# Sources

**PIB :** https://donnees.banquemondiale.org/indicateur/NY.GDP.MKTP.CD  
**Taux de chômage :**  
**Taux d'intérêts des dépôts :** https://donnees.banquemondiale.org/indicateur/FR.INR.DPST?most_recent_value_desc=false&view=chart  
**IPCH :**  
**Historique des actions :**  
**Devise:** https://ec.europa.eu/eurostat/databrowser/view/tec00033/default/table?lang=en&category=t_ert  
**Matières premières:** https://bdm.insee.fr/series/sdmx/data/SERIES_BDM/010002100 **et** https://bdm.insee.fr/series/sdmx/data/SERIES_BDM/010002091  
**Dette publique:**  


# Type d'analyses prévus et résultats attendus

## Analyses prévues

- Corrélations entre les différentes données  
- Etude d'indices boursiers

## Résultats attendus

### PIB
Un PIB croissant est souvent associé à une économie forte, ce qui peut influencer positivement les marchés boursiers. L'analyse cherchera à quantifier cette relation.  

### Taux de chômage
Un faible taux de chômage peut refléter une économie robuste et un climat favorable aux entreprises, impactant ainsi les actions. Les corrélations entre ces données et les performances boursières seront examinées.  

### Taux d'intérêts des dépôts
Les variations des taux d'intérêt influencent directement l'attractivité des investissements en actions. Une corrélation négative est souvent attendue entre les taux d'intérêt et les performances des marchés.  

### IPCH (Indice des Prix à la Consommation Harmonisé)
L'inflation, mesurée ici par l'IPCH, est un facteur clé pour comprendre les ajustements des marchés financiers aux variations des taux d'intérêt et des prix.  

### Historique des actions
L'analyse des tendances passées dans les cours des actions permettra d'évaluer la réactivité des marchés aux changements des indicateurs économiques.  

### Devise
Les fluctuations des taux de change peuvent avoir un impact direct, notamment pour les entreprises opérant à l'international. Les relations entre les cours des actions et les variations des devises seront explorées.  

### Matières premières
Certains secteurs boursiers sont fortement dépendants des prix des matières premières. L'étude analysera les corrélations spécifiques entre ces prix et les performances des actions dans les secteurs concernés.  

### Dette intérieure
Le niveau d'endettement d'un pays peut influencer la confiance des investisseurs et, par conséquent, le comportement des marchés. L'étude des corrélations dans ce contexte sera essentielle.  


# Début du code

In [601]:
#necessary imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
import warnings
import xml.etree.ElementTree as ET
from collections import defaultdict
import requests

import ipywidgets as  widgets
from ipywidgets import interact, widgets, VBox, HBox
from ipywidgets import interact_manual
import geopandas as gpd

# Dette publique

In [602]:
# Load the data from the Excel file
warnings.filterwarnings("ignore", category=UserWarning, module="openpyxl")
# Importer les données depuis l'URL
dette_publique_url = 'https://sebastien-hein.emi.u-bordeaux.fr/OI-sbzrthstrm/DATA/dette_pub.xlsx'
code_url = 'https://sebastien-hein.emi.u-bordeaux.fr/OI-sbzrthstrm/DATA/code.tsv'

# Charger les fichiers directement depuis l'URL
dette_publique = pd.read_excel(dette_publique_url, sheet_name='Feuille 1')
code = pd.read_csv(code_url, sep='\t')

# Define column names for code and label
col_code = 'CODE'
col_label = 'Label - French'

def find_value(x):
    """Find the label value based on the code."""
    matched = not code[code[col_code] == x].empty
    if matched:
        return code[code[col_code] == x].loc[:, col_label].iloc[0]
    elif x == 'TIME':
        return 'Country'
    return None

# Modify the index of the dataframe
dette_publique.index = dette_publique.iloc[:, 0].apply(find_value)
dette_publique.index.name = None

# Set the column names based on the 'Country' row
dette_publique.columns = dette_publique.loc['Country']

# Filter the dataframe to include only rows from 'Belgique' to 'Suède' onwards and exclude the 'TIME' column
# Indeed, only the countries interest us,
# and the datas are missing for Islande, Norvège, Suisse and United Kingdom.
dette_publique = dette_publique.loc['Belgique':'Suède', dette_publique.columns != 'TIME']

# Filter the dataframe to include only columns from the year 2002 onwards
dette_publique = dette_publique.loc[:,2002:] # Problème non résolu: Si l'on prend une date inférieure à 2002, 
                                             #                      l'interpolation ne fonctionne nul part.

def to_date(x):
    """Convert a value to datetime."""
    return pd.to_datetime(x, format='%Y')

# Vectorize the to_date function
vect_to_date = np.vectorize(to_date)

# Convert the columns to datetime
dette_publique.columns = vect_to_date(dette_publique.columns.values)

monthly_dates = pd.date_range(start=dette_publique.columns.values[0], end=dette_publique.columns.values[-1], freq='MS')

# Add columns for each month from 2013 to 2023
dette_publique = dette_publique.reindex(columns=dette_publique.columns.union(monthly_dates))

def fill_val(x):
    """Fill missing values by resampling and interpolating."""
    return x.resample('MS').interpolate(method='quadratic')

# Apply the fill_val function to each row
dette_publique = dette_publique.apply(func=fill_val, axis=1).T
dette_publique.columns.name = 'Country'

dette_publique.isna().sum().sum() # Number of missing values (0)


np.int64(0)

# IPCH

In [603]:
# Define URLs for the data sources
ipch_url = 'https://sebastien-hein.emi.u-bordeaux.fr/OI-sbzrthstrm/DATA/ipch.tsv'
code_cp_url = 'https://sebastien-hein.emi.u-bordeaux.fr/OI-sbzrthstrm/DATA/code_cp.tsv'

# Load the data directly from the URLs
ipch = pd.read_csv(ipch_url, sep='\t')  # Load the ipch data using tab as a separator
code_cp = pd.read_csv(code_cp_url, sep='\t')  # Load the code_cp data using tab as a separator


# Set index
def get_CP(x): return x[8:12]  # Extract CP code
def get_id(x): return x[13:]  # Extract id
vect_get_cp = np.vectorize(get_CP)
vect_get_id = np.vectorize(get_id)
ipch['CP'] = vect_get_cp(ipch.iloc[:, 0])  # Apply CP extraction
ipch['id'] = vect_get_id(ipch.iloc[:, 0])  # Apply id extraction

ipch = ipch[ipch['CP'] == 'CP00']
ipch.drop(columns = 'CP', inplace = True)

ipch.drop(columns='freq,unit,coicop,geo\\TIME_PERIOD', inplace=True)  # Drop unnecessary columns
ipch['country'] = ipch.loc[:, 'id'].apply(find_value)  # Find country names
ipch.set_index(['country'], inplace=True)  # Set index
ipch.drop(columns='id', inplace=True)  # Drop id column

# Convert the columns to datetime
def to_date_M(x):
    """Convert a value to datetime."""
    try:
        return pd.to_datetime(x[:-1], format='%Y-%m')
    except:
        print('fail')
        return x

vect_to_date_M = np.vectorize(to_date_M)
ipch.columns = vect_to_date_M(ipch.columns.values)  # Apply datetime conversion

# Filter data
ipch = ipch[~ipch.index.str.startswith(('Union', 'Zone', 'Espace'))]  # Exclude certain countries

# Clean and convert to numeric
ipch = ipch.map(lambda x: pd.to_numeric(
    str(x).replace(' ', '').replace('d', ''), errors='coerce'))
ipch = ipch.T
ipch.columns.name = 'Country'

# Missing values
# Initialiser le dictionnaire pour stocker les plages de dates manquantes
missing_ranges = defaultdict(list)

# Identifier les valeurs manquantes
missing_values = ipch.isna()

# Parcourir chaque pays (colonne) pour trouver les plages de dates manquantes
for country in ipch.columns:
    country_missing = missing_values[country]
    if not country_missing.empty:
        # Trouver les plages de dates manquantes
        missing_dates = country_missing[country_missing].index
        start_date = None
        for date in missing_dates:
            if start_date is None:
                start_date = date
            if (date + pd.DateOffset(months=1)) not in missing_dates:
                end_date = date
                missing_ranges[country].append((start_date.strftime('%Y-%m'), end_date.strftime('%Y-%m')))
                start_date = None

# Convertir le defaultdict en dict
missing_ranges = dict(missing_ranges)

for country, dates in missing_ranges.items():
    print(f'{country}: {dates}')

# Delete the country for which the missing data is on a bigger period than 4 years
ipch.drop(columns = 'Albanie, Kosovo*, Monténégro'.split(', '), inplace = True)

# Fill missing values using interpolation as the most consistent method for long gaps
ipch.interpolate(method='time', inplace=True, limit_direction='both')  # Interpolate linearly by date for smoother transitions

# Optionally fill remaining missing values (if interpolation failed for some edge cases) with column mean
ipch.fillna(ipch.mean(), inplace=True)

ipch.isna().sum().sum() # Number of missing values (0)

Albanie: [('2010-12', '2016-11')]
Monténégro: [('2010-12', '2015-11')]
United Kingdom: [('2020-12', '2024-11')]
Kosovo*: [('2010-12', '2016-11')]


np.int64(0)

# Chomage

In [604]:
def parse_xml(file_url):
    """Parse XML file and extract data into a DataFrame."""
    
    # Download the XML file content using requests
    response = requests.get(file_url)
    xml_content = response.text  # Get the content of the XML file as a string
    
    # Parse the XML content with ElementTree
    root = ET.fromstring(xml_content)  # Parse the XML string directly
    
    # Initialize lists to store data
    data = []
    columns = set()
    rows = set()

    # Extract data from <Series> and <Obs> tags
    for series in root.findall('.//Series'):
        geo = series.attrib.get('geo')  # Get "geo" attribute
        if geo:
            rows.add(geo)
            for obs in series.findall('Obs'):
                time_period = obs.attrib.get('TIME_PERIOD')  # Get "TIME_PERIOD" attribute
                obs_value = obs.attrib.get('OBS_VALUE')  # Get "OBS_VALUE" attribute
                if time_period and obs_value:
                    columns.add(time_period)
                    data.append((geo, time_period, obs_value))

    # Create DataFrame with appropriate indices
    df = pd.DataFrame(index=sorted(rows), columns=sorted(columns))

    # Fill DataFrame with extracted values
    for geo, time_period, obs_value in data:
        df.at[geo, time_period] = obs_value

    return df

# URL for the XML data
file_url = 'https://sebastien-hein.emi.u-bordeaux.fr/OI-sbzrthstrm/DATA/chomage.xml'

# Load the data
chomage = parse_xml(file_url)

# Format the data
chomage.columns = chomage.columns.map(lambda x: \
                                      pd.to_datetime(x, format='%Y-%m'))  # Convert columns to datetime
chomage.index = chomage.index.map(lambda x: \
                code.loc[code.loc[:, 'CODE'] == x, 'Label - French'].iloc[0])  # Map index to labels
chomage.drop('Zone euro - 20 pays (à partir de 2023)', inplace=True)  # Drop specific rows
chomage.drop('Union européenne - 27 pays (à partir de 2020)', inplace=True)  # Drop specific rows
chomage = chomage.apply(pd.to_numeric).T  # Convert data to numeric
chomage.columns.name = 'Country'


# Missing values
missing_ranges = defaultdict(list) # Initialiser le dictionnaire pour stocker les plages de dates manquantes
missing_values = chomage.isna() # Identifier les valeurs manquantes

for country in chomage.columns: # Parcourir chaque pays (colonne) pour trouver les plages de dates manquantes
    country_missing = missing_values[country]
    if country_missing.any():
        # Trouver les plages de dates manquantes
        missing_dates = country_missing[country_missing].index
        start_date = None
        for date in missing_dates:
            if start_date is None:
                start_date = date
            if date + pd.DateOffset(months=1) not in missing_dates:
                end_date = date
                missing_ranges[country].append((start_date.strftime('%Y-%m'), 
                                                end_date.strftime('%Y-%m')))
                start_date = None

# Fill missing values using interpolation as the most consistent method for long gaps
chomage.interpolate(method='time', inplace=True, limit_direction='both')  # Interpolate linearly by date for smoother transitions

# Optionally fill remaining missing values (if interpolation failed for some edge cases) with column mean
chomage.fillna(chomage.mean(), inplace=True)

ipch.isna().sum().sum() # Number of missing values (0)

np.int64(0)

# PIB

In [605]:
def parse_xml_pib(file_url):
    """Parse XML file and extract data into a DataFrame."""
    
    # Download the XML file content using requests
    response = requests.get(file_url)
    xml_content = response.text  # Get the content of the XML file as a string
    
    # Parse the XML content with ElementTree
    root = ET.fromstring(xml_content)  # Parse the XML string directly
    
    # Initialize a list to store data
    data = []

    # Extract data
    for record in root.findall('.//record'):
        record_data = {}
        for field in record.findall('field'):
            name = field.attrib.get('name')
            text = field.text
            match name:
                case "Country or Area": record_data['country'] = text
                case "Value": 
                    try: record_data['value'] = float(text)
                    except: record_data['value'] = None
                case "Year": record_data['year'] = pd.to_datetime(str(text))
        data.append(record_data)

    # Create DataFrame from the list of dictionaries
    df = pd.DataFrame(data)

    # Remove duplicates
    df = df.drop_duplicates(subset=['year', 'country'])

    # Pivot the DataFrame to get the desired format
    df = df.pivot(index='year', columns='country', values='value')

    return df

url = 'https://julie-sclaunich.emi.u-bordeaux.fr/DATA/API_NY.GDP.MKTP.CD_DS2_fr_xml_v2_38351.xml'
pib = parse_xml_pib(url)


In [606]:
# Reindex monthly
monthly_dates = pd.date_range(start=pib.index.values[0], end=pib.index.values[-1], freq='MS')

# Add columns for each month from 2013 to 2023
pib = pib.reindex(index=pib.index.union(monthly_dates))
pib.columns.name = 'Country'

# Select only same dates and countries as dette_publique
pib = pib.loc[dette_publique.index, dette_publique.columns.intersection(pib.columns)]

def fill_val(x):
    """Fill missing values by resampling and interpolating."""
    return x.resample('MS').interpolate(method='quadratic')

# Apply the fill_val function to each row
pib = pib.apply(func=fill_val, axis=0)

pib.isna().sum().sum() # Number of missing values (0)


np.int64(0)

# Taux d'intérêt

A priori, que des NA pour les pays d'Europe ?\
Décommenter le code `# taux = taux.loc[dette_publique.index, dette_publique.columns.intersection(taux.columns)]` vide complètement la df. Il s'agit de faire l'intersection entre les colonnes de (donc les pays présents dans) dette_publique avec celles de taux. J'imagine que si le résultat est vide, c'est que pour tous les pays de dette_publique ont des valeurs NA dans taux.\
Peut-être que c'est logique: on prête à l'échelle européenne avec la banque centrale ? 

In [607]:
url = 'https://julie-sclaunich.emi.u-bordeaux.fr/DATA/API_FR.INR.DPST_DS2_fr_xml_v2_52919.xml'
taux = parse_xml_pib(url)

In [608]:
# Reindex monthly
monthly_dates = pd.date_range(start=taux.index.values[0], end=taux.index.values[-1], freq='MS')

# Add columns for each month from 2013 to 2023
taux = taux.reindex(index=taux.index.union(monthly_dates))

# Select only same dates and countries as dette_publique
#taux = taux.loc[dette_publique.index, dette_publique.columns.intersection(taux.columns)]

def fill_val(x):
    """Fill missing values by resampling and interpolating."""
    return x.resample('MS').interpolate(method='quadratic')

# Apply the fill_val function to each row
taux = taux.apply(func=fill_val, axis=0)

taux.isna().sum().sum() # Number of missing values (0)


np.int64(144389)

# Devise

In [609]:
# URL of the CSV file
url_3 = 'https://julie-sclaunich.emi.u-bordeaux.fr/DATA/estat_tec00033_filtered_en.csv'

# Load the CSV file into a DataFrame
devise = pd.read_csv(url_3)

# Remove unnecessary columns to clean up the data
columns_to_delete = ['DATAFLOW', 'LAST UPDATE', 'freq', 'statinfo', 'unit', 'OBS_FLAG']
devise.drop(columns=columns_to_delete, inplace=True)

# Convert 'TIME_PERIOD' to datetime format and filter rows after 2012
devise['TIME_PERIOD'] = pd.to_datetime(devise['TIME_PERIOD'], format='%Y', errors='coerce')  # Convert to datetime
devise = devise[devise['TIME_PERIOD'] > '2012-12-31']  # Keep rows with dates after 2012
devise['TIME_PERIOD'] = devise['TIME_PERIOD'].dt.strftime('%Y/%m')  # Format as YYYY/MM

# Reshape the DataFrame to have 'TIME_PERIOD' as row index and 'currency' as columns
devise = devise.pivot(index='TIME_PERIOD', columns='currency', values='OBS_VALUE')

# Set the index to be a DatetimeIndex for resampling
devise.index = pd.to_datetime(devise.index, format='%Y/%m', errors='coerce')
devise.index.name = None
devise.columns.name = 'Currency'

# Define a function to fill missing values by resampling and interpolating
def fill_val(x):
    """Fill missing values by resampling to monthly frequency and using quadratic interpolation."""
    return x.resample('MS').interpolate(method='quadratic')

# Apply the interpolation function to fill missing values
devise = fill_val(devise)
print(devise.isna().sum().sum()) # Number of missing values (24 a voir pourquoi)
# Display a sample of the corrected data
devise.head()


24


Currency,Bosnia and Herzegovina convertible mark,Bulgarian lev,Canadian dollar,Czech koruna,Danish krone,Hungarian forint,Icelandic króna,Japanese yen,North Macedonian denar,Norwegian krone,Polish zloty,Pound sterling,Romanian leu,Russian rouble,Serbian dinar,Swedish krona,Swiss franc,Turkish lira,US dollar
2013-01-01,1.95583,1.9558,1.3684,25.98,7.4579,296.87,162.38,129.66,61.585,7.8067,4.1975,0.84926,4.419,42.337,113.1369,8.6515,1.2311,2.5335,1.3281
2013-02-01,1.95583,1.9558,1.384134,26.195389,7.457178,298.365029,161.761335,131.231491,61.59073,7.849537,4.197594,0.848684,4.42244,42.618453,113.521815,8.697162,1.236956,2.577914,1.33979
2013-03-01,1.95583,1.9558,1.397158,26.376653,7.456598,299.637286,161.199346,132.544486,61.595525,7.888816,4.197485,0.847671,4.425333,42.944501,113.864562,8.737177,1.241087,2.615999,1.348489
2013-04-01,1.95583,1.9558,1.410265,26.562635,7.456038,300.959399,160.573608,133.88034,61.600413,7.932953,4.19715,0.846004,4.428299,43.385011,114.238586,8.780119,1.244379,2.655914,1.356062
2013-05-01,1.95583,1.9558,1.421635,26.727907,7.455577,302.152358,159.964517,135.05522,61.604722,7.976317,4.19661,0.843845,4.430932,43.890885,114.595096,8.820316,1.246282,2.692291,1.361329


# Matières premières

In [610]:
url_or = "https://bdm.insee.fr/series/sdmx/data/SERIES_BDM/010002100" # URL of the serie

response = requests.get(url_or) # Retrieve XML data
response.raise_for_status() # Checks that the request is successful
xml_content = response.content


root = ET.fromstring(xml_content) # Parse XML content

root = ET.fromstring(xml_content) # Load XML content


data = [] # Initialize a list to store the data


for series in root.findall(".//{*}Series"): # Browse each series

    for obs in series.findall(".//{*}Obs"): # Browse the observation in  each series

        # Extract relevant 
        time_period = obs.attrib.get("TIME_PERIOD")
        obs_value = obs.attrib.get("OBS_VALUE")
        # Add the data at the list
        data.append({"TIME_PERIOD": time_period, "OBS_VALUE": obs_value})


df_or = pd.DataFrame(data) # Create a DataFrame from the extracted data


# Convert columns to appropriate types
df_or["TIME_PERIOD"] = pd.to_datetime(df_or["TIME_PERIOD"], format="%Y-%m")
df_or["OBS_VALUE"] = pd.to_numeric(df_or["OBS_VALUE"])



# Convert TIME_PERIOD to datetime for easier filtering
df_or['TIME_PERIOD'] = pd.to_datetime(df_or['TIME_PERIOD'], format='%Y-%m')

# Filter years between 2013 and 2023
start_date = '2013-01-01'
end_date = '2023-12-31'
df_or = df_or[(df_or['TIME_PERIOD'] >= start_date) & (df_or['TIME_PERIOD'] <= end_date)]

df_or.set_index('TIME_PERIOD', inplace=True) #indexes the years
df_or.index.name = None
df_or.columns.name = None
df_or.rename(columns = {'OBS_VALUE': 'Or'}, inplace = True)
print(df_or.isna().sum().sum()) # Number of missing values (0)
# show the 5 first rows
df_or.head()

0


Unnamed: 0,Or
2023-12-01,201.5
2023-11-01,198.6
2023-10-01,195.8
2023-09-01,194.0
2023-08-01,190.2


In [611]:
url_petrol = "https://bdm.insee.fr/series/sdmx/data/SERIES_BDM/010002091" # URL of the serie

response = requests.get(url_petrol) # Retrieve XML data
response.raise_for_status()   # Checks that the request is successful
xml_content = response.content


root = ET.fromstring(xml_content) # Parse XML content

root = ET.fromstring(xml_content) # Load XML content

# Initialiser une liste pour stocker les données
data = []


for series in root.findall(".//{*}Series"): # Browse each series
   
    for obs in series.findall(".//{*}Obs"):  # Browse the observation in  each series

       # Extract relevant attributes
        time_period = obs.attrib.get("TIME_PERIOD")
        obs_value = obs.attrib.get("OBS_VALUE")
         # Add the data at the list
        data.append({"TIME_PERIOD": time_period, "OBS_VALUE": obs_value})


df_petrol = pd.DataFrame(data)  # Create a DataFrame from the extracted data

# Convert columns to appropriate types
df_petrol["TIME_PERIOD"] = pd.to_datetime(df_petrol["TIME_PERIOD"], format="%Y-%m")
df_petrol["OBS_VALUE"] = pd.to_numeric(df_petrol["OBS_VALUE"])


# Convert TIME_PERIOD to datetime for easier filtering
df_petrol['TIME_PERIOD'] = pd.to_datetime(df_petrol['TIME_PERIOD'], format='%Y-%m')

# Filter years between 2013 and 2023
start_date = '2013-01-01'
end_date = '2023-12-31'
df_petrol = df_petrol[(df_petrol['TIME_PERIOD'] >= start_date) & (df_petrol['TIME_PERIOD'] <= end_date)]

df_petrol.set_index('TIME_PERIOD', inplace=True) #indexes the years
df_petrol.index.name = None
df_petrol.columns.name = None
df_petrol.rename(columns = {'OBS_VALUE': 'Petrol'}, inplace = True)

print(ipch.isna().sum().sum()) # Number of missing values (0)
# show the 5 first rows
df_petrol.head()
material = pd.concat((df_petrol, df_or), axis = 1, join = 'inner')
material.columns.name = 'Material'
material.head()

0


Material,Petrol,Or
2023-12-01,118.1,201.5
2023-11-01,127.3,198.6
2023-10-01,142.3,195.8
2023-09-01,145.2,194.0
2023-08-01,130.9,190.2


# Fusion des dataframe

In [612]:
common_countries = ipch.columns.intersection(\
                   dette_publique.columns.intersection(\
                   chomage.columns.intersection(\
                    pib.columns)))

common_dates = ipch.index.intersection(\
                   dette_publique.index.intersection(\
                   chomage.index.intersection(\
                    pib.index)))

expected_dates = pd.date_range(start='2013-01-01', end='2023-01-01', freq='MS')
all_dates_present = expected_dates.isin(common_dates).all()

print(f'Toutes les dates du {common_dates[-1].strftime('%d/%m/%Y')}\
       au {common_dates[0].strftime('%d/%m/%Y')} sont présentes' \
        if all_dates_present else 'Il manque des dates')

def set_index(df, index_name):
    df = df.loc[common_dates, common_countries]
    if isinstance(df.columns, pd.MultiIndex):
        # Ajouter le nouvel index au niveau supérieur du MultiIndex existant
        new_index = pd.MultiIndex.from_tuples([(index_name, *idx) \
                        for idx in df.columns], names=['Type'] + df.columns.names)
    else:
        # Créer un MultiIndex en juxtaposant le nouvel index et l'index existant
        new_index = pd.MultiIndex.from_tuples([(index_name, idx) 
                        for idx in df.columns], names=['Type', df.columns.names[0]])
    df.columns = new_index
    return df
dette_publique = set_index(dette_publique, 'Dette publique')
chomage = set_index(chomage, 'Chomage')
ipch = set_index(ipch, 'IPCH')
pib = set_index(pib, 'PIB')

data = pd.concat((dette_publique, chomage, ipch, pib), axis = 1, join = 'inner')
data = data.groupby('Type', axis = 1)


Toutes les dates du 01/01/2023       au 01/01/2013 sont présentes


  data = data.groupby('Type', axis = 1)


In [613]:
data.get_group('IPCH')

Type,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH,IPCH
Country,Autriche,Belgique,Bulgarie,Chypre,Allemagne,Danemark,Estonie,Grèce,Espagne,Finlande,...,Lituanie,Luxembourg,Lettonie,Malte,Pays-Bas,Pologne,Portugal,Roumanie,Suède,Slovénie
2013-01-01,2.8,1.5,2.6,2.0,1.9,0.9,3.7,0.0,2.8,2.6,...,2.7,2.1,0.6,2.4,3.2,1.6,0.4,5.1,0.6,2.8
2013-02-01,2.6,1.5,2.2,1.8,1.8,1.1,4.0,0.1,2.9,2.4,...,2.3,2.4,0.3,1.8,3.2,1.2,0.2,4.9,0.5,2.9
2013-03-01,2.4,1.4,1.6,1.3,1.9,0.7,3.8,-0.2,2.6,2.5,...,1.6,2.0,0.3,1.4,3.2,1.0,0.7,4.4,0.5,2.2
2013-04-01,2.1,1.2,0.9,0.1,1.1,0.5,3.4,-0.6,1.5,2.4,...,1.4,1.7,-0.4,0.9,2.8,0.8,0.4,4.4,0.0,1.6
2013-05-01,2.4,1.2,1.0,0.1,1.5,0.7,3.6,-0.3,1.8,2.5,...,1.5,1.4,-0.2,0.8,3.1,0.5,0.9,4.4,0.3,1.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-09-01,11.0,12.1,15.6,9.0,10.9,11.1,24.1,12.1,9.0,8.4,...,22.5,8.8,22.0,7.4,17.1,15.7,9.8,13.4,10.3,10.6
2022-10-01,11.6,13.1,14.8,8.6,11.6,11.4,22.5,9.5,7.3,8.4,...,22.1,8.8,21.7,7.4,16.8,16.4,10.6,13.5,9.8,10.3
2022-11-01,11.2,10.5,14.3,8.1,11.3,9.7,21.4,8.8,6.7,9.1,...,21.4,7.3,21.7,7.2,11.3,16.1,10.2,14.6,10.1,10.8
2022-12-01,10.5,10.2,14.3,7.6,9.6,9.6,17.5,7.6,5.5,8.8,...,20.0,6.2,20.7,7.3,11.0,15.3,9.8,14.1,10.8,10.8


# Affichage

L'affichage des graphiques se fait trois fois, je ne sais pas pourquoi.\
Pour l'affichage de la carte, j'ai commencé à coder une fonction tout en bas. Il manque la traduction des noms des pays. Je ne sais pas si ça va fonctionner après ça, à voir.

## Fonction d'affichage de graphique

In [614]:
def plot_graph(countries, start, end, data, data_type):
    plt.figure(figsize=(9, 4))
    for country in countries.split(', '):
        data.loc[start:end, (data_type, country)].plot(label=f'{country}')

    plt.title(f'{data_type} ({start.strftime("%m/%Y")} - {end.strftime("%m/%Y")})')
    plt.legend()
    plt.show()

## Fonction d'affichage de carte

In [619]:
# Load GeoJSON file into a GeoDataFrame
url = "https://raw.githubusercontent.com/leakyMirror/map-of-europe/master/GeoJSON/europe.geojson"
europe = gpd.read_file(url)

def plot_map(date: str, data: pd.core.groupby.DataFrameGroupBy, data_type: str):
    """
    Plot a map of Europe for the specified date and data type.
    
    Args:
        date (str): Date in 'YYYY-MM' format for which to filter the data.
        data (pd.core.groupby.DataFrameGroupBy): A grouped DataFrame with MultiIndex columns
            ('Type', 'Country') and a DateTimeIndex.
        data_type (str): The type of data to display (e.g., 'IPCH').
    """
    # Prepare the data
    df = data.get_group(data_type).copy()
    df.columns = df.columns.droplevel('Type')
    df.rename(columns={
        "Albanie": "Albania",
        "Allemagne": "Germany",
        "Andorre": "Andorra",
        "Autriche": "Austria",
        "Belgique": "Belgium",
        "Biélorussie": "Belarus",
        "Bosnie-Herzégovine": "Bosnia and Herzegovina",
        "Bulgarie": "Bulgaria",
        "Croatie": "Croatia",
        "Danemark": "Denmark",
        "Espagne": "Spain",
        "Estonie": "Estonia",
        "Finlande": "Finland",
        "France": "France",
        "Grèce": "Greece",
        "Hongrie": "Hungary",
        "Irlande": "Ireland",
        "Islande": "Iceland",
        "Italie": "Italy",
        "Kosovo": "Kosovo",
        "Lettonie": "Latvia",
        "Liechtenstein": "Liechtenstein",
        "Lituanie": "Lithuania",
        "Luxembourg": "Luxembourg",
        "Malte": "Malta",
        "Moldavie": "Moldova",
        "Monaco": "Monaco",
        "Monténégro": "Montenegro",
        "Norvège": "Norway",
        "Pays-Bas": "Netherlands",
        "Pologne": "Poland",
        "Portugal": "Portugal",
        "République tchèque": "Czech Republic",
        "Roumanie": "Romania",
        "Royaume-Uni": "United Kingdom",
        "Russie": "Russia",
        "Saint-Marin": "San Marino",
        "Serbie": "Serbia",
        "Slovaquie": "Slovakia",
        "Slovénie": "Slovenia",
        "Suède": "Sweden",
        "Suisse": "Switzerland",
        "Ukraine": "Ukraine",
        "Vatican": "Vatican City"
    }, inplace=True)

    df.index.name = None
    df.columns.name = 'Material'

    # Melt the DataFrame to long format for merging
    df_melted = df.reset_index().melt(id_vars='index', var_name='Country', value_name='Value')
    df_melted.rename(columns={'index': 'Date'}, inplace=True)

    # Merge GeoJSON data with DataFrame
    europe_merged = europe.merge(df_melted, left_on='NAME', right_on='Country', how='left')

    # Plot the data
    fig, ax = plt.subplots(1, 1, figsize=(12, 8))
    europe_merged.plot(column='Value', ax=ax, legend=True, cmap='viridis', 
                       missing_kwds={"color": "lightgrey"},
                       legend_kwds={'label': data_type})

    plt.title(f"Map of {data_type} in Europe in {pd.to_datetime(date).strftime('%B %Y')}", fontsize=16)
    plt.show()

In [None]:
# Widgets pour sélectionner les pays et les dates
countries_widget = widgets.Text(
    value='France, Allemagne, Italie',
    description='Countries:',
    placeholder='Enter countries separated by commas'
)

start_date_widget = widgets.DatePicker(
    value=pd.to_datetime('2015-1', format='%Y-%m'),
    description='Start Date'
)

end_date_widget = widgets.DatePicker(
    value=pd.to_datetime('2020-3', format='%Y-%m'),
    description='End Date'
)

# Widget pour choix multiple des données à afficher
multi_choice_widget = widgets.SelectMultiple(
    options=['IPCH', 'Dette publique', 'PIB', 'Chomage'],
    value=['IPCH'],
    description='Select Data',
    disabled=False
)

# Widget case à cocher
checkbox_widget = widgets.Checkbox(
    value=False,
    description='Map',
    disabled=False
)

# Fonction générale pour tracer les données
@interact_manual(countries=countries_widget, \
                 start_date=start_date_widget, end_date=end_date_widget, \
                 type=multi_choice_widget, map=checkbox_widget)
def plot_G(countries, start_date, end_date, type, map = True):
    if map:
        plot_map(date = start_date, data = data, data_type= type[0])
    else:
        for t in type:
            plot_graph(countries, start_date, end_date, data.get_group(t), t)



interactive(children=(Text(value='France, Allemagne, Italie', continuous_update=False, description='Countries:…

In [15]:
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt

url = "https://raw.githubusercontent.com/leakyMirror/map-of-europe/master/GeoJSON/europe.geojson"
europe = gpd.read_file(url)

def plot_map(year, data, data_type):
    to_plot = data.loc[f'{year}-01', :]
    to_plot = europe.merge(to_plot, left_on='NAME', right_index=True, how='left')
    
    fig, ax = plt.subplots(1, 1, figsize=(15, 10))
    europe.plot(ax=ax, color='lightgrey', edgecolor='black')
    to_plot.plot(column='Value', ax=ax, legend=True, cmap='viridis', 
                 missing_kwds={"color": "lightgrey", "label": "Données manquantes"},
                 legend_kwds={'label': "PIB", 'orientation': "vertical"},
                 edgecolor='black')
    
    ax.set_title(f"Carte des PIB en Europe (en {year})", fontsize=16)
    ax.axis("off")
    plt.show()

plot_map(2020, data.get_group('PIB'), 'PIB')

ERROR 1: PROJ: proj_create_from_database: Open of /opt/local/stow/conda/miniforge3/envs/cremi/share/proj failed


MergeError: Not allowed to merge between different levels. (1 levels on the left, 2 on the right)

Affichage matières premières

In [19]:
import matplotlib.pyplot as plt
import seaborn as sb
import pandas as pd
import ipywidgets as widgets
from ipywidgets import interact

# Function to plot the data
def plot_data(start_date, end_date, show_or, show_petrol):
    plt.figure(figsize=(10, 6))
    start_date, end_date = pd.to_datetime(start_date), pd.to_datetime(end_date)
    
    # Filter the data based on the selected dates
    filtered_material = material[(material.index >= start_date) & (material.index <= end_date)]
    
    if show_or:
        sb.lineplot(data=filtered_material, x=filtered_material.index, y='Or', label='Or', marker='o')
        # If 'Or' checkbox is checked, plot the data for 'Or'

    if show_petrol:
        sb.lineplot(data=filtered_material, x=filtered_material.index, y='Petrol', label='Pétrole', marker='o')
        # If 'Petrol' checkbox is checked, plot the data for 'Petrol'

    # Configure the plot with a title, legend, and grid
    plt.title(f'Cour de l\'or et du pétrole en euros ({start_date.strftime("%m/%Y")} - {end_date.strftime("%m/%Y")})')
    plt.legend()  # The legend is automatically updated based on the checked datasets
    plt.grid(True)
    plt.show()

# Widgets for selecting the start and end dates, and options to display data
start_date_widget = widgets.DatePicker(description='Début', value=pd.to_datetime('2013-01-01'))
end_date_widget = widgets.DatePicker(description='Fin', value=pd.to_datetime('2023-12-31'))
show_or_widget = widgets.Checkbox(description='Or', value=True)  # Checkbox for showing 'Or' data
show_petrol_widget = widgets.Checkbox(description='Pétrole', value=True)  # Checkbox for showing 'Petrol' data

# Interactive interface to control the plot function with widgets
interact(
    plot_data,  # The function to interact with
    start_date=start_date_widget,  # Start date widget
    end_date=end_date_widget,  # End date widget
    show_or=show_or_widget,  # 'Or' checkbox widget
    show_petrol=show_petrol_widget  # 'Petrol' checkbox widget
)

interactive(children=(DatePicker(value=Timestamp('2013-01-01 00:00:00'), description='Début', step=1), DatePic…

<function __main__.plot_data(start_date, end_date, show_or, show_petrol)>

Affichage des devises

In [17]:
# Function to plot currency data
def plot_devise_data(select_all, start_date, end_date, **currency_checkboxes):
    plt.figure(figsize=(12, 6))

    # List of selected currencies
    if select_all:
        selected_currencies = devise.columns.tolist()  # If "select all" is checked, include all currencies
    else:
        selected_currencies = [currency for currency, is_selected in currency_checkboxes.items() if is_selected]
        # If not, include only the selected currencies based on the checkboxes

    # Filter the data based on the selected date range and currencies
    filtered_data = devise[(devise.index >= start_date) & (devise.index <= end_date)]
    filtered_data = filtered_data[selected_currencies]

    # Plot the time series for each selected currency
    for currency in selected_currencies:
        sb.lineplot(data=filtered_data, x=filtered_data.index, y=currency, label=currency, marker='o')

    # Configure the plot with a title, x and y labels, and a legend
    plt.title(f'Valeurs des devises (équivalent en euros) ({start_date.strftime("%m/%Y")} - {end_date.strftime("%m/%Y")})')
    plt.xlabel('TIME_PERIOD')
    plt.ylabel('OBS_VALUE')
    plt.legend(title='Currency')  # Currency legend
    plt.grid(True)
    plt.show()

# Widgets for selecting the start and end dates
start_date_widget = widgets.DatePicker(description='Début', value=pd.to_datetime('2013-01-01'))
end_date_widget = widgets.DatePicker(description='Fin', value=pd.to_datetime('2023-12-31'))

# Dynamically generate checkboxes for each currency
currency_checkboxes = {
    currency: widgets.Checkbox(description=currency, value=False)  # Default value is False (unchecked)
    for currency in devise.columns
}

# Checkbox to "Select All" currencies
select_all_widget = widgets.Checkbox(description='Tout sélectionner', value=False)

# Function to dynamically update checkboxes based on "Select All"
def update_checkboxes(change):
    for checkbox in currency_checkboxes.values():
        checkbox.value = change['new']  # Update the state of checkboxes based on the "Select All" checkbox

select_all_widget.observe(update_checkboxes, names='value')  # Observe changes to the "Select All" checkbox

# Create a container for all the checkboxes
checkbox_container = VBox([select_all_widget] + list(currency_checkboxes.values()))

# Interactive interface to control the plot function with widgets
interact(
    plot_devise_data,  # The function to interact with
    select_all=select_all_widget,  # "Select All" widget
    start_date=start_date_widget,  # Start date widget
    end_date=end_date_widget,  # End date widget
    **currency_checkboxes  # Pass each currency checkbox widget as a parameter
)



interactive(children=(Checkbox(value=False, description='Tout sélectionner'), DatePicker(value=Timestamp('2013…

<function __main__.plot_devise_data(select_all, start_date, end_date, **currency_checkboxes)>

# Corrélations

In [20]:
def correlation_matrix(data, variables):
  """Return the correlation matrix for specified variables."""
  data_flat = data.apply(lambda x: x.droplevel(0, axis=1))  # Flatten MultiIndex
  return data_flat[variables].corr()  # Compute and return correlation matrix

#test
variables = ['PIB', 'Chomage','IPCH']
print(correlation_matrix(data, variables).head()) 

Type                 PIB                                                    \
Country         Autriche  Belgique  Bulgarie    Chypre Allemagne  Danemark   
Type Country                                                                 
PIB  Autriche   1.000000  0.990586  0.906579  0.962533  0.993234  0.963908   
     Belgique   0.990586  1.000000  0.930290  0.973268  0.979347  0.981178   
     Bulgarie   0.906579  0.930290  1.000000  0.983123  0.880897  0.925532   
     Chypre     0.962533  0.973268  0.983123  1.000000  0.942629  0.962962   
     Allemagne  0.993234  0.979347  0.880897  0.942629  1.000000  0.963257   

Type                                                    ...      IPCH  \
Country          Estonie     Grèce   Espagne  Finlande  ...  Lituanie   
Type Country                                            ...             
PIB  Autriche   0.938519  0.637396  0.958764  0.962428  ...  0.649532   
     Belgique   0.945167  0.656340  0.939378  0.948630  ...  0.685807   
     Bulga