# Sources

**PIB :** https://donnees.banquemondiale.org/indicateur/NY.GDP.MKTP.CD  
**Taux de chômage :** https://ec.europa.eu/eurostat/databrowser/view/UNE_RT_M__custom_14826434/default/table?lang=fr
**IPCH :**  https://ec.europa.eu/eurostat/databrowser/view/PRC_HICP_MANR__custom_14819170/default/table?lang=fr
**Historique des actions :**  
**Devise:** https://ec.europa.eu/eurostat/databrowser/view/tec00033/default/table?lang=en&category=t_ert  
**Matières premières:** https://bdm.insee.fr/series/sdmx/data/SERIES_BDM/010002100 **et** https://bdm.insee.fr/series/sdmx/data/SERIES_BDM/010002091  
**Dette publique:** https://ec.europa.eu/eurostat/databrowser/view/sdg_17_40/default/table?lang=fr

# Type d'analyses prévus et résultats attendus

## Analyses prévues

- Corrélations entre les différentes données  
- Etude d'indices boursiers

## Résultats attendus

### PIB
Un PIB croissant est souvent associé à une économie forte, ce qui peut influencer positivement les marchés boursiers. L'analyse cherchera à quantifier cette relation.  

### Taux de chômage
Un faible taux de chômage peut refléter une économie robuste et un climat favorable aux entreprises, impactant ainsi les actions. Les corrélations entre ces données et les performances boursières seront examinées.   

### IPCH (Indice des Prix à la Consommation Harmonisé)
L'inflation, mesurée ici par l'IPCH, est un facteur clé pour comprendre les ajustements des marchés financiers aux variations des taux d'intérêt et des prix.  

### Historique des actions
L'analyse des tendances passées dans les cours des actions permettra d'évaluer la réactivité des marchés aux changements des indicateurs économiques.  

### Devise
Les fluctuations des taux de change peuvent avoir un impact direct, notamment pour les entreprises opérant à l'international. Les relations entre les cours des actions et les variations des devises seront explorées.  

### Matières premières
Certains secteurs boursiers sont fortement dépendants des prix des matières premières. L'étude analysera les corrélations spécifiques entre ces prix et les performances des actions dans les secteurs concernés.  

### Dette intérieure
Le niveau d'endettement d'un pays peut influencer la confiance des investisseurs et, par conséquent, le comportement des marchés. L'étude des corrélations dans ce contexte sera essentielle.  


# Début du code

In [1]:
#necessary imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
import warnings
import xml.etree.ElementTree as ET
from collections import defaultdict
import requests

import ipywidgets as  widgets
from ipywidgets import interact, widgets, VBox, HBox
from ipywidgets import interact_manual
import geopandas as gpd

import yfinance as yf

# Dette publique

In [2]:
# Load the data from the Excel file
warnings.filterwarnings("ignore", category=UserWarning, module="openpyxl")
# Importer les données depuis l'URL
dette_publique_url = 'https://sebastien-hein.emi.u-bordeaux.fr/OI-sbzrthstrm/DATA/dette_pub.xlsx'
code_url = 'https://sebastien-hein.emi.u-bordeaux.fr/OI-sbzrthstrm/DATA/code.tsv'

# Charger les fichiers directement depuis l'URL
dette_publique = pd.read_excel(dette_publique_url, sheet_name='Feuille 1')
code = pd.read_csv(code_url, sep='\t')

# Define column names for code and label
col_code = 'CODE'
col_label = 'Label - French'

def find_value(x):
    """Find the label value based on the code."""
    matched = not code[code[col_code] == x].empty
    if matched:
        return code[code[col_code] == x].loc[:, col_label].iloc[0]
    elif x == 'TIME':
        return 'Country'
    return None

# Modify the index of the dataframe
dette_publique.index = dette_publique.iloc[:, 0].apply(find_value)
dette_publique.index.name = None

# Set the column names based on the 'Country' row
dette_publique.columns = dette_publique.loc['Country']

# Filter the dataframe to include only rows from 'Belgique' to 'Suède' onwards and exclude the 'TIME' column
# Indeed, only the countries interest us,
# and the datas are missing for Islande, Norvège, Suisse and United Kingdom.
dette_publique = dette_publique.loc['Belgique':'Suède', dette_publique.columns != 'TIME']

# Filter the dataframe to include only columns from the year 2002 onwards
dette_publique = dette_publique.loc[:,2002:] # Problème non résolu: Si l'on prend une date inférieure à 2002, 
                                             #                      l'interpolation ne fonctionne nul part.

def to_date(x):
    """Convert a value to datetime."""
    return pd.to_datetime(x, format='%Y')

# Vectorize the to_date function
vect_to_date = np.vectorize(to_date)

# Convert the columns to datetime
dette_publique.columns = vect_to_date(dette_publique.columns.values)

monthly_dates = pd.date_range(start=dette_publique.columns.values[0], end=dette_publique.columns.values[-1], freq='MS')

# Add columns for each month from 2013 to 2023
dette_publique = dette_publique.reindex(columns=dette_publique.columns.union(monthly_dates))

def fill_val(x):
    """Fill missing values by resampling and interpolating."""
    return x.resample('MS').interpolate(method='quadratic')

# Apply the fill_val function to each row
dette_publique = dette_publique.apply(func=fill_val, axis=1).T
dette_publique.columns.name = 'Country'

dette_publique.isna().sum().sum() # Number of missing values (0)


0

# IPCH

In [3]:
# Define URLs for the data sources
ipch_url = 'https://sebastien-hein.emi.u-bordeaux.fr/OI-sbzrthstrm/DATA/ipch.tsv'
code_cp_url = 'https://sebastien-hein.emi.u-bordeaux.fr/OI-sbzrthstrm/DATA/code_cp.tsv'

# Load the data directly from the URLs
ipch = pd.read_csv(ipch_url, sep='\t')  # Load the ipch data using tab as a separator
code_cp = pd.read_csv(code_cp_url, sep='\t')  # Load the code_cp data using tab as a separator


# Set index
def get_CP(x): return x[8:12]  # Extract CP code
def get_id(x): return x[13:]  # Extract id
vect_get_cp = np.vectorize(get_CP)
vect_get_id = np.vectorize(get_id)
ipch['CP'] = vect_get_cp(ipch.iloc[:, 0])  # Apply CP extraction
ipch['id'] = vect_get_id(ipch.iloc[:, 0])  # Apply id extraction

ipch = ipch[ipch['CP'] == 'CP00']
ipch.drop(columns = 'CP', inplace = True)

ipch.drop(columns='freq,unit,coicop,geo\\TIME_PERIOD', inplace=True)  # Drop unnecessary columns
ipch['country'] = ipch.loc[:, 'id'].apply(find_value)  # Find country names
ipch.set_index(['country'], inplace=True)  # Set index
ipch.drop(columns='id', inplace=True)  # Drop id column

# Convert the columns to datetime
def to_date_M(x):
    """Convert a value to datetime."""
    try:
        return pd.to_datetime(x[:-1], format='%Y-%m')
    except:
        print('fail')
        return x

vect_to_date_M = np.vectorize(to_date_M)
ipch.columns = vect_to_date_M(ipch.columns.values)  # Apply datetime conversion

# Filter data
ipch = ipch[~ipch.index.str.startswith(('Union', 'Zone', 'Espace'))]  # Exclude certain countries

# Clean and convert to numeric
ipch = ipch.map(lambda x: pd.to_numeric(
    str(x).replace(' ', '').replace('d', ''), errors='coerce'))
ipch = ipch.T
ipch.columns.name = 'Country'

# Missing values
# Initialiser le dictionnaire pour stocker les plages de dates manquantes
missing_ranges = defaultdict(list)

# Identifier les valeurs manquantes
missing_values = ipch.isna()

# Parcourir chaque pays (colonne) pour trouver les plages de dates manquantes
for country in ipch.columns:
    country_missing = missing_values[country]
    if not country_missing.empty:
        # Trouver les plages de dates manquantes
        missing_dates = country_missing[country_missing].index
        start_date = None
        for date in missing_dates:
            if start_date is None:
                start_date = date
            if (date + pd.DateOffset(months=1)) not in missing_dates:
                end_date = date
                missing_ranges[country].append((start_date.strftime('%Y-%m'), end_date.strftime('%Y-%m')))
                start_date = None

# Convertir le defaultdict en dict
missing_ranges = dict(missing_ranges)

for country, dates in missing_ranges.items():
    print(f'{country}: {dates}')

# Delete the country for which the missing data is on a bigger period than 4 years
ipch.drop(columns = 'Albanie, Kosovo*, Monténégro'.split(', '), inplace = True)

# Fill missing values using interpolation as the most consistent method for long gaps
ipch.interpolate(method='time', inplace=True, limit_direction='both')  # Interpolate linearly by date for smoother transitions

# Optionally fill remaining missing values (if interpolation failed for some edge cases) with column mean
ipch.fillna(ipch.mean(), inplace=True)

ipch.isna().sum().sum() # Number of missing values (0)

Albanie: [('2010-12', '2016-11')]
Monténégro: [('2010-12', '2015-11')]
United Kingdom: [('2020-12', '2024-11')]
Kosovo*: [('2010-12', '2016-11')]


0

# Chomage

In [4]:
def parse_xml(file_url):
    """Parse XML file and extract data into a DataFrame."""
    
    # Download the XML file content using requests
    response = requests.get(file_url)
    xml_content = response.text  # Get the content of the XML file as a string
    
    # Parse the XML content with ElementTree
    root = ET.fromstring(xml_content)  # Parse the XML string directly
    
    # Initialize lists to store data
    data = []
    columns = set()
    rows = set()

    # Extract data from <Series> and <Obs> tags
    for series in root.findall('.//Series'):
        geo = series.attrib.get('geo')  # Get "geo" attribute
        if geo:
            rows.add(geo)
            for obs in series.findall('Obs'):
                time_period = obs.attrib.get('TIME_PERIOD')  # Get "TIME_PERIOD" attribute
                obs_value = obs.attrib.get('OBS_VALUE')  # Get "OBS_VALUE" attribute
                if time_period and obs_value:
                    columns.add(time_period)
                    data.append((geo, time_period, obs_value))

    # Create DataFrame with appropriate indices
    df = pd.DataFrame(index=sorted(rows), columns=sorted(columns))

    # Fill DataFrame with extracted values
    for geo, time_period, obs_value in data:
        df.at[geo, time_period] = obs_value

    return df

# URL for the XML data
file_url = 'https://sebastien-hein.emi.u-bordeaux.fr/OI-sbzrthstrm/DATA/chomage.xml'

# Load the data
chomage = parse_xml(file_url)

# Format the data
chomage.columns = chomage.columns.map(lambda x: \
                                      pd.to_datetime(x, format='%Y-%m'))  # Convert columns to datetime
chomage.index = chomage.index.map(lambda x: \
                code.loc[code.loc[:, 'CODE'] == x, 'Label - French'].iloc[0])  # Map index to labels
chomage.drop('Zone euro - 20 pays (à partir de 2023)', inplace=True)  # Drop specific rows
chomage.drop('Union européenne - 27 pays (à partir de 2020)', inplace=True)  # Drop specific rows
chomage = chomage.apply(pd.to_numeric).T  # Convert data to numeric
chomage.columns.name = 'Country'


# Missing values
missing_ranges = defaultdict(list) # Initialize dictionary to store missing date ranges
missing_values = chomage.isna() # Identify missing values

for country in chomage.columns: # Loop through each country (column) to find missing date ranges
    country_missing = missing_values[country]
    if country_missing.any():
        # Find the ranges of missing dates
        missing_dates = country_missing[country_missing].index
        start_date = None
        for date in missing_dates:
            if start_date is None:
                start_date = date
            if date + pd.DateOffset(months=1) not in missing_dates:
                end_date = date
                missing_ranges[country].append((start_date.strftime('%Y-%m'), 
                                                end_date.strftime('%Y-%m')))
                start_date = None

# Fill missing values using interpolation as the most consistent method for long gaps
chomage.interpolate(method='time', inplace=True, limit_direction='both')  # Interpolate linearly by date for smoother transitions

# Optionally fill remaining missing values (if interpolation failed for some edge cases) with column mean
chomage.fillna(chomage.mean(), inplace=True)

ipch.isna().sum().sum() # Number of missing values (0)


0

# PIB

In [5]:
def parse_xml_pib(file_url):
    """Parse XML file and extract data into a DataFrame."""
    
    # Download the XML file content using requests
    response = requests.get(file_url)
    xml_content = response.text  # Get the content of the XML file as a string
    
    # Parse the XML content with ElementTree
    root = ET.fromstring(xml_content)  # Parse the XML string directly
    
    # Initialize a list to store data
    data = []

    # Extract data
    for record in root.findall('.//record'):
        record_data = {}
        for field in record.findall('field'):
            name = field.attrib.get('name')
            text = field.text
            match name:
                case "Country or Area": record_data['country'] = text
                case "Value": 
                    try: record_data['value'] = float(text)
                    except: record_data['value'] = None
                case "Year": record_data['year'] = pd.to_datetime(str(text))
        data.append(record_data)

    # Create DataFrame from the list of dictionaries
    df = pd.DataFrame(data)

    # Remove duplicates
    df = df.drop_duplicates(subset=['year', 'country'])

    # Pivot the DataFrame to get the desired format
    df = df.pivot(index='year', columns='country', values='value')

    return df

# URL for the XML data
url = 'https://julie-sclaunich.emi.u-bordeaux.fr/DATA/API_NY.GDP.MKTP.CD_DS2_fr_xml_v2_38351.xml'

# Load the data
pib = parse_xml_pib(url)



In [6]:
# Reindex monthly
monthly_dates = pd.date_range(start=pib.index.values[0], end=pib.index.values[-1], freq='MS')

# Add columns for each month from 2013 to 2023
pib = pib.reindex(index=pib.index.union(monthly_dates))
pib.columns.name = 'Country'

# Select only same dates and countries as dette_publique
pib = pib.loc[dette_publique.index, dette_publique.columns.intersection(pib.columns)]

def fill_val(x):
    """Fill missing values by resampling and interpolating."""
    return x.resample('MS').interpolate(method='quadratic')

# Apply the fill_val function to each row
pib = pib.apply(func=fill_val, axis=0)

pib.isna().sum().sum() # Number of missing values (0)


0

# Devise

In [7]:
# URL of the CSV file
url_3 = 'https://julie-sclaunich.emi.u-bordeaux.fr/DATA/estat_tec00033_filtered_en.csv'

# Load the CSV file into a DataFrame
devise = pd.read_csv(url_3)

# Remove unnecessary columns to clean up the data
columns_to_delete = ['DATAFLOW', 'LAST UPDATE', 'freq', 'statinfo', 'unit', 'OBS_FLAG']
devise.drop(columns=columns_to_delete, inplace=True)

# Convert 'TIME_PERIOD' to datetime format and filter rows after 2012
devise['TIME_PERIOD'] = pd.to_datetime(devise['TIME_PERIOD'], format='%Y', errors='coerce')  # Convert to datetime
devise = devise[devise['TIME_PERIOD'] > '2012-12-31']  # Keep rows with dates after 2012
devise['TIME_PERIOD'] = devise['TIME_PERIOD'].dt.strftime('%Y/%m')  # Format as YYYY/MM

# Reshape the DataFrame to have 'TIME_PERIOD' as row index and 'currency' as columns
devise = devise.pivot(index='TIME_PERIOD', columns='currency', values='OBS_VALUE')

# Set the index to be a DatetimeIndex for resampling
devise.index = pd.to_datetime(devise.index, format='%Y/%m', errors='coerce')
devise.index.name = None
devise.columns.name = 'Currency'

# Define a function to fill missing values by resampling and interpolating
def fill_val(x):
    """Fill missing values by resampling to monthly frequency and using quadratic interpolation."""
    return x.resample('MS').interpolate(method='quadratic')

# Apply the interpolation function to fill missing values
devise = fill_val(devise)
print(devise.isna().sum().sum()) # Number of missing values (24 a voir pourquoi)
# Display a sample of the corrected data
devise.head()


24


Currency,Bosnia and Herzegovina convertible mark,Bulgarian lev,Canadian dollar,Czech koruna,Danish krone,Hungarian forint,Icelandic króna,Japanese yen,North Macedonian denar,Norwegian krone,Polish zloty,Pound sterling,Romanian leu,Russian rouble,Serbian dinar,Swedish krona,Swiss franc,Turkish lira,US dollar
2013-01-01,1.95583,1.9558,1.3684,25.98,7.4579,296.87,162.38,129.66,61.585,7.8067,4.1975,0.84926,4.419,42.337,113.1369,8.6515,1.2311,2.5335,1.3281
2013-02-01,1.95583,1.9558,1.384134,26.195389,7.457178,298.365029,161.761335,131.231491,61.59073,7.849537,4.197594,0.848684,4.42244,42.618453,113.521815,8.697162,1.236956,2.577914,1.33979
2013-03-01,1.95583,1.9558,1.397158,26.376653,7.456598,299.637286,161.199346,132.544486,61.595525,7.888816,4.197485,0.847671,4.425333,42.944501,113.864562,8.737177,1.241087,2.615999,1.348489
2013-04-01,1.95583,1.9558,1.410265,26.562635,7.456038,300.959399,160.573608,133.88034,61.600413,7.932953,4.19715,0.846004,4.428299,43.385011,114.238586,8.780119,1.244379,2.655914,1.356062
2013-05-01,1.95583,1.9558,1.421635,26.727907,7.455577,302.152358,159.964517,135.05522,61.604722,7.976317,4.19661,0.843845,4.430932,43.890885,114.595096,8.820316,1.246282,2.692291,1.361329


# Matières premières
## Or

In [8]:
url_or = "https://bdm.insee.fr/series/sdmx/data/SERIES_BDM/010002100" # URL of the serie

response = requests.get(url_or) # Retrieve XML data
response.raise_for_status() # Checks that the request is successful
xml_content = response.content


root = ET.fromstring(xml_content) # Parse XML content

root = ET.fromstring(xml_content) # Load XML content


data = [] # Initialize a list to store the data


for series in root.findall(".//{*}Series"): # Browse each series

    for obs in series.findall(".//{*}Obs"): # Browse the observation in  each series

        # Extract relevant 
        time_period = obs.attrib.get("TIME_PERIOD")
        obs_value = obs.attrib.get("OBS_VALUE")
        # Add the data at the list
        data.append({"TIME_PERIOD": time_period, "OBS_VALUE": obs_value})


df_or = pd.DataFrame(data) # Create a DataFrame from the extracted data


# Convert columns to appropriate types
df_or["TIME_PERIOD"] = pd.to_datetime(df_or["TIME_PERIOD"], format="%Y-%m")
df_or["OBS_VALUE"] = pd.to_numeric(df_or["OBS_VALUE"])



# Convert TIME_PERIOD to datetime for easier filtering
df_or['TIME_PERIOD'] = pd.to_datetime(df_or['TIME_PERIOD'], format='%Y-%m')

# Filter years between 2013 and 2023
start_date = '2013-01-01'
end_date = '2023-12-31'
df_or = df_or[(df_or['TIME_PERIOD'] >= start_date) & (df_or['TIME_PERIOD'] <= end_date)]

df_or.set_index('TIME_PERIOD', inplace=True) #indexes the years
df_or.index.name = None
df_or.columns.name = None
df_or.rename(columns = {'OBS_VALUE': 'Or'}, inplace = True)
print(df_or.isna().sum().sum()) # Number of missing values (0)
# show the 5 first rows
df_or.head()

0


Unnamed: 0,Or
2023-12-01,201.5
2023-11-01,198.6
2023-10-01,195.8
2023-09-01,194.0
2023-08-01,190.2


## Pétrole

In [9]:
url_petrol = "https://bdm.insee.fr/series/sdmx/data/SERIES_BDM/010002091" # URL of the serie

response = requests.get(url_petrol) # Retrieve XML data
response.raise_for_status()   # Checks that the request is successful
xml_content = response.content


root = ET.fromstring(xml_content) # Parse XML content

root = ET.fromstring(xml_content) # Load XML content

# Initialiser une liste pour stocker les données
data = []


for series in root.findall(".//{*}Series"): # Browse each series
   
    for obs in series.findall(".//{*}Obs"):  # Browse the observation in  each series

       # Extract relevant attributes
        time_period = obs.attrib.get("TIME_PERIOD")
        obs_value = obs.attrib.get("OBS_VALUE")
         # Add the data at the list
        data.append({"TIME_PERIOD": time_period, "OBS_VALUE": obs_value})


df_petrol = pd.DataFrame(data)  # Create a DataFrame from the extracted data

# Convert columns to appropriate types
df_petrol["TIME_PERIOD"] = pd.to_datetime(df_petrol["TIME_PERIOD"], format="%Y-%m")
df_petrol["OBS_VALUE"] = pd.to_numeric(df_petrol["OBS_VALUE"])


# Convert TIME_PERIOD to datetime for easier filtering
df_petrol['TIME_PERIOD'] = pd.to_datetime(df_petrol['TIME_PERIOD'], format='%Y-%m')

# Filter years between 2013 and 2023
start_date = '2013-01-01'
end_date = '2023-12-31'
df_petrol = df_petrol[(df_petrol['TIME_PERIOD'] >= start_date) & (df_petrol['TIME_PERIOD'] <= end_date)]

df_petrol.set_index('TIME_PERIOD', inplace=True) #indexes the years
df_petrol.index.name = None
df_petrol.columns.name = None
df_petrol.rename(columns = {'OBS_VALUE': 'Petrol'}, inplace = True)

print(ipch.isna().sum().sum()) # Number of missing values (0)
# show the 5 first rows
df_petrol.head()
material = pd.concat((df_petrol, df_or), axis = 1, join = 'inner')
material.columns.name = 'Material'
material.head()

0


Material,Petrol,Or
2023-12-01,118.1,201.5
2023-11-01,127.3,198.6
2023-10-01,142.3,195.8
2023-09-01,145.2,194.0
2023-08-01,130.9,190.2


# Action

## Présentation de la classe ```Indice``` pour importer les données d'un actif et calculer les indices

La classe importe les données depuis YahooFinance.

L'argument `ticker_symbol` suffit lors de l'instanciation. Il s'agit du symbole boursier de l'action dont les données seront téléchargées.\
L'argument `data` lors de l'instanciation permet de donner directement les données si elles sont téléchargées.

La méthode `update` calcule les différents indices qui seront affichés par la méthode `affichage`. Elle est exécutée lors de l'instanciation.

L'historique du prix de l'action est accessible via la méthode `get_data`.\
Les calculs des indices OBV, ADLine, ADX et Aroon sont implémentés et accessibles via la méthode `get_index`.\

## Présentation des indices

### OBV
Il s'agit d'un indicateur de momentum qui mesure les flux de volume positifs et négatifs. 

Si la courbe de l'OBV augmente (ou diminue) de façon prononcée, sans changement significatif du prix de l'actif, cela indique qu'à un moment, le prix devrait sauter vers le haut (ou vers le bas).

Lorsque les institutions commencent à acheter un actif que les particuliers continuent de vendre, le prix est encore légèrement en baisse ou se stabilise, alors que le volume augmente. Le phénomène inverse se produit également. 

### ADLine
L'ADLine (*Accumulative Distribution Line*) est un indicateur qui mesure le flux d'argent pour un actif en prenant en compte à la fois les variations de prix et les volumes.

Une ADLine en hausse indique une pression d'achat accrue, souvent interprétée comme une accumulation de la part des investisseurs, tandis qu'une ADLine en baisse révèle une pression de vente ou une distribution.

Une divergence entre l'ADLine et le prix de l'actif peut être utilisée pour anticiper un retournement potentiel de tendance. Par exemple, si le prix monte mais que l'ADLine chute, cela pourrait signaler un affaiblissement de la tendance haussière.

### ADX
L'ADX identifie une tendance forte lorsqu'il est au-dessus de 25 et une tendance faible lorsqu'il est en-dessous de 20. \
On peut également utiliser le franchissement des lignes $-DI$ et $+DI$ pour générer des signaux de trade: 
- Lorsque $+DI$ passe au-dessus de $-DI$ et que l'ADX est supérieur à 20 (idéalement à 25), alors il s'agit d'un potentiel signal pour acheter.
- Inversement, lorsque $-DI$ passe au-dessus de $+DI$ et que l'ADX est supérieur à 20 (ou 25), il s'agit d'un potentiel signal pour vendre.

### Aroon
Indique si le prix maximal ou minimal a été atteint depuis longtemps ou non sur les dernières périodes (25 par défaut). Il peut s'agir du prix d'ouverture, de clôture, le prix maximal ou minimal sur la période. S'il est à 100, c'est que le prix maximal a été atteint la veille et que le prix minimal a été atteint avant toutes les périodes étudiées. S'il est à -100 dans le cas contraire.

In [10]:
class Index():
    """
    A class to represent and calculate various financial indices for a given stock.
    """

    @staticmethod
    def load_data(ticker_symbol='AAPL', period='max'):
        """
        Import stock price history.
        
        IN: ticker_symbol: <str> Stock identifier
            period: <str> Period over which data is downloaded
                    arg: '1d', '5d', '1mo', '3mo', '6mo', '1y', '2y', '5y', '10y', 'ytd', 'max'
        OUT: <pd.Series>: Stock history
        """
        ticker = yf.Ticker(ticker_symbol)  # Create a Ticker object
        data_hist = ticker.history(period=period)  # Download historical data
        dates = pd.Series(data_hist['Open'].index).apply(lambda x: pd.to_datetime(x.strftime('%Y-%m-%d')))  # Convert dates
        data = data_hist.reset_index(drop=True).set_index(dates).drop(['Dividends', 'Stock Splits'], axis=1)  # Prepare data
        data.index.name = None
        return data

    @classmethod
    def smooth(cls, series, period=14, method='simple', fill='NoFill', alpha=1/14):
        """
        Calculate moving average.
        
        IN: series: <Pandas Series> Series for which the moving average is calculated
            period: <int> Number of periods used for the moving average
            method: <str> Method used for calculating the average
                    'simple': Calculate arithmetic average
                    'exp': Calculate exponential average
                    'weight': Assign increasing weights to series and calculate arithmetic average
            fill: <str> If the first period values are initialized or left empty (fill='NoFill')
                  'constant': Fill with the first non-null value (at position period)
                  'data': Fill with the first period values of the series
                  'smooth': Fill value i with the moving average of the first i+1 values of the series over a period of i+1, 
                            for i from 1 to period - 1. The same method of calculating the average is used.
            alpha: <int> Argument for calculating the average by the 'exp' method. Must be between 0 and 1.
        OUT: <Pandas Series>: Smoothed series
        """
        match method:
            case 'simple':
                smoothed = series.rolling(period).sum().copy() / period  # Simple moving average
            case 'exp':
                smoothed = [series.iloc[0]]  # Initialize with the first value
                for i in range(1, len(series)):
                    smoothed += [alpha * series.iloc[i] + (1 - alpha) * smoothed[i - 1]]  # Exponential moving average
                smoothed = pd.Series(smoothed)
            case 'weight':
                weight = np.array([k for k in range(1, period + 1)])  # Weights for weighted moving average
                smoothed = series.rolling(period).apply(lambda x: np.dot(x, weight).sum() / weight.sum())  # Weighted moving average
        
        match fill:
            case 'constant':
                smoothed.iloc[:period] = smoothed.iloc[period - 1]  # Fill with constant value
            case 'data':
                smoothed.iloc[:period - 1] = series.iloc[:period - 1]  # Fill with initial data values
            case 'smooth':
                smoothed.iloc[0] = series.iloc[0]  # Initialize with the first value
                for i in range(1, period):
                    smoothed.iloc[i] = cls.smooth(series=series.iloc[:i + 1], period=i + 1, method=method, fill='NoFill').iloc[i]  # Smooth fill
            case 'NoFill':
                pass  # No fill

        return smoothed

    def __init__(self, name = 'Apple', ticker_symbol='AAPL', data=False) -> None:
        """
        Initialize the Index class.
        
        IN: ticker_symbol: <str> Stock ticker symbol to study
            data: <bool or Pandas DataFrame> If data is False, the history will be downloaded from Yahoo Finance. 
                                             Otherwise, data must contain the data downloaded from Yahoo Finance and 
                                             transformed as in the static method load_data.
        """
        self.__name = name
        self.__TICKER_SYMBOL = ticker_symbol  # Stock ticker symbol
        self.__data = None  # Data placeholder
        self.__index = None  # Index placeholder
        self.__date_limits = {'Start': None, 'End': None}  # Date limits

        self.__start(data)  # Initialize data

    def __start(self, data):
        """Start the data initialization."""
        if data is not False:
            self.__data = data  # Use provided data
        else:
            self.__data = self.load_data(ticker_symbol = self.__TICKER_SYMBOL)  # Load data from Yahoo Finance

        self.__index = pd.DataFrame(index = self.__data.index)  # Initialize index DataFrame
        self.__date_limits['Start'] = self.__data.index[0]  # Set start date
        self.__date_limits['End'] = self.__data.index[-1]  # Set end date

    def get_name(self):
        return self.__name
 
    def get_data(self):
        """Return the data."""
        monthly_dates = pd.date_range(start=self.__data.index.values[0], end=self.__data.index.values[-1], freq='MS')

        # Add columns for each month from 2013 to 2023
        self.__data = self.__data.reindex(index=self.__data.index.union(monthly_dates))

        def fill_val(x):
            """Fill missing values by resampling and interpolating."""
            return x.resample('MS').interpolate(method='quadratic')

        # Apply the fill_val function to each row
        self.__data = self.__data.apply(func=fill_val, axis=0)

        return self.__data[self.__data.index.is_month_start]
    
    def get_index(self):
        """Return the index."""
        monthly_dates = pd.date_range(start=self.__index.index.values[0], end=self.__index.index.values[-1], freq='MS')

        # Add columns for each month from 2013 to 2023
        self.__index = self.__index.reindex(index=self.__index.index.union(monthly_dates))

        def fill_val(x):
            """Fill missing values by resampling and interpolating."""
            return x.resample('MS').interpolate(method='quadratic')

        # Apply the fill_val function to each row
        self.__index = self.__index.apply(func=fill_val, axis=0)

        return self.__index[self.__index.index.is_month_start]
   
    def update(self, **kwargs):
        """Update the index with calculated indicators."""
        self.__index = pd.DataFrame(index=self.__data.index)  # Reset index DataFrame
        self.OBV(**kwargs)
        self.ADLine(**kwargs)
        self.ADX(**kwargs)
        self.Aroon(**kwargs)
    
    def add(self, indicator, dates, **kwargs):
        """Add a specific indicator to the index."""
        match indicator:
            case 'C': return self.C(dates=dates, **kwargs)  # Add indicator 'C'

    # Display methods
    def display(self, start_date, end_date, list_indices, **kwargs):
        """
        Display the graphs of the specified indices in list_indices.
        
        IN: start_date, end_date: <str> Start and end dates for displaying the indices
            list_indices: <list of str> List of indices in string format to display. 
                         Multiple lists will give multiple graphs. 
                         Different indices within a single list will be displayed on the same graph.
                         Index names: 'OBV', 'ADLine', 'ADX'
            **kwargs: Arguments to pass to the index calculation methods
        """
        self.update(**kwargs)  # Update indices

        # Convert start and end date
        start = pd.to_datetime(start_date)
        end = pd.to_datetime(end_date)

        if start < self.__date_limits['Start']: start = self.__date_limits['Start']  # Adjust start date
        else: pass
        if end > self.__date_limits['End']: end = self.__date_limits['End']  # Adjust end date
        else: pass

        data = self.__index.loc[start:end]  # Filter data by date range
        x = pd.Series(data.index).apply(lambda x: x.strftime('%d/%m/%y'))  # Format dates for x-axis

        fig, ax = plt.subplots(len(list_indices), 1, figsize=(20, 3 * len(list_indices)))  # Create subplots
        fig.suptitle(f'Period {start.strftime("%d/%m/%y")} - {end.strftime("%d/%m/%y")}')  # Set title
        for i, indices in enumerate(list_indices):
            for indicator in indices:
                if indicator in data:
                    y = data[indicator]  # Get data for the index if present
                else:
                    y = self.__add(indicator, dates=data.index, **kwargs)  # Add index data if not present
                style = kwargs.get(f'{indicator}_style', '')  # Get style for the index
                ax[i].plot(x, y, style, label=indicator)  # Plot the index
            ax[i].legend()
            ax[i].set_xticks(ticks=x.iloc[np.linspace(0, 1, 20) * (len(x) - 1)])
    
    # Index calculation methods
    def OBV(self, start='Close', end=False, **kwargs):
        """
        Calculate the On-Balance Volume (OBV) indicator.
        
        IN: start, end: <str> Among 'Open' and 'Close', the directions for adding volumes will be calculated according 
                        to the opening price ('Open') or closing price ('Close') of day n for the start and day n+1 for the end.
            **kwargs: Exists for compatibility
        OUT: <Pandas Series>: OBV indicator
        """
        if start == 'Open' and end == 'Close':
            direction = self.__data[end] - self.__data[start]  # Calculate direction based on Open and Close
        elif end:
            direction = self.__data[end] - self.__data[start].shift(1)  # Calculate direction based on shifted Close
        else: direction = self.__data[start] - self.__data[start].shift(1)  # Calculate direction based on shifted Open

        direction.iloc[0] = 1  # Set initial direction
        OBV = (self.__data['Volume'] * direction / abs(direction)).cumsum()  # Calculate OBV
        self.__index['OBV'] = np.where(np.isnan(OBV), OBV.shift(1), OBV)  # Handle NaN values and store OBV

        return self.__index['OBV']
        
    def ADLine(self, **kwargs):
        """
        Calculate the Accumulation/Distribution Line (ADLine) indicator.
        
        IN: **kwargs: Exists for compatibility
        OUT: <Pandas Series>: ADLine indicator
        """
        MFM = ((self.__data['High'] - 2 * self.__data['Close'] + self.__data['Low'])) / \
              (self.__data['Low'] - self.__data['High'])  # Money Flow Multiplier
        MFV = MFM * self.__data['Volume']  # Money Flow Volume

        self.__index['ADLine'] = MFV.cumsum()  # Calculate and store ADLine

        return self.__index['ADLine']
    
    def ADX(self, period_ADX=14, method='exp', fill='NoFill', alpha=1/14, **kwargs):
        """
        Calculate the Average Directional Index (ADX) indicator.
        
        IN: period_ADX: <int> Number of periods used for the ADX calculation
            method: <str> Method used for calculating the average
                    'simple': Calculate arithmetic average
                    'exp': Calculate exponential average
                    'weight': Assign increasing weights to series and calculate arithmetic average
            fill: <str> If the first period values are initialized or left empty (fill='NoFill')
                  'constant': Fill with the first non-null value (at position period)
                  'data': Fill with the first period values of the series
                  'smooth': Fill value i with the moving average of the first i+1 values of the series over a period of i+1, 
                            for i from 1 to period - 1. The same method of calculating the average is used.
            alpha: <int> Argument for calculating the average by the 'exp' method. Must be between 0 and 1.
            **kwargs: Exists for compatibility
        OUT: <Pandas Series>: ADX indicator
        """
        DMp = np.where(self.__data['High'] - self.__data['High'].shift(1) > self.__data['Low'].shift(1) - self.__data['Low'],
                       self.__data['High'] - self.__data['High'].shift(1), 0)  # Positive Directional Movement
        DMm = np.where(self.__data['High'] - self.__data['High'].shift(1) <= self.__data['Low'].shift(1) - self.__data['Low'],
                       self.__data['Low'].shift(1) - self.__data['Low'], 0)  # Negative Directional Movement

        TR = pd.DataFrame([self.__data['High'] - self.__data['Low'], 
                           self.__data['High'] - self.__data['Close'].shift(1), 
                           self.__data['Close'].shift(1) - self.__data['Low']]).apply(max)  # Set True Range

        # Smoothed Positive DM, Negative DM and True Range
        DMpsmooth = self.smooth(series=pd.Series(DMp), period=period_ADX, method=method, fill=fill, alpha=alpha)
        DMmsmooth = self.smooth(series=pd.Series(DMm), period=period_ADX, method=method, fill=fill, alpha=alpha)
        TRsmooth = self.smooth(series=pd.Series(TR), period=period_ADX, method=method, fill=fill, alpha=alpha)

        DIp = 100 * DMpsmooth / TRsmooth  # Positive Directional Indicator
        DIm = 100 * DMmsmooth / TRsmooth  # Negative Directional Indicator

        DX = 100 * (DIp - DIm) / (DIp + DIm)  # Directional Movement Index
        DX.iloc[0] = 0  # Set initial DX
        ADX = self.smooth(DX, period=period_ADX, method=method, fill=fill, alpha=alpha)  # Average Directional Index

        # Set index for ADX, DIp, DIm
        ADX.index = self.__index.index
        DIp.index = self.__index.index
        DIm.index = self.__index.index
        
        # Store ADX, Positive DI and Negative DI in index
        self.__index['ADX'] = ADX
        self.__index['pDI'] = DIp
        self.__index['mDI'] = DIm
        
        return self.__index['ADX']
    
    def Aroon(self, period_Aroon=25, event='Close', **kwargs):
        """
        Calculate the Aroon indicator.
        
        IN: period_Aroon: <int> Number of periods over which the index is calculated
            event: <str> Among 'Open', 'Close', 'Low', and 'High', the Aroon index will be calculated based on the variations of these events.
            **kwargs: Exists for compatibility
        OUT: <Pandas Series>: Aroon indicator
        """
        def indMin(i):
            if i < period_Aroon: start, end = 0, max(i, 1)
            else: start, end = i - period_Aroon, i
            return np.argmin(self.__data[event].iloc[start: end])  # Index of minimum value
        def indMax(i):
            if i < period_Aroon: start, end = 0, max(i, 1)
            else: start, end = i - period_Aroon, i
            return np.argmax(self.__data[event].iloc[start: end])  # Index of maximum value

        # Calculate argMin, argMax and Aroon indicator for each period
        argMin = pd.Series(range(len(self.__data))).apply(indMin)
        argMax = pd.Series(range(len(self.__data))).apply(indMax)
        Aroon = 100 * (argMin - argMax) / period_Aroon
        
        # Store Aroon in index
        Aroon.index = self.__index.index
        self.__index['Aroon'] = Aroon

        return self.__index['Aroon']
    
    # Other lines
    def C(self, dates, c=25):
        """
        Generate a horizontal curve.
        
        IN: dates: <Pandas Index> Dates for the curve
            c: <int> Height of the curve
        OUT: <Pandas Series>: Horizontal curve
        """
        return pd.Series(range(len(self.__index.index)), index=dates)



## Sélection d'action

In [11]:
list_entreprise = {
    # Entreprises du CAC 40
    "Accor": "AC.PA",
    "Air Liquide": "AI.PA",
    "ArcelorMittal": "MT.AS",
    "BNP Paribas": "BNP.PA",
    
    # Entreprises défensives
    "Nestlé": "NESN.SW",
    "Sanofi": "SAN.PA",
    "Novo Nordisk": "NOVO-B.CO",
    "GlaxoSmithKline": "GSK.L",
    
    # Entreprises cycliques
    "Volkswagen": "VOW3.DE",
    "BMW": "BMW.DE",
    "LVMH": "MC.PA",
    "Hermès": "RMS.PA",
    
    # Entreprises de valeur
    "TotalEnergies": "TTE.PA",
    "Schneider Electric": "SU.PA",
    "Airbus": "AIR.PA",
    "L'Oréal": "OR.PA",
    
    # Entreprises avec corrélations probables avec des indices macro-économiques ou techniques
    "SAP": "SAP.DE",
    "ASML": "ASML.AS",
    "Siemens": "SIE.DE",
    "Danone": "BN.PA",
    "Kering": "KER.PA",
    "Orange": "ORA.PA",
    "Publicis": "PUB.PA"
}

In [12]:
# Initialize empty DataFrames for historical actions and indices
hist_action = pd.DataFrame()
hist_index = pd.DataFrame()

# Iterate over each company in the dictionary
for company, ticker in list_entreprise.items():
    # Instantiate an Index object for the company
    index_obj = Index(name=company, ticker_symbol=ticker)
    
    # Update the index to calculate technical indices
    index_obj.update()
    
    # Get the historical action data
    action_data = index_obj.get_data()
    
    # Create a MultiIndex for the columns in the format ('Company', 'Price')
    action_data.columns = pd.MultiIndex.from_product([[company], action_data.columns], names=['Company', 'Price'])
    action_data.set_index(action_data.index, inplace=True)  # Ensure the index remains the DateTimeIndex
    
    # Concatenate the historical action data into hist_action
    hist_action = pd.concat([hist_action, action_data], axis=1)
    
    # Get the historical index data
    index_data = index_obj.get_index()
    
    # Create a MultiIndex for the columns in the format ('Company', 'Index')
    index_data.columns = pd.MultiIndex.from_product([[company], index_data.columns], names=['Company', 'Index'])
    index_data.set_index(index_data.index, inplace=True)  # Ensure the index remains the DateTimeIndex
    
    # Concatenate the index data into hist_index
    hist_index = pd.concat([hist_index, index_data], axis=1)


  return bound(*args, **kwds)


Air Liquide AI.PA
Publicis PUB.PA


In [13]:
hist_action.loc['2002':, :]

Company,Accor,Accor,Accor,Accor,Accor,Air Liquide,Air Liquide,Air Liquide,Air Liquide,Air Liquide,...,Orange,Orange,Orange,Orange,Orange,Publicis,Publicis,Publicis,Publicis,Publicis
Price,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume,...,Open,High,Low,Close,Volume,Open,High,Low,Close,Volume
2002-01-01,4.646613,4.646613,4.646613,4.646613,0.000000e+00,15.310338,15.310338,15.310338,15.310338,0.000000e+00,...,11.522519,11.522519,11.522519,11.522519,0.000000e+00,15.785186,15.785186,15.785186,15.785186,0.000000
2002-02-01,4.734243,4.734243,4.618163,4.648890,1.148837e+06,15.514610,15.611880,15.300615,15.417339,2.017343e+06,...,9.572158,9.610653,8.943424,8.971653,6.914240e+06,14.936230,15.413765,14.936230,15.328870,484905.000000
2002-03-01,4.830975,4.895843,4.795696,4.862840,1.319025e+06,15.923143,16.370586,15.757784,16.253862,1.923018e+06,...,7.698786,7.916919,7.698786,7.852762,4.716427e+06,16.161898,16.660657,16.077003,16.453726,393082.000000
2002-04-01,5.120037,5.120037,5.120037,5.120037,0.000000e+00,16.020416,16.020416,16.020416,16.020416,0.000000e+00,...,8.904932,8.904932,8.904932,8.904932,0.000000e+00,20.390736,20.390736,20.390736,20.390736,0.000000
2002-05-01,5.105242,5.105242,5.105242,5.105242,0.000000e+00,16.711031,16.711031,16.711031,16.711031,0.000000e+00,...,6.918643,6.918643,6.918643,6.918643,0.000000e+00,17.987152,17.987152,17.987152,17.987152,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2024-09-01,36.300782,36.237238,34.918575,35.264210,1.248972e+06,172.543858,172.457424,169.212728,169.877726,7.082982e+05,...,10.128504,10.178257,10.091507,10.219860,5.506408e+06,97.115961,97.069237,95.096323,95.316574,609924.483516
2024-10-01,39.169998,39.230000,37.849998,38.320000,1.122141e+06,173.720001,173.820007,169.839996,170.940002,6.621300e+05,...,10.015405,10.059056,9.957205,10.020255,7.110345e+06,98.580002,99.000000,96.779999,96.879997,697233.000000
2024-11-01,41.520000,41.919998,41.279999,41.849998,3.448940e+05,164.880005,166.759995,164.279999,166.300003,4.285810e+05,...,9.753501,9.860202,9.753501,9.758351,4.762588e+06,97.800003,98.059998,97.480003,98.000000,254459.000000
2024-12-01,,,,,,,,,,,...,,,,,,,,,,


# Fusion des dataframe

In [14]:
common_countries = ipch.columns.intersection(\
                   dette_publique.columns.intersection(\
                   chomage.columns.intersection(\
                    pib.columns)))

common_dates = ipch.index.intersection(\
                   dette_publique.index.intersection(\
                   chomage.index.intersection(\
                    pib.index)))

expected_dates = pd.date_range(start=common_dates[0], end=common_dates[-1], freq='MS')
all_dates_present = expected_dates.isin(common_dates).all()

print(f'Toutes les dates du {common_dates[0].strftime('%d/%m/%Y')}\
       au {common_dates[-1].strftime('%d/%m/%Y')} sont présentes' \
        if all_dates_present else 'Il manque des dates')

def set_index(df, index_name):
    df = df.loc[common_dates, common_countries]
    if isinstance(df.columns, pd.MultiIndex):
        # Ajouter le nouvel index au niveau supérieur du MultiIndex existant
        new_index = pd.MultiIndex.from_tuples([(index_name, *idx) \
                        for idx in df.columns], names=['Type'] + df.columns.names)
    else:
        # Créer un MultiIndex en juxtaposant le nouvel index et l'index existant
        new_index = pd.MultiIndex.from_tuples([(index_name, idx) 
                        for idx in df.columns], names=['Type', df.columns.names[0]])
    df.columns = new_index
    return df
# dette_publique = set_index(dette_publique, 'Dette publique')
# chomage = set_index(chomage, 'Chomage')
# ipch = set_index(ipch, 'IPCH')
# pib = set_index(pib, 'PIB')

hist_action = hist_action[hist_action.index.isin(common_dates)]
hist_index = hist_index[hist_index.index.isin(common_dates)]


data = pd.concat((dette_publique, chomage, ipch, pib), axis = 1, join = 'inner')
data = data.groupby('Type', axis = 1)


Toutes les dates du 01/01/2013       au 01/01/2023 sont présentes


  data = data.groupby('Type', axis = 1)


KeyError: 'Type'

# Affichage

L'affichage des graphiques se fait trois fois, je ne sais pas pourquoi.\
Pour l'affichage de la carte, j'ai commencé à coder une fonction tout en bas. Il manque la traduction des noms des pays. Je ne sais pas si ça va fonctionner après ça, à voir.

## Fonction d'affichage de graphique

In [None]:
def plot_graph(countries, start, end, data, data_type):
    plt.figure(figsize=(9, 4))
    for country in countries.split(', '):
        data.loc[start:end, (data_type, country)].plot(label=f'{country}')

    plt.title(f'{data_type} ({start.strftime("%m/%Y")} - {end.strftime("%m/%Y")})')
    plt.legend()
    plt.show()

## Fonction d'affichage de carte

In [None]:
# Load GeoJSON file into a GeoDataFrame
url = "https://raw.githubusercontent.com/leakyMirror/map-of-europe/master/GeoJSON/europe.geojson"
europe = gpd.read_file(url)

def plot_map(date: str, data: pd.core.groupby.DataFrameGroupBy, data_type: str):
    """
    Plot a map of Europe for the specified date and data type.
    
    Args:
        date (str): Date in 'YYYY-MM' format for which to filter the data.
        data (pd.core.groupby.DataFrameGroupBy): A grouped DataFrame with MultiIndex columns
            ('Type', 'Country') and a DateTimeIndex.
        data_type (str): The type of data to display (e.g., 'IPCH').
    """
    # Prepare the data
    df = data.get_group(data_type).copy()
    df.columns = df.columns.droplevel('Type')
    df.rename(columns={
        "Albanie": "Albania",
        "Allemagne": "Germany",
        "Andorre": "Andorra",
        "Autriche": "Austria",
        "Belgique": "Belgium",
        "Biélorussie": "Belarus",
        "Bosnie-Herzégovine": "Bosnia and Herzegovina",
        "Bulgarie": "Bulgaria",
        "Croatie": "Croatia",
        "Danemark": "Denmark",
        "Espagne": "Spain",
        "Estonie": "Estonia",
        "Finlande": "Finland",
        "France": "France",
        "Grèce": "Greece",
        "Hongrie": "Hungary",
        "Irlande": "Ireland",
        "Islande": "Iceland",
        "Italie": "Italy",
        "Kosovo": "Kosovo",
        "Lettonie": "Latvia",
        "Liechtenstein": "Liechtenstein",
        "Lituanie": "Lithuania",
        "Luxembourg": "Luxembourg",
        "Malte": "Malta",
        "Moldavie": "Moldova",
        "Monaco": "Monaco",
        "Monténégro": "Montenegro",
        "Norvège": "Norway",
        "Pays-Bas": "Netherlands",
        "Pologne": "Poland",
        "Portugal": "Portugal",
        "République tchèque": "Czech Republic",
        "Roumanie": "Romania",
        "Royaume-Uni": "United Kingdom",
        "Russie": "Russia",
        "Saint-Marin": "San Marino",
        "Serbie": "Serbia",
        "Slovaquie": "Slovakia",
        "Slovénie": "Slovenia",
        "Suède": "Sweden",
        "Suisse": "Switzerland",
        "Ukraine": "Ukraine",
        "Vatican": "Vatican City"
    }, inplace=True)

    df.index.name = None
    df.columns.name = 'Material'

    # Melt the DataFrame to long format for merging
    df_melted = df.reset_index().melt(id_vars='index', var_name='Country', value_name='Value')
    df_melted.rename(columns={'index': 'Date'}, inplace=True)

    # Merge GeoJSON data with DataFrame
    europe_merged = europe.merge(df_melted, left_on='NAME', right_on='Country', how='left')

    # Plot the data
    fig, ax = plt.subplots(1, 1, figsize=(12, 8))
    europe_merged.plot(column='Value', ax=ax, legend=True, cmap='viridis', 
                       missing_kwds={"color": "lightgrey"},
                       legend_kwds={'label': data_type})

    plt.title(f"Map of {data_type} in Europe in {pd.to_datetime(date).strftime('%B %Y')}", fontsize=16)
    plt.show()

In [None]:
# Widgets pour sélectionner les pays et les dates
countries_widget = widgets.Text(
    value='France, Allemagne, Italie',
    description='Countries:',
    placeholder='Enter countries separated by commas'
)

start_date_widget = widgets.DatePicker(
    value=pd.to_datetime('2015-1', format='%Y-%m'),
    description='Start Date'
)

end_date_widget = widgets.DatePicker(
    value=pd.to_datetime('2020-3', format='%Y-%m'),
    description='End Date'
)

# Widget pour choix multiple des données à afficher
multi_choice_widget = widgets.SelectMultiple(
    options=['IPCH', 'Dette publique', 'PIB', 'Chomage'],
    value=['IPCH'],
    description='Select Data',
    disabled=False
)

# Widget case à cocher
checkbox_widget = widgets.Checkbox(
    value=False,
    description='Map',
    disabled=False
)

# Fonction générale pour tracer les données
@interact_manual(countries=countries_widget, \
                 start_date=start_date_widget, end_date=end_date_widget, \
                 type=multi_choice_widget, map=checkbox_widget)
def plot_G(countries, start_date, end_date, type, map = True):
    if map: plot_map(date = start_date, data = data, data_type= type[0])
    else:
        for t in type:
            plot_graph(countries, start_date, end_date, data.get_group(t), t)



Affichage matières premières

In [None]:
# Function to plot the data
def plot_data(start_date, end_date, show_or, show_petrol):
    plt.figure(figsize=(10, 6))
    start_date, end_date = pd.to_datetime(start_date), pd.to_datetime(end_date)
    
    # Filter the data based on the selected dates
    filtered_material = material[(material.index >= start_date) & (material.index <= end_date)]
    
    if show_or:
        sb.lineplot(data=filtered_material, x=filtered_material.index, y='Or', label='Or', marker='o')
        # If 'Or' checkbox is checked, plot the data for 'Or'

    if show_petrol:
        sb.lineplot(data=filtered_material, x=filtered_material.index, y='Petrol', label='Pétrole', marker='o')
        # If 'Petrol' checkbox is checked, plot the data for 'Petrol'

    # Configure the plot with a title, legend, and grid
    plt.title(f'Cour de l\'or et du pétrole en euros ({start_date.strftime("%m/%Y")} - {end_date.strftime("%m/%Y")})')
    plt.legend()  # The legend is automatically updated based on the checked datasets
    plt.grid(True)
    plt.show()

# Widgets for selecting the start and end dates, and options to display data
start_date_widget = widgets.DatePicker(description='Début', value=pd.to_datetime('2013-01-01'))
end_date_widget = widgets.DatePicker(description='Fin', value=pd.to_datetime('2023-12-31'))
show_or_widget = widgets.Checkbox(description='Or', value=True)  # Checkbox for showing 'Or' data
show_petrol_widget = widgets.Checkbox(description='Pétrole', value=True)  # Checkbox for showing 'Petrol' data

# Interactive interface to control the plot function with widgets
interact(
    plot_data,  # The function to interact with
    start_date=start_date_widget,  # Start date widget
    end_date=end_date_widget,  # End date widget
    show_or=show_or_widget,  # 'Or' checkbox widget
    show_petrol=show_petrol_widget  # 'Petrol' checkbox widget
)

Affichage des devises

In [None]:
# Function to plot currency data
def plot_devise_data(select_all, start_date, end_date, **currency_checkboxes):
    plt.figure(figsize=(12, 6))

    # List of selected currencies
    if select_all:
        selected_currencies = devise.columns.tolist()  # If "select all" is checked, include all currencies
    else:
        selected_currencies = [currency for currency, is_selected in currency_checkboxes.items() if is_selected]
        # If not, include only the selected currencies based on the checkboxes

    # Filter the data based on the selected date range and currencies
    filtered_data = devise[(devise.index >= start_date) & (devise.index <= end_date)]
    filtered_data = filtered_data[selected_currencies]

    # Plot the time series for each selected currency
    for currency in selected_currencies:
        sb.lineplot(data=filtered_data, x=filtered_data.index, y=currency, label=currency, marker='o')

    # Configure the plot with a title, x and y labels, and a legend
    plt.title(f'Valeurs des devises (équivalent en euros) ({start_date.strftime("%m/%Y")} - {end_date.strftime("%m/%Y")})')
    plt.xlabel('TIME_PERIOD')
    plt.ylabel('OBS_VALUE')
    plt.legend(title='Currency')  # Currency legend
    plt.grid(True)
    plt.show()

# Widgets for selecting the start and end dates
start_date_widget = widgets.DatePicker(description='Début', value=pd.to_datetime('2013-01-01'))
end_date_widget = widgets.DatePicker(description='Fin', value=pd.to_datetime('2023-12-31'))

# Dynamically generate checkboxes for each currency
currency_checkboxes = {
    currency: widgets.Checkbox(description=currency, value=False)  # Default value is False (unchecked)
    for currency in devise.columns
}

# Checkbox to "Select All" currencies
select_all_widget = widgets.Checkbox(description='Tout sélectionner', value=False)

# Function to dynamically update checkboxes based on "Select All"
def update_checkboxes(change):
    for checkbox in currency_checkboxes.values():
        checkbox.value = change['new']  # Update the state of checkboxes based on the "Select All" checkbox

select_all_widget.observe(update_checkboxes, names='value')  # Observe changes to the "Select All" checkbox

# Create a container for all the checkboxes
checkbox_container = VBox([select_all_widget] + list(currency_checkboxes.values()))

# Interactive interface to control the plot function with widgets
interact(
    plot_devise_data,  # The function to interact with
    select_all=select_all_widget,  # "Select All" widget
    start_date=start_date_widget,  # Start date widget
    end_date=end_date_widget,  # End date widget
    **currency_checkboxes  # Pass each currency checkbox widget as a parameter
)



# Corrélations

Argument: data, type, historique (pd.Series), countries\
Calcule la correlation entre (colonne et historique, pour colonne dans data.get_group(type).loc[;, countries])\
Fais une moyenne des correlation.\
Retourne la correlation moyenne.

In [None]:
def correlation_matrix(data, variables):
  """Return the correlation matrix for specified variables."""
  data_flat = data.apply(lambda x: x.droplevel(0, axis=1))  # Flatten MultiIndex
  return data_flat[variables].corr()  # Compute and return correlation matrix

#test
variables = ['PIB', 'Chomage','IPCH']
correlation_matrix(data, variables).head()

In [None]:
def compute_avg_correlation(data, hist_action, country, data_type, company):
    """
    Compute avg correlation between a company's data in hist_action and a country's data in the provided dataset.
    """
    # Flatten the MultiIndex structure in the dataset to simplify access to columns.
    ungrouped_data = data.apply(lambda df: df.droplevel(0, axis=1))

    # Check if the specified company and country exist in their respective datasets.
    if company not in hist_action.columns or country not in ungrouped_data[data_type].columns:
        return None  # Return None if either is not present.
    
    # Extract a specific column for the company.
    # If the company's data is a DataFrame (multi-column), select the 'Close' column by default.
    if isinstance(hist_action[company], pd.DataFrame):
        col_company = hist_action[company]['Close']  # Use the 'Close' column for the company's data.
    else:
        col_company = hist_action[company]  # Use the entire series if no multi-column structure exists.
    
    # Extract the country's data as a Series.
    col_country = ungrouped_data[data_type][country]
    
    # Ensure both columns have valid (non-NaN) data by dropping null values.
    col_company = col_company.dropna()
    col_country = col_country.dropna()

    # Align the two Series by finding the intersection of their indices (dates).
    common_index = col_company.index.intersection(col_country.index)
    col_company = col_company.loc[common_index]
    col_country = col_country.loc[common_index]
    
    # Compute the correlation between the two aligned Series if there are valid data points.
    if len(col_company) > 0 and len(col_country) > 0:
        return col_company.corr(col_country)  # Return the correlation coefficient.
    else:
        return None  # Return None if there are no valid data points.


  
result = compute_avg_correlation(data, hist_action, 'France', 'PIB', 'Air Liquide')
print(result)


In [None]:
def avg_material_correlation(material, hist_action, commodity, company):
  """compute avg correlation between commodity in material and company in hist_action"""
  col_material = material[commodity]  # select commodity column
  col_company = hist_action.xs(company, axis=1, level='Company')  # select company column
  correlations = [col_material.corr(col_company[c]) for c in col_company.columns]  # compute correlations
  return sum(correlations) / len(correlations)  # return mean correlation

avg_material_correlation(material, hist_action, 'Or', 'Air Liquide')

In [None]:
def compute_avg_currency_correlation(devise, hist_action, currency, company):
  """compute avg correlation between company in hist_action and currency in devise"""
  col_company = hist_action.xs(company, axis=1, level='Company')  # select company column
  col_currency = devise[currency]  # select currency column
  correlations = [col_company[c].corr(col_currency) for c in col_company.columns]  # compute correlations
  return sum(correlations) / len(correlations)  # return mean correlation

compute_avg_currency_correlation(devise, hist_action, 'US dollar', 'Air Liquide')