<center>
    <img src=https://upload.wikimedia.org/wikipedia/commons/6/6f/Dauphine_logo_2019_-_Bleu.png style=width: 600px;/> 
</center>

<div align="center"><span style="font-family:Arial Black;font-size:33px;color:darkblue"> Master Economie Finance </span></div>


<div align="center"><span style="font-family:Arial Black;font-size:27px;color:darkblue">Index Construction</span></div>

# Definition

A financial index is a statistical indicator that reflects the change in value of a set of financial assets, such as stocks, bonds, or other financial instruments.
It is used to:
- Measure the performance of a given market or sector.
- Serve as a benchmark for financial products (ETFs, index funds, etc.).
- Compare returns between different asset classes.

Examples of well-known financial indices: CAC 40 (France), Stoxx Europe 600 (Europe), S&P 500 (United States).

An index is not a financial instrument in itself, since it cannot be traded (bought or sold). It does not represent a monetary value but rather a reference value. An investor wishing to gain exposure to the risk represented by a given index generally uses an ETF, which is designed to replicate the chosen financial index.

**Index Composition**

Selection criteria: The index groups companies according to factors such as:
- Size (market capitalization)
- Industry sector
- Country or geographic area

Examples: CAC 40: 40 large French companies listed on Euronext Paris, S&P 500: 500 large U.S. companies.

Although thematic indices are the most common, other more complex indices exist. If an investment strategy can be defined based on a set of rules, it can be expressed in the form of an index.

**Weighting Methods**

- Market capitalization weighting: Each company has a weight proportional to its market capitalization.
- Equal weighting: Each company has the same weight in the index at the time of rebalancing.
- Risk-based weighting: Each company is weighted according to its volatility or another risk indicator.

**Calculation of the Index Value**

Once the investment universe is defined and the weighting method chosen, the index value/performance is calculated at set intervals, generally on a daily basis.

**Index Rebalancing** 

Index rebalancing is the process by which the composition or weighting of the securities that make up a stock index is periodically adjusted.
This ensures that the index remains representative of the market or sector it measures. Rebalancing may include:
- Adding new companies that meet the criteria.
- Removing companies that no longer meet the requirements.

Index rebalancing also helps manage weighting drift between rebalancing dates.

The rebalancing frequency is defined before the index is launched; it can be monthly, quarterly, or yearly. Some market events can trigger exceptional rebalancing (e.g., a volatility spike for certain indices based on a quantitative strategy).


# Part I - Import and Retrieval of the Required Data

For this practical work, we will use Yahoo Finance to retrieve financial data. Other free alternatives exist, however: Stooq, Alpha Vantage, Quandl. Alpha Vantage and Quandl, however, require working with APIs.

For those who wish to access the implementation of yfinance, it is available here: https://github.com/ranaroussi/yfinance

In [None]:
!pip install yfinance pandas_datareader pandas numpy matplotlib scipy numba seaborn --quiet

In [None]:
# import of the libraries required for the project
import os
import math
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import yfinance as yf
import datetime as dt
import scipy.stats as st
import textwrap
import pprint

from datetime import datetime, timedelta
from dateutil.relativedelta import relativedelta
from matplotlib.ticker import FuncFormatter
from pandas_datareader import data as pdr
from typing import Union, Tuple, List, Dict

In [None]:
# Data retrieval via yfinance - understanding how to use the yfinance package

ticker_test = "TTE.PA"
stock = yf.Ticker(ticker_test)

# The variable stock is an object containing numerous data and functionalities.
# To display all the methods and attributes of the object, we can use Python's native function dir().
dir(stock)

In [None]:
# The attribute info allows displaying the main characteristics of the object.
stock_info = stock.info
print("The variable stock_info has the following type ", type(stock_info))
print("")
print("Data available data in the  :\n")
pprint.pprint(stock_info, indent=4, width=80, sort_dicts=False)
print()
# To access one of the info data points, we just need to provide the desired key, since the variable is a dictionary.
print("The free float of ", stock_info.get("longName"), " is ", stock_info.get("floatShares"))

Our first objective is to build a French stock index that tracks the 20 largest French companies in terms of market capitalization over a 10-year period.
To do this, we need to retrieve the following historical data: closing price, market capitalization.

In [None]:
tickers = [
    'ACA.PA',   # Crédit Agricole
    'BN.PA',    # Danone
    'DSY.PA',   # Dassault Systèmes
    'ENGI.PA',  # Engie
    'RMS.PA',   # Hermès
    'KER.PA',   # Kering
    'OR.PA',    # L'Oréal
    'MC.PA',    # LVMH
    'ML.PA',    # Michelin
    'ORA.PA',   # Orange
    'RI.PA',    # Pernod Ricard
    'PUB.PA',   # Publicis Groupe
    'RNO.PA',   # Renault
    'SAF.PA',   # Safran
    'SGO.PA',   # Saint-Gobain
    'SAN.PA',   # Sanofi
    'SU.PA',    # Schneider Electric
    'GLE.PA',   # Société Générale
    'TEP.PA',   # Teleperformance
    'HO.PA',    # Thales
    'TTE.PA',   # TotalEnergies
    'URW.PA',   # Unibail-Rodamco-Westfield
    'DG.PA',    # Vinci
    'VIV.PA',   # Vivendi
    'WLN.PA',   # Worldline
    'AC.PA',    # Accor
    'ALO.PA',   # Alstom
    'ERF.PA',   # Eurofins Scientific
    'EDEN.PA',  # Edenred
    'LI.PA',    # Klépierre
]

# We will download 10 years of historical data for our stock universe.
end_date = dt.datetime.now() - dt.timedelta(days=1)
start_date = end_date - relativedelta(years=10)

input_data_from_yf = yf.download(tickers, start=start_date, end=end_date, auto_adjust=True) #permet la récupération des données suivantes : Close, High, Low, Open, Volume.
close_data_from_yf = input_data_from_yf['Close']
path_to_store_data = os.getcwd()
close_data_from_yf.to_csv(path_to_store_data + '/close_data_historical_10y.csv')

# The number of shares is not available historically in Yahoo Finance. We will therefore use a CSV file containing this information.
number_of_shares_data_historical = pd.read_csv(path_to_store_data +'/number_of_shares_data_historical_10y.csv')



# Part II - Index Construction Methodology

## Formula for defining the weights of index constituents

* **Equal weighting**: \$w\_i = 1/N\$.
* **Market-cap weighting**: \$w\_i = \frac{cap\_i}{\sum\_j cap\_j}\$.
* **Custom weighting**: score followed by normalization \$w\_i \propto score\_i\$.

### Formula for determining the value of a market-cap weighted index

Index value (base):

$$
I_t = \frac{\sum_{i=1}^N w_{i,t} \, P_{i,t}}{D}
$$

where \$D\$ is a divisor that ensures the continuity of the index during corporate actions.

## Eligibility rules and common filters applied when defining the investment universe

Examples:

* Minimum free-float market capitalization
* Minimum average daily volume (ADV)
* ESG criteria (exclusion of tobacco producers, controversial weapons, or companies generating energy from coal)

## Calculation of weights and base value — implementation

### Data cleaning

Write a function `clean_data(close_df, shares_df)` that:

* Identifies the tickers common to both DataFrames.
* Aligns the common dates.
* Returns the two cleaned, comparable DataFrames.

### Market capitalization calculation

Write a function `compute_market_cap(prices, shares_outstanding)` that:

* Takes as input two dictionaries (prices and number of shares).
* Calculates each company’s market capitalization.

### Selection of the largest capitalizations

On the initial date, select the 20 companies with the highest market capitalization.

Store their capitalizations in a dictionary `top_20_mkt_cap`.

### Calculation of the index weights

Write a function `compute_weights(market_caps)` that calculates the relative weight of each security within the index.

### Initial composition of the index

* Load the closing prices and the number of shares from the CSV file.
* Apply the `clean_data` function to both DataFrames.
* Extract the closing prices and the number of shares for the first date in the cleaned DataFrames.
* Compute the market capitalization of each company.
* Sort the companies by market capitalization and keep only the 20 largest.
* Compute the portfolio weights of the selected securities.



In [None]:
CLOSE_CSV = "close_data_historical_10y.csv"
SHARES_CSV = "number_of_shares_data_historical_10y.csv"

# =============================================================================
# 1) Data cleaning and alignment
# =============================================================================
def clean_data(close_df: pd.DataFrame, shares_df: pd.DataFrame) -> Tuple[pd.DataFrame, pd.DataFrame]:
    """Align columns and index between *close_df* and *shares_df*.
    Returns DataFrames containing ONLY the common tickers and dates.
    
    Instructions:
    1. Find the intersection of the columns (tickers) between *close_df* and *shares_df*. Hint: see `.columns.intersection(...)`.
    2. Find the intersection of the common indexes (dates).
    3. Filter both DataFrames to keep only these columns and common dates. Hint: see `.loc[...]`.
    4. Return the two sorted DataFrames.
     """
    # common_tickers = ...
    # common_dates = ...
   
    # close_df = ...
    # shares_df = ...
    
    raise NotImplementedError("Complete clean_data() according to the instructions above")
    

# =============================================================================
# 2) Market capitalization calculation
# =============================================================================
def compute_market_cap(prices: Dict[str, float], shares_outstanding: Dict[str, float]) -> Dict[str, float]:
    """Compute market capitalization per ticker from two dicts.
      
    Instructions:
    - Iterate over the tickers present in *prices*.
    - Retrieve *close_price* and *shares*.
    - Handle missing values (None/NaN) by treating them as 0 if necessary.
    - Return a dict {ticker: market_cap_float} with the market cap of all elements.
    """
    
    raise NotImplementedError("Complete compute_market_cap()")


# =============================================================================
# 3) Weights calculation
# =============================================================================
def compute_weights(market_caps: Dict[str, float]) -> Dict[str, float]:
    """Compute weights (sum = 1.0) from market caps.
    
    Instructions:
    - Calculate the total sum `total_mkt_cap`.
    - For each ticker, weight = value / total




## Implementation Part 2 – index value calculation and rebalancing

### Convert the DataFrames to facilitate data manipulation:

From the two DataFrames `close_data_from_yf` and `number_of_shares_data_historical`, create two dictionaries that map each date to a sub-dictionary containing the corresponding values for each stock.

### Defining the rebalancing dates

Define the rebalancing dates as the last business day of each year.
Store these dates in a list `rebalancing_dates`.

### Initializing the index

Set the initial index value to **100** at the first available rebalancing date.
Create a dictionary `portfolio_value` to store the index value at each rebalancing date.

---

### Annual rebalancing loop

For each period between two rebalancing dates:

* Compute market capitalizations at the start date of the period.
* Select the 20 largest market capitalizations.
* Compute the relative weights of the selected securities in the index.
* Compute each security’s return up to the next period.
* Compute the portfolio return.
* Update the index value at the end of the period.


In [None]:
# We now want to extend the index construction to include an annual rebalancing

# Converting the DataFrames to make data manipulation easier.


# Defining the rebalancing dates: here, the last observation of each year
rebalancing_dates = close_data_from_yf.groupby(close_data_from_yf.index.year).apply(lambda x: x.index.max()).tolist()
if len(rebalancing_dates) < 2:
    raise ValueError("Not enough rebalancing dates to calculate performance.")

# Initialization: retrieve the first available date and set the index value to 100


portfolio_value = {}
# Loop over each rebalancing period (year by year)
for i, date in enumerate(rebalancing_dates[:-1]):
    start_date = date
    end_date = rebalancing_dates[i+1]

    # Compute market capitalizations at the rebalancing date
    
    # Select the 20 largest market capitalizations
    
    # Compute the corresponding weights
    
    # Prices at the start and end of the period for the stocks in the index
    
    # Compute each stock’s contribution to the portfolio return
    
    # Update the portfolio value at the end of the period


# Convert the dict to a Pandas Series and sort by date
pv_series = pd.Series(portfolio_value).sort_index()
print("Index level over time:\n")
print(pv_series)

# Plot
fig, ax = plt.subplots(figsize=(10, 4))
pv_series.plot(ax=ax)
ax.set_title("Value of the rebalanced index (base 100)")
ax.set_xlabel("Date")
ax.set_ylabel("Value")
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Here’s the English translation with the same formatting preserved:

---

## Part 3 – Encapsulate the code and make the rebalancing frequency configurable

### Objective

Encapsulate the provided code into **a single function** and allow the user to choose the **rebalancing frequency** among: *annual*, *quarterly*, *monthly*, or *daily*.
We will **reuse as much as possible** from the existing code (dictionary construction, selection/weighting logic, update loop).

1. **Function signature**
   Create a function (name `build_index_with_rebalancing`) with the following signature:

* **Inputs**:

  * `close_data_from_yf: pd.DataFrame` (date index, ticker columns, closing prices)
  * `number_of_shares_data_historical: pd.DataFrame` (same structure, number of shares)
  * `frequency: str` in `{"annual", "quarterly", "monthly", "daily"}` (default value: `"annual"`)
    
* **Outputs**:

  * `pv_series: pd.Series` sorted by date (index value, base 100)

2. **Configuring the rebalancing frequency**

* Set up a **mapping** between `frequency` and a rule for aggregating by period:

  * `"annual"` → last business day of each **year**
  * `"quarterly"` → last business day of each **quarter**
  * `"monthly"` → last business day of each **month**
  * `"daily"` → every **available day** (equivalent to rebalancing daily)
* Expected implementation: group the price index by period (year/quarter/month/day) and **take the last observation** of each period, as in the existing annual code (generalized to other periodicities).

3. **Index initialization**

* Set the initial value to **100.0** at the **first** available rebalancing date.

4. **Rebalancing loop (generalize the existing logic)**

* Reuse the rebalancing logic implemented earlier.

6. **Compare the impact of rebalancing frequency on index performance**

* Run the code for all frequencies and display the results.

### Notes

* **Reuse** as much of the provided code as possible (dictionaries, sorting by market cap, computing weights and contributions).
* The financial logic (top 20 by market cap, cap-weighted weights, simple returns) **remains the same**; only the **definition of rebalancing dates** becomes configurable via `frequency`.
* **Test your code with an annual rebalancing frequency to ensure you obtain the same results as before.**


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

def build_index_with_rebalancing(
    close_data_from_yf: pd.DataFrame,
    number_of_shares_data_historical: pd.DataFrame,
    frequency: str = "annual",           # "annual" | "quarterly" | "monthly" | "daily"
    n_constituents: int = 20,
    base_value: float = 100.0,
    return_details: bool = True) -> Union[pd.Series, Tuple[pd.Series, List[Dict[str, object]]]]:
    """
    Build the time series of an index rebalanced at the chosen frequency.
    - Prices and number of shares are provided as DataFrames (date index, tickers as columns).
    - The 20 largest market capitalizations are selected at each rebalancing.
    - Weights are proportional to market capitalizations.

    - If the parameter return_details is True: create a list `details` that stores,
      for each rebalancing period (as a dictionary), the following information:
      start_date, end_date, instruments in the index, each instrument’s weight,
      return, initial portfolio value, final portfolio value.

    - The function returns a `pd.Series` with dates and the index value.

    Assumptions: data are clean/aligned, and utility functions are available:
      - compute_market_cap(close_row_dict, shares_row_dict) -> dict{ticker: cap}
      - compute_weights(top_caps_dict) -> dict{ticker: weight}
    """

    # 1) Convert DataFrames -> dictionaries (reuse existing code)
   
    # 2) Define the rebalancing dates (last observed day of each period)

    # Rebalancing loop over each period
    for i, start_date in enumerate(rebalancing_dates[:-1]):
        pass

In [None]:
frequencies = ["annual", "quarterly", "monthly", "weekly"]
results = {} 
details_log = {}

# Build the index for each rebalancing frequency
for freq in frequencies:
    pv_series, details = build_index_with_rebalancing(
        close_data_from_yf=close_data_from_yf,
        number_of_shares_data_historical=number_of_shares_data_historical,
        frequency=freq,
        n_constituents=20,
        base_value=100.0,
        return_details=True
    )
    results[freq] = pv_series
    details_log[freq] = details

# Combine into a DataFrame for comparison
pv_df = pd.concat(results, axis=1)  # multi-index columns (freq, series)
pv_df.columns = pv_df.columns.droplevel(1) if isinstance(pv_df.columns, pd.MultiIndex) else pv_df.columns
pv_df = pv_df.sort_index()

print("Preview of index values (all frequencies):")
print(pv_df.dropna())

# Annual performance calculation
annual_perf = {}
for freq, ser in results.items():
    yearly = ser.groupby(ser.index.year).last().pct_change().dropna()
    annual_perf[freq] = yearly

perf_df = pd.DataFrame(annual_perf)

print("\nTable of annual performances (%):")
print(perf_df.applymap(lambda x: f"{x:.2%}"))

# Plot comparison of all frequencies
plt.figure(figsize=(10, 5))
for freq, ser in results.items():
    ser.plot(label=freq)
plt.title("Rebalanced index value (frequency comparison)")
plt.xlabel("Date")
plt.ylabel("Value (base 100)")
plt.grid(True, alpha=0.3)
plt.legend()
plt.tight_layout()
plt.show()

# Individual plots per frequency
for freq, ser in results.items():
    fig, ax = plt.subplots(figsize=(9, 3.5))
    ser.plot(ax=ax)
    ax.set_title(f"Rebalanced index value – {freq}")
    ax.set_xlabel("Date")
    ax.set_ylabel("Value (base 100)")
    ax.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()
