# Stock Analysis and Visualization Script

This script fetches historical stock data, processes it, and visualizes key metrics.

### Imports and Settings

In [88]:
import zoneinfo
from zoneinfo import ZoneInfo

import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime, timedelta, tzinfo

# Pandas display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

# Force yfinance to use browser-like headers
yf.set_tz_cache_location("custom_cache")

### Configurable Parameters

In [89]:
# Stock ticker symbol
TICKER = "^GSPC"  # S&P 500 Index

DURATION_DAYS = 366
END_DATE = datetime.now().astimezone(zoneinfo.ZoneInfo("Europe/Paris"))
START_DATE = END_DATE - timedelta(days=DURATION_DAYS)

In [90]:
print("=" * 70)
print(f"S&P 500 DATA RETRIEVAL")
print("=" * 70)
print(f"Ticker: {TICKER}")
print(f"Period: {START_DATE.strftime('%Y-%m-%d')} to {END_DATE.strftime('%Y-%m-%d')}")
print(f"Duration: {DURATION_DAYS} days")
print("=" * 70)

S&P 500 DATA RETRIEVAL
Ticker: ^GSPC
Period: 2024-11-08 to 2025-11-09
Duration: 366 days


### Fetching the data thanks to Yahoo Finance

In [91]:
sp500 = yf.download(TICKER, start=START_DATE, end=END_DATE, interval='1d', progress=True)

  sp500 = yf.download(TICKER, start=START_DATE, end=END_DATE, interval='1d', progress=True)
[*********************100%***********************]  1 of 1 completed


In [92]:
sp500.head(10)

Price,Close,High,Low,Open,Volume
Ticker,^GSPC,^GSPC,^GSPC,^GSPC,^GSPC
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2024-11-08,5995.540039,6012.450195,5976.759766,5976.759766,4666740000
2024-11-11,6001.350098,6017.310059,5986.689941,6008.859863,4333000000
2024-11-12,5983.990234,6009.919922,5960.080078,6003.600098,4243400000
2024-11-13,5985.379883,6008.189941,5965.910156,5985.75,4220180000
2024-11-14,5949.169922,5993.879883,5942.279785,5989.680176,4184570000
2024-11-15,5870.620117,5915.319824,5853.009766,5912.790039,4590960000
2024-11-18,5893.620117,5908.120117,5865.950195,5874.169922,3983860000
2024-11-19,5916.97998,5923.509766,5855.290039,5870.049805,4036940000
2024-11-20,5917.109863,5920.669922,5860.560059,5914.339844,3772620000
2024-11-21,5948.709961,5963.319824,5887.259766,5940.580078,4230120000


### Understanding the S&P 500 Data Columns

When we download the S&P 500 data from Yahoo Finance, we get the following columns:

- **Open** : The price at which the index opened at the beginning of the trading day.
- **High** : The highest price reached during the trading day.
- **Low** : The lowest price reached during the trading day.
- **Close** : The price at which the index closed at the end of the trading day.
- **Volume** : The total number of shares/contracts traded during the day.
- **Date** : The date of the year

> Note: For indices like the S&P 500, there is typically no `Adj Close` column by default. 
> Markets are close during weekends and some days during the yeah, this is why we won't have 365 rows.


In [93]:
sp500.shape[0] # 249

250

### Close vs Adjusted Close

- **Close**: The raw closing price of the asset at the end of the trading day.
- **Adjusted Close (Adj Close)**: The closing price **adjusted for dividends and stock splits**, reflecting the "true" value for investors who reinvest dividends.

Why Adjusted Close matters ?

For individual stocks:
- Dividends and splits affect the nominal closing price.
- Using `Close` alone can misrepresent actual returns.
- `Adj Close` corrects for these events, giving the real return over time.

For the S&P 500 Index:

- Dividends and splits are already reflected in the index value.
- Therefore, **Close â‰ˆ Adjusted Close** for the S&P 500.
- To be consistent with future analyses, we will rename `Close` to `Adj Close`:

In [94]:
sp500.rename(columns={"Close": "AdjClose"}, inplace=True)

In [95]:
sp500.columns

MultiIndex([('AdjClose', '^GSPC'),
            (    'High', '^GSPC'),
            (     'Low', '^GSPC'),
            (    'Open', '^GSPC'),
            (  'Volume', '^GSPC')],
           names=['Price', 'Ticker'])

### Daily Returns (Rendements Journaliers)

To analyze the S&P 500, we often compute **daily returns**, which measure the relative change in price from one day to the next.

Formula

The **simple daily return** is calculated as:
$
\[
R_t = \frac{P_t}{P_{t-1}} - 1
\]
$
Where:

- \(R_t\) : daily return at day \(t\)  
- \(P_t\) : `Adj Close` price at day \(t\)  
- \(P_{t-1}\) : `Adj Close` price at the previous day

Why this formula?

- It represents the **percentage change** in price from one day to the next.  
- It is the basis for many statistical analyses such as **volatility**, **Sharpe ratio**, and technical indicators like **RSI**.


In [99]:
index = 0;
sp500_length = sp500.shape[0]

sp500_return_per_day = [0.0]*sp500_length

def calculate_return(pt, pt_minus_one):
    return ((pt/pt_minus_one) - 1)

for i in range(sp500_length - 1):
    sp500_return_per_day[i+1] = calculate_return(sp500.iloc[i+1].AdjClose.iloc[0], sp500.iloc[i].AdjClose.iloc[0])

sp500[("DailyReturn", "^GSPC")] = sp500_return_per_day

In [104]:
sp500.head()

Price,AdjClose,High,Low,Open,Volume,DailyReturn
Ticker,^GSPC,^GSPC,^GSPC,^GSPC,^GSPC,^GSPC
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2024-11-08,5995.540039,6012.450195,5976.759766,5976.759766,4666740000,0.0
2024-11-11,6001.350098,6017.310059,5986.689941,6008.859863,4333000000,0.000969
2024-11-12,5983.990234,6009.919922,5960.080078,6003.600098,4243400000,-0.002893
2024-11-13,5985.379883,6008.189941,5965.910156,5985.75,4220180000,0.000232
2024-11-14,5949.169922,5993.879883,5942.279785,5989.680176,4184570000,-0.00605


In [105]:
sp500_pct_change = sp500[("AdjClose", "^GSPC")].pct_change()
sp500_manual = sp500[("DailyReturn", "^GSPC")][1:]
sp500_pandas = sp500_pct_change[1:]

print(f"Vectors are the same : {sp500_manual.equals(sp500_pandas)}")

Vectors are the same : True
