# **Financial risks in stock markets**

## In this notebook, we are going to study financial risk in stock markets by comparing the volatility of companies from different sectors. For ths we have to answer this question :

## **Which stocks are the riskiest ? Are some sectors correlated/ Move together or not ?**




## 1. Collect :



### 1.1 Import librairies :


In [96]:
#1. IMPORTING LIBRARIES WE WILL USE LATER

#data manipulation
import pandas as pd
import numpy as np

#data visualization
import matplotlib.pyplot as plt
import seaborn as sns


#time
from datetime import datetime, timedelta

print("✓ All libraries imported successfully!")
print(f"Current date: {datetime.now().strftime('%Y-%m-%d')}")

✓ All libraries imported successfully!
Current date: 2025-11-30


### 1.2 Data Collection

In this section, we collect daily stock price data for the five selected companies representing different sectors:

| Ticker | Company               | Sector        |
|--------|------------------------|----------------|
| AAPL   | Apple Inc.             | Technology     |
| JPM    | JPMorgan Chase & Co.   | Finance        |
| AMZN   | Amazon.com Inc.        | Consumer       |
| JNJ    | Johnson & Johnson      | Healthcare     |
| TTE    | TotalEnergies SE       | Energy         |



In [97]:
#1. Import  the datasets
apple_df=pd.read_csv('Apple_daily.csv')
amazon_df=pd.read_csv('Amazon_daily.csv')
JP_Morgan_df=pd.read_csv('JP_Morgan_daily.csv')
Johnson_df=pd.read_csv('Johnson&Johnson_daily.csv')
Total_Energies_df=pd.read_csv('TotalEnergies_daily.csv')

In [98]:
#See the data shape since they need top have the same number of rows
print("Apple datasets rows and columns",apple_df.shape)
print("Amazon datasets rows and columns",amazon_df.shape)
print("JP_Morgan_df datasets rows and columns",JP_Morgan_df.shape)
print("Johnson_df datasets rows and columns",Johnson_df.shape)
print("Total_Energies_df datasets rows and columns",Total_Energies_df.shape)

Apple datasets rows and columns (1740, 10)
Amazon datasets rows and columns (1740, 10)
JP_Morgan_df datasets rows and columns (1745, 10)
Johnson_df datasets rows and columns (1740, 10)
Total_Energies_df datasets rows and columns (1735, 10)


## 2. Clean :


###2.1 Assessing Data Quality :

* Check for missing values

* Check data types

*  Check for duplicates

*  Validate date ranges

*  Check for outliers

* Check data consistency



In [99]:
#Look at each data set to inspect
from IPython.display import display
print("Apple")
display(apple_df.head())
print("Amazon")
display(amazon_df.head())
print("JP_Morgan")
display(JP_Morgan_df.head())
print("Johnson")
display(Johnson_df.head())
print("Total_Energies")
display(Total_Energies_df.head())

Apple


Unnamed: 0.1,Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
0,0,2019-01-02 00:00:00-05:00,38.72249984741211,39.712501525878906,38.557499,39.48,37.538818,148158800.0,0.0,
1,1,2019-01-03 00:00:00-05:00,35.994998931884766,36.43000030517578,35.5,35.547501,33.799679,365248800.0,0.0,
2,2,2019-01-04 00:00:00-05:00,36.13249969482422,37.13750076293945,35.950001,37.064999,35.242561,234428400.0,0.0,0.0
3,3,2019-01-07 00:00:00-05:00,37.17499923706055,37.20750045776367,36.474998,,35.164124,219111200.0,0.0,0.0
4,4,2019-01-08 00:00:00-05:00,,37.95500183105469,37.130001,37.6875,35.834446,164101200.0,0.0,0.0


Amazon


Unnamed: 0.1,Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
0,0,2019-01-02 00:00:00-05:00,,77.66799926757812,73.046501,76.956497,76.956497,159662000.0,0.0,0.0
1,1,2019-01-03 00:00:00-05:00,76.00050354003906,76.9000015258789,74.855499,75.014,75.014,139512000.0,0.0,0.0
2,2,2019-01-04 00:00:00-05:00,76.5,79.69999694824219,75.915497,78.769501,78.769501,183652000.0,0.0,0.0
3,3,2019-01-07 00:00:00-05:00,80.1155014038086,81.72799682617188,,81.475502,81.475502,159864000.0,,0.0
4,4,2019-01-08 00:00:00-05:00,83.2344970703125,83.83049774169922,80.830498,,82.829002,177628000.0,0.0,0.0


JP_Morgan


Unnamed: 0.1,Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
0,0,2019-01-02 00:00:00-05:00,95.9499969482422,99.77999877929688,95.940002,99.309998,81.616707,15670900.0,0.0,0.0
1,1,2019-01-03 00:00:00-05:00,98.63999938964844,98.88999938964844,96.690002,97.110001,80.456787,,0.8,0.0
2,2,2019-01-04 00:00:00-05:00,,100.93000030517578,98.279999,100.690002,83.422867,16935200.0,0.0,0.0
3,3,2019-01-07 00:00:00-05:00,100.43000030517578,101.47000122070312,,100.760002,83.480865,15430700.0,0.0,0.0
4,4,2019-01-08 00:00:00-05:00,101.62999725341795,101.81999969482422,99.550003,100.57,83.323441,13578800.0,0.0,0.0


Johnson


Unnamed: 0.1,Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
0,0,2019-01-02 00:00:00-05:00,128.130005,128.3800048828125,,127.75,105.609131,7631700.0,0.0,0.0
1,1,2019-01-03 00:00:00-05:00,128.139999,128.27000427246094,,125.720001,103.930962,8654500.0,,0.0
2,2,2019-01-04 00:00:00-05:00,127.120003,128.64999389648438,126.730003,127.830002,105.67527,8831700.0,0.0,0.0
3,3,2019-01-07 00:00:00-05:00,127.629997,128.35000610351562,126.800003,127.010002,104.997383,8404700.0,0.0,0.0
4,4,2019-01-08 00:00:00-05:00,128.179993,130.5,127.730003,129.960007,107.436119,9351600.0,0.0,0.0


Total_Energies


Unnamed: 0.1,Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
0,0,2019-01-02 00:00:00-05:00,51.810001373291016,,51.610001,52.790001,34.675293,1212500.0,0.0,0.0
1,1,2019-01-03 00:00:00-05:00,52.95000076293945,53.04999923706055,52.32,52.91,34.754124,1099600.0,0.0,0.0
2,2,2019-01-04 00:00:00-05:00,53.68000030517578,54.459999084472656,53.560001,54.459999,35.772228,,0.0,0.0
3,3,2019-01-07 00:00:00-05:00,53.77999877929688,54.54999923706055,53.549999,54.360001,35.706551,1222400.0,0.0,0.0
4,4,2019-01-08 00:00:00-05:00,54.27999877929688,54.43000030517578,54.009998,,35.594883,2322000.0,0.0,0.0


In [100]:
#we need to remove the Unnamed because it's the index considered as column in the table
for df in [apple_df, amazon_df, JP_Morgan_df, Johnson_df, Total_Energies_df]:
    df.drop(columns=[col for col in df.columns if "Unnamed" in col], inplace=True, errors="ignore")
    df.index.name = None   # remove index name if it exists
apple_df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
0,2019-01-02 00:00:00-05:00,38.72249984741211,39.712501525878906,38.557499,39.48,37.538818,148158800.0,0.0,
1,2019-01-03 00:00:00-05:00,35.994998931884766,36.43000030517578,35.5,35.547501,33.799679,365248800.0,0.0,
2,2019-01-04 00:00:00-05:00,36.13249969482422,37.13750076293945,35.950001,37.064999,35.242561,234428400.0,0.0,0.0
3,2019-01-07 00:00:00-05:00,37.17499923706055,37.20750045776367,36.474998,,35.164124,219111200.0,0.0,0.0
4,2019-01-08 00:00:00-05:00,,37.95500183105469,37.130001,37.6875,35.834446,164101200.0,0.0,0.0


In [101]:
#have a quick looak at the summary statistics
print("Apple")
display(apple_df.describe(include='all'))
print("Amazon")
display(amazon_df.describe(include='all'))
print("JP_Morgan")
display(JP_Morgan_df.describe(include='all'))
print("Johnson")
display(Johnson_df.describe(include='all'))
print("Total_Energies")
display(Total_Energies_df.describe(include='all'))

Apple


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
count,1740,1621.0,1715.0,1695.0,1654.0,1698.0,1703.0,1596.0,1625.0
unique,1735,1541.0,1643.0,,,,,,
top,2020-12-04 00:00:00-05:00,181.2700042724609,151.57000732421875,,,,,,
freq,2,3.0,3.0,,,,,,
mean,,,,146.606967,148.453876,146.866574,87877990.0,0.003805,0.002462
std,,,,58.633556,59.186657,59.670528,53867980.0,0.029172,0.099228
min,,,,35.5,35.547501,30.29178,-174048100.0,0.0,0.0
25%,,,,112.962498,115.017502,112.154127,53627600.0,0.0,0.0
50%,,,,149.360001,151.284996,149.002686,75864400.0,0.0,0.0
75%,,,,187.474998,189.697498,188.187782,107497000.0,0.0,0.0


Amazon


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
count,1740,1731.0,1690.0,1581.0,1577.0,1710.0,1692.0,1630.0,1637.0
unique,1735,1650.0,1636.0,,,,,,
top,2019-01-09 00:00:00-05:00,162.5,166.89999389648438,,,,,,
freq,2,3.0,3.0,,,,,,
mean,,,,145.554984,147.98216,147.543898,65291490.0,0.0,0.012217
std,,,,43.220491,43.851818,43.46106,36542820.0,0.0,0.494317
min,,,,73.046501,75.014,75.014,-311346000.0,0.0,0.0
25%,,,,101.599998,105.372002,104.424627,43670080.0,0.0,0.0
50%,,,,151.029999,153.786499,153.360001,58486450.0,0.0,0.0
75%,,,,174.259995,176.257507,176.228626,79621300.0,0.0,0.0


JP_Morgan


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
count,1745,1604.0,1737.0,1582.0,1636.0,1719.0,1629.0,1622.0,1624.0
unique,1735,1490.0,1624.0,,,,,,
top,2021-11-19 00:00:00-05:00,142.94000244140625,104.5500030517578,,,,,,
freq,2,4.0,3.0,,,,,,
mean,,,,157.278629,159.404626,148.846937,12145600.0,0.017201,0.0
std,,,,55.548462,56.773825,60.880834,6963025.0,0.134341,0.0
min,,,,76.910004,79.029999,67.407692,-43595700.0,0.0,0.0
25%,,,,115.3325,116.325001,105.232346,8327000.0,0.0,0.0
50%,,,,140.729996,142.560005,133.16333,10490700.0,0.0,0.0
75%,,,,179.537495,184.087505,177.872009,14208800.0,0.0,0.0


Johnson


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
count,1740,1662.0,1653.0,1642.0,1739.0,1685.0,1688.0,1615.0,1631.0
unique,1735,,1391.0,,,,,,
top,2019-10-04 00:00:00-04:00,,164.38999938964844,,,,,,
freq,2,,5.0,,,,,,
mean,,156.956276,,155.773149,157.144959,142.820882,7935034.0,0.015554,0.0
std,,14.361019,,14.439609,14.631667,18.687577,6955988.0,0.130249,0.0
min,,117.0,,109.160004,111.139999,23.964251,-14417500.0,0.0,0.0
25%,,147.404995,,146.0625,147.205002,128.345688,5689475.0,0.0,0.0
50%,,158.214996,,156.784996,158.320007,146.858017,6899850.0,0.0,0.0
75%,,166.0,,164.822502,166.184998,153.897949,8623650.0,0.0,0.0


Total_Energies


Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits
count,1735,1649.0,1598.0,1722.0,1638.0,1662.0,1563.0,1642.0,1729.0
unique,1735,1292.0,1303.0,,,,,,
top,2025-11-24 00:00:00-05:00,55.0,61.7599983215332,,,,,,
freq,1,4.0,5.0,,,,,,
mean,,,,54.213078,54.853316,45.334847,1933867.0,0.013096,0.0
std,,,,9.589139,10.482233,12.379737,1226913.0,0.101968,0.0
min,,,,22.129999,24.76,17.157011,-4659500.0,0.0,0.0
25%,,,,48.259998,48.73,35.487028,1220950.0,0.0,0.0
50%,,,,55.004999,55.545,42.105955,1662600.0,0.0,0.0
75%,,,,61.474999,61.937499,57.527821,2389650.0,0.0,0.0


In [None]:
#all columns types need to be floats except the data one need to be formated as data type
print("________Apple______")
display(apple_df.info())
print("_______Amazon_______")
display(amazon_df.info())
print("________JP_Morgan________")
display(JP_Morgan_df.info())
print("_________Johnson_________")
display(Johnson_df.info())
print("_________Total_Energies_________")
display(Total_Energies_df.info())

________Apple______
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1740 entries, 0 to 1739
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Date          1740 non-null   object 
 1   Open          1621 non-null   object 
 2   High          1715 non-null   object 
 3   Low           1695 non-null   float64
 4   Close         1654 non-null   float64
 5   Adj Close     1698 non-null   float64
 6   Volume        1703 non-null   float64
 7   Dividends     1596 non-null   float64
 8   Stock Splits  1625 non-null   float64
dtypes: float64(6), object(3)
memory usage: 122.5+ KB


None

_______Amazon_______
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1740 entries, 0 to 1739
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Date          1740 non-null   object 
 1   Open          1731 non-null   object 
 2   High          1690 non-null   object 
 3   Low           1581 non-null   float64
 4   Close         1577 non-null   float64
 5   Adj Close     1710 non-null   float64
 6   Volume        1692 non-null   float64
 7   Dividends     1630 non-null   float64
 8   Stock Splits  1637 non-null   float64
dtypes: float64(6), object(3)
memory usage: 122.5+ KB


None

________JP_Morgan________
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1745 entries, 0 to 1744
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Date          1745 non-null   object 
 1   Open          1604 non-null   object 
 2   High          1737 non-null   object 
 3   Low           1582 non-null   float64
 4   Close         1636 non-null   float64
 5   Adj Close     1719 non-null   float64
 6   Volume        1629 non-null   float64
 7   Dividends     1622 non-null   float64
 8   Stock Splits  1624 non-null   float64
dtypes: float64(6), object(3)
memory usage: 122.8+ KB


None

_________Johnson_________
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1740 entries, 0 to 1739
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Date          1740 non-null   object 
 1   Open          1662 non-null   float64
 2   High          1653 non-null   object 
 3   Low           1642 non-null   float64
 4   Close         1739 non-null   float64
 5   Adj Close     1685 non-null   float64
 6   Volume        1688 non-null   float64
 7   Dividends     1615 non-null   float64
 8   Stock Splits  1631 non-null   float64
dtypes: float64(7), object(2)
memory usage: 122.5+ KB


None

_________Total_Energies_________
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1735 entries, 0 to 1734
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Date          1735 non-null   object 
 1   Open          1649 non-null   object 
 2   High          1598 non-null   object 
 3   Low           1722 non-null   float64
 4   Close         1638 non-null   float64
 5   Adj Close     1662 non-null   float64
 6   Volume        1563 non-null   float64
 7   Dividends     1642 non-null   float64
 8   Stock Splits  1729 non-null   float64
dtypes: float64(6), object(3)
memory usage: 122.1+ KB


None

Since there are some negative values for volumes, nan values, big  differences between Adjusted close and Close price, not all columns types are floats etc..., we need to process to cleanig the data.

Since our dataset,we need to have for each unique day ( like a Date ID), one row, we need to see if there are duplicates :

In [102]:
#check if there are duplicates by seeing if there are two rows for the same day
print(apple_df['Date'].duplicated().sum())
print(amazon_df['Date'].duplicated().sum())
print(JP_Morgan_df['Date'].duplicated().sum())
print(Johnson_df['Date'].duplicated().sum())
print(Total_Energies_df['Date'].duplicated().sum())

5
5
10
5
0


> Since it's the same reasoning for the  5 datasets, we'll define functions that perform the same operations for the cleaning



In [103]:
datasets = {
    "Apple": apple_df,
    "Amazon": amazon_df,
    "JP Morgan": JP_Morgan_df,
    "Johnson&Johnson": Johnson_df,
    "Total Energies": Total_Energies_df,
}

In [104]:
# function that drops duplicates
#First Change data column type to data format
def change_to_date(df):
    df['Date'] = pd.to_datetime(df['Date'],errors="coerce",utc=True)

    print("Column Date converted to date format and sorted by date ")
    return df

#Convert other columns to float, if it's error, make it NaN  :
def convert_to_float(df):
    for col in df.columns:
        if col != 'Date':
            df[col] = pd.to_numeric(df[col], errors='coerce')

    return df


In [105]:
#drop duplicates by looking at the date that acts like key
def remove_duplicates(df,name):
    dup_count = df['Date'].duplicated().sum()
    if dup_count > 0:
        df = df.drop_duplicates()
        print(f"{name}: duplicates removed")
    return df

###2.2 Handling Missing Values :

In [106]:
#First of all, since we have negative values which can't be possible
#So we assume that it's just an input error and it's just postive value Instead

def handle_negative_values(df):
   numeric_cols = df.select_dtypes(include=["int64", "float64"]).columns
   for col in numeric_cols:
       df[col] = df[col].abs()
   return df



> Here, it's preferable to not drop any empty missing entry since we are going to compare the companies by periods the NaN values placements are random


*   Middle rows :
*   First rows :
*   Last rows :




In [107]:
import pandas as pd
import numpy as np

def fill_missing_prices_by_neighbors(df):

    price_cols = [c for c in ['Close', 'Adj Close'] if c in df.columns]

    for col in price_cols:
        n = len(df)

        #first row case
        if pd.isna(df[col].iloc[0]):
          #if the first cell if NaN it takes the next cell if it's not NAN, otherwise the next one
            if pd.notna(df[col].iloc[1]):
                df[col].iloc[0] = df[col].iloc[1]
            elif pd.notna(df[col].iloc[2]):
                df[col].iloc[0] = df[col].iloc[2]

        #starting from second row to before last row
        for i in range(1, n - 1):
          #takes the average between it pre
            if pd.isna(df[col].iloc[i]):
                prev_val = df[col].iloc[i - 1]
                next_val = df[col].iloc[i + 1]

                if pd.notna(prev_val) and pd.notna(next_val):
                    df[col].iloc[i] = (prev_val + next_val) / 2

                elif pd.notna(prev_val):
                    df[col].iloc[i] = prev_val #takes only the previous one if there the next row is empty

                elif pd.notna(next_val):
                    df[col].iloc[i] = next_val #takes only the new one if there the previous row is empty

        #last row case
        if pd.isna(df[col].iloc[n - 1]):
            if pd.notna(df[col].iloc[n - 2]):
                df[col].iloc[n - 1] = df[col].iloc[n - 2]
            elif pd.notna(df[col].iloc[n - 3]):
                df[col].iloc[n - 1] = df[col].iloc[n - 3]

    return df



In [108]:
#for the volume, we replace it by the median
def fill_missing_volume(df):
  if 'Volume' in df.columns:
    median_vol = df['Volume'].median()
    df['Volume'] = df['Volume'].fillna(median_vol)
  return df

###2.3 Transforming the features :

In [109]:
# Since Dividends and Stock Splits columns are always 0, drop them
def drop_dividends_and_stock_splits(df):
    df = df.drop(columns=['Dividends', 'Stock Splits'])
    return df

In [113]:
#see if there are outliers between close price and ajusted price (the difference should not be huge )
def difference_close_adjusted(df,threshhold=2):
  for i in range (len(df)):
    close_val = df['Close'].iloc[i]
    adj_val = df['Adj Close'].iloc[i]   # or 'Adj Close' if this is your column
    diff = abs(close_val - adj_val)
    if diff>threshhold:
      df.loc[i, adj_val] = close_val
    return df


In [111]:
#Add Year Month Day column

def add_year_month_day(df):
  df['Year'] = df['Date'].dt.year
  df['Month'] = df['Date'].dt.month
  df['Day'] = df['Date'].dt.day
  return df


###2.4 Apply these transformations to the dataset once  :

In [114]:
def full_clean_df(df, name):


    #2-remove duplicates
    df=change_to_date(df)
    df=convert_to_float(df)
    df = remove_duplicates(df, name)

    #2- Handle missing values
    df = handle_negative_values(df)
    df = fill_missing_prices_by_neighbors(df)
    df = fill_missing_volume(df)

    #3-Transform
    df = drop_dividends_and_stock_splits(df)
    df= difference_close_adjusted(df,threshhold=2) # Handling outliers
    df= add_year_month_day(df)
    return df

# Original code from the cell
cleaned_datasets = {}

for name, df in datasets.items():
    cleaned_datasets[name] = full_clean_df(df.copy(), name)


Column Date converted to date format and sorted by date 
Apple: duplicates removed
Column Date converted to date format and sorted by date 
Amazon: duplicates removed


You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  df[col].iloc[i] = (prev_val + next_val) / 2
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col].iloc[i] = (

Column Date converted to date format and sorted by date 
JP Morgan: duplicates removed
Column Date converted to date format and sorted by date 
Johnson&Johnson: duplicates removed


You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  df[col].iloc[i] = (prev_val + next_val) / 2
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col].iloc[i] = (

Column Date converted to date format and sorted by date 


You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:

df["col"][row_indexer] = value

Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

  df[col].iloc[i] = (prev_val + next_val) / 2
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col].iloc[i] = (

## 3. Store :

## 4. Analyze :




Kaggle links to follow  :
https://www.kaggle.com/code/noeyislearning/apple-stock-prices-lstm-lr

For indicators , this notebook is top : https://www.kaggle.com/code/lusfernandotorres/data-science-for-financial-markets


###4.1 : Financial Indicators :

> FINANCIAL RETURNS

In [115]:
### on peut faire un markdaws before the code to write each formula

# FINANCIAL RETURNS
#Compute daily returns
#compute log returns


> RISK MEASUREMENT

In [None]:
#Compute volatility
#Annualized volatility
#Value at Risk (VaR) – Downside Risk
#Maximum Drawdown (Crash Risk)


> Comparative Risk Analysis Between Sectors

###4.2 : Diversification

> Correlation Matrix


> Portfolio Variance Formula et Equal-Weight Diversified Portfolio (
   pour comparer Individual stock volatility et portfolio volatility )

> sector level correlation ( genre entre tech et consumer, entre finance et energy .... )

> COVID Crisis Period Analysis (JSP LE TRUC DU COVID ON LE MET OU EXACTEMET mdr )

###4.3 Visualizations : ( la ya dans le dans le 1er link du kaggle )

## 5. Communicate :