# EDA ON SP500 STOCKS

## Import Libraries and extract datas from pickle

In [5]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from helpermodules import correlation_study
import pickle
from helpermodules.correlation_study import CorrelationAnalysis

In [6]:
# Take data
with open('pickle_files/cleaned_sp500_df.pkl', 'rb') as f:
    df = pickle.load(f)

This dataframe df contains the cleaned closing prices of all the stocks componing the index SP500 from 2014 to 2024.


## Descriptive Statistics

In [7]:
# Extract tickers
tickers = df.columns.tolist()

# Compute percentage changes and drop NaNs
df_returns = df.pct_change().dropna(how='any')

# Calculate mean, median, standard deviation, skewness, kurtosis for each stock
descriptive_stats = df_returns.describe().transpose()
descriptive_stats['skewness'] = df_returns.skew()
descriptive_stats['kurtosis'] = df_returns.kurtosis()

# Compute mean, median, standard deviation of the mean of pct returns 
avg_stats_mean = descriptive_stats['mean'].describe()

print(avg_stats_mean)

count    469.000000
mean       0.000646
std        0.000394
min       -0.000789
25%        0.000422
50%        0.000586
75%        0.000837
max        0.003719
Name: mean, dtype: float64


1. **Mean:** The overall mean of the average daily percentage returns is approximately 0.000646 (or 0.0646%), indicating a small positive daily return on average for all stocks.

2. **Standard Deviation (std):** The standard deviation of the average daily returns is approximately 0.000394 (or 0.0394%), suggesting low variability in the average daily returns across stocks.

3. **Minimum and Maximum:** The minimum mean daily return is -0.000789 (-0.0789%), while the maximum is 0.003719 (0.3719%). These extremes suggest that while most stocks exhibit small positive returns, there are outliers with both small negative and relatively high positive mean daily returns.

4. **Percentiles (25%, 50%, 75%):** 
   - **25th Percentile (0.000422):** 25% of the stocks have mean daily returns below 0.0422%.
   - **50th Percentile (Median, 0.000586):** The median value represents the midpoint of the mean returns distribution, with half the stocks having returns above this value and half below.
   - **75th Percentile (0.000831):** 75% of the stocks have mean daily returns below 0.0831%, indicating that most stocks cluster around small positive returns.

The distribution of the mean daily percentage returns indicates that the S&P 500 stocks generally show small positive returns, with limited variability. The presence of some outliers, as seen in the minimum and maximum values, suggests a need to consider the effects of extreme performers when analyzing the dataset.

In [8]:
# Count number of stocks with positive skewness
positive_skewness = descriptive_stats['skewness'] > 0
print(f'Number of stocks with positive skewness: {positive_skewness.sum()}')

#Count and print number of stocks with negative kurtois
negative_kurtosis = descriptive_stats['kurtosis'] < 0
print(f'Number of stocks with negative kurtosis: {negative_kurtosis.sum()}')

Number of stocks with positive skewness: 205
Number of stocks with negative kurtosis: 0


Out of the S&P 500 stocks, 205 exhibit positive skewness in their daily returns. Positive skewness indicates that the right tail of the distribution is longer or fatter than the left. This suggests that these stocks have a higher probability of experiencing large positive returns, which may be attractive to investors seeking high-growth opportunities.

None of the stocks have negative kurtosis. Since kurtosis measures the "tailedness" of the distribution, a positive kurtosis indicates heavier tails than a normal distribution (leptokurtic), which is typical for financial return data. This implies that extreme events (both positive and negative) are more likely than would be predicted by a normal distribution.