# Mini Project 2

**2025 Introduction to Quantiative Methods in Finance**

**The Erdös Institute**


###  Hypothesis Testing of Standard Assumptions Theoretical Financial Mathematics

In the theory of mathematical finance, it is common to assume the log returns of a stock/index are normally distributed.


Investigate if the log returns of stocks or indexes of your choosing are normally distributed. Some suggestions for exploration include:

    1) Test if there are period of times when the log-returns of a stock/index have evidence of normal distribution.
    
    2) Test if removing extremal return data creates a distribution with evidence of being normal.
    
    3) Create a personalized portfolio of stocks with historical log return data that is normally distributed.
    
    4) Test if the portfolio you created in the first mini-project has significant periods of time with evidence of normally distributed log returns.
    
    5) Gather x-number of historical stock data and just perform a normality test on their log return data to see if any of the stocks exhibit evidence of log returns that are normally distributed.

In [2]:
#Import Packages
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
from scipy.optimize import minimize
import datetime as dt
import scipy.stats as stats
sns.set_style('darkgrid')
from scipy.stats import normaltest

import yfinance as yf

## Testing portfolios in the first mini-project for normality for significant periods of time.

In first mini projects, we built 2 portfolios (of higher and lower risk) made up of stocks over 5 years:

- **AAPL**: Apple Inc.  
- **GM**: General Motors Company  
- **NFLX**: Netflix, Inc.  
- **F**: Ford Motor Company  
- **ADBE**: Adobe Inc.  
- **TSLA**: Tesla, Inc.

We first find the portfolio return time series and then test for normality of log daily return distribution for time intervals of one year over 5 years. 

To assess whether the log returns are normally distributed, we perform D'Agostino and Pearson’s test for normality. The resulting p-value provides statistical evidence to evaluate the null hypothesis of normality. Specifically, if the p-value is less than 0.05, we reject the null hypothesis, indicating the log returns do not follow a normal distribution.


### Testing for lower risk portfolio

In [158]:
#Uploading the portfolio data

tickers = ["AAPL", "GM", "NFLX", "F", "ADBE", "TSLA"]

start_date = dt.datetime.today()-dt.timedelta(days = 5*365)
end_date = dt.datetime.today()

stock = yf.download(tickers, start = start_date, end =end_date)

#Find daily log returns for stocks
daily_log_returns = np.log(stock['Close']/stock['Close'].shift(1))
daily_log_returns = daily_log_returns.dropna()

covariance_matrix = 252*((daily_log_returns).cov())

# Number of assets i.e. stocks
n_assets = len(tickers)

# Define an initial guess for asset weights (we take equal weights)
initial_weights = np.array([1/n_assets] * n_assets)

# Define weight constraints
constraints = ({'type': 'eq', 'fun': lambda weights: np.sum(weights)-1}, #Sum of weights equals 1 
               {'type': 'ineq', 'fun': lambda weights: min(weights)-.05}, #Allocate at least 5% of capital into each index in stock_symbols
              {'type': 'ineq', 'fun': lambda weights: .30-max(weights)}) #Do not allocate more than 30% of capital into each index in stock_symbol

# Objective function to minimize portfolio variance i.e. yearly volatility
def portfolio_volatility(weights):
    portfolio_std_dev = np.sqrt(np.dot(weights.T, np.dot(covariance_matrix, weights)))
    return portfolio_std_dev

# Run the optimization to find the optimal weights with constraints
result = minimize(portfolio_volatility, initial_weights, constraints=constraints)

# Optimal asset weights
optimal_weights = result.x

  stock = yf.download(tickers, start = start_date, end =end_date)
[*********************100%***********************]  6 of 6 completed


In [159]:
# Extract 'Close' prices
close = stock['Close']

# Calculate daily returns (simple returns) for each stock
daily_returns = close/close.iloc[0]
daily_returns = daily_returns.dropna()

# Calculate daily weighted portfolio returns
daily_weighted_returns = daily_returns.dot(optimal_weights)

# Calculate daily log returns of portfolio
daily_log_weighted_returns = np.log(daily_weighted_returns)



In [160]:
# Perform normality test for each year over 5 years

# Initialize list to store p-values and corresponding date ranges
p_value_array = []

for j in range(5):
    # Define time window: from (j+1) years ago to j years ago
    end_date = dt.datetime.today() - dt.timedelta(days=j*365)
    start_date = dt.datetime.today() - dt.timedelta(days=(j+1)*365)
    
    # Filter log returns data for this yearly period
    filtered_log_return_data = daily_log_weighted_returns.loc[start_date.strftime('%Y-%m-%d'):end_date.strftime('%Y-%m-%d')]
    
    # Perform normality test on filtered yearly log return data
    p_value = stats.normaltest(filtered_log_return_data)[1]
    
    # Store (start_date to end_date) and associated p-value in list
    p_value_array.append([start_date.strftime('%Y-%m-%d') + " to " + end_date.strftime('%Y-%m-%d'), p_value])

# Iterate over stored p-values and print interpretation
for period, p_val in p_value_array:
    if p_val < 0.05:
        print(f"No statistical evidence for normal distribution from {period}")
    else:
        print(f"Statistical evidence for normal distribution from {period}")
    print('--' * 40)
 

Statistical evidence for normal distribution from 2024-06-28 to 2025-06-28
--------------------------------------------------------------------------------
Statistical evidence for normal distribution from 2023-06-29 to 2024-06-28
--------------------------------------------------------------------------------
Statistical evidence for normal distribution from 2022-06-29 to 2023-06-29
--------------------------------------------------------------------------------
No statistical evidence for normal distribution from 2021-06-29 to 2022-06-29
--------------------------------------------------------------------------------
No statistical evidence for normal distribution from 2020-06-29 to 2021-06-29
--------------------------------------------------------------------------------


### Testing for higher risk portfolio

In [161]:
#Uploading the portfolio data

tickers = ["AAPL", "GM", "NFLX", "F", "ADBE", "TSLA"]

start_date = dt.datetime.today()-dt.timedelta(days = 5*365)
end_date = dt.datetime.today()

stock = yf.download(tickers, start = start_date, end =end_date)

n_assets = len(tickers)


# Portfolio with eqaul weights for each stock
equal_weights = np.array([1/n_assets] * n_assets)




  stock = yf.download(tickers, start = start_date, end =end_date)
[*********************100%***********************]  6 of 6 completed


In [162]:
# Extract 'Close' prices
close = stock['Close']

# Calculate daily returns (simple returns) for each stock
daily_returns = close/close.iloc[0]
daily_returns = daily_returns.dropna()

# Calculate daily weighted portfolio returns
daily_weighted_returns = daily_returns.dot(equal_weights)

# Calculate daily log returns of portfolio
daily_log_weighted_returns = np.log(daily_weighted_returns)


In [163]:
# Perform normality test for each year over 5 years

# Initialize list to store p-values and corresponding date ranges
p_value_array = []

for j in range(5):
    # Define time window: from (j+1) years ago to j years ago
    end_date = dt.datetime.today() - dt.timedelta(days=j*365)
    start_date = dt.datetime.today() - dt.timedelta(days=(j+1)*365)
    
    # Filter log returns data for this yearly period
    filtered_log_return_data = daily_log_weighted_returns.loc[start_date.strftime('%Y-%m-%d'):end_date.strftime('%Y-%m-%d')]
    
    # Perform normality test on filtered yearly log return data
    p_value = stats.normaltest(filtered_log_return_data)[1]
    
    # Store (start_date to end_date) and associated p-value in list
    p_value_array.append([start_date.strftime('%Y-%m-%d') + " to " + end_date.strftime('%Y-%m-%d'), p_value])

# Iterate over stored p-values and print interpretation
for period, p_val in p_value_array:
    if p_val < 0.05:
        print(f"No statistical evidence for normal distribution from {period}")
    else:
        print(f"Statistical evidence for normal distribution from {period}")
    print('--' * 40)
 


No statistical evidence for normal distribution from 2024-06-28 to 2025-06-28
--------------------------------------------------------------------------------
Statistical evidence for normal distribution from 2023-06-29 to 2024-06-28
--------------------------------------------------------------------------------
Statistical evidence for normal distribution from 2022-06-29 to 2023-06-29
--------------------------------------------------------------------------------
Statistical evidence for normal distribution from 2021-06-29 to 2022-06-29
--------------------------------------------------------------------------------
No statistical evidence for normal distribution from 2020-06-29 to 2021-06-29
--------------------------------------------------------------------------------



## Normality Test on Log Returns of Selected Stocks

We analyze the log returns of the following stocks over the past five years of trading days:

- Amazon.com, Inc. (AMZN)  
- Microsoft Corporation (MSFT)  
- Alphabet Inc. (GOOG)  
- The Home Depot, Inc. (HD)  
- General Motors Company (GM)  
- Apple Inc. (AAPL)  
- Intel Corporation (INTC)  
- Adobe Inc. (ADBE)

To assess whether the log returns are normally distributed, we perform D'Agostino and Pearson’s test for normality. The resulting p-value provides statistical evidence to evaluate the null hypothesis of normality. Specifically, if the p-value is less than 0.05, we reject the null hypothesis, indicating the log returns do not follow a normal distribution.



In [3]:
# Upload stock x-number of stock data for 2 years
tickers= ['AMZN', 'MSFT', 'GOOG', 'HD', 'GM','AAPL', 'INTC', 'ADBE']


start_date = dt.datetime.today()-dt.timedelta(days = 5*365)
end_date = dt.datetime.today()

stock = yf.download(tickers, start = start_date, end =end_date)

  stock = yf.download(tickers, start = start_date, end =end_date)
[*********************100%***********************]  8 of 8 completed


In [4]:
#Collect p-values of normality tests (D'Agostino and Pearson’s test)

# Initialize list to store p-values
p_value_array = []

for i in tickers:
    # Calculate daily returns and log returns for ticker i
    stock_returns = stock['Close'] / stock['Close'].shift(1)
    stock_log_returns = np.log(stock_returns.dropna())[i].values


    # Perform normality test and extract p-value
    p_value = stats.normaltest(stock_log_returns)[1]
    
    # Append ticker and p-value to list
    p_value_array.append([i, p_value])

# Print p-values for all stocks
print(p_value_array)

[['AMZN', 4.0703026631235075e-30], ['MSFT', 1.5612340539801503e-18], ['GOOG', 1.533550146188051e-22], ['HD', 3.672475568695469e-28], ['GM', 4.1746435261323195e-10], ['AAPL', 5.827347067355156e-34], ['INTC', 1.7886716279390967e-100], ['ADBE', 1.0938045110970739e-88]]


In [5]:
#Print evidence/non-evidence of normality

for i in range(len(tickers)):
    print(f"Test for normality for stock, {tickers[i]}")
    p_value = p_value_array[i]
    if p_value[1]< 0.05:
        print("→ Statistically significant evidence that the data is NOT normally distributed.")
    else:    
       print("→ No statistically significant evidence against normality.")
    print('--'*40) 
    print('--'*40) 

Test for normality for stock, AMZN
→ Statistically significant evidence that the data is NOT normally distributed.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Test for normality for stock, MSFT
→ Statistically significant evidence that the data is NOT normally distributed.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Test for normality for stock, GOOG
→ Statistically significant evidence that the data is NOT normally distributed.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Test for normality for stock, HD
→ Statistically significant evidence that the data is NOT normally distributed.
--------------------------------------------------------

Thus, we see that the given stocks does not have evidence for normality of daily log-returns

## Finding period of times where we have evidence for normal distribution

Here, the 5 year stock data from previous sections is taken and then we perform D'Agostino and Pearson’s test for Normality of daily log returns on time periods of one year. Thus, for 5 years of data, we have 5 time intervals. For this test we compute p-value as statistical evidence. For p<0.05, the null hypothesis of normality is rejected.

In [6]:

tickers =  ['AMZN', 'MSFT', 'GOOG', 'HD', 'GM','AAPL', 'INTC', 'ADBE']
start_date = dt.datetime.today() - dt.timedelta(days=5*365)
end_date = dt.datetime.today()

# Download data for all stocks
stock = yf.download(tickers, start=start_date, end=end_date)

# Extract 'Close' prices
close = stock['Close']

# Calculate daily simple returns for each stock
daily_returns = close/close.iloc[0]
daily_returns = daily_returns.dropna()

# Calculate daily log returns for each stock
daily_log_returns = np.log(daily_returns)



  stock = yf.download(tickers, start=start_date, end=end_date)
[*********************100%***********************]  8 of 8 completed


In [7]:
#printing results for test of normality after obtaining the p-values of different stocks

# Initialize results list
p_value_array = []

# Perform normality test on each stock's yearly log returns
for ticker in tickers:
    for j in range(5):
        # Define date range for year j
        end_period = dt.datetime.today() - dt.timedelta(days=j*365)
        start_period = dt.datetime.today() - dt.timedelta(days=(j+1)*365)
        
        # Filter log returns for the date range
        filtered_log_returns = daily_log_returns[ticker].loc[start_period.strftime('%Y-%m-%d'):end_period.strftime('%Y-%m-%d')]
        
        # Perform normality test if there is sufficient data
        if len(filtered_log_returns) > 0:
            p_value = stats.normaltest(filtered_log_returns)[1]
            p_value_array.append([ticker, start_period.strftime('%Y-%m-%d'), end_period.strftime('%Y-%m-%d'), p_value])
        else:
            p_value_array.append([ticker, start_period.strftime('%Y-%m-%d'), end_period.strftime('%Y-%m-%d'), None])

for i in tickers:
    print(f"Test for normality for stock, {i}")
    print('--'*40)
    # Filter p_value_array entries for the current ticker i
    ticker_entries = [entry for entry in p_value_array if entry[0] == i]
    
    for entry in ticker_entries:
        p_val = entry[3]
        start_date = entry[1]
        end_date = entry[2]

        if p_val < 0.05:
            print(f"No statistically significant evidence for normal distribution for time period: {start_date} to {end_date}")
        else:
            print(f"Statistically significant evidence for normal distribution for time period: {start_date} to {end_date}")
    print('--'*40)
    print('--'*40)     

Test for normality for stock, AMZN
--------------------------------------------------------------------------------
No statistically significant evidence for normal distribution for time period: 2024-06-28 to 2025-06-28
No statistically significant evidence for normal distribution for time period: 2023-06-29 to 2024-06-28
No statistically significant evidence for normal distribution for time period: 2022-06-29 to 2023-06-29
No statistically significant evidence for normal distribution for time period: 2021-06-29 to 2022-06-29
No statistically significant evidence for normal distribution for time period: 2020-06-29 to 2021-06-29
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Test for normality for stock, MSFT
--------------------------------------------------------------------------------
Statistically significant evidence for normal distribution for time period: 2024-06-28

Thus, we observe that, for the given stocks, while their log-daily returns over the past five years do not exhibit evidence of normality, there is evidence supporting normality over certain one-year intervals.

## Testing for Normality after removing extremal data

For removing extremal data of previously given stocks, we exclude log-daily returns data points outside the quantile range of 2 and 98 percentile. After trimming, we then test for normality.

In [45]:
# Define function for trimming extremal data in quantile range of 5th and 95th percentile
def trim_extremes(data, lower_quantile=0.02, upper_quantile=0.98):
    low = np.quantile(data, lower_quantile)
    high = np.quantile(data, upper_quantile)
    trimmed_data = data[(data >= low) & (data <= high)]
    return trimmed_data

In [28]:
#downloading stock data

# Upload stock x-number of stock data for 2 years
tickers= ['AMZN', 'MSFT', 'GOOG', 'HD', 'GM','AAPL', 'INTC', 'ADBE']


start_date = dt.datetime.today()-dt.timedelta(days = 5*365)
end_date = dt.datetime.today()

stock = yf.download(tickers, start = start_date, end =end_date)

  stock = yf.download(tickers, start = start_date, end =end_date)
[*********************100%***********************]  8 of 8 completed


In [46]:
#Trime data and collect p-values of normality tests (D'Agostino and Pearson’s test)

# Initialize list to store p-values
p_value_array = []

for i in tickers:
    # Calculate daily returns and log returns for ticker i
    stock_returns = stock['Close'] / stock['Close'].shift(1)
    stock_log_returns = np.log(stock_returns.dropna())[i].values

    #trim data
    trimmed_returns=trim_extremes(stock_log_returns)


    # Perform normality test and extract p-value
    p_value = stats.normaltest(trimmed_returns)[1]
    
    # Append ticker and p-value to list
    p_value_array.append([i, p_value])

# Print p-values for all stocks
print(p_value_array)

[['AMZN', 0.3557547963830886], ['MSFT', 0.024520457347429962], ['GOOG', 0.05076975301184344], ['HD', 0.00715072276890955], ['GM', 0.5551199021276989], ['AAPL', 0.028486611303305886], ['INTC', 0.052580037722450214], ['ADBE', 0.0014048113821803732]]


In [47]:
#Print evidence/non-evidence of normality

for i in range(len(tickers)):
    print(f"Test for normality for stock, {tickers[i]}")
    p_value = p_value_array[i]
    if p_value[1]< 0.05:
        print("→ Statistically significant evidence that the data is NOT normally distributed.")
    else:    
       print("→ No statistically significant evidence against normality.")
    print('--'*40) 
    print('--'*40) 

Test for normality for stock, AMZN
→ No statistically significant evidence against normality.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Test for normality for stock, MSFT
→ Statistically significant evidence that the data is NOT normally distributed.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Test for normality for stock, GOOG
→ No statistically significant evidence against normality.
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Test for normality for stock, HD
→ Statistically significant evidence that the data is NOT normally distributed.
--------------------------------------------------------------------------------
-----------------

Thus, after removing the extremal data, we see evidence of normality of log-daily returns for some of the given stocks.