#### Is Monday Effect an Urban Myth?

Author: Sungbin Youk\
Date: May 22nd, 2021
----

## Predicting the Stock Market

It will be great if you can predict the changes in the stock market. It will make you rich. Isn't that everyone's dream? Unfortunately, [efficient market hypothesis](https://www.investopedia.com/terms/e/efficientmarkethypothesis.asp) postulates that generating a stable parameter that reflects the share prices is impossible as the share prices reflect all information. 

It would be against the efficient market hypothesis if there is a predictable *pattern* in the stock market. In 1973, [Frank Cross](https://www.jstor.org/stable/pdf/4529641.pdf?refreqid=excelsior%3Adeff8e6e9e2c4c0b275b4b03a21b9c13) documented a non-random movement in stock prices. Here are the main findings from examining the Standard & Poor's Composite Stock Index from 1953 to 1970:
- The index have risen on Friday more often than on any other days of the week, and have risen least often on Monday. 
- When the Friday index declined, the Monday index was more likely to also see a decline. When the Friday index advanced, the Monday index was likely to remain static (neither advancing nor declining). 

## What is Monday Effect?

Over the years, Frank Cross's findings were coined into what is now known as the **Monday Effect**. There are two different definitions of the monday effect (each corresponding to the two findings that are mentioned above). 

- Monday effect states that the returns on Monday are less than the other days of the week, and are often negative on average ([Pettengill, 2003](https://www.jstor.org/stable/pdf/23292837.pdf?refreqid=excelsior%3A6da162ff7d91746d901fc154171e6015)).
- Monday effect states that the returns on the stock market on Monday, especially the first few hours, will follow the pattern of the previous Friday, espeically the last few hours ([Investopedia](https://www.investopedia.com/terms/m/mondayeffect.asp)). 

You may wonder what may be the reason behind this abnormality in the stock prices. As the existence of Monday effect is controversial (thus, the reason for our project), there isn't a clear answer. Some state that the stock returns are low on Monday because companies may hold on to bad news until the last day of stock trading (Friday), which in turn makes the next stock trading day (Monday) to take the hit. 

## Our Objectives

The objective of our project is in two-folds:
1) [Arman and Lestari](https://www.atlantis-press.com/proceedings/icame-18/125917114) examined the Monday effect (the first definition) in the Indonesian Stock Exchange. We will first examine if we can examine the same results in the U.S. stock market.


## Summary of Arman and Lestari's Study

Arman and Lestari examined the Monday effect by examining the banking sectors on the Indonesian stock market from 2014 to 2017. A one-sample t-test was conducted for each of the weekdays. The results indicated that the average stock return on Monday is -0.0006, which was not statistically significant. 

## Tackling Objective 1

In our analysis, the stock returns of S&P 500 from 2014 to 2017 are examined. The stock return data are obtained from yfinance package in python.

### Importing Libraries and Packages

In [1]:
import yfinance as yf
import numpy as np
import pandas as pd
import requests
import datetime
from datetime import date
import calendar
import io
from scipy import stats

### Importing the list of ticker for S&P 500 between 2014 to 2017

The first step is to retrieve the companies that constituted S&P 500 in the past. 

In [2]:
# Downloading the csv file from a Github page which has a list of companies and when they were added or removed from S&P 500
url = "https://raw.githubusercontent.com/leosmigel/analyzingalpha/master/sp500-historical-components-and-changes/sp500_history.csv"
download = requests.get(url).content

# Reading the downloaded content and turning it into a pandas dataframe
df = pd.read_csv(io.StringIO(download.decode('utf-8')))

#Turning the date column into a datetime object
df["date"] = pd.to_datetime(df["date"])

# Printing out the first 5 rows of the dataframe
df.head()

Unnamed: 0.1,Unnamed: 0,cik,date,name,value,variable
0,183,72741.0,1957-01-01,Eversource Energy,ES,added_ticker
1,228,874766.0,1957-01-01,Hartford Financial Svc.Gp.,HIG,added_ticker
2,435,1113169.0,1957-01-01,T. Rowe Price Group,TROW,added_ticker
3,349,1111711.0,1957-01-01,NiSource Inc.,NI,added_ticker
4,185,1109357.0,1957-01-01,Exelon Corp.,EXC,added_ticker


In [3]:
# Function to retrieve the tickers in S&P 500 for a given timeframe
def past_SP_ticker(end_date):
    ticker_list = []
    global df
    for index,row in df.iterrows():
        if row['date'] > end_date:
            break
        else:
            if row['variable'] == "added_ticker":
                ticker_list.append(row['value'])
            elif row['value'] in ticker_list:
                ticker_list.remove(row['value'])
    return ticker_list

In [4]:
# Using the past_SP_ticker() function to retrieve the tickers of S&P 500 for 2017. 
end_date = '20171231'
date_time_obj = datetime.datetime.strptime(end_date,'%Y%m%d')
SP_ticker_2017 = past_SP_ticker(date_time_obj)

### Creating a dataframe of stock returns for the identified S&P 500 constituents of 2017
The next step is to obtain the daily stock returns of the selected companies. This requires several steps: obtain the stock data of the S&P 500 constituents of 2017, delete the missing values, calculate the log retruns, create a multilevel index (i.e., hierarchical index) with the days of the week

#### Obtaining the stock data of S&P 500 constituents of 2017

In [5]:
# Using the ticker to obtain stock prices from yfinance
rawdata = yf.download(SP_ticker_2017, start="2013-12-31", end="2017-12-31")
rawdata.head()

[*********************100%***********************]  488 of 488 completed

33 Failed downloads:
- STI: No data found, symbol may be delisted
- Q: No data found for this date range, symbol may be delisted
- VIAB: No data found, symbol may be delisted
- JEC: No data found, symbol may be delisted
- RTN: No data found, symbol may be delisted
- NBL: No data found, symbol may be delisted
- UA-C: No data found, symbol may be delisted
- KFT: No data found for this date range, symbol may be delisted
- LUK: No data found for this date range, symbol may be delisted
- CXO: No data found, symbol may be delisted
- WCG: No data found, symbol may be delisted
- BRK.B: No data found, symbol may be delisted
- ETFC: No data found, symbol may be delisted
- TIF: No data found, symbol may be delisted
- SYMC: No data found, symbol may be delisted
- DLPH: No data found, symbol may be delisted
- MYL: No data found, symbol may be delisted
- GGP: No data found for this date range, symbol may be delisted
- FOXA: Da

Unnamed: 0_level_0,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,Adj Close,...,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume,Volume
Unnamed: 0_level_1,A,AAL,AAP,AAPL,ABBV,ABC,ABT,ACN,ADBE,ADI,...,XEL,XLNX,XOM,XRAY,XRX,XYL,YUM,ZBH,ZION,ZTS
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2013-12-30,,,,,,,,,,,...,,,,,,,,,,
2013-12-31,38.247643,23.80422,108.380943,17.819059,38.220161,62.277599,33.028294,70.729378,59.880001,42.959011,...,1752800.0,1215400.0,8509600.0,434400.0,2033475.0,558000.0,2966800.0,650000.0,1077400.0,2270400.0
2014-01-02,37.592236,23.907927,107.460464,17.568451,37.619469,61.905617,32.942123,69.79171,59.290001,41.567257,...,3192300.0,3436800.0,11028100.0,1025400.0,3977691.0,765100.0,2721200.0,868800.0,1356700.0,2576100.0
2014-01-03,38.067085,25.020357,110.535233,17.18255,37.851063,61.949898,33.295418,70.023994,59.16,41.845615,...,2939400.0,1982700.0,9295600.0,623300.0,2763747.0,454500.0,2026800.0,1288200.0,1122500.0,2524900.0
2014-01-06,37.879818,25.482304,109.477676,17.276245,36.468731,61.728436,33.734875,69.284195,58.119999,41.609428,...,3382300.0,1970800.0,11848500.0,986700.0,5657131.0,849400.0,4083600.0,1414900.0,1988200.0,2763200.0


#### Deleting the missing values

In [6]:
# Inspecting the missing values in terms of rows
rawdata['Adj Close'].isna().sum(axis=0).describe()

count     488.000000
mean       82.977459
std       262.763277
min         3.000000
25%         3.000000
50%         3.000000
75%         3.000000
max      1011.000000
dtype: float64

In [7]:
# Making an list of tuples for tickers that has more than 3 missing values
high_missing_ticker = rawdata['Adj Close'].isna().sum(axis=0) > 3
high_missing_ticker_list = high_missing_ticker[high_missing_ticker].index.tolist()
high_missing_ticker_tuples = list()
for i in ['Adj Close', 'Open', 'Close', 'High' ,'Low', 'Volume']:
    high_missing_ticker_tuples += list(zip([i]*len(high_missing_ticker_list),high_missing_ticker_list))

In [8]:
# Excluding columns (i.e., tickers) that has more than 3 missing values 
rawdata = rawdata.drop(high_missing_ticker_tuples, axis = 1)

In [9]:
# Finding out the dates that all tickers (columns) have missing values
missingdate =rawdata.isna().sum(axis=1) > 0
missingdate[missingdate].index

DatetimeIndex(['2013-12-30', '2016-01-18', '2017-01-02'], dtype='datetime64[ns]', name='Date', freq=None)

In [10]:
# row with the index of 2013-12-31 will be deleted as it is out of the scope of our data (2014~2017)
rawdata = rawdata.drop(pd.Timestamp('2013-12-30'))

In [11]:
# rows with the index of 2017-01-02 and 2017-02-20 are replaced with the values from the previous date
rawdata = rawdata.fillna(method= 'ffill')

In [12]:
# Double check to see if all the missing values were either removed or replaced
(rawdata.isna().sum(axis=None)>0).any()

False

#### Calculating the log returns for closing price

In [13]:
# Getting the log returns from stock prices
logret = np.log(rawdata['Close']).diff()
logret.columns = pd.MultiIndex.from_product([['logreturn'], logret.columns])
# Joining logret and rawdata 
rawdata = rawdata.join(logret)
# row with the index of 2013-12-31 will be deleted as it is out of the scope of our data (2014~2017)
rawdata = rawdata.drop(pd.Timestamp('2013-12-31'))

In [14]:
print("After preprocessing the data, we have idenified the log returns of {} companies, which were included in S&P500 in 2017. To recap, we are examining the stock returns from 2014 to 2017. Therefore, we will be examining the stock returns of {} days".format(len(logret.columns), len(logret)))

After preprocessing the data, we have idenified the log returns of 438 companies, which were included in S&P500 in 2017. To recap, we are examining the stock returns from 2014 to 2017. Therefore, we will be examining the stock returns of 1010 days


#### Creating a new columns for days of the week

In [15]:
# The day of the week is added as a new index (creating a hierarchical index)
rawdata['days of week'] = [calendar.day_name[day.weekday()] for day in rawdata.index]

#### Exporting dataframe as csv

In [16]:
rawdata.to_csv('Sungbin2014_2017_SP500.csv')

#### Creating a multilevel index: Adding the week of day as an index

In [17]:
rawdata = rawdata.set_index(['days of week', logret.index])

ValueError: Length mismatch: Expected 1009 rows, received array of length 1010

### Analyzing the Monday Effect
The next step is to analyze the Monday effect. First, as done in Arman and Lestari's research, one-sample t-test is conducted for each day of the week. The test value is 0. Therefore, a significant result indicates that it is highly unlikely to have obtained the average log stock returns on a specific day of the week given that the null hypothesis is true (i.e. the average log stock return is 0).

In [122]:
# Before we get into conducting one sample t-test, 
# let's look at the mean of log stock returns for each day of the week
logret.groupby(level=0).mean().mean(axis=1)

day
Friday       0.000360
Monday      -0.000040
Thursday     0.000372
Tuesday      0.000450
Wednesday    0.001083
dtype: float64

Wow! Unlike other days of the week (where the average log stock return is positive), Monday has a negative log stock returns. Let's see if this value is statistically significant. 

In [128]:
scipy.stats.ttest_1samp(logret.groupby(level=0),0)

  return array(a, dtype, copy=False, order=order)


ValueError: NumExpr 2 does not support Unicode as a dtype.