# Project Goal:

PREDICT percent change in stock price after next earnings event for all company

One record per earning event date per company.

## Document Purpose:
Get the day before and day after for each stock's earning events (before_eps_day, after_eps_day)

# Packages Imports

In [1]:
!pip install pandas_market_calendars



# Loading Data

## Filter Trading Dates for tickers

Based on tickers in ticker csv, only keep earnings dates records for those stocks

In [2]:
## Imports
import pandas as pd
import pandas_market_calendars as mcal
import numpy as np
from datetime import datetime
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px

import yfinance as yf

In [3]:
# list of stock tickers we are interested in measuring the percent change in (provided):
tickers_df = pd.read_csv("/users/brigitteasullivan/Documents/0. Data Science/Notebooks/Riipen Project/eps-stock-market-ml-project/data/all_tickers-with-options.csv", 
                         header = 0, names = ['Ticker'])
tickers_df

Unnamed: 0,Ticker
0,\t\t1INVH
1,\t\t1LNCO
2,\t\t1LUV
3,\t\t2LW
4,\t\t9AC
...,...
12829,\t\tZZ
12830,\t\tZZK
12831,\t\tZZV
12832,\t\tZZZ


In [4]:
# cleaning: trim tab
tickers_df["Ticker"] = tickers_df['Ticker'].str.strip()
tickers_df

Unnamed: 0,Ticker
0,1INVH
1,1LNCO
2,1LUV
3,2LW
4,9AC
...,...
12829,ZZ
12830,ZZK
12831,ZZV
12832,ZZZ


## Earnings dates

Objective is to get a percentage change in stock price between two dates, to then calculate the percentage change in stock price after an earnings announcement. Companies announce earnings either "premarket", where earnings are announced before markets open, or "postmarket" after markets close for that day. 

**Decision:** If company announced is premarket - earnings date to watch is same day and the previous open market day. 
if company announced is postmarket - earnings date to watch is next trading date and the day anounced.

From documentation: https://www.earningscalendar.net/documentation

*   `date` ISO date of this earnings event.
*   `ticker` Ticker as traded on the exchange.
*   `url` URL to the press release that confirmed this event.
*   `title` Title of the press release that confirmed this event.
*   `security\_name` Name of the security trading under this symbol/ticker.
*   `exchange` The exchange where the security is trading. One of AMEX, NASDAQ, NYSE.
*   `when` When are earnings slated to be released that day. Returned only if known, otherwise the string "null".
*   `pub\_date` Publication date of the press release that confirmed this event



In [5]:
# load provided data set for earnings events:
earnings_df = pd.read_csv('/users/brigitteasullivan/Documents/0. Data Science/Notebooks/Riipen Project/eps-stock-market-ml-project/data/confirmed_earnings.csv')
earnings_df

Unnamed: 0,date,exchange,symbol,when,title,url,pub_date,security_name
0,2023-01-03,NASDAQ,LEDS,premarket,SemiLEDs to Announce Fiscal First Quarter 2023...,https://www.businesswire.com/news/home/2022122...,2022-12-27T11:05:00,SemiLEDS Corporation - Common Stock
1,2023-01-03,NASDAQ,SGH,postmarket,SGH Announces First Quarter Fiscal 2023 Financ...,https://www.businesswire.com/news/home/2022121...,2022-12-13T21:05:00,"SMART Global Holdings, Inc. - Ordinary Shares"
2,2023-01-04,NASDAQ,RGP,postmarket,"Resources Connection, Inc. to Announce Fiscal ...",https://www.businesswire.com/news/home/2022122...,2022-12-21T21:05:00,"Resources Connection, Inc. - Common Stock"
3,2023-01-04,NASDAQ,SLP,postmarket,Simulations Plus Sets Date for First Quarter F...,https://www.businesswire.com/news/home/2022121...,2022-12-19T21:06:00,"Simulations Plus, Inc. - Common Stock"
4,2023-01-05,NYSE,CAG,,CONAGRA BRANDS ANNOUNCES DETAILS OF FISCAL 202...,https://www.prnewswire.com/news-releases/conag...,2022-12-05T07:30:00,"ConAgra Brands, Inc. Common Stock"
...,...,...,...,...,...,...,...,...
9471,2023-12-21,NASDAQ,LMNR,postmarket,Limoneira to Announce Fourth Quarter and Full ...,https://www.businesswire.com/news/home/2023120...,2023-12-07T13:30:00,Limoneira Co - Common Stock
9472,2023-12-21,NYSE,NKE,,"NIKE, Inc. Announces Second Quarter Fiscal 202...",https://www.businesswire.com/news/home/2023112...,2023-11-21T21:15:00,"Nike, Inc. Common Stock"
9473,2023-12-21,NASDAQ,PAYX,premarket,"Paychex, Inc. Schedules Second Quarter Fiscal ...",https://www.businesswire.com/news/home/2023120...,2023-12-08T19:00:00,"Paychex, Inc. - Common Stock"
9474,2023-12-21,NYSE,WS,postmarket,Worthington Steel to Webcast Discussion of Sec...,https://www.businesswire.com/news/home/2023120...,2023-12-04T21:10:00,"Worthington Steel, Inc. Common Shares"


In [6]:
earnings_df = earnings_df.loc[:,['date', 'exchange', 'symbol', 'when', 'security_name']].copy()

In [7]:
earnings_df

Unnamed: 0,date,exchange,symbol,when,security_name
0,2023-01-03,NASDAQ,LEDS,premarket,SemiLEDS Corporation - Common Stock
1,2023-01-03,NASDAQ,SGH,postmarket,"SMART Global Holdings, Inc. - Ordinary Shares"
2,2023-01-04,NASDAQ,RGP,postmarket,"Resources Connection, Inc. - Common Stock"
3,2023-01-04,NASDAQ,SLP,postmarket,"Simulations Plus, Inc. - Common Stock"
4,2023-01-05,NYSE,CAG,,"ConAgra Brands, Inc. Common Stock"
...,...,...,...,...,...
9471,2023-12-21,NASDAQ,LMNR,postmarket,Limoneira Co - Common Stock
9472,2023-12-21,NYSE,NKE,,"Nike, Inc. Common Stock"
9473,2023-12-21,NASDAQ,PAYX,premarket,"Paychex, Inc. - Common Stock"
9474,2023-12-21,NYSE,WS,postmarket,"Worthington Steel, Inc. Common Shares"


**Decision:** If no value for when company announced, assume the announcement is premarket. 

In [8]:
# cleaning - remove nans:
# fill missing NA's in "when" with "premarket"

earnings_df.loc[:,'when'].fillna(value='premarket', inplace=True)
earnings_df

Unnamed: 0,date,exchange,symbol,when,security_name
0,2023-01-03,NASDAQ,LEDS,premarket,SemiLEDS Corporation - Common Stock
1,2023-01-03,NASDAQ,SGH,postmarket,"SMART Global Holdings, Inc. - Ordinary Shares"
2,2023-01-04,NASDAQ,RGP,postmarket,"Resources Connection, Inc. - Common Stock"
3,2023-01-04,NASDAQ,SLP,postmarket,"Simulations Plus, Inc. - Common Stock"
4,2023-01-05,NYSE,CAG,premarket,"ConAgra Brands, Inc. Common Stock"
...,...,...,...,...,...
9471,2023-12-21,NASDAQ,LMNR,postmarket,Limoneira Co - Common Stock
9472,2023-12-21,NYSE,NKE,premarket,"Nike, Inc. Common Stock"
9473,2023-12-21,NASDAQ,PAYX,premarket,"Paychex, Inc. - Common Stock"
9474,2023-12-21,NYSE,WS,postmarket,"Worthington Steel, Inc. Common Shares"


In [9]:
# Convert the 'date' column to datetime
earnings_df['date'] = pd.to_datetime(earnings_df['date'])

In [10]:
# QA - check date data type
earnings_df.dtypes

date             datetime64[ns]
exchange                 object
symbol                   object
when                     object
security_name            object
dtype: object

In [11]:
# remove data for companies not in ticker list
earnings_df = earnings_df[earnings_df['symbol'].isin(tickers_df['Ticker'])].copy()
earnings_df

Unnamed: 0,date,exchange,symbol,when,security_name
0,2023-01-03,NASDAQ,LEDS,premarket,SemiLEDS Corporation - Common Stock
1,2023-01-03,NASDAQ,SGH,postmarket,"SMART Global Holdings, Inc. - Ordinary Shares"
2,2023-01-04,NASDAQ,RGP,postmarket,"Resources Connection, Inc. - Common Stock"
3,2023-01-04,NASDAQ,SLP,postmarket,"Simulations Plus, Inc. - Common Stock"
4,2023-01-05,NYSE,CAG,premarket,"ConAgra Brands, Inc. Common Stock"
...,...,...,...,...,...
9470,2023-12-21,NYSE,KMX,premarket,CarMax Inc
9471,2023-12-21,NASDAQ,LMNR,postmarket,Limoneira Co - Common Stock
9472,2023-12-21,NYSE,NKE,premarket,"Nike, Inc. Common Stock"
9473,2023-12-21,NASDAQ,PAYX,premarket,"Paychex, Inc. - Common Stock"


In [12]:
# QA: check how many tickers remain
ticker_unique_num = tickers_df['Ticker'].nunique()
print('unique ticker symbols in ticker df:', ticker_unique_num)

symbol_unique_num = earnings_df['symbol'].nunique()
print('unique symbols in earnings df:', symbol_unique_num)


# in both ticker and earnings df
combined_df = pd.concat([earnings_df['symbol'], tickers_df['Ticker']])
unique_ticker = set(tickers_df['Ticker'].unique())
unique_earnings = set(earnings_df['symbol'].unique())

intersection = list(unique_ticker.intersection(unique_earnings))
length_intersection = len(intersection)
print('unique symbols in earnings df AND in ticker:', length_intersection)

unique ticker symbols in ticker df: 12834
unique symbols in earnings df: 2502
unique symbols in earnings df AND in ticker: 2502


In [13]:
earnings_df['exchange'].value_counts()

exchange
NYSE      4691
NASDAQ    3698
AMEX        67
OTC          2
Name: count, dtype: int64

In [14]:
# check percent missing values for each column
percent_missing = earnings_df.isnull().sum() * 100 / len(earnings_df)
missing_value_df = pd.DataFrame({'column_name': earnings_df.columns,
                                 'percent_missing': percent_missing})

In [15]:
missing_value_df

Unnamed: 0,column_name,percent_missing
date,date,0.0
exchange,exchange,3.755121
symbol,symbol,0.0
when,when,0.0
security_name,security_name,3.709604


In [16]:
## cleaning - filter out OTC

earnings_df = earnings_df.loc[~earnings_df['exchange'].isin(["OTC"]),:].copy()
# earnings_df

In [17]:
## QA - check that OTC not in Dataframe anymore
earnings_df['exchange'].value_counts(dropna = False)

exchange
NYSE      4691
NASDAQ    3698
NaN        330
AMEX        67
Name: count, dtype: int64

In [18]:
earnings_df

Unnamed: 0,date,exchange,symbol,when,security_name
0,2023-01-03,NASDAQ,LEDS,premarket,SemiLEDS Corporation - Common Stock
1,2023-01-03,NASDAQ,SGH,postmarket,"SMART Global Holdings, Inc. - Ordinary Shares"
2,2023-01-04,NASDAQ,RGP,postmarket,"Resources Connection, Inc. - Common Stock"
3,2023-01-04,NASDAQ,SLP,postmarket,"Simulations Plus, Inc. - Common Stock"
4,2023-01-05,NYSE,CAG,premarket,"ConAgra Brands, Inc. Common Stock"
...,...,...,...,...,...
9470,2023-12-21,NYSE,KMX,premarket,CarMax Inc
9471,2023-12-21,NASDAQ,LMNR,postmarket,Limoneira Co - Common Stock
9472,2023-12-21,NYSE,NKE,premarket,"Nike, Inc. Common Stock"
9473,2023-12-21,NASDAQ,PAYX,premarket,"Paychex, Inc. - Common Stock"


# Before and After Earnings Dates
Find the next trading date based on earnings date release and stock market the company is trading on. 


Note:
AMEX stock exchange is also known as the NYSE American and follows the same open schedule as the NYSE, Will use the NYSE calendar for all AMEX values.

In [19]:
# dictionary of stock exchanges and codes 

stock_ex_codes = {
    'NYSE': 'XNYS',
    'New York Stock Exchange (NYSE)': 'XNYS',
    'NASDAQ': 'NASDAQ',
    'London Stock Exchange (LSE)': 'XLON',
    'Tokyo Stock Exchange (TSE)': 'XTKS',
    'Hong Kong Stock Exchange (HKEX)': 'XHKG',
    'Toronto Stock Exchange (TSX)': 'XTSE',
    'Shanghai Stock Exchange (SSE)': 'XSHG',
    'Shenzhen Stock Exchange (SZSE)': 'XSHE',
    'Bombay Stock Exchange (BSE)': 'XBOM',
    'National Stock Exchange of India (NSE)': 'XNSE',
    'AMEX': 'XNYS' # See note
}


In [21]:
# Create a copy of the 'exchange' column
exchange_column_copy = earnings_df['exchange'].copy()

# Add a new column 'exchange_code' based on the copied 'exchange' column
earnings_df.loc[:,'exchange_code'] = exchange_column_copy.map(stock_ex_codes)

earnings_df

Unnamed: 0,date,exchange,symbol,when,security_name,exchange_code
0,2023-01-03,NASDAQ,LEDS,premarket,SemiLEDS Corporation - Common Stock,NASDAQ
1,2023-01-03,NASDAQ,SGH,postmarket,"SMART Global Holdings, Inc. - Ordinary Shares",NASDAQ
2,2023-01-04,NASDAQ,RGP,postmarket,"Resources Connection, Inc. - Common Stock",NASDAQ
3,2023-01-04,NASDAQ,SLP,postmarket,"Simulations Plus, Inc. - Common Stock",NASDAQ
4,2023-01-05,NYSE,CAG,premarket,"ConAgra Brands, Inc. Common Stock",XNYS
...,...,...,...,...,...,...
9470,2023-12-21,NYSE,KMX,premarket,CarMax Inc,XNYS
9471,2023-12-21,NASDAQ,LMNR,postmarket,Limoneira Co - Common Stock,NASDAQ
9472,2023-12-21,NYSE,NKE,premarket,"Nike, Inc. Common Stock",XNYS
9473,2023-12-21,NASDAQ,PAYX,premarket,"Paychex, Inc. - Common Stock",NASDAQ


In [22]:
earnings_df['exchange'].value_counts(dropna=False)

exchange
NYSE      4691
NASDAQ    3698
NaN        330
AMEX        67
Name: count, dtype: int64

In [24]:
#drop rows that didn't have a exchange that was found:
earnings_df = earnings_df.dropna(inplace = False, ignore_index = True, subset=['exchange', 'security_name', 'exchange_code']).copy()
earnings_df

Unnamed: 0,date,exchange,symbol,when,security_name,exchange_code
0,2023-01-03,NASDAQ,LEDS,premarket,SemiLEDS Corporation - Common Stock,NASDAQ
1,2023-01-03,NASDAQ,SGH,postmarket,"SMART Global Holdings, Inc. - Ordinary Shares",NASDAQ
2,2023-01-04,NASDAQ,RGP,postmarket,"Resources Connection, Inc. - Common Stock",NASDAQ
3,2023-01-04,NASDAQ,SLP,postmarket,"Simulations Plus, Inc. - Common Stock",NASDAQ
4,2023-01-05,NYSE,CAG,premarket,"ConAgra Brands, Inc. Common Stock",XNYS
...,...,...,...,...,...,...
8451,2023-12-21,NYSE,KMX,premarket,CarMax Inc,XNYS
8452,2023-12-21,NASDAQ,LMNR,postmarket,Limoneira Co - Common Stock,NASDAQ
8453,2023-12-21,NYSE,NKE,premarket,"Nike, Inc. Common Stock",XNYS
8454,2023-12-21,NASDAQ,PAYX,premarket,"Paychex, Inc. - Common Stock",NASDAQ


In [25]:
earnings_df['exchange'].value_counts(dropna=False)

exchange
NYSE      4691
NASDAQ    3698
AMEX        67
Name: count, dtype: int64

In [26]:
# store calendars in seperate variables to increase speed - function + apply was 15 mins
nyse_cal = mcal.get_calendar('XNYS')
nasdaq_cal = mcal.get_calendar('NASDAQ')


#### Trading Day After Earnings Release

In [29]:
def get_next_trading_day(row):
    ''' Returns the next trading day that a specific exchange is open based on a input date, considering holidays / weekends.
    If no value for exchange return np.nan. 
    will be applied to a df.
    '''
    next_trading_day2 = np.nan
    try:
        if row['exchange_code'] == 'XNYS':
            next_trading_day2 = nyse_cal.valid_days(start_date=row['date'], end_date = row['date'] +pd.Timedelta(days=10))[1]
        elif row['exchange_code'] == 'NASDAQ':
            next_trading_day2 = nasdaq_cal.valid_days(start_date=row['date'], end_date = row['date'] +pd.Timedelta(days=10))[1]
        # exchange =  mcal.get_calendar(row['exchange'])
        # next_trading_day2 = exchange.valid_days(start_date=row['date'], end_date = row['date'] +pd.Timedelta(days=10))[1]
        return next_trading_day2
    except Exception as e:
        print(f'Error:{e}')
        return np.nan

In [30]:
#show how long this cell took to run.
import timeit

start_time = timeit.default_timer()

#apply function
earnings_df.loc[:,'next_trading_day'] = earnings_df.apply(get_next_trading_day, axis = 1)

#print lenght of time to run function on entire DF
elapsed = timeit.default_timer() - start_time
print(f"Time taken: {elapsed} seconds")

Time taken: 1.4960818749999945 seconds


In [31]:
earnings_df

Unnamed: 0,date,exchange,symbol,when,security_name,exchange_code,next_trading_day
0,2023-01-03,NASDAQ,LEDS,premarket,SemiLEDS Corporation - Common Stock,NASDAQ,2023-01-04 00:00:00+00:00
1,2023-01-03,NASDAQ,SGH,postmarket,"SMART Global Holdings, Inc. - Ordinary Shares",NASDAQ,2023-01-04 00:00:00+00:00
2,2023-01-04,NASDAQ,RGP,postmarket,"Resources Connection, Inc. - Common Stock",NASDAQ,2023-01-05 00:00:00+00:00
3,2023-01-04,NASDAQ,SLP,postmarket,"Simulations Plus, Inc. - Common Stock",NASDAQ,2023-01-05 00:00:00+00:00
4,2023-01-05,NYSE,CAG,premarket,"ConAgra Brands, Inc. Common Stock",XNYS,2023-01-06 00:00:00+00:00
...,...,...,...,...,...,...,...
8451,2023-12-21,NYSE,KMX,premarket,CarMax Inc,XNYS,2023-12-22 00:00:00+00:00
8452,2023-12-21,NASDAQ,LMNR,postmarket,Limoneira Co - Common Stock,NASDAQ,2023-12-22 00:00:00+00:00
8453,2023-12-21,NYSE,NKE,premarket,"Nike, Inc. Common Stock",XNYS,2023-12-22 00:00:00+00:00
8454,2023-12-21,NASDAQ,PAYX,premarket,"Paychex, Inc. - Common Stock",NASDAQ,2023-12-22 00:00:00+00:00


In [32]:
# QA: check to make sure all records in df have a next trading day
earnings_df[earnings_df['next_trading_day'].isnull()]

Unnamed: 0,date,exchange,symbol,when,security_name,exchange_code,next_trading_day


In [33]:
#drop rows that didn't have a trading day that was found (there should be none)
earnings_df = earnings_df.dropna(inplace = False, ignore_index = True).copy()
earnings_df

Unnamed: 0,date,exchange,symbol,when,security_name,exchange_code,next_trading_day
0,2023-01-03,NASDAQ,LEDS,premarket,SemiLEDS Corporation - Common Stock,NASDAQ,2023-01-04 00:00:00+00:00
1,2023-01-03,NASDAQ,SGH,postmarket,"SMART Global Holdings, Inc. - Ordinary Shares",NASDAQ,2023-01-04 00:00:00+00:00
2,2023-01-04,NASDAQ,RGP,postmarket,"Resources Connection, Inc. - Common Stock",NASDAQ,2023-01-05 00:00:00+00:00
3,2023-01-04,NASDAQ,SLP,postmarket,"Simulations Plus, Inc. - Common Stock",NASDAQ,2023-01-05 00:00:00+00:00
4,2023-01-05,NYSE,CAG,premarket,"ConAgra Brands, Inc. Common Stock",XNYS,2023-01-06 00:00:00+00:00
...,...,...,...,...,...,...,...
8451,2023-12-21,NYSE,KMX,premarket,CarMax Inc,XNYS,2023-12-22 00:00:00+00:00
8452,2023-12-21,NASDAQ,LMNR,postmarket,Limoneira Co - Common Stock,NASDAQ,2023-12-22 00:00:00+00:00
8453,2023-12-21,NYSE,NKE,premarket,"Nike, Inc. Common Stock",XNYS,2023-12-22 00:00:00+00:00
8454,2023-12-21,NASDAQ,PAYX,premarket,"Paychex, Inc. - Common Stock",NASDAQ,2023-12-22 00:00:00+00:00


In [34]:
# store as datetime types
earnings_df['next_trading_day'] = pd.to_datetime(earnings_df['next_trading_day'])
earnings_df['date'] = pd.to_datetime(earnings_df['date'])
earnings_df.dtypes

date                     datetime64[ns]
exchange                         object
symbol                           object
when                             object
security_name                    object
exchange_code                    object
next_trading_day    datetime64[ns, UTC]
dtype: object

### Trading Day Before Earnings release

In [38]:
def get_prior_trading_day(row):
    ''' Returns the trading day PRIOR that a specific exchange is open based on a input date, considering holidays / weekends.
    If no value for exchange return np.nan. 
    will be applied to a df.
    start date is 10 days before known trading date, then take second last item from list (last item is the end date) 
    10 days to make sure there's enough range for winter holidays
    '''
    try:
        if row['exchange_code'] == 'XNYS':
            prior_trading_day = nyse_cal.valid_days(start_date = row['date'] -pd.Timedelta(days=10), end_date=row['date'])[-2]
        elif row['exchange_code'] == 'NASDAQ':
            prior_trading_day = nasdaq_cal.valid_days(start_date = row['date'] -pd.Timedelta(days=10), end_date=row['date'])[-2]
        return prior_trading_day
        # exchange =  mcal.get_calendar(row['exchange_codes'])
        # prior_trading_day = exchange.valid_days(start_date = row['date'] -pd.Timedelta(days=10), end_date=row['date'])[-2]
        # return prior_trading_day
    except Exception as e:
        # print(f'Error:{e}')
        return np.nan

In [39]:
#show how long this cell took to run.
import timeit

start_time = timeit.default_timer()

earnings_df['prior_trading_day'] = earnings_df.apply(get_prior_trading_day, axis = 1)

elapsed = (timeit.default_timer() - start_time) / 60
print(f"Time taken: {elapsed} minutes")

Time taken: 0.022231659716666505 minutes


In [40]:
earnings_df

Unnamed: 0,date,exchange,symbol,when,security_name,exchange_code,next_trading_day,prior_trading_day
0,2023-01-03,NASDAQ,LEDS,premarket,SemiLEDS Corporation - Common Stock,NASDAQ,2023-01-04 00:00:00+00:00,2022-12-30 00:00:00+00:00
1,2023-01-03,NASDAQ,SGH,postmarket,"SMART Global Holdings, Inc. - Ordinary Shares",NASDAQ,2023-01-04 00:00:00+00:00,2022-12-30 00:00:00+00:00
2,2023-01-04,NASDAQ,RGP,postmarket,"Resources Connection, Inc. - Common Stock",NASDAQ,2023-01-05 00:00:00+00:00,2023-01-03 00:00:00+00:00
3,2023-01-04,NASDAQ,SLP,postmarket,"Simulations Plus, Inc. - Common Stock",NASDAQ,2023-01-05 00:00:00+00:00,2023-01-03 00:00:00+00:00
4,2023-01-05,NYSE,CAG,premarket,"ConAgra Brands, Inc. Common Stock",XNYS,2023-01-06 00:00:00+00:00,2023-01-04 00:00:00+00:00
...,...,...,...,...,...,...,...,...
8451,2023-12-21,NYSE,KMX,premarket,CarMax Inc,XNYS,2023-12-22 00:00:00+00:00,2023-12-20 00:00:00+00:00
8452,2023-12-21,NASDAQ,LMNR,postmarket,Limoneira Co - Common Stock,NASDAQ,2023-12-22 00:00:00+00:00,2023-12-20 00:00:00+00:00
8453,2023-12-21,NYSE,NKE,premarket,"Nike, Inc. Common Stock",XNYS,2023-12-22 00:00:00+00:00,2023-12-20 00:00:00+00:00
8454,2023-12-21,NASDAQ,PAYX,premarket,"Paychex, Inc. - Common Stock",NASDAQ,2023-12-22 00:00:00+00:00,2023-12-20 00:00:00+00:00


### Next trading day or day of 

In [41]:
# QA Confirm no NaN in When column
earnings_df['when'].value_counts(dropna=False)

when
premarket     4754
postmarket    3702
Name: count, dtype: int64

In [42]:
## ASSUMPTION - IF NO specification on when, assuming pre market and using same day as release as the eps day (per direction from manager)

def eps_day_after(row):
    eps_day_value = np.nan
    if row['when'] == 'premarket':
        eps_day_value = row['date']
    elif row['when'] == 'postmarket':
         eps_day_value = row['next_trading_day']
    elif pd.isna(row['when']):
        eps_day_value = row['date']
    return eps_day_value

In [43]:
## Based on when the EPS is announced, determine the day we are interested in for movement
earnings_df['after_eps_date'] = earnings_df.apply(eps_day_after, axis = 1)

In [44]:
earnings_df

Unnamed: 0,date,exchange,symbol,when,security_name,exchange_code,next_trading_day,prior_trading_day,after_eps_date
0,2023-01-03,NASDAQ,LEDS,premarket,SemiLEDS Corporation - Common Stock,NASDAQ,2023-01-04 00:00:00+00:00,2022-12-30 00:00:00+00:00,2023-01-03 00:00:00
1,2023-01-03,NASDAQ,SGH,postmarket,"SMART Global Holdings, Inc. - Ordinary Shares",NASDAQ,2023-01-04 00:00:00+00:00,2022-12-30 00:00:00+00:00,2023-01-04 00:00:00+00:00
2,2023-01-04,NASDAQ,RGP,postmarket,"Resources Connection, Inc. - Common Stock",NASDAQ,2023-01-05 00:00:00+00:00,2023-01-03 00:00:00+00:00,2023-01-05 00:00:00+00:00
3,2023-01-04,NASDAQ,SLP,postmarket,"Simulations Plus, Inc. - Common Stock",NASDAQ,2023-01-05 00:00:00+00:00,2023-01-03 00:00:00+00:00,2023-01-05 00:00:00+00:00
4,2023-01-05,NYSE,CAG,premarket,"ConAgra Brands, Inc. Common Stock",XNYS,2023-01-06 00:00:00+00:00,2023-01-04 00:00:00+00:00,2023-01-05 00:00:00
...,...,...,...,...,...,...,...,...,...
8451,2023-12-21,NYSE,KMX,premarket,CarMax Inc,XNYS,2023-12-22 00:00:00+00:00,2023-12-20 00:00:00+00:00,2023-12-21 00:00:00
8452,2023-12-21,NASDAQ,LMNR,postmarket,Limoneira Co - Common Stock,NASDAQ,2023-12-22 00:00:00+00:00,2023-12-20 00:00:00+00:00,2023-12-22 00:00:00+00:00
8453,2023-12-21,NYSE,NKE,premarket,"Nike, Inc. Common Stock",XNYS,2023-12-22 00:00:00+00:00,2023-12-20 00:00:00+00:00,2023-12-21 00:00:00
8454,2023-12-21,NASDAQ,PAYX,premarket,"Paychex, Inc. - Common Stock",NASDAQ,2023-12-22 00:00:00+00:00,2023-12-20 00:00:00+00:00,2023-12-21 00:00:00


In [45]:
# Define a function to conditionally replace values
def eps_day_before(row):
    if row['when'] == 'postmarket':
        return row['date']
    elif row['when'] == 'premarket':
        return row['prior_trading_day']
    else:
        return row['next_trading_day']

# Apply the function to create a new column 'D' with the replaced values
earnings_df['before_eps_date'] = earnings_df.apply(eps_day_before, axis=1)


In [46]:
earnings_df['after_eps_date'] = pd.to_datetime(earnings_df['after_eps_date'], utc=True).dt.strftime('%Y-%m-%d')
earnings_df['before_eps_date'] = pd.to_datetime(earnings_df['before_eps_date'], utc=True).dt.strftime('%Y-%m-%d')

In [47]:
earnings_df.head(15)

Unnamed: 0,date,exchange,symbol,when,security_name,exchange_code,next_trading_day,prior_trading_day,after_eps_date,before_eps_date
0,2023-01-03,NASDAQ,LEDS,premarket,SemiLEDS Corporation - Common Stock,NASDAQ,2023-01-04 00:00:00+00:00,2022-12-30 00:00:00+00:00,2023-01-03,2022-12-30
1,2023-01-03,NASDAQ,SGH,postmarket,"SMART Global Holdings, Inc. - Ordinary Shares",NASDAQ,2023-01-04 00:00:00+00:00,2022-12-30 00:00:00+00:00,2023-01-04,2023-01-03
2,2023-01-04,NASDAQ,RGP,postmarket,"Resources Connection, Inc. - Common Stock",NASDAQ,2023-01-05 00:00:00+00:00,2023-01-03 00:00:00+00:00,2023-01-05,2023-01-04
3,2023-01-04,NASDAQ,SLP,postmarket,"Simulations Plus, Inc. - Common Stock",NASDAQ,2023-01-05 00:00:00+00:00,2023-01-03 00:00:00+00:00,2023-01-05,2023-01-04
4,2023-01-05,NYSE,CAG,premarket,"ConAgra Brands, Inc. Common Stock",XNYS,2023-01-06 00:00:00+00:00,2023-01-04 00:00:00+00:00,2023-01-05,2023-01-04
5,2023-01-05,NASDAQ,HELE,premarket,Helen of Troy Limited - Common Stock,NASDAQ,2023-01-06 00:00:00+00:00,2023-01-04 00:00:00+00:00,2023-01-05,2023-01-04
6,2023-01-05,NYSE,LNN,premarket,Lindsay Corporation Common Stock,XNYS,2023-01-06 00:00:00+00:00,2023-01-04 00:00:00+00:00,2023-01-05,2023-01-04
7,2023-01-05,NYSE,LW,premarket,"Lamb Weston Holdings, Inc. Common Stock",XNYS,2023-01-06 00:00:00+00:00,2023-01-04 00:00:00+00:00,2023-01-05,2023-01-04
8,2023-01-05,NYSE,MSM,premarket,"MSC Industrial Direct Company, Inc. Common Stock",XNYS,2023-01-06 00:00:00+00:00,2023-01-04 00:00:00+00:00,2023-01-05,2023-01-04
9,2023-01-05,NYSE,RPM,premarket,RPM International Inc. Common Stock,XNYS,2023-01-06 00:00:00+00:00,2023-01-04 00:00:00+00:00,2023-01-05,2023-01-04


In [48]:
# tidy up final data set, drop unecessary columns, reorder
earnings_df  = earnings_df.drop(columns = ['next_trading_day', 'prior_trading_day', 'exchange'], inplace=False).copy()

earnings_df = earnings_df.reindex(columns = ['date', 'symbol', 'when', 'security_name', 'exchange_codes', 'before_eps_date', 'after_eps_date'])
earnings_df

Unnamed: 0,date,symbol,when,security_name,exchange_codes,before_eps_date,after_eps_date
0,2023-01-03,LEDS,premarket,SemiLEDS Corporation - Common Stock,,2022-12-30,2023-01-03
1,2023-01-03,SGH,postmarket,"SMART Global Holdings, Inc. - Ordinary Shares",,2023-01-03,2023-01-04
2,2023-01-04,RGP,postmarket,"Resources Connection, Inc. - Common Stock",,2023-01-04,2023-01-05
3,2023-01-04,SLP,postmarket,"Simulations Plus, Inc. - Common Stock",,2023-01-04,2023-01-05
4,2023-01-05,CAG,premarket,"ConAgra Brands, Inc. Common Stock",,2023-01-04,2023-01-05
...,...,...,...,...,...,...,...
8451,2023-12-21,KMX,premarket,CarMax Inc,,2023-12-20,2023-12-21
8452,2023-12-21,LMNR,postmarket,Limoneira Co - Common Stock,,2023-12-21,2023-12-22
8453,2023-12-21,NKE,premarket,"Nike, Inc. Common Stock",,2023-12-20,2023-12-21
8454,2023-12-21,PAYX,premarket,"Paychex, Inc. - Common Stock",,2023-12-20,2023-12-21


# Export

In [49]:
#export to CSV as EPS DAYS
earnings_df.to_csv('/users/brigitteasullivan/Documents/0. Data Science/Notebooks/Riipen Project/eps-stock-market-ml-project/data/eps_days.csv', index = False)