# Prashant Dutt

## Research question/interests
How does seasonality effect the market? <br>
Which sectors/ stocks/ ETFS are most sensitive and least sensitve to seasonality?

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
plt.show()

Let's load a data set. I've chosen RETL - Direxion Daily Retail Bull 3x Shares:

*"RETL, as a levered product, is a short-term tactical instrument and not a buy-and-hold ETF. Like many levered funds, it promises 3x exposure only for one day. Over longer periods, returns can vary significantly from 3x exposure to its underlying index. Investors should note that RETL's underlying index may have a different take on the retail space than other indexes. The fund tends to overweight apparel, auto-parts & service and specialty retailers, while underweighting department stores. RETL’s exposure also extends to some unexpected sectors like oil & gas refining and marketing. As a short-term product, trading costs are relatively more important for the fund, than with buy-and-hold ETFs. Note: On December 1, 2016, RETL changed its underlying index from Russell 1000 Retail Index to S&P Retail Select Industry Index."*

In [5]:
data = pd.read_csv("../data/raw/archive/ETFs/retl.us.txt", sep=",", header=0, names=["Date", "Open", "High", "Low", "Close", "Volume", "OpenInt"])
data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,OpenInt
0,2010-07-14,3.0042,3.0842,3.0042,3.0283,923050,0
1,2010-07-15,3.0314,3.0788,3.0251,3.0717,1225763,0
2,2010-07-16,2.9498,2.952,2.8828,2.8937,773018,0
3,2010-07-19,2.8696,2.8696,2.8588,2.8634,147921,0
4,2010-07-20,2.8409,2.9708,2.8409,2.9708,54019,0


First, let's remove OpenInt from the dataset.

In [7]:
data.drop(columns=['OpenInt'], inplace=True)
data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
1,2010-07-15,3.0314,3.0788,3.0251,3.0717,1225763
2,2010-07-16,2.9498,2.952,2.8828,2.8937,773018
3,2010-07-19,2.8696,2.8696,2.8588,2.8634,147921
4,2010-07-20,2.8409,2.9708,2.8409,2.9708,54019
5,2010-07-21,3.0175,3.0205,2.928,2.928,48877


Next, we examine the datatypes of the vaious components.

In [8]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1781 entries, 1 to 1781
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    1781 non-null   object 
 1   Open    1781 non-null   float64
 2   High    1781 non-null   float64
 3   Low     1781 non-null   float64
 4   Close   1781 non-null   float64
 5   Volume  1781 non-null   int64  
dtypes: float64(4), int64(1), object(1)
memory usage: 97.4+ KB


In [None]:
It looks like the Data column is being treated as a string rather than as a date. Let's fix this.

In [10]:
data['Date'] = data['Date'].apply(pd.to_datetime)
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1781 entries, 1 to 1781
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   Date    1781 non-null   datetime64[ns]
 1   Open    1781 non-null   float64       
 2   High    1781 non-null   float64       
 3   Low     1781 non-null   float64       
 4   Close   1781 non-null   float64       
 5   Volume  1781 non-null   int64         
dtypes: datetime64[ns](1), float64(4), int64(1)
memory usage: 97.4 KB


Next we want to make the Date column the index column.

In [11]:
data.set_index('Date', inplace = True)
data.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-07-15,3.0314,3.0788,3.0251,3.0717,1225763
2010-07-16,2.9498,2.952,2.8828,2.8937,773018
2010-07-19,2.8696,2.8696,2.8588,2.8634,147921
2010-07-20,2.8409,2.9708,2.8409,2.9708,54019
2010-07-21,3.0175,3.0205,2.928,2.928,48877
