<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Note" data-toc-modified-id="Note-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Note</a></span></li><li><span><a href="#Load-data-and-packages" data-toc-modified-id="Load-data-and-packages-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Load data and packages</a></span><ul class="toc-item"><li><span><a href="#Choose-tech-stocks-as-the-focus" data-toc-modified-id="Choose-tech-stocks-as-the-focus-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Choose tech stocks as the focus</a></span></li></ul></li><li><span><a href="#Pivot-dataframe,-so-that-each-stock-is-a-column" data-toc-modified-id="Pivot-dataframe,-so-that-each-stock-is-a-column-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Pivot dataframe, so that each stock is a column</a></span></li><li><span><a href="#Prepare-data-for-analysis" data-toc-modified-id="Prepare-data-for-analysis-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Prepare data for analysis</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Use-lag-to-quickly-change-the-index-to-leave-the-first-few-rows-blank" data-toc-modified-id="Use-lag-to-quickly-change-the-index-to-leave-the-first-few-rows-blank-4.0.1"><span class="toc-item-num">4.0.1&nbsp;&nbsp;</span>Use lag to quickly change the index to leave the first few rows blank</a></span></li><li><span><a href="#Calculate-returns-over-7-days-prior-as-{-prices_today/prices_7_days_ago--1.0-}" data-toc-modified-id="Calculate-returns-over-7-days-prior-as-{-prices_today/prices_7_days_ago--1.0-}-4.0.2"><span class="toc-item-num">4.0.2&nbsp;&nbsp;</span>Calculate returns over 7 days prior as { prices_today/prices_7_days_ago -1.0 }</a></span></li><li><span><a href="#Melt-(unpivot)-data-to-create-an-analytical-base-table-(ABT)" data-toc-modified-id="Melt-(unpivot)-data-to-create-an-analytical-base-table-(ABT)-4.0.3"><span class="toc-item-num">4.0.3&nbsp;&nbsp;</span>Melt (unpivot) data to create an analytical base table (ABT)</a></span></li><li><span><a href="#Repeat-the-shifting,-calculation,-and-melting-for-the-14,-21,-28-day-offset" data-toc-modified-id="Repeat-the-shifting,-calculation,-and-melting-for-the-14,-21,-28-day-offset-4.0.4"><span class="toc-item-num">4.0.4&nbsp;&nbsp;</span>Repeat the shifting, calculation, and melting for the 14, 21, 28-day offset</a></span></li><li><span><a href="#Create-a-seperate-set-for-the-7-day-forward-offset" data-toc-modified-id="Create-a-seperate-set-for-the-7-day-forward-offset-4.0.5"><span class="toc-item-num">4.0.5&nbsp;&nbsp;</span>Create a seperate set for the 7-day-forward offset</a></span></li><li><span><a href="#Use-reduce()-to-merge-all-dataframes-(representing-intervals)-in-the-dict-into-one-single-ABT" data-toc-modified-id="Use-reduce()-to-merge-all-dataframes-(representing-intervals)-in-the-dict-into-one-single-ABT-4.0.6"><span class="toc-item-num">4.0.6&nbsp;&nbsp;</span>Use <code>reduce()</code> to merge all dataframes (representing intervals) in the dict into one single ABT</a></span></li></ul></li></ul></li></ul></div>

# Note
Done in collaboration with Vinnie Ronzano on my team.

# Load data and packages

In [112]:
library("tidyverse");
library("reshape");
library("data.table");

In [113]:
df <- read.csv(file="all_stocks_2017-2018.csv", header=TRUE, sep=",");
tail(df,5)

Unnamed: 0,Date,Open,High,Low,Close,Volume,Name
7777,12/22/2017,71.42,71.87,71.22,71.58,10979165,AABA
7778,12/26/2017,70.94,71.39,69.63,69.86,8542802,AABA
7779,12/27/2017,69.77,70.49,69.69,70.06,6345124,AABA
7780,12/28/2017,70.12,70.32,69.51,69.82,7556877,AABA
7781,12/29/2017,69.79,70.13,69.43,69.85,6613070,AABA


## Choose tech stocks as the focus

In [114]:
df_stock <- df %>%
  filter(str_detect(Name, "AAPL|MSFT|AMZN|GOOGL"))

# Pivot dataframe, so that each stock is a column

In [115]:
pivoted_df <- cast(df_stock, Date ~ Name, value = "Close")
pivoted_df <- pivoted_df[order(as.Date(pivoted_df$Date, format="%m/%d/%Y")), ]
tail(pivoted_df)

Unnamed: 0,Date,AAPL,AMZN,GOOGL,MSFT
73,12/21/2017,175.01,1174.76,1070.85,85.5
74,12/22/2017,175.01,1168.36,1068.86,85.51
75,12/26/2017,170.57,1176.76,1065.85,85.4
76,12/27/2017,170.6,1182.26,1060.2,85.71
77,12/28/2017,171.08,1186.1,1055.95,85.72
78,12/29/2017,169.23,1169.47,1053.4,85.54


Turn date into index (rownames in R) to simplify transformations later on

In [116]:
rownames(pivoted_df) <- pivoted_df$Date
pivoted_df$Date <- NULL

# Prepare data for analysis

###  Use lag to quickly change the index to leave the first few rows blank
Comparision: `lead()` vs `lag()` in R, while just `shift(+/- x)` in Python.

In [117]:
shifted <- lapply(pivoted_df,lag,n=7)
head(data.frame(shifted),10)

AAPL,AMZN,GOOGL,MSFT
,,,
,,,
,,,
,,,
,,,
,,,
,,,
116.15,753.67,808.01,62.58
116.02,757.18,807.77,62.3
116.61,780.45,813.02,62.3


### Calculate returns over 7 days prior as { prices_today/prices_7_days_ago -1.0 }

In [118]:
# calculate returns over 7 days prior
delta_7 <-  pivoted_df/shifted - 10

# show some rows as examples
head(delta_7,10)

Unnamed: 0,AAPL,AMZN,GOOGL,MSFT
1/3/2017,,,,
1/4/2017,,,,
1/5/2017,,,,
1/6/2017,,,,
1/9/2017,,,,
1/10/2017,,,,
1/11/2017,,,,
1/12/2017,-8.97331,-8.920429,-8.973367,-8.999521
1/13/2017,-8.97397,-8.920811,-8.971316,-8.993579
1/17/2017,-8.970929,-8.962496,-8.982239,-8.996308


### Melt (unpivot) data to create an analytical base table (ABT)
This should improve performance when running time-series analysis.

In [119]:
delta_7$Date <- rownames(delta_7)

In [120]:
# melt with melt()
melt_7 <- melt(delta_7, id="Date")
colnames(melt_7)[2] <- "Name"
colnames(melt_7)[3] <- "delta_7"

In [121]:
#melted dataframe example (with 7 days)
tail(melt_7)

Unnamed: 0,Date,Name,delta_7
999,12/21/2017,MSFT,-9.000935
1000,12/22/2017,MSFT,-8.998125
1001,12/26/2017,MSFT,-8.991616
1002,12/27/2017,MSFT,-9.013126
1003,12/28/2017,MSFT,-9.007641
1004,12/29/2017,MSFT,-9.003379


As one can see, the columns name is consolidated in a single column. 

### Repeat the shifting, calculation, and melting for the 14, 21, 28-day offset
This means comparing today prices with those 14, 21, or 28 days ago.
We want to shift the data to create a dataset that contains records moving backwards by day 7 days (1 week), 14 days (2 weeks), 21 days, and 28 days.

In [122]:
delta_dict <- list(7,14,21,28)
names(delta_dict) <- list("delta_7","delta_14","delta_21","delta_28")

In [123]:
melted_dfs <- list()
for (i in delta_dict)
{
  transit <- lapply(pivoted_df,lag,n=i)
  rate_change <-  pivoted_df/transit - 1
  rate_change$Date <- rownames(rate_change)
  melted_dfs[[paste("delta_",i)]] <- melt(rate_change, id="Date")
  colnames(melted_dfs[[paste("delta_",i)]])[2] <- "Name"
  colnames(melted_dfs[[paste("delta_",i)]])[3] <- paste("delta_",i)
}

### Create a seperate set for the 7-day-forward offset

In [124]:
# We need a seperate section to shift 7 days forward (to act as a future predictor)
transit <- lapply(pivoted_df,lead,n=7)
return_7 <-  pivoted_df/transit - 1
return_7$Date <- rownames(return_7)
melted_dfs[["return_7"]] <- melt(return_7, id="Date")
colnames(melted_dfs[["return_7"]])[2] <- "Name"
colnames(melted_dfs[["return_7"]])[3] <- "return_7"

### Use `reduce()` to merge all dataframes (representing intervals) in the dict into one single ABT
`Reduce`() is a much faster way to merge dataframes with writing long-winded function for iterating through the list and dictionary.

In [125]:
# Reduce() work with merge() to combine features into an analytical base table

abt <- Reduce(function(x,y) merge(x,y,by=c("Date","Name"),all=TRUE),melted_dfs)
                  
# display examples from the ABT 
tail(abt, 10)

Unnamed: 0,Date,Name,delta_ 7,delta_ 14,delta_ 21,delta_ 28,return_7
995,9/6/2017,GOOGL,0.012380441,-0.002382793,-0.00394396,-0.011013008,0.0071956292
996,9/6/2017,MSFT,0.007964845,-0.003394433,0.013812155,0.003280481,-0.0253618377
997,9/7/2017,AAPL,-0.001300551,0.021538072,0.007371314,0.078662207,0.0163231865
998,9/7/2017,AMZN,0.03535866,0.019675817,-0.010476441,-0.03977295,0.0054198873
999,9/7/2017,GOOGL,0.023444992,0.023963521,0.006036921,-0.008806987,0.021661737
1000,9/7/2017,MSFT,0.020733214,0.02679558,0.021294134,0.017798467,-0.0109100585
1001,9/8/2017,AAPL,-0.026272175,0.007174603,-0.015087545,0.066563572,-0.0006300006
1002,9/8/2017,AMZN,0.012410121,0.007751938,-0.016405128,-0.022150681,-0.0040830635
1003,9/8/2017,GOOGL,0.006048624,0.016443888,0.001414773,-0.004325754,0.0048566488
1004,9/8/2017,MSFT,0.012731006,0.020554559,0.020836208,0.017606602,-0.0193531283


In [126]:
# create feature_dfs list that contains base features from original dataset plus melted dataset
# grab features from original dataset
base_df <- select(df,'Date', 'Name', 'Volume', 'Close')

In [127]:
# create a list with all the features in the  dataframes
melted_dfs[['base']] <- base_df

In [128]:
# reduce-merge features into analytical base table
abt <- Reduce(function(x,y) merge(x,y,by=c("Date","Name"),all=TRUE),melted_dfs)
abt <- abt[order(as.Date(abt$Date, format="%m/%d/%Y")), ]
abt <- abt[order(abt$Name), ] 

In [130]:
# display examples from the ABT 
head(abt,10)

Unnamed: 0,Date,Name,delta_ 7,delta_ 14,delta_ 21,delta_ 28,return_7,Volume,Close
404,1/3/2017,AAPL,,,,,-0.025995807,28781865,116.15
497,1/4/2017,AAPL,,,,,-0.025369624,21118116,116.02
528,1/5/2017,AAPL,,,,,-0.02825,22193587,116.61
559,1/6/2017,AAPL,,,,,-0.017334778,31751900,117.91
590,1/9/2017,AAPL,,,,,-0.006595425,33561948,118.99
1,1/10/2017,AAPL,,,,,-0.007416667,24462051,119.11
32,1/11/2017,AAPL,,,,,-0.002748168,27588593,119.75
63,1/12/2017,AAPL,0.02668963,,,,-0.0060015,27086220,119.25
94,1/13/2017,AAPL,0.02602999,,,,-0.023301608,26111948,119.04
125,1/17/2017,AAPL,0.02907126,,,,-0.015909464,34439843,120.0


**Now, we have a ABT ready for analysis using analytics tools such as Tableau**