### Predicting stock price movements: Part 1 of 2

This program creates a data set to be used in machine learning models
to predict the future direction of stock prices. The general approach used in prediction is based on a paper by Jim Kyung-Soo Liew and Boris Mayster: "Forecasting ETFs with Machine Learning Algorithms" (2017). 

This program gets the raw data from yahoo, transforms it into a dataset that can directly be fed into scikit-learn for prediction, and writes the transformed dataset in a CSV file.


The inputs to the program are (1) set of ticker symbols, and (2) timeframe, i.e, the beginning and end dates. We also need N (number of days to be used to calculate returns and average volumes) and j (number of lags of N-day returns and average volumes). In this program, N = 20 and j = 10. The same time frame will be used for all tickers and all tickers need to have data for the specified timeframe to avoid issues related to missing data.

Initial version: February 21, 2017 (APPL data only)
Next revision: June 22, 2017  (Generalizes to a set of tickers)
This version: July 7, 2017 - creates an output in "wide" format so that for each day, there are a number of columns per Ticker. The ticker is part of the column name so that it's easy to identify tickers.

Note: yahoo data portal may not be available in the future. I had to do the "workaround" below - fix_yahoo_finance to get it to work in July 2017

Second note: yahoo finance API no longer sems to be available (as of early 2018). Quandl.com has similar publicly-available data for stocks (but not for ETFs).

Murat Aydogdu

In [1]:
from IPython.display import display
import pandas as pd
import numpy as np
import datetime

In [2]:
pd.set_option('display.max_columns', None)
#pd.options.display.max_rows = 1000
pd.options.display.float_format = '{:20,.2f}'.format

In [65]:
from pandas_datareader import data as pdr
import fix_yahoo_finance  # This is the "temporary" fix to get the data from yahoo
tickers = ["AAPL", "GOOGL","AMZN","MSFT","IBM"]

dt = pdr.get_data_yahoo(tickers, start="1999-12-01", end="2017-06-30")

[*********************100%***********************]  5 of 5 downloaded

In [66]:
# This is panel data, not a data frame
dt

<class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 4424 (major_axis) x 5 (minor_axis)
Items axis: Open to Volume
Major_axis axis: 1999-12-01 00:00:00 to 2017-06-30 00:00:00
Minor_axis axis: AAPL to MSFT

In [67]:
# Convert panel to data frame
dtdf = dt.to_frame()
display(dtdf)

Unnamed: 0_level_0,Unnamed: 1_level_0,Open,High,Low,Close,Adj Close,Volume
Date,minor,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1999-12-01,AAPL,3.61,3.73,3.57,3.32,3.68,154641200.00
1999-12-01,AMZN,87.25,87.88,81.97,85.00,85.00,10663600.00
1999-12-01,IBM,102.56,104.44,102.25,77.49,103.42,5336400.00
1999-12-01,MSFT,45.53,46.97,45.44,31.27,46.59,48864200.00
1999-12-02,AAPL,3.68,3.95,3.63,3.55,3.94,141839600.00
1999-12-02,AMZN,86.00,91.31,85.62,89.06,89.06,9538700.00
1999-12-02,IBM,103.44,106.31,103.38,78.87,105.27,6216900.00
1999-12-02,MSFT,46.53,47.62,46.44,31.81,47.41,55473800.00
1999-12-03,AAPL,4.01,4.13,4.00,3.71,4.11,161980000.00
1999-12-03,AMZN,92.50,93.38,86.06,86.56,86.56,11151200.00


In [22]:
# Save raw data to a CSV first
#dtdf.to_csv('RAWDATA.CSV')
# March 7, 2018: yahoo finance is no longer working.
# Similar data is available on quandl for common stocks.
# Use the existing raw data file from a previous run of the 
# data acquisition step above.
dtdf = pd.read_csv('RAWDATA.CSV')

In [23]:
# Transform and keep only adjusted prices and dollar volume
# Use dollar volume so that we can compare across securities
dtdf['P'] = dtdf['Adj Close']
dtdf['V'] = dtdf['Volume'] * dtdf['Close'] 
display(dtdf)

Unnamed: 0,Date,Ticker,Open,High,Low,Close,Adj Close,Volume,P,V
0,1999-12-01,AAPL,3.61,3.73,3.57,3.32,3.68,154641200.00,3.68,514098793.03
1,1999-12-01,AMZN,87.25,87.88,81.97,85.00,85.00,10663600.00,85.00,906406000.00
2,1999-12-01,IBM,102.56,104.44,102.25,77.49,103.42,5336400.00,103.42,413506877.82
3,1999-12-01,MSFT,45.53,46.97,45.44,31.27,46.59,48864200.00,46.59,1527855705.25
4,1999-12-02,AAPL,3.68,3.95,3.63,3.55,3.94,141839600.00,3.94,504139213.72
5,1999-12-02,AMZN,86.00,91.31,85.62,89.06,89.06,9538700.00,89.06,849540468.75
6,1999-12-02,IBM,103.44,106.31,103.38,78.87,105.27,6216900.00,105.27,490323409.10
7,1999-12-02,MSFT,46.53,47.62,46.44,31.81,47.41,55473800.00,47.41,1764767363.15
8,1999-12-03,AAPL,4.01,4.13,4.00,3.71,4.11,161980000.00,4.11,600869021.48
9,1999-12-03,AMZN,92.50,93.38,86.06,86.56,86.56,11151200.00,86.56,965275750.00


In [None]:
# March 7, 2018: When starting with the flatfile (RAWDATA.CSV), this step is not needed
# Rename indices Date: level 0 and Ticker: level 1
#dtdf.index = dtdf.index.set_names('Date', level=0) 
#dtdf.index = dtdf.index.set_names('Ticker', level=1)

# Drop columns other than adjusted close and dollar volume
#dtdf.drop(dtdf.columns[[0,1,2,3,4,5]], inplace = True, axis=1)
# Sort by ticker
#dtdf.sort_index(level=1, ascending = True, inplace=True)
#display(dtdf)

In [24]:
# Starting with flatfile:
# Drop columns other than Date, Ticker, Adjusted closing price and Dollar volume
dtdf.drop(dtdf.columns[[2,3,4,5,6,7]], inplace = True, axis=1)

In [25]:
dtdf.sort_values(['Ticker', 'Date'], ascending=[True, True], inplace=True)
display(dtdf)

Unnamed: 0,Date,Ticker,P,V
0,1999-12-01,AAPL,3.68,514098793.03
4,1999-12-02,AAPL,3.94,504139213.72
8,1999-12-03,AAPL,4.11,600869021.48
12,1999-12-06,AAPL,4.14,436649612.25
16,1999-12-07,AAPL,4.21,422797351.29
20,1999-12-08,AAPL,3.93,365987679.69
24,1999-12-09,AAPL,3.76,725854559.39
28,1999-12-10,AAPL,3.68,529731959.78
32,1999-12-13,AAPL,3.54,423097228.19
36,1999-12-14,AAPL,3.39,333479975.37


Construct variables using lags and leads.

Today is day *t*. For each day *t*, we need P~t~, P~t-N~, P~t+N~, and AV~t~
 
 AV~t~ is the average dollar volume of last N days, ending (i.e., including) today. This is parallel to how returns are calculated.

In [26]:
# m : minus, p: plus
# Negative values can be used for leads ("forward lags")
# Return will be measured based on Pt and Pt-N
# Average dollar volume will be measured based on Vt through Vt-N

N = 20
lagN = N*1
leadN = N*-1

# Price: we just need the N-lead and N-lag values
cname = 'P'+'m'+str(N).zfill(2)
dtdf[cname] = dtdf.groupby('Ticker')['P'].shift(lagN)
cname = 'P'+'p'+str(N).zfill(2)
dtdf[cname] = dtdf.groupby('Ticker')['P'].shift(leadN)
cname = 'AV'
# Average dollar volume of last N days, ending (including) today
dtdf[cname] = dtdf.groupby('Ticker')['V'].rolling(lagN).mean().reset_index(0,drop=True)    
display(dtdf)

Unnamed: 0,Date,Ticker,P,V,Pm20,Pp20,AV
0,1999-12-01,AAPL,3.68,514098793.03,,3.58,
4,1999-12-02,AAPL,3.94,504139213.72,,3.67,
8,1999-12-03,AAPL,4.11,600869021.48,,4.00,
12,1999-12-06,AAPL,4.14,436649612.25,,3.66,
16,1999-12-07,AAPL,4.21,422797351.29,,3.71,
20,1999-12-08,AAPL,3.93,365987679.69,,3.39,
24,1999-12-09,AAPL,3.76,725854559.39,,3.55,
28,1999-12-10,AAPL,3.68,529731959.78,,3.49,
32,1999-12-13,AAPL,3.54,423097228.19,,3.31,
36,1999-12-14,AAPL,3.39,333479975.37,,3.11,


In [27]:
# Compute the N-day return 
#  and the sign of the N-period ahead price movement

dtdf['R'] = (dtdf['P'] / dtdf['Pm20']) - 1
dtdf['Y'] = (dtdf['Pp20'] >= dtdf['P']).astype(int)

dtdf = dtdf.dropna()
display(dtdf)

Unnamed: 0,Date,Ticker,P,V,Pm20,Pp20,AV,R,Y
80,1999-12-30,AAPL,3.58,167566808.43,3.68,3.63,363007272.51,-0.03,1
84,1999-12-31,AAPL,3.67,135815743.06,3.94,3.71,344591098.97,-0.07,1
88,2000-01-03,AAPL,4.00,483655734.41,4.11,3.58,338730434.62,-0.03,0
92,2000-01-04,AAPL,3.66,423520692.32,4.14,3.53,338073988.62,-0.12,0
96,2000-01-05,AAPL,3.71,652759257.04,4.21,3.69,349572083.91,-0.12,0
100,2000-01-06,AAPL,3.39,588342234.14,3.93,3.86,360689811.63,-0.14,1
104,2000-01-07,AAPL,3.55,369687177.83,3.76,4.07,342881442.56,-0.05,1
108,2000-01-10,AAPL,3.49,398128945.80,3.68,4.10,336301291.86,-0.05,1
112,2000-01-11,AAPL,3.31,330257970.38,3.54,4.02,331659328.97,-0.06,1
116,2000-01-12,AAPL,3.11,686270313.02,3.39,4.05,349298845.85,-0.08,1


In [28]:
pd.options.mode.chained_assignment = None  # default='warn'
# j-lags of returns and average volumes
J = 10
vars = ['R','AV']
for i in vars:
    for j in range (1,J+1):
        cname = i+str(j).zfill(2)
        dtdf[cname] = dtdf[i].shift(j)
display(dtdf)  

Unnamed: 0,Date,Ticker,P,V,Pm20,Pp20,AV,R,Y,R01,R02,R03,R04,R05,R06,R07,R08,R09,R10,AV01,AV02,AV03,AV04,AV05,AV06,AV07,AV08,AV09,AV10
80,1999-12-30,AAPL,3.58,167566808.43,3.68,3.63,363007272.51,-0.03,1,,,,,,,,,,,,,,,,,,,,
84,1999-12-31,AAPL,3.67,135815743.06,3.94,3.71,344591098.97,-0.07,1,-0.03,,,,,,,,,,363007272.51,,,,,,,,,
88,2000-01-03,AAPL,4.00,483655734.41,4.11,3.58,338730434.62,-0.03,0,-0.07,-0.03,,,,,,,,,344591098.97,363007272.51,,,,,,,,
92,2000-01-04,AAPL,3.66,423520692.32,4.14,3.53,338073988.62,-0.12,0,-0.03,-0.07,-0.03,,,,,,,,338730434.62,344591098.97,363007272.51,,,,,,,
96,2000-01-05,AAPL,3.71,652759257.04,4.21,3.69,349572083.91,-0.12,0,-0.12,-0.03,-0.07,-0.03,,,,,,,338073988.62,338730434.62,344591098.97,363007272.51,,,,,,
100,2000-01-06,AAPL,3.39,588342234.14,3.93,3.86,360689811.63,-0.14,1,-0.12,-0.12,-0.03,-0.07,-0.03,,,,,,349572083.91,338073988.62,338730434.62,344591098.97,363007272.51,,,,,
104,2000-01-07,AAPL,3.55,369687177.83,3.76,4.07,342881442.56,-0.05,1,-0.14,-0.12,-0.12,-0.03,-0.07,-0.03,,,,,360689811.63,349572083.91,338073988.62,338730434.62,344591098.97,363007272.51,,,,
108,2000-01-10,AAPL,3.49,398128945.80,3.68,4.10,336301291.86,-0.05,1,-0.05,-0.14,-0.12,-0.12,-0.03,-0.07,-0.03,,,,342881442.56,360689811.63,349572083.91,338073988.62,338730434.62,344591098.97,363007272.51,,,
112,2000-01-11,AAPL,3.31,330257970.38,3.54,4.02,331659328.97,-0.06,1,-0.05,-0.05,-0.14,-0.12,-0.12,-0.03,-0.07,-0.03,,,336301291.86,342881442.56,360689811.63,349572083.91,338073988.62,338730434.62,344591098.97,363007272.51,,
116,2000-01-12,AAPL,3.11,686270313.02,3.39,4.05,349298845.85,-0.08,1,-0.06,-0.05,-0.05,-0.14,-0.12,-0.12,-0.03,-0.07,-0.03,,331659328.97,336301291.86,342881442.56,360689811.63,349572083.91,338073988.62,338730434.62,344591098.97,363007272.51,


In [29]:
dtdf = dtdf.dropna()

In [30]:
# All we need are the returns and volumes, 
# plus the target variable Y (future N-day price movement)

# July 14: We want the latest return R, and 10 of its lags
# and same for AV. So keep R and AV
# There will be 11 return / volume features 
# dtdf = dtdf.drop(['P','V','Pm20','Pp20', 'AV', 'R'], 1)
dtdf = dtdf.drop(['P','V','Pm20','Pp20'], 1)

In [31]:
columns = dtdf.columns.tolist()
columns

['Date',
 'Ticker',
 'AV',
 'R',
 'Y',
 'R01',
 'R02',
 'R03',
 'R04',
 'R05',
 'R06',
 'R07',
 'R08',
 'R09',
 'R10',
 'AV01',
 'AV02',
 'AV03',
 'AV04',
 'AV05',
 'AV06',
 'AV07',
 'AV08',
 'AV09',
 'AV10']

In [32]:
columns = [#'index',
 'Date','Ticker','Y',
 'R', 'R01', 'R02', 'R03', 'R04', 'R05', 'R06', 'R07', 'R08', 'R09', 'R10',
 'AV', 'AV01', 'AV02', 'AV03', 'AV04', 'AV05', 'AV06', 'AV07', 'AV08', 'AV09', 'AV10']
dtdf = dtdf[columns]

In [33]:
# Convert Date and Ticker into columns - not necessary if from flatfile
#dtdf.reset_index(inplace=True)
display(dtdf)

Unnamed: 0,Date,Ticker,Y,R,R01,R02,R03,R04,R05,R06,R07,R08,R09,R10,AV,AV01,AV02,AV03,AV04,AV05,AV06,AV07,AV08,AV09,AV10
120,2000-01-13,AAPL,1,-0.00,-0.08,-0.06,-0.05,-0.05,-0.14,-0.12,-0.12,-0.03,-0.07,-0.03,365218945.34,349298845.85,331659328.97,336301291.86,342881442.56,360689811.63,349572083.91,338073988.62,338730434.62,344591098.97,363007272.51
124,2000-01-14,AAPL,1,0.02,-0.00,-0.08,-0.06,-0.05,-0.05,-0.14,-0.12,-0.12,-0.03,-0.07,362641843.67,365218945.34,349298845.85,331659328.97,336301291.86,342881442.56,360689811.63,349572083.91,338073988.62,338730434.62,344591098.97
128,2000-01-18,AAPL,1,0.04,0.02,-0.00,-0.08,-0.06,-0.05,-0.05,-0.14,-0.12,-0.12,-0.03,361926200.44,362641843.67,365218945.34,349298845.85,331659328.97,336301291.86,342881442.56,360689811.63,349572083.91,338073988.62,338730434.62
132,2000-01-19,AAPL,1,0.09,0.04,0.02,-0.00,-0.08,-0.06,-0.05,-0.05,-0.14,-0.12,-0.12,376383516.50,361926200.44,362641843.67,365218945.34,349298845.85,331659328.97,336301291.86,342881442.56,360689811.63,349572083.91,338073988.62
136,2000-01-20,AAPL,1,0.11,0.09,0.04,0.02,-0.00,-0.08,-0.06,-0.05,-0.05,-0.14,-0.12,447471279.14,376383516.50,361926200.44,362641843.67,365218945.34,349298845.85,331659328.97,336301291.86,342881442.56,360689811.63,349572083.91
140,2000-01-21,AAPL,0,0.11,0.11,0.09,0.04,0.02,-0.00,-0.08,-0.06,-0.05,-0.05,-0.14,456549821.86,447471279.14,376383516.50,361926200.44,362641843.67,365218945.34,349298845.85,331659328.97,336301291.86,342881442.56,360689811.63
144,2000-01-24,AAPL,1,0.03,0.11,0.11,0.09,0.04,0.02,-0.00,-0.08,-0.06,-0.05,-0.05,465858524.74,456549821.86,447471279.14,376383516.50,361926200.44,362641843.67,365218945.34,349298845.85,331659328.97,336301291.86,342881442.56
148,2000-01-25,AAPL,1,0.13,0.03,0.11,0.11,0.09,0.04,0.02,-0.00,-0.08,-0.06,-0.05,481616413.84,465858524.74,456549821.86,447471279.14,376383516.50,361926200.44,362641843.67,365218945.34,349298845.85,331659328.97,336301291.86
152,2000-01-26,AAPL,1,0.12,0.13,0.03,0.11,0.11,0.09,0.04,0.02,-0.00,-0.08,-0.06,488127199.61,481616413.84,465858524.74,456549821.86,447471279.14,376383516.50,361926200.44,362641843.67,365218945.34,349298845.85,331659328.97
156,2000-01-27,AAPL,1,0.09,0.12,0.13,0.03,0.11,0.11,0.09,0.04,0.02,-0.00,-0.08,491663343.83,488127199.61,481616413.84,465858524.74,456549821.86,447471279.14,376383516.50,361926200.44,362641843.67,365218945.34,349298845.85


In [34]:
# Find the intersection of minimum and maximum of dates across tickers
# then use this dataframe for all observations

mindate = dtdf.groupby('Ticker')['Date'].min()
maxdate = dtdf.groupby('Ticker')['Date'].max()

begdate = max(mindate)
enddate = min(maxdate)

print begdate, enddate

2004-09-17 2017-06-02


In [35]:
df1 = dtdf.where((dtdf['Date'] >= begdate) & (dtdf['Date'] <= enddate))\
      .dropna(axis=0, how='all')
display(df1)

Unnamed: 0,Date,Ticker,Y,R,R01,R02,R03,R04,R05,R06,R07,R08,R09,R10,AV,AV01,AV02,AV03,AV04,AV05,AV06,AV07,AV08,AV09,AV10
4836,2004-09-17,AAPL,1.00,0.21,0.15,0.14,0.15,0.15,0.18,0.15,0.15,0.18,0.18,0.14,219559758.92,214147068.13,208767832.07,210204598.47,213725751.70,213791432.11,209843079.33,204623590.50,203476436.62,201875425.28,205360390.52
4841,2004-09-20,AAPL,1.00,0.22,0.21,0.15,0.14,0.15,0.15,0.18,0.15,0.15,0.18,0.18,219142114.70,219559758.92,214147068.13,208767832.07,210204598.47,213725751.70,213791432.11,209843079.33,204623590.50,203476436.62,201875425.28
4846,2004-09-21,AAPL,1.00,0.22,0.22,0.21,0.15,0.14,0.15,0.15,0.18,0.15,0.15,0.18,224611086.39,219142114.70,219559758.92,214147068.13,208767832.07,210204598.47,213725751.70,213791432.11,209843079.33,204623590.50,203476436.62
4851,2004-09-22,AAPL,1.00,0.16,0.22,0.22,0.21,0.15,0.14,0.15,0.15,0.18,0.15,0.15,226930891.07,224611086.39,219142114.70,219559758.92,214147068.13,208767832.07,210204598.47,213725751.70,213791432.11,209843079.33,204623590.50
4856,2004-09-23,AAPL,1.00,0.13,0.16,0.22,0.22,0.21,0.15,0.14,0.15,0.15,0.18,0.15,225399146.59,226930891.07,224611086.39,219142114.70,219559758.92,214147068.13,208767832.07,210204598.47,213725751.70,213791432.11,209843079.33
4861,2004-09-24,AAPL,1.00,0.08,0.13,0.16,0.22,0.22,0.21,0.15,0.14,0.15,0.15,0.18,209793465.19,225399146.59,226930891.07,224611086.39,219142114.70,219559758.92,214147068.13,208767832.07,210204598.47,213725751.70,213791432.11
4866,2004-09-27,AAPL,1.00,0.09,0.08,0.13,0.16,0.22,0.22,0.21,0.15,0.14,0.15,0.15,211053920.92,209793465.19,225399146.59,226930891.07,224611086.39,219142114.70,219559758.92,214147068.13,208767832.07,210204598.47,213725751.70
4871,2004-09-28,AAPL,1.00,0.11,0.09,0.08,0.13,0.16,0.22,0.22,0.21,0.15,0.14,0.15,215886136.18,211053920.92,209793465.19,225399146.59,226930891.07,224611086.39,219142114.70,219559758.92,214147068.13,208767832.07,210204598.47
4876,2004-09-29,AAPL,1.00,0.12,0.11,0.09,0.08,0.13,0.16,0.22,0.22,0.21,0.15,0.14,213944096.10,215886136.18,211053920.92,209793465.19,225399146.59,226930891.07,224611086.39,219142114.70,219559758.92,214147068.13,208767832.07
4881,2004-09-30,AAPL,1.00,0.08,0.12,0.11,0.09,0.08,0.13,0.16,0.22,0.22,0.21,0.15,212311312.92,213944096.10,215886136.18,211053920.92,209793465.19,225399146.59,226930891.07,224611086.39,219142114.70,219559758.92,214147068.13


In [36]:
# Convert to "wide format" so that for each day, 
# there are a number of columns,
# and each ticekr's name is atteched to its respective columns
df2 = df1.pivot(index='Date', columns='Ticker')
df2.columns = [' '.join(col).strip() for col in df2.columns.values]
df2.reset_index(inplace=True)

display(df2)

Unnamed: 0,Date,Y AAPL,Y AMZN,Y GOOGL,Y IBM,Y MSFT,R AAPL,R AMZN,R GOOGL,R IBM,R MSFT,R01 AAPL,R01 AMZN,R01 GOOGL,R01 IBM,R01 MSFT,R02 AAPL,R02 AMZN,R02 GOOGL,R02 IBM,R02 MSFT,R03 AAPL,R03 AMZN,R03 GOOGL,R03 IBM,R03 MSFT,R04 AAPL,R04 AMZN,R04 GOOGL,R04 IBM,R04 MSFT,R05 AAPL,R05 AMZN,R05 GOOGL,R05 IBM,R05 MSFT,R06 AAPL,R06 AMZN,R06 GOOGL,R06 IBM,R06 MSFT,R07 AAPL,R07 AMZN,R07 GOOGL,R07 IBM,R07 MSFT,R08 AAPL,R08 AMZN,R08 GOOGL,R08 IBM,R08 MSFT,R09 AAPL,R09 AMZN,R09 GOOGL,R09 IBM,R09 MSFT,R10 AAPL,R10 AMZN,R10 GOOGL,R10 IBM,R10 MSFT,AV AAPL,AV AMZN,AV GOOGL,AV IBM,AV MSFT,AV01 AAPL,AV01 AMZN,AV01 GOOGL,AV01 IBM,AV01 MSFT,AV02 AAPL,AV02 AMZN,AV02 GOOGL,AV02 IBM,AV02 MSFT,AV03 AAPL,AV03 AMZN,AV03 GOOGL,AV03 IBM,AV03 MSFT,AV04 AAPL,AV04 AMZN,AV04 GOOGL,AV04 IBM,AV04 MSFT,AV05 AAPL,AV05 AMZN,AV05 GOOGL,AV05 IBM,AV05 MSFT,AV06 AAPL,AV06 AMZN,AV06 GOOGL,AV06 IBM,AV06 MSFT,AV07 AAPL,AV07 AMZN,AV07 GOOGL,AV07 IBM,AV07 MSFT,AV08 AAPL,AV08 AMZN,AV08 GOOGL,AV08 IBM,AV08 MSFT,AV09 AAPL,AV09 AMZN,AV09 GOOGL,AV09 IBM,AV09 MSFT,AV10 AAPL,AV10 AMZN,AV10 GOOGL,AV10 IBM,AV10 MSFT
0,2004-09-17,1.00,0.00,1.00,0.00,1.00,0.21,0.11,0.17,0.01,0.01,0.15,0.08,0.07,0.01,-0.01,0.14,0.10,0.06,0.03,0.01,0.15,0.12,0.05,0.03,0.01,0.15,0.11,0.05,0.03,0.01,0.18,0.06,0.08,0.06,0.02,0.15,0.04,0.08,0.03,-0.00,0.15,0.02,0.08,0.01,-0.02,0.18,0.09,0.07,0.02,0.01,0.18,0.09,0.07,0.01,-0.00,0.14,0.10,0.07,-0.01,0.00,219559758.92,295295629.57,508588884.88,263741569.62,873070421.48,214147068.13,302882886.14,3250494371.22,258640513.47,854596244.17,208767832.07,296993529.92,3174982857.13,261047814.69,876020800.35,210204598.47,294829499.08,3221311366.90,257153246.13,879542110.67,213725751.70,276458134.97,3208938125.76,258064698.59,877284133.29,213791432.11,269322060.58,3305496494.59,260827920.41,872290144.11,209843079.33,276929220.24,3476070785.80,270901977.21,868888248.06,204623590.50,280958269.06,3480190122.43,272912432.90,865451049.19,203476436.62,287074085.06,3478905651.54,268223778.32,870221627.50,201875425.28,292031919.35,3514978060.75,266772942.72,876251112.86,205360390.52,302333536.62,3528430379.53,272257300.66,903231652.31
1,2004-09-20,1.00,0.00,1.00,1.00,1.00,0.22,0.10,0.10,0.01,0.01,0.21,0.11,0.17,0.01,0.01,0.15,0.08,0.07,0.01,-0.01,0.14,0.10,0.06,0.03,0.01,0.15,0.12,0.05,0.03,0.01,0.15,0.11,0.05,0.03,0.01,0.18,0.06,0.08,0.06,0.02,0.15,0.04,0.08,0.03,-0.00,0.15,0.02,0.08,0.01,-0.02,0.18,0.09,0.07,0.02,0.01,0.18,0.09,0.07,0.01,-0.00,219142114.70,301833067.24,478445205.49,263419386.51,878373383.16,219559758.92,295295629.57,508588884.88,263741569.62,873070421.48,214147068.13,302882886.14,3250494371.22,258640513.47,854596244.17,208767832.07,296993529.92,3174982857.13,261047814.69,876020800.35,210204598.47,294829499.08,3221311366.90,257153246.13,879542110.67,213725751.70,276458134.97,3208938125.76,258064698.59,877284133.29,213791432.11,269322060.58,3305496494.59,260827920.41,872290144.11,209843079.33,276929220.24,3476070785.80,270901977.21,868888248.06,204623590.50,280958269.06,3480190122.43,272912432.90,865451049.19,203476436.62,287074085.06,3478905651.54,268223778.32,870221627.50,201875425.28,292031919.35,3514978060.75,266772942.72,876251112.86
2,2004-09-21,1.00,0.00,1.00,1.00,1.00,0.22,0.10,0.08,0.01,0.00,0.22,0.10,0.10,0.01,0.01,0.21,0.11,0.17,0.01,0.01,0.15,0.08,0.07,0.01,-0.01,0.14,0.10,0.06,0.03,0.01,0.15,0.12,0.05,0.03,0.01,0.15,0.11,0.05,0.03,0.01,0.18,0.06,0.08,0.06,0.02,0.15,0.04,0.08,0.03,-0.00,0.15,0.02,0.08,0.01,-0.02,0.18,0.09,0.07,0.02,0.01,224611086.39,305412423.55,449781859.73,262897371.99,910145968.56,219142114.70,301833067.24,478445205.49,263419386.51,878373383.16,219559758.92,295295629.57,508588884.88,263741569.62,873070421.48,214147068.13,302882886.14,3250494371.22,258640513.47,854596244.17,208767832.07,296993529.92,3174982857.13,261047814.69,876020800.35,210204598.47,294829499.08,3221311366.90,257153246.13,879542110.67,213725751.70,276458134.97,3208938125.76,258064698.59,877284133.29,213791432.11,269322060.58,3305496494.59,260827920.41,872290144.11,209843079.33,276929220.24,3476070785.80,270901977.21,868888248.06,204623590.50,280958269.06,3480190122.43,272912432.90,865451049.19,203476436.62,287074085.06,3478905651.54,268223778.32,870221627.50
3,2004-09-22,1.00,0.00,1.00,1.00,1.00,0.16,0.06,0.13,-0.00,-0.00,0.22,0.10,0.08,0.01,0.00,0.22,0.10,0.10,0.01,0.01,0.21,0.11,0.17,0.01,0.01,0.15,0.08,0.07,0.01,-0.01,0.14,0.10,0.06,0.03,0.01,0.15,0.12,0.05,0.03,0.01,0.15,0.11,0.05,0.03,0.01,0.18,0.06,0.08,0.06,0.02,0.15,0.04,0.08,0.03,-0.00,0.15,0.02,0.08,0.01,-0.02,226930891.07,315771102.75,432226256.27,270429744.44,935367207.57,224611086.39,305412423.55,449781859.73,262897371.99,910145968.56,219142114.70,301833067.24,478445205.49,263419386.51,878373383.16,219559758.92,295295629.57,508588884.88,263741569.62,873070421.48,214147068.13,302882886.14,3250494371.22,258640513.47,854596244.17,208767832.07,296993529.92,3174982857.13,261047814.69,876020800.35,210204598.47,294829499.08,3221311366.90,257153246.13,879542110.67,213725751.70,276458134.97,3208938125.76,258064698.59,877284133.29,213791432.11,269322060.58,3305496494.59,260827920.41,872290144.11,209843079.33,276929220.24,3476070785.80,270901977.21,868888248.06,204623590.50,280958269.06,3480190122.43,272912432.90,865451049.19
4,2004-09-23,1.00,0.00,1.00,1.00,1.00,0.13,0.04,0.14,-0.01,-0.01,0.16,0.06,0.13,-0.00,-0.00,0.22,0.10,0.08,0.01,0.00,0.22,0.10,0.10,0.01,0.01,0.21,0.11,0.17,0.01,0.01,0.15,0.08,0.07,0.01,-0.01,0.14,0.10,0.06,0.03,0.01,0.15,0.12,0.05,0.03,0.01,0.15,0.11,0.05,0.03,0.01,0.18,0.06,0.08,0.06,0.02,0.15,0.04,0.08,0.03,-0.00,225399146.59,316652788.35,433659678.40,271508563.29,933743930.15,226930891.07,315771102.75,432226256.27,270429744.44,935367207.57,224611086.39,305412423.55,449781859.73,262897371.99,910145968.56,219142114.70,301833067.24,478445205.49,263419386.51,878373383.16,219559758.92,295295629.57,508588884.88,263741569.62,873070421.48,214147068.13,302882886.14,3250494371.22,258640513.47,854596244.17,208767832.07,296993529.92,3174982857.13,261047814.69,876020800.35,210204598.47,294829499.08,3221311366.90,257153246.13,879542110.67,213725751.70,276458134.97,3208938125.76,258064698.59,877284133.29,213791432.11,269322060.58,3305496494.59,260827920.41,872290144.11,209843079.33,276929220.24,3476070785.80,270901977.21,868888248.06
5,2004-09-24,1.00,0.00,1.00,1.00,1.00,0.08,0.02,0.11,-0.00,-0.01,0.13,0.04,0.14,-0.01,-0.01,0.16,0.06,0.13,-0.00,-0.00,0.22,0.10,0.08,0.01,0.00,0.22,0.10,0.10,0.01,0.01,0.21,0.11,0.17,0.01,0.01,0.15,0.08,0.07,0.01,-0.01,0.14,0.10,0.06,0.03,0.01,0.15,0.12,0.05,0.03,0.01,0.15,0.11,0.05,0.03,0.01,0.18,0.06,0.08,0.06,0.02,209793465.19,315346225.14,441859306.99,277231309.77,946899492.84,225399146.59,316652788.35,433659678.40,271508563.29,933743930.15,226930891.07,315771102.75,432226256.27,270429744.44,935367207.57,224611086.39,305412423.55,449781859.73,262897371.99,910145968.56,219142114.70,301833067.24,478445205.49,263419386.51,878373383.16,219559758.92,295295629.57,508588884.88,263741569.62,873070421.48,214147068.13,302882886.14,3250494371.22,258640513.47,854596244.17,208767832.07,296993529.92,3174982857.13,261047814.69,876020800.35,210204598.47,294829499.08,3221311366.90,257153246.13,879542110.67,213725751.70,276458134.97,3208938125.76,258064698.59,877284133.29,213791432.11,269322060.58,3305496494.59,260827920.41,872290144.11
6,2004-09-27,1.00,0.00,1.00,1.00,1.00,0.09,0.00,0.11,-0.01,-0.01,0.08,0.02,0.11,-0.00,-0.01,0.13,0.04,0.14,-0.01,-0.01,0.16,0.06,0.13,-0.00,-0.00,0.22,0.10,0.08,0.01,0.00,0.22,0.10,0.10,0.01,0.01,0.21,0.11,0.17,0.01,0.01,0.15,0.08,0.07,0.01,-0.01,0.14,0.10,0.06,0.03,0.01,0.15,0.12,0.05,0.03,0.01,0.15,0.11,0.05,0.03,0.01,211053920.92,323652035.81,446270344.13,284324595.16,960039513.83,209793465.19,315346225.14,441859306.99,277231309.77,946899492.84,225399146.59,316652788.35,433659678.40,271508563.29,933743930.15,226930891.07,315771102.75,432226256.27,270429744.44,935367207.57,224611086.39,305412423.55,449781859.73,262897371.99,910145968.56,219142114.70,301833067.24,478445205.49,263419386.51,878373383.16,219559758.92,295295629.57,508588884.88,263741569.62,873070421.48,214147068.13,302882886.14,3250494371.22,258640513.47,854596244.17,208767832.07,296993529.92,3174982857.13,261047814.69,876020800.35,210204598.47,294829499.08,3221311366.90,257153246.13,879542110.67,213725751.70,276458134.97,3208938125.76,258064698.59,877284133.29
7,2004-09-28,1.00,0.00,1.00,1.00,1.00,0.11,0.03,0.24,0.00,-0.00,0.09,0.00,0.11,-0.01,-0.01,0.08,0.02,0.11,-0.00,-0.01,0.13,0.04,0.14,-0.01,-0.01,0.16,0.06,0.13,-0.00,-0.00,0.22,0.10,0.08,0.01,0.00,0.22,0.10,0.10,0.01,0.01,0.21,0.11,0.17,0.01,0.01,0.15,0.08,0.07,0.01,-0.01,0.14,0.10,0.06,0.03,0.01,0.15,0.12,0.05,0.03,0.01,215886136.18,332347224.53,486748259.86,289538638.45,983495071.98,211053920.92,323652035.81,446270344.13,284324595.16,960039513.83,209793465.19,315346225.14,441859306.99,277231309.77,946899492.84,225399146.59,316652788.35,433659678.40,271508563.29,933743930.15,226930891.07,315771102.75,432226256.27,270429744.44,935367207.57,224611086.39,305412423.55,449781859.73,262897371.99,910145968.56,219142114.70,301833067.24,478445205.49,263419386.51,878373383.16,219559758.92,295295629.57,508588884.88,263741569.62,873070421.48,214147068.13,302882886.14,3250494371.22,258640513.47,854596244.17,208767832.07,296993529.92,3174982857.13,261047814.69,876020800.35,210204598.47,294829499.08,3221311366.90,257153246.13,879542110.67
8,2004-09-29,1.00,0.00,1.00,1.00,1.00,0.12,0.07,0.28,0.00,0.01,0.11,0.03,0.24,0.00,-0.00,0.09,0.00,0.11,-0.01,-0.01,0.08,0.02,0.11,-0.00,-0.01,0.13,0.04,0.14,-0.01,-0.01,0.16,0.06,0.13,-0.00,-0.00,0.22,0.10,0.08,0.01,0.00,0.22,0.10,0.10,0.01,0.01,0.21,0.11,0.17,0.01,0.01,0.15,0.08,0.07,0.01,-0.01,0.14,0.10,0.06,0.03,0.01,213944096.10,340886332.04,574252127.29,292218090.39,992813117.89,215886136.18,332347224.53,486748259.86,289538638.45,983495071.98,211053920.92,323652035.81,446270344.13,284324595.16,960039513.83,209793465.19,315346225.14,441859306.99,277231309.77,946899492.84,225399146.59,316652788.35,433659678.40,271508563.29,933743930.15,226930891.07,315771102.75,432226256.27,270429744.44,935367207.57,224611086.39,305412423.55,449781859.73,262897371.99,910145968.56,219142114.70,301833067.24,478445205.49,263419386.51,878373383.16,219559758.92,295295629.57,508588884.88,263741569.62,873070421.48,214147068.13,302882886.14,3250494371.22,258640513.47,854596244.17,208767832.07,296993529.92,3174982857.13,261047814.69,876020800.35
9,2004-09-30,1.00,0.00,1.00,1.00,1.00,0.08,0.07,0.29,0.02,0.01,0.12,0.07,0.28,0.00,0.01,0.11,0.03,0.24,0.00,-0.00,0.09,0.00,0.11,-0.01,-0.01,0.08,0.02,0.11,-0.00,-0.01,0.13,0.04,0.14,-0.01,-0.01,0.16,0.06,0.13,-0.00,-0.00,0.22,0.10,0.08,0.01,0.00,0.22,0.10,0.10,0.01,0.01,0.21,0.11,0.17,0.01,0.01,0.15,0.08,0.07,0.01,-0.01,212311312.92,350074923.88,595947130.78,293984873.82,1014559939.00,213944096.10,340886332.04,574252127.29,292218090.39,992813117.89,215886136.18,332347224.53,486748259.86,289538638.45,983495071.98,211053920.92,323652035.81,446270344.13,284324595.16,960039513.83,209793465.19,315346225.14,441859306.99,277231309.77,946899492.84,225399146.59,316652788.35,433659678.40,271508563.29,933743930.15,226930891.07,315771102.75,432226256.27,270429744.44,935367207.57,224611086.39,305412423.55,449781859.73,262897371.99,910145968.56,219142114.70,301833067.24,478445205.49,263419386.51,878373383.16,219559758.92,295295629.57,508588884.88,263741569.62,873070421.48,214147068.13,302882886.14,3250494371.22,258640513.47,854596244.17


In [37]:
# This dataset will form the basis for the machine learning part of the analysis
#df2.to_csv('Predict_01.CSV')