# Data Preparation

The first step in this project is actually getting the raw stock data from Alpha Vantage and processing into a form ready for training.

Before starting this process you must get an API key from this [site](https://www.alphavantage.co/support/#api-key).

## Load some necessary modules

In [None]:
# You only need to run this once on a SageMaker instance or on your PC  
!pip install alpha_vantage

In [38]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
from pandas_datareader import data
import time
import yaml

from alpha_vantage.timeseries import TimeSeries

%matplotlib inline

## Read the user defined settings

In [43]:
with open('settings.yml') as f:
    settings = yaml.safe_load(f)

## Download the stock history
Note you have to manually enter your Alpha Vantage API key in the following cell.  It is unique to each user and is limited to 5 API requests per minute and 500 requests per day, so please don't use someone else's key.

In [44]:
# Set your API key
api_key = 'AIM9J0PA8YV2A12F'

# Format connection
ts = TimeSeries(key=api_key, output_format='pandas')

# Read saved data if it exists
if os.path.isfile('stock-data.pkl'):
    saved_stocks = pd.read_pickle('stock-data.pkl')
    saved_stock_names = saved_stocks.columns.to_list()
else:
    saved_stock_names = []
    
# To avoid hitting the 5 requests per minute, we add delay
t0 = time.time()
stocks = []
for ticker in settings['tickers']:
    if ticker in saved_stock_names:
        print("Reading {} from saved data.".format(ticker))
        stocks.append(saved_stocks.loc[:, [ticker]])
    else:
        print("Reading {} from API.".format(ticker))
        data, _ = ts.get_daily_adjusted(symbol=ticker, outputsize='full')
        data = data.rename(columns={'5. adjusted close': ticker})
        stocks.append(data.loc[settings['data_end']:settings['data_start'], [ticker]])
del data
stocks = pd.concat(stocks, axis=1)
stocks.to_pickle('stock-data.pkl')

Reading BCE from API.
Reading RCI from saved data.
Reading SJR from saved data.


In [45]:
stocks['2019-12-16':'2019-12-11']

Unnamed: 0_level_0,BCE,RCI,SJR
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-12-16,47.9,48.73,20.1825
2019-12-13,47.5,48.44,20.1127
2019-12-12,48.1606,48.0,20.1625
2019-12-11,48.3878,47.87,20.2072
