# Capstone Project 2
**Predicting stock prices with a neural network**

## Introduction

The objective of this project is to use deep learning to predict movement in stocks using a deep learning model. We will be analyzing time series data from 3 major tech companies: Apple, Google, and Microsoft.

## Data Wrangling
Pulling data from the web using Quandl's API

In [1]:
import pandas as pd
import quandl
import datetime

In [2]:
def quandl_stocks(symbol, start_date=(2000, 1, 1), end_date=None):
    """
    symbol is a string representing a stock symbol, e.g. 'AAPL'
 
    start_date and end_date are tuples of integers representing the year, month,
    and day
 
    end_date defaults to the current date when None
    """
 
    query_list = ['WIKI' + '/' + symbol]
 
    start_date = datetime.date(*start_date)
 
    if end_date:
        end_date = datetime.date(*end_date)
    else:
        end_date = datetime.date.today()
 
    return quandl.get(query_list, 
            returns='pandas', 
            start_date=start_date,
            end_date=end_date,
            collapse='daily',
            order='asc'
            )

In [3]:
apple_df = quandl_stocks('AAPL', (2010, 1, 1), (2018, 12, 31))

In [4]:
apple_df.head()

Unnamed: 0_level_0,WIKI/AAPL - Open,WIKI/AAPL - High,WIKI/AAPL - Low,WIKI/AAPL - Close,WIKI/AAPL - Volume,WIKI/AAPL - Ex-Dividend,WIKI/AAPL - Split Ratio,WIKI/AAPL - Adj. Open,WIKI/AAPL - Adj. High,WIKI/AAPL - Adj. Low,WIKI/AAPL - Adj. Close,WIKI/AAPL - Adj. Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2010-01-04,213.43,214.5,212.38,214.01,17633200.0,0.0,1.0,27.42873,27.56624,27.29379,27.503268,123432400.0
2010-01-05,214.6,215.59,213.25,214.38,21496600.0,0.0,1.0,27.579091,27.70632,27.405597,27.550818,150476200.0
2010-01-06,214.38,215.23,210.75,210.97,19720000.0,0.0,1.0,27.550818,27.660055,27.084312,27.112585,138040000.0
2010-01-07,211.75,212.0,209.05,210.58,17040400.0,0.0,1.0,27.212826,27.244955,26.865839,27.062465,119282800.0
2010-01-08,210.3,212.0,209.06,211.98,15986100.0,0.0,1.0,27.026481,27.244955,26.867124,27.242385,111902700.0


In [5]:
google_df = quandl_stocks('GOOGL', (2010, 1, 1), (2018, 12, 31))

In [6]:
google_df.head()

Unnamed: 0_level_0,WIKI/GOOGL - Open,WIKI/GOOGL - High,WIKI/GOOGL - Low,WIKI/GOOGL - Close,WIKI/GOOGL - Volume,WIKI/GOOGL - Ex-Dividend,WIKI/GOOGL - Split Ratio,WIKI/GOOGL - Adj. Open,WIKI/GOOGL - Adj. High,WIKI/GOOGL - Adj. Low,WIKI/GOOGL - Adj. Close,WIKI/GOOGL - Adj. Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2010-01-04,626.95,629.51,624.24,626.75,3908400.0,0.0,1.0,314.445664,315.729627,313.086468,314.345354,3908400.0
2010-01-05,627.18,627.84,621.54,623.99,6003300.0,0.0,1.0,314.56102,314.892042,311.732288,312.961081,6003300.0
2010-01-06,625.86,625.86,606.36,608.26,7949400.0,0.0,1.0,313.898976,313.898976,304.118786,305.071727,7949400.0
2010-01-07,609.4,610.0,592.65,594.1,12815700.0,0.0,1.0,305.643492,305.944421,297.242559,297.969804,12815700.0
2010-01-08,592.0,603.25,589.11,602.02,9439100.0,0.0,1.0,296.916553,302.558971,295.467079,301.942066,9439100.0


In [7]:
msft_df = quandl_stocks('MSFT', (2010, 1, 1), (2018, 12, 31))

In [8]:
msft_df.head()

Unnamed: 0_level_0,WIKI/MSFT - Open,WIKI/MSFT - High,WIKI/MSFT - Low,WIKI/MSFT - Close,WIKI/MSFT - Volume,WIKI/MSFT - Ex-Dividend,WIKI/MSFT - Split Ratio,WIKI/MSFT - Adj. Open,WIKI/MSFT - Adj. High,WIKI/MSFT - Adj. Low,WIKI/MSFT - Adj. Close,WIKI/MSFT - Adj. Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2010-01-04,30.62,31.1,30.59,30.95,38409100.0,0.0,1.0,24.885276,25.275379,24.860895,25.153472,38409100.0
2010-01-05,30.85,31.1,30.64,30.96,49749600.0,0.0,1.0,25.072201,25.275379,24.901531,25.161599,49749600.0
2010-01-06,30.88,31.08,30.52,30.77,58182400.0,0.0,1.0,25.096582,25.259125,24.804005,25.007183,58182400.0
2010-01-07,30.63,30.7,30.19,30.452,50559700.0,0.0,1.0,24.893404,24.950294,24.53581,24.748741,50559700.0
2010-01-08,30.28,30.88,30.24,30.66,51197400.0,0.0,1.0,24.608954,25.096582,24.576445,24.917785,51197400.0


Now that we've got our raw data, let's dump them to CSV files for further analysis.

In [11]:
apple_df.to_csv('/Users/jessemailhot/Documents/GitHub/springboard/Capstone 2/raw data/apple.csv')
google_df.to_csv('/Users/jessemailhot/Documents/GitHub/springboard/Capstone 2/raw data/google.csv')
msft_df.to_csv('/Users/jessemailhot/Documents/GitHub/springboard/Capstone 2/raw data/msft.csv')