# Getting The Data

#### Brian Bahmanyar


___

[Quandl](https://www.quandl.com) provides free daily financial data which will be used in the analyses to come. They also provide a free, but somewhat lackluster Python API.

In [1]:
import numpy as np
import pandas as pd
import Quandl

Below is a function written to serve as a wrapper around Quandl's Python API and provide some needed functionality.

In [2]:
def get_adj_close(tickers, start, end="", ratios=[], log_transforms=[]):
    """
    Args:
        tickers (list): collection of ticker symbols for which to collect adj. close 
                daily prices for
        start (string, format: 2013-01-01): start date for which to collect prices after
        end (string, format: 2013-01-01): optional end date, today if not specified
        ratios (list): collection of tuples of tickers from 'tickers' list to calculate 
                price ratios for (the stock with larger mean is numerator)
        log_transforms (list): collection of tickers from 'tickers' to include additional 
                natural log transformed copies
    
    Returns (dataframe): all adj. close prices, ratios, and log transforms specified
    """ 
    result = {}
    
    for ticker in tickers:
        try:
            result[ticker] = Quandl.get('WIKI/'+ticker, trim_start=start, trim_end=end)['Adj. Close']
        except DatasetNotFound:
            print('ERROR:')
            print(ticker, 'is not a vaild ticker')

    for ratio in ratios:
        try:
            ticker1, ticker2 = ratio
            if result[ticker1].mean() > result[ticker2].mean():
                result[ticker1+'/'+ticker2] = result[ticker1]/result[ticker2]
            else:
                result[ticker2+'/'+ticker1] = result[ticker2]/result[ticker1]
        except KeyError:
            print('ERROR:')
            print(ticker1, 'or', ticker2, 'are not in the list of specified tickers')
    
    for log_transform in log_transforms:
        try:
            result['ln('+log_transform+')'] = np.log(result[log_transform])
        except KeyError:
            print('ERROR:')
            print(log_transform, 'is not in the list of specified tickers')
    
    return pd.DataFrame(result).dropna() # drop na here because of differences in lenght of history for stocks

A copy of this function is placed into api_wrapper.py for use in other notebooks.

***

### Usage

In [3]:
bundle = get_adj_close( ['FB', 'AMZN', 'AAPL', 'VZ', 'T', 'SBUX', 'NKE', 'CMG', 'KO', 'PEP', 'XOM', 'CVX'], 
                           start='2013-01-01', 
                           ratios=[('FB','AMZN'), ('FB','AAPL'), ('VZ','T'), ('KO','PEP'), ('XOM','CVX')],
                           log_transforms=['FB', 'AMZN', 'AAPL', 'NKE', 'CMG'] )

In [4]:
bundle.head()

Unnamed: 0_level_0,AAPL,AAPL/FB,AMZN,AMZN/FB,CMG,CVX,CVX/XOM,FB,KO,NKE,...,SBUX,T,VZ,VZ/T,XOM,ln(AAPL),ln(AMZN),ln(CMG),ln(FB),ln(NKE)
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2013-01-02,73.295822,2.617708,257.31,9.189643,301.06,98.749493,1.208567,28.0,34.350312,25.070117,...,26.504768,30.260352,41.463123,1.370213,81.707939,4.294504,5.550282,5.70731,3.332205,3.221677
2013-01-03,72.370116,2.606054,258.48,9.307886,300.95,98.329054,1.205596,27.77,34.350312,25.326428,...,26.683073,30.277644,41.266437,1.362934,81.560568,4.281793,5.554818,5.706944,3.323956,3.231848
2013-01-04,70.354805,2.446273,259.15,9.010779,300.18,98.847894,1.206371,28.76,34.405126,25.573066,...,26.837283,30.459206,41.491221,1.36219,81.938206,4.253551,5.557407,5.704382,3.358986,3.24154
2013-01-07,69.940953,2.377327,268.46,9.125085,299.59,98.17698,1.212219,29.42,34.07624,25.611755,...,26.85174,30.597539,41.856493,1.367969,80.989506,4.247651,5.592702,5.702415,3.381675,3.243051
2013-01-08,70.129189,2.413255,266.38,9.166552,297.76,97.73865,1.199305,29.06,33.838711,25.340936,...,26.803549,30.087436,40.849652,1.357698,81.496093,4.250339,5.584924,5.696288,3.369363,3.232421


In [5]:
bundle.to_csv('bundle.csv')

### Tests

In [6]:
assert len(bundle.columns) == 22
assert bundle.isnull().sum().sum() == 0