# Baseline - Linear Interpolation

For a ticker, given a sample period (3-1-2009 to ???), we design a linear interpolation method to to predict its closing price on 3-31-2019:  

predicted_price = price_start + (price_end - price_start) * (end_label - start)/(end - start)  
10_bagger = predicted_price > price_start*10  
 
where:  
* price_start - first valid price in the sample period
* price_end - last valid price in the sample period
* end_label - 3-31-2019
* start - date of the first valid price in the sample period
* end - date of the last valid price in the sample period

This serves as the baseline for other more advanced machine learning methods for 10 Bagger prediction.


In [1]:
import quandl  # Access to Sharadar Core US Equities Bundle
api_key = '7B87ndLPJbCDzpNHosH3'

import math
import platform
import matplotlib
import matplotlib.pyplot as plt
from pylab import rcParams
import numpy as np
import torch
import pandas as pd
from IPython.display import display
import time

from datetime import date, datetime, time, timedelta


print("Python version: ", platform.python_version())
print("Pytorch version: {}".format(torch.__version__))

Python version:  3.6.9
Pytorch version: 1.2.0


## Import Labels

For each sample period (e.g. 3-1-2009 to 12-31-2018), we want to import a list of valid tickers. A valid ticker is defined as a ticker which is active for at least 180 days before the end of the sample period. 

For example, if the end of the sample period is 12-31-2018, a ticker has to be active since 7-4-2018. Any ticker that IPO after 7-4-2018 is not a valid ticker, since there is no enough price history to make an educated prediction.

In [2]:
labels = pd.read_csv("../datasets/sharader/labels_12-31-2018.csv")

y = labels.set_index('ticker')
y['firstpricedate']= pd.to_datetime(y['firstpricedate'])
y['lastpricedate']= pd.to_datetime(y['lastpricedate'])

y.head()

Unnamed: 0_level_0,appreciation,10bagger,table,permaticker,name,exchange,isdelisted,category,cusips,siccode,...,currency,location,lastupdated,firstadded,firstpricedate,lastpricedate,firstquarter,lastquarter,secfilings,companysite
ticker,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
A,6.339117,False,SEP,196290,Agilent Technologies Inc,NYSE,N,Domestic,00846U101,3826.0,...,USD,California; U.S.A,2020-01-14,2014-09-26,1999-11-18,2020-01-14,1997-06-30,2019-09-30,https://www.sec.gov/cgi-bin/browse-edgar?actio...,http://www.agilent.com
AA,1.224348,False,SEP,124392,Alcoa Corp,NYSE,N,Domestic,013872106,3350.0,...,USD,New York; U.S.A,2020-01-14,2016-11-01,2016-11-01,2020-01-14,2014-12-31,2019-09-30,https://www.sec.gov/cgi-bin/browse-edgar?actio...,http://www.alcoa.com
AAAGY,1.275556,False,SEP,120538,Altana Aktiengesellschaft,NYSE,Y,ADR,02143N103,2834.0,...,USD,Jordan,2018-10-16,2018-02-13,2002-05-22,2010-08-12,2000-12-31,2005-12-31,https://www.sec.gov/cgi-bin/browse-edgar?actio...,
AAAP,3.331837,False,SEP,155760,Advanced Accelerator Applications SA,NASDAQ,Y,ADR,00790T100,2834.0,...,USD,France,2018-06-28,2016-05-19,2015-11-11,2018-02-09,2012-12-31,2016-12-31,https://www.sec.gov/cgi-bin/browse-edgar?actio...,
AAC,0.099459,False,SEP,187592,AAC Holdings Inc,NYSE,Y,Domestic,000307108,8093.0,...,USD,Tennessee; U.S.A,2019-10-25,2015-09-11,2014-10-02,2019-10-25,2013-09-30,2019-09-30,https://www.sec.gov/cgi-bin/browse-edgar?actio...,


### Number of active tickers

In [3]:
tickers = list(y.index)
print(len(tickers))

9881


In [4]:
valid_tickers = pd.Series(tickers, name = 'ticker')

valid_tickers.head()

0        A
1       AA
2    AAAGY
3     AAAP
4      AAC
Name: ticker, dtype: object

In [9]:
prices = pd.read_csv("../datasets/sharader/inputs_notfilled_2018-12-31.csv")
prices

Unnamed: 0,date,A,AA,AAAGY,AAAP,AAC,AACC,AACG,AACPF,AAGIY,...,ZUO,ZURVY,ZVO,ZVUE,ZXAIY,ZYME,ZYNE,ZYTO,ZYXI,ZZ
0,2009-03-02,12.68,,15.75,,,3.29,5.180,,,...,,12.750,,0.01,,,,0.011,1.21,0.84
1,2009-03-03,12.68,,15.75,,,3.30,5.320,,,...,,12.850,,0.01,,,,0.011,1.22,0.76
2,2009-03-04,13.31,,16.35,,,3.33,5.080,,,...,,13.740,,0.01,,,,0.011,1.22,0.76
3,2009-03-05,12.54,,15.59,,,3.30,5.080,,,...,,11.910,,0.01,,,,0.011,1.17,0.58
4,2009-03-06,12.65,,15.97,,,3.40,5.250,,,...,,11.300,,0.01,,,,0.011,1.20,0.55
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2472,2018-12-24,62.67,25.15,,,1.52,,0.970,,32.335,...,16.36,28.680,6.73,,0.67,11.08,2.94,,2.69,
2473,2018-12-26,65.54,27.14,,,1.67,,0.998,,32.700,...,17.81,29.050,6.78,,0.67,11.99,2.92,,2.63,
2474,2018-12-27,66.48,27.16,,,1.49,,0.980,,32.330,...,17.90,28.769,6.83,,0.62,11.70,2.95,,2.63,
2475,2018-12-28,65.96,26.60,,,1.41,,1.000,,32.900,...,17.65,29.668,6.87,,0.72,13.65,2.92,,2.65,


In [12]:
X = prices.set_index('date')
print (X['A'])

date
2009-03-02    12.68
2009-03-03    12.68
2009-03-04    13.31
2009-03-05    12.54
2009-03-06    12.65
              ...  
2018-12-24    62.67
2018-12-26    65.54
2018-12-27    66.48
2018-12-28    65.96
2018-12-31    67.46
Name: A, Length: 2477, dtype: float64


In [24]:
# Start and end date of the sampled period
start_date_sample = '2009-03-01'
end_date_sample = '2018-03-31'

# List for saving predicted price on 3-31-2019
predicts = []

for ticker in valid_tickers:
    
    print("Ticker: {}".format(ticker))
    
    # First and last dates when the ticker is active
    first_price_date = y['firstpricedate'].loc[ticker]
    last_price_date = y['lastpricedate'].loc[ticker]
    
    print("First Trading Date: {}".format(first_price_date.strftime('%m-%d-%Y')))
    print("Last Trading Date: {}".format(last_price_date.strftime('%m-%d-%Y')))
    
    if datetime(start_date_sample) > first_price_date:
        start = datetime(start_date_sample)
    else:
        start = first_price_date
    
    first_price = X.loc[start.strftime('%m-%d-%Y'), ticker]
#     last_price = X.loc[last_price_date.strftime('%m-%d-%Y'), ticker]

    print("First Trading Date: {}  Price: {}".format(start,first_price))
#     print("Last Trading Date: {}  Price: {}".format(last_price_date, last_price))
    
    """
    s = X[('close',ticker)]
    
    price_start = X[('close',ticker)].loc[s.first_valid_index()]
    
    price_end = X[('close',ticker)].loc[ s.last_valid_index()]
    
    print("First Trading Date: {}  Price: {}".format(firstTradingDate,price_start))
    print("Last Trading Date: {}  Price: {}".format(lastTradingDate, price_end ))
    
    delta = (lastTradingDate - firstTradingDate).days
    
    # For now, set the two to be the same
    predict_last_price = price_end
    
    appreciation = predict_last_price/price_start
    predictions.append(appreciation > 10.0)
    
    print(appreciation, appreciation > 10.0)    
    """


Ticker: A
First Trading Date: 11-18-1999
Last Trading Date: 01-14-2020


TypeError: an integer is required (got type str)

In [88]:
print(delta)

2533


In [62]:
X[('close','A')].loc[('2009-03-02')]

12.68