# Analyze Top 100 Tickers

For each sample period(e.g. 3-1-2009 to 12-31-2015), the logistic regression model outputs a set of tickers sorted in descending order based on probability scores.

In this notebook, we will analyze the characteristics of the sets of top 100 tickers generated by these models:

1. Are the top tickers already 10 Baggers when they are on the top of the list?
2. What kind of return can an investor achieve by adopting these tickers into his portfolio?
3. How much churn is there? How many of the top tickers remain in the list as the end of sampling period progresses from 12-31-2010 to 12-31-2018?

Answering these questions will allow us to design a portfolio strategy that maximizes portfolio return.

In [1]:
import quandl  # Access to Sharadar Core US Equities Bundle
api_key = '7B87ndLPJbCDzpNHosH3'

import math
import platform
import matplotlib
import matplotlib.pyplot as plt
from pylab import rcParams
import numpy as np
from sklearn import linear_model  # package for logistic regression (not using GPU)
import torch
import pandas as pd
from IPython.display import display
import time
import pickle

from utils import *

from datetime import date, datetime, time, timedelta


print("Python version: ", platform.python_version())
print("Pytorch version: {}".format(torch.__version__))

Python version:  3.6.6
Pytorch version: 1.1.0


In [3]:
label_filenames = ['labels_12-31-2010.csv',
              'labels_12-31-2011.csv',
              'labels_12-31-2012.csv',
              'labels_12-31-2013.csv',
              'labels_12-31-2014.csv',
              'labels_12-31-2015.csv',
              'labels_12-31-2016.csv',
              'labels_12-31-2017.csv',
              'labels_12-31-2018.csv'
             ]

prices_filenames = ['inputs_notfilled_2010-12-31.csv',
            'inputs_notfilled_2011-12-31.csv',
            'inputs_notfilled_2012-12-31.csv',
            'inputs_notfilled_2013-12-31.csv',
            'inputs_notfilled_2014-12-31.csv',
            'inputs_notfilled_2015-12-31.csv',
            'inputs_notfilled_2016-12-31.csv',
            'inputs_notfilled_2017-12-31.csv',
            'inputs_notfilled_2018-12-31.csv']

# This is a dictionary of sorted tickers outputted by log reg models based on ticker price only
sorted_tickers = pickle.load( open( "sorted_tickers_logreg_price_only.pkl", "rb" ) )

In [14]:
# A list of regularization constants which train the best model
C_values = [
    1e-3,
    1e-3,
    1e-3,
    1e-3,
    1e-3,
    1e-2,
    1e-5,
    1e-5,
    1e-3
]

end_dates = [
    '2010-12-31',
    '2011-12-30',
    '2012-12-31',
    '2013-12-31',
    '2014-12-31',
    '2015-12-31',
    '2016-12-30',
    '2017-12-29',    
    '2018-12-31', 
]

for label_filename, prices_filename, C, end_date in  zip(label_filenames,  \
                                                        prices_filenames, C_values, end_dates,): 
    
    print('Sample Period:{}'.format(label_filename))
    
    labels = pd.read_csv("../datasets/sharader/"+label_filename)
    y = labels.set_index('ticker')

    prices = pd.read_csv("../datasets/sharader/"+prices_filename)
    X = prices.set_index('date')
    first_valid_prices = X.apply(first_valid_idx, axis=0)

    X_filled = X.fillna(axis=0, method='ffill')  # forward fill along date axis with last valid price
    X_filled = X_filled.fillna(0)  # fill all other NaN with zero - remaining NaN before the first valid price

    # Transpose Dataframe in rows of tickers, and normalize price
    X_all = X_filled.transpose().div(first_valid_prices, axis=0)
    
    sum = 0

    for num, ticker in enumerate(sorted_tickers[label_filename][C][:101]):
        buy_price = X_all[end_date].loc[ticker]
        sell_price = y['appreciation'].loc[ticker]
        
        apprec = sell_price/buy_price - 1.0
        sum += apprec
        
        label = y['10bagger'].loc[ticker]
        print('{}. {}: {:.2f} --> {:.2f} ({:.2f}-{})'.format(num, ticker, buy_price, sell_price, apprec, label))
    
    print('Portfolio Appreciation: {:.2f}'.format(sum/100))
    print('\n')

Sample Period:labels_12-31-2010.csv
0. PXPLY: 95.00 --> 125.00 (0.32-True)
1. ECPN: 68.00 --> 1.00 (-0.99-False)
2. ATSG: 41.58 --> 121.32 (1.92-True)
3. MTOR: 38.00 --> 37.69 (-0.01-True)
4. KERX: 34.70 --> 25.45 (-0.27-True)
5. TRXBQ: 30.75 --> 48.00 (0.56-True)
6. DS: 26.80 --> 2.99 (-0.89-False)
7. TRW: 23.53 --> 47.08 (1.00-True)
8. LVS: 20.98 --> 27.84 (0.33-True)
9. SPRD: 20.88 --> 35.15 (0.68-True)
10. VITK: 1.00 --> 0.01 (-0.99-False)
11. LBY: 19.10 --> 3.51 (-0.82-False)
12. DXLG: 16.93 --> 8.75 (-0.48-False)
13. CPWM: 14.06 --> 31.88 (1.27-True)
14. CROX: 15.02 --> 22.59 (0.50-True)
15. BWEBF: 13.12 --> 28.12 (1.14-True)
16. TECK: 20.27 --> 7.60 (-0.63-False)
17. ESCA: 15.19 --> 26.60 (0.75-True)
18. RT: 12.44 --> 2.29 (-0.82-False)
19. APKT: 12.75 --> 7.01 (-0.45-False)
20. RIOM: 16.67 --> 17.53 (0.05-True)
21. LCUT: 11.14 --> 7.50 (-0.33-False)
22. SRZ: 13.97 --> 37.13 (1.66-True)
23. FOE: 12.96 --> 16.75 (0.29-True)
24. DNDNQ: 12.04 --> 0.01 (-1.00-False)
25. HGSI: 14.14 

0. DTG: 128.68 --> 128.68 (0.00-True)
1. MITK: 84.86 --> 174.86 (1.06-True)
2. INHX: 96.15 --> 96.15 (0.00-True)
3. CAR: 96.24 --> 83.00 (-0.14-True)
4. PATK: 68.76 --> 242.35 (2.52-True)
5. DAN: 59.45 --> 53.76 (-0.10-True)
6. NXST: 65.56 --> 127.49 (0.94-True)
7. DEIX: 50.11 --> 50.11 (0.00-True)
8. HRVEQ: 50.00 --> 0.50 (-0.99-False)
9. STRZA: 52.12 --> 63.32 (0.21-True)
10. TEN: 45.62 --> 17.87 (-0.61-True)
11. MILL: 35.20 --> 0.02 (-1.00-False)
12. SPRD: 35.15 --> 35.15 (0.00-True)
13. CMTX: 36.36 --> 181.82 (4.00-True)
14. TRS: 41.12 --> 31.16 (-0.24-True)
15. CPWM: 31.88 --> 31.88 (0.00-True)
16. ATLS: 40.74 --> 0.01 (-1.00-False)
17. LVS: 36.01 --> 27.84 (-0.23-True)
18. BARZ: 22.73 --> 3.64 (-0.84-False)
19. FSII: 28.27 --> 28.27 (0.00-True)
20. ESCA: 28.02 --> 26.60 (-0.05-True)
21. SBGI: 31.62 --> 34.05 (0.08-True)
22. LULU: 22.75 --> 63.15 (1.78-True)
23. AXL: 26.91 --> 18.83 (-0.30-True)
24. CRRS: 20.23 --> 0.77 (-0.96-False)
25. LBY: 25.93 --> 3.51 (-0.86-False)
26. REGN:

0. HTLLQ: 3000.00 --> 3000.00 (0.00-True)
1. PPKZ: 250.00 --> 250.00 (0.00-True)
2. DTG: 128.68 --> 128.68 (0.00-True)
3. MITK: 87.86 --> 174.86 (0.99-True)
4. CAR: 87.33 --> 83.00 (-0.05-True)
5. DDRX: 69.48 --> 69.48 (0.00-True)
6. GTT: 79.86 --> 96.39 (0.21-True)
7. STRZA: 63.32 --> 63.32 (0.00-True)
8. TRXBQ: 48.00 --> 48.00 (0.00-True)
9. BZ: 50.12 --> 50.12 (0.00-True)
10. AVNR: 40.38 --> 40.38 (0.00-True)
11. ULTA: 46.61 --> 63.75 (0.37-True)
12. LAD: 38.58 --> 36.95 (-0.04-True)
13. CATM: 47.04 --> 30.67 (-0.35-True)
14. CNO: 37.55 --> 31.73 (-0.16-True)
15. BEXP: 32.57 --> 32.57 (-0.00-True)
16. VRUS: 31.41 --> 31.41 (0.00-True)
17. REGN: 27.11 --> 30.33 (0.12-True)
18. MEAS: 26.54 --> 26.54 (0.00-True)
19. FSII: 28.27 --> 28.27 (0.00-True)
20. SBGI: 29.51 --> 34.05 (0.15-True)
21. EGHT: 29.18 --> 41.22 (0.41-True)
22. GIII: 17.81 --> 24.07 (0.35-True)
23. BBX: 34.86 --> 42.29 (0.21-True)
24. LULU: 25.04 --> 63.15 (1.52-True)
25. INTT: 27.06 --> 39.53 (0.46-True)
26. CTDH: 21.

## S&P benchmark

2019 31.10%  
2018 -4.41%  
2017 21.94%  
2016 11.93%  
2015 1.31%  
2014 13.81%  
2013 32.43%  
2012 15.88%  
2011 2.07%%  
2009 27.11%  