# Context Search

It is possible that the problem of predicting 10 Baggers cannot be solved by more advanced machine learning. 

If a person had travelled in time from the future, this problem would be solved by asking him or her. But supposed that person were not interested in the stock market and we cannot ask him directly "Which stocks have appreciated in price more than 10 times from date X1 to date X2?" What, then, are the general questions we should ask him which would increase the probability of predicting 10 Baggers?  

"In the future, do people still drive gas-powered cars or electric cars?"  

"In the future, are there a lot of robots or flying drones? What kind of purposes do they serve?"  

"In the future, is cancer (or Alzheimer) a treatable disease?"  

and so on....

In [1]:
import quandl  # Access to Sharadar Core US Equities Bundle
api_key = '7B87ndLPJbCDzpNHosH3'

import math
import platform
import matplotlib
import matplotlib.pyplot as plt
from pylab import rcParams
import numpy as np
from sklearn import linear_model  # package for logistic regression (not using GPU)
import torch
import pandas as pd
from IPython.display import display
import time

from utils import *

from datetime import date, datetime, time, timedelta


print("Python version: ", platform.python_version())
print("Pytorch version: {}".format(torch.__version__))

Python version:  3.6.6
Pytorch version: 1.1.0


In [8]:
stocks = pd.read_csv("stock_tickers_sorted.cvs")

display(stocks['industry'].describe())
display(stocks['industry'].unique())

count             10207
unique              178
top       Biotechnology
freq                828
Name: industry, dtype: object

array(['Diagnostics & Research', 'Aluminum', 'Biotechnology',
       'Medical Care', 'Asset Management',
       'Education & Training Services', 'Conglomerates', 'None',
       'Airlines', 'Aerospace & Defense', 'Airports & Air Services',
       'Insurance - Life', 'Rental & Leasing Services',
       'Banks - Regional', 'Semiconductors',
       'Building Products & Equipment', 'Specialty Retail', 'Leisure',
       'Consumer Electronics', 'REIT - Diversified', 'Gold',
       'Oil & Gas E&P', 'Electrical Equipment & Parts',
       'Specialty Industrial Machinery', 'Drug Manufacturers - General',
       'Medical Distribution', 'Health Information Services',
       'Specialty Chemicals', 'Beverages - Brewers',
       'Auto & Truck Dealerships', 'Engineering & Construction',
       'Uranium', 'Communication Equipment', 'Industrial Distribution',
       'Insurance - Specialty', 'Specialty Business Services',
       'Medical Devices', 'REIT - Mortgage', 'Telecom Services',
       'Capital Mar

In [44]:
industries = stocks['industry'].value_counts()
sectors = industries.to_frame(name="num")

counts = []
ratios = []

for industry, num in industries.items():
    
    num_10baggers = len(stocks[(stocks['industry']==industry) & (stocks['10bagger']==1)].index)
    
    counts.append(num_10baggers)
    ratios.append(num_10baggers/num)

    print("{}:  {} tickers  {} 10baggers ({:.3f})".format(industry, num, num_10baggers, num_10baggers/num))


Biotechnology:  828 tickers  43 10baggers (0.052)
Banks - Regional:  793 tickers  12 10baggers (0.015)
None:  480 tickers  8 10baggers (0.017)
Software - Application:  411 tickers  30 10baggers (0.073)
Oil & Gas E&P:  279 tickers  4 10baggers (0.014)
Medical Devices:  240 tickers  13 10baggers (0.054)
Communication Equipment:  199 tickers  9 10baggers (0.045)
Semiconductors:  175 tickers  21 10baggers (0.120)
Asset Management:  169 tickers  3 10baggers (0.018)
Oil & Gas Midstream:  169 tickers  2 10baggers (0.012)
Telecom Services:  161 tickers  6 10baggers (0.037)
REIT - Mortgage:  161 tickers  4 10baggers (0.025)
Conglomerates:  148 tickers  0 10baggers (0.000)
Specialty Business Services:  136 tickers  7 10baggers (0.051)
Diagnostics & Research:  136 tickers  9 10baggers (0.066)
Information Technology Services:  132 tickers  8 10baggers (0.061)
Specialty Industrial Machinery:  127 tickers  11 10baggers (0.087)
Utilities - Regulated Electric:  123 tickers  0 10baggers (0.000)
Insuran

Broadcasting - TV:  5 tickers  0 10baggers (0.000)
Computer Systems:  5 tickers  1 10baggers (0.200)
Media - Diversified:  5 tickers  0 10baggers (0.000)
Pay TV:  4 tickers  0 10baggers (0.000)
Copper:  4 tickers  0 10baggers (0.000)
Semiconductor Memory:  3 tickers  0 10baggers (0.000)
Apparel Stores:  3 tickers  0 10baggers (0.000)
Financial Conglomerates:  3 tickers  0 10baggers (0.000)
Business Equipment:  3 tickers  0 10baggers (0.000)
Long-Term Care Facilities:  3 tickers  0 10baggers (0.000)
Broadcasting - Radio:  2 tickers  0 10baggers (0.000)
Farm & Construction Equipment:  2 tickers  0 10baggers (0.000)
Home Improvement Stores:  2 tickers  0 10baggers (0.000)
Health Care Plans:  2 tickers  1 10baggers (0.500)
Staffing & Outsourcing Services:  2 tickers  0 10baggers (0.000)
Marketing Services:  2 tickers  0 10baggers (0.000)
Home Furnishings & Fixtures:  2 tickers  0 10baggers (0.000)
Data Storage:  2 tickers  0 10baggers (0.000)
Beverages - Soft Drinks:  1 tickers  0 10bagger

In [34]:
stocks[(stocks['industry']=='Electronic Gaming & Multimedia') & (stocks['10bagger']==1)]

Unnamed: 0,ticker,appreciation,10bagger,table,permaticker,name,exchange,isdelisted,category,cusips,...,currency,location,lastupdated,firstadded,firstpricedate,lastpricedate,firstquarter,lastquarter,secfilings,companysite
3981,GLUU,23.782609,True,SEP,194438,Glu Mobile Inc,NASDAQ,N,Domestic,379890106,...,USD,California; U.S.A,2020-01-14,2014-10-28,2007-03-22,2020-01-14,2005-12-31,2019-09-30,https://www.sec.gov/cgi-bin/browse-edgar?actio...,http://www.glu.com
4114,GRVY,23.344203,True,SEP,189912,GRAVITY Co Ltd,NASDAQ,N,ADR,38911N206 38911N107,...,USD,Republic Of Korea,2020-01-14,2016-02-03,2005-02-08,2020-01-14,2002-12-31,2018-12-31,https://www.sec.gov/cgi-bin/browse-edgar?actio...,http://www.gravity.co.kr
9255,TTWO,16.702655,True,SEP,197239,Take Two Interactive Software Inc,NASDAQ,N,Domestic,874054109,...,USD,New York; U.S.A,2020-01-14,2015-03-20,1997-04-15,2020-01-14,1996-12-31,2019-09-30,https://www.sec.gov/cgi-bin/browse-edgar?actio...,http://www.take2games.com


In [47]:
sectors['num_10baggers'] = counts
sectors['ratio'] = ratios
sectors.index.name = 'industry'

sectors

Unnamed: 0_level_0,num,num_10baggers,ratio
industry,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Biotechnology,828,43,0.051932
Banks - Regional,793,12,0.015132
,480,8,0.016667
Software - Application,411,30,0.072993
Oil & Gas E&P,279,4,0.014337
...,...,...,...
Integrated Shipping & Logistics,1,0,0.000000
Financial Exchanges,1,0,0.000000
Infrastructure Operations,1,0,0.000000
Banks - Regional - Latin America,1,0,0.000000


In [50]:
sectors.to_csv(r'sectors.csv', header=True)