In [175]:
!python -V
# !pip install poetry && python -m poetry install --no-root


Python 3.8.3
Using version [1m^1.5.1[0m for [36mpandarallel[0m

[34mUpdating dependencies[0m
[2K[34mResolving dependencies...[0m [30;1m(1.3s)[0m[34mResolving dependencies...[0m [30;1m(0.8s)[0m

[34mWriting lock file[0m


Package operations: [34m2[0m installs, [34m0[0m updates, [34m0[0m removals

  - Installing [36mdill[0m ([1m0.3.2[0m)
  - Installing [36mpandarallel[0m ([1m1.5.1[0m)


# Stock market price prediction
## What is the stock market
The stock market is a place where people and companies can buy and sell shares of companies and commodities.
People go to the stock market in the hopes of investing money in the place which will give them their best ROI(return on investment)
## How the stock market works
In the market, each item has a price, which is decided according to the demand and supply.
If there are more people who want to buy a share of AAPL(Apple Inc.), the price will go up, otherwise, the price will go down.
When one wants to buy (buy order), he hopes that the price will go up in the future, where he could sell the stock and  profit on the difference.
If Alice wants to sell AAPL share, lets say at 40\\$ minimum, she can only sell if there is a buyer, let's say bob, who agrees to buy at 40\\$ or more.
This can only be achieved if Alice believe that AAPL is **overpriced** and Bob believe the AAPL is **underpriced**.

## What is the problem?
If one could know for sure, at all times, if a share is underpriced or overpriced, one could always profit in the stock market. 
The stock price is aim to reflect the value of the company - if APPL has 4 million shared, 100 \\$ each, 
then AAPL is worth 400 \\$ million. 
The value of the stock is then thorised to be the "the wisdom of the crowd", and is composed of the aggregated knowledge of all the shareholders and investors.
These knowledge might contain hidden(inside knowledge) and public(cash flow, debts, yearly profits) parameters which can affect the value of the company.
One company can also be influenced by another, if Sumsung price go up because of a new phone release, Apple price might go down.
Knowing and taking all these parameters into account is not an easy task.

## Solutions
Naturally, when a lot of money is involved, A lot of people are trying to make sense of the stock market, and developed many ways to try and gain a little knowledge about the stock value before all the other investors do. 
* **tehcincal indicators** - creating indicators that should give some hints at where the stock is going to go(moving averages, high volumes thresholds, etc)
* **technical analysis** - trying to find some patterns that are said to be correlative with up/down movement in the price.
* **fundamental anaylsis** - analysing the companies quartly reports, trying to make sense of the profits, debts, etc.
* **Social media** - analysis social media sentiment about the company(tweeter, facebook, news reports).





# Contents
1. [Data set](#Data-set)
  1. [Validation split]
  1. [Close price and volume](#Close-price-and-volume)
  2. [Close-Volume correlation](#Close-Volume-correlation)
  3. [Profits stats](#Profits-stats)
  4. [Summary](#Data-Summary)
2. [Pre processing](#PreProcessing)
  1. [Scaling the data](#Scaling-the-data)
  2. [Stationary time series](#Stationary-time-series)
  3. [Summary](#Pre-Process-Summary)


# Data set 
The data is provided by https://www.kaggle.com/dgawlik/nyse?select=prices-split-adjusted.csv 
Dataset consists of following files:

1. **prices.csv**: raw, as-is daily prices. Most of data spans from 2010 to the end 2016, for companies new on stock market date range is shorter. There have been approx. 140 stock splits in that time, this set doesn't account for that.
2. **prices-split-adjusted.csv**: same as prices, but there have been added adjustments for splits.
3. **securities.csv**: general description of each company with division on sectors
4. **fundamentals.csv**: metrics extracted from annual SEC 10K fillings (2012-2016), should be enough to derive most of popular fundamental indicators.

In [2]:
import pandas as pd
import plotly.express as px 
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from plotly.offline import init_notebook_mode
from IPython.display import display
import seaborn as sns
import numpy as np
init_notebook_mode(connected=True)

In [3]:
prices = pd.read_csv("./prices-split-adjusted.csv", parse_dates=["date"],) 
prices.head()

Unnamed: 0,date,symbol,open,close,low,high,volume
0,2016-01-05,WLTW,123.43,125.839996,122.309998,126.25,2163600.0
1,2016-01-06,WLTW,125.239998,119.980003,119.940002,125.540001,2386400.0
2,2016-01-07,WLTW,116.379997,114.949997,114.93,119.739998,2489500.0
3,2016-01-08,WLTW,115.480003,116.620003,113.5,117.440002,2006300.0
4,2016-01-11,WLTW,117.010002,114.970001,114.089996,117.330002,1408600.0


We see that we have multiple columsn here
* date - single trading day
* symbol - ticker - the shortened name of the company
* open - the open(first) price value of the day
* close - the close(last) price value of the day
* low - the lowest price value of the day
* high - the highest price value of the day
* volume - number of transactions in a day. 

Let's focus on a single symbol = AAPL

In [4]:
df = prices[prices['symbol']=="AAPL"].drop(columns="symbol")
# Lets see how that looks now
df.head()

Unnamed: 0,date,open,close,low,high,volume
254,2010-01-04,30.49,30.572857,30.34,30.642857,123432400.0
721,2010-01-05,30.657143,30.625713,30.464285,30.798571,150476200.0
1189,2010-01-06,30.625713,30.138571,30.107143,30.747143,138040000.0
1657,2010-01-07,30.25,30.082857,29.864286,30.285715,119282800.0
2125,2010-01-08,30.042856,30.282858,29.865715,30.285715,111902700.0



We want to extract extra information from each ticket data.

We'll focus on the `close` price, since it's usually close to the `open`/`low`/`high` prices, and the volume. 


## Validation split
First things first - Let's take 15% of the data and set it aside for validation in the end.  
We don't want to jump into conclutions when we look at the future


In [24]:
prices = pd.read_csv("./prices-split-adjusted.csv", parse_dates=["date"],).sort_values("date")
_split_date = prices.date.unique()[int(len(prices.date.unique())*0.85)]

prices, validation = prices[prices.date<_split_date], prices[prices.date>=_split_date]
# basic utilities to get data for certain symbol and plot 
def prices_df(symbol):
    return prices[prices.symbol.str.match(symbol)]

def plot_close(df):
    return px.line(data_frame=df,
             y='close',
             line_group='symbol', color="symbol",
            hover_data=["open", "high","low", "volume"],
             x='date')

def plot_volume(df):
    return px.bar(df, x="date", y="volume", color="symbol", 
                  barmode='group')
    


## Close price and volume
We'll plot close price and volume for sample of symbols..
Volume is plotted as sum in quarters, in order to make the graph easier to understand

In [25]:
df = prices_df('GOOGL|AAPL|^FB$|MSFT')
plot_close(df).show()
df = df.set_index("date")

df = df.groupby([df.index.year.astype(str)+'_'+(df.index.month//4).astype(str), df.symbol]).sum()
plot_volume(df.reset_index())

## Close-Volume correlation
We can see that the volume tends to decrease through time, as the price goes up.
Let's see if the price is correlated with the volume


In [26]:
# Describe 
corr = pd.pivot_table(prices_df(".*").groupby("symbol").corr()['close'].reset_index(),
               values="close",
               columns=["level_1"], 
               index="symbol").volume
px.histogram(x=corr.index, y=corr)

We can see that for most of the stocks, close has negative correlation with price.
This might make sense since the more expensive the share there are less people who can afford to trade it.

### Close*Volume ratio.
We saw earlier that close and volume are opposite correlated.
it might tell us what is the the amount of money going transfered in a day.

In [27]:
df = prices_df("AAPL|GOOGL|FB$")
px.line(data_frame=df,x="date", y=df.close*df.volume, 
        line_group="symbol", color="symbol", title="Close*Volume value")


We can see that the Volume is very different between stocks, and is tending to decline.   
We would think the the volume*close price would tell us something about the value of the company,  
but we can see that it change in *Billions* of dollars in value. There are some "Anomalies",  
extremely high values of volume*close, and it might be a hint for something that is happening in the company.

## Cross company correlation
We want to see if some companies are correlated with each other.  
Does apple prices influence google price?  
We'll do that by checking the difference between each day to the previous one,  
and then finding the correlation between these values for different companies.

In [28]:
df = prices_df('GOOGL|AAPL|FB$|TWTR|MSFT|AMZN|JNJ|JPM')

close_diff = df.set_index(["symbol", "date"]).close.transform(
    lambda f: np.append(0, f.values[1:]-f.values[:-1])
)
corr_matrix = df.assign(close_diff=close_diff)[["symbol","date", "close_diff"]].pivot(index="date", columns="symbol", values="close_diff").corr()
go.Figure(go.Heatmap(z=corr_matrix, x=corr_matrix.keys(), y=corr_matrix.keys(),colorscale='Reds'))

We notice few strong (pearson's) correlations:
1. Amazon(`AMZN`) and Apple(`AAPL`) negative opposite correlation of -0.82.  
2. Facebook(`FB`) and Google(`GOOGL`) have negative correlation of -0.7.
3. Microsoft(`MSFT`) and Facebook(`FB`) have positive coorelation of 0.5.  
Seems that if we want to predict one stock value, we might need to use data from multiple companies.

## Profits stats
I want to check how much do stocks tend to go up, how many positive(value increased) days/month they have.

In [29]:
def get_stats(df):
    month_change = df.groupby(df.date.dt.to_period("M")).close.agg(lambda i: i.values[-1]-i.values[0])
    percent = lambda pred, total: round(100*len(total[pred])/len(total), 2)
    profit_per_share = df.close.iloc[-1] - df.close.iloc[0]
    return pd.DataFrame.from_records([{
        "trading_days": len(df),
        "positive_days": len(df[df.close > df.open]),
        "positive_days_perc": percent(df.close > df.open, df),
        "trading_month": len(month_change),
        "positive_month": len(month_change[month_change>0]),
        "positive_month_percent": percent(month_change>0, month_change),
        "profit_100_shares": 100*profit_per_share,
        "profit_2000_usd": (2000//df.close.iloc[0])*profit_per_share
    }])

stats = prices_df(".*").groupby("symbol").apply(get_stats)

display(stats)
display(stats.describe())

Unnamed: 0_level_0,Unnamed: 1_level_0,trading_days,positive_days,positive_days_perc,trading_month,positive_month,positive_month_percent,profit_100_shares,profit_2000_usd
symbol,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
A,0,1497,776,51.84,72,38,52.78,1780.087096,1584.277515
AAL,0,1497,712,47.56,72,40,55.56,3732.000000,15637.080000
AAP,0,1497,768,51.30,72,45,62.50,10847.000500,5315.030245
AAPL,0,1497,742,49.57,72,45,62.50,8260.714314,5369.464304
ABBV,0,743,407,54.78,36,20,55.56,1892.000200,1059.520112
...,...,...,...,...,...,...,...,...,...
YHOO,0,1497,705,47.09,72,38,52.78,1581.000000,1833.960000
YUM,0,1497,757,50.57,72,41,56.94,2603.163192,2056.498922
ZBH,0,1497,789,52.71,72,41,56.94,3893.999900,1285.019967
ZION,0,1497,762,50.90,72,40,55.56,1403.000100,2104.500150


Unnamed: 0,trading_days,positive_days,positive_days_perc,trading_month,positive_month,positive_month_percent,profit_100_shares,profit_2000_usd
count,499.0,499.0,499.0,499.0,499.0,499.0,499.0,499.0
mean,1440.184369,737.228457,51.108337,69.300601,39.623246,56.785611,4238.376018,2514.604109
std,235.416665,124.424522,1.958371,11.191731,7.747414,6.7422,7987.567605,3511.800733
min,19.0,10.0,39.82,2.0,0.0,0.0,-8020.0009,-1752.270078
25%,1497.0,747.5,50.1,72.0,38.0,52.78,1072.99995,672.799918
50%,1497.0,765.0,51.24,72.0,41.0,56.94,2560.633087,1674.619854
75%,1497.0,783.0,52.37,72.0,44.0,61.11,5118.358505,3259.535117
max,1497.0,836.0,55.85,72.0,52.0,72.22,106136.9949,41300.280891


## Stats points
We can see that there is a big difference between different stocks.
1. 51% percent of the days, for more than 50% of the stocks, are positive.
2. 56% percent of the months, for more than 50% of the stocks are positive.
3. If one would invest 2K\\$ in 2010, by 2016, he would have, in average, made 2.8K\\$ profit.
    3.1.  However, if we choose poorly, we could either lose money(minimum of 1.6K\\$) or have huge profit of up to 30K\\$ !

## Data-Summary
1. Prices and volumes are very different between stocks.
2. Most of the time, most of the stocks, increase in value.
3. Volume has **opposite** correlation with price. when volume goes up, price goes down.
4. Volume*price ratio 
5. Every 3 monthes(quarter) the companies report about their buisness. usually on these days the price is more volatile.   
6. Some companies are correlated with each other.


# Pre Processing
Each stock looks different from the other stocks. If we want to make a unified model, we want to scale the values to a similar range.
In order to do that, we will start with minmax scaling for each symbol. It's the simplest transformation that doesn't change the way the data looks.

1. scale within each symbol using minmax scaling.
2. replace the date field with "days" = days since the first data point (global)


**note**: This is only preprocessing for initial research, when trainig a model we need to fit everything only to the test set.

In [30]:
from sklearn.preprocessing import minmax_scale
from datetime import timedelta
import numpy as np


## Scaling the data

In [31]:
def minmax(df):
    """MinMax scaling the dataframe"""
    df = df.set_index(["symbol", "date"])
    
    # Scale by symbol
    df = df.groupby("symbol").transform(minmax_scale)
    df = df.reset_index()
    return df

def reset_days(df):
    """Convert date to day since start. easier to use than with actual dates."""
    days_since_start = ((df.date - df.date.min()).astype("timedelta64[s]")/(60*60*24)).astype(int)
    df = df.assign(day=days_since_start).drop(columns="date")
    df = df.sort_values("day")
    return df

pre_processed_1 = reset_days(minmax(prices))

def stock_df(symbol):
    return pre_processed_1[pre_processed_1.symbol.str.match(symbol)]

# plot_close and plot_volume, since we are not using the raw data anymore.
# Overriding
def plot_close(df):
    return px.line(data_frame=df,
             y='close',
             line_group='symbol', color="symbol",
            hover_data=["open", "high","low", "volume"],
             x='day', title="Close price")

# Overriding
def plot_volume(df, days=90):
    df = df.drop(columns="day").groupby([df.day//90, "symbol"]).sum().reset_index()
    return px.bar(df, x="day", y="volume", color="symbol", 
                  barmode='group', title="Volume")

Plot the close prices and Volume to see that prices are in the same range.

In [32]:
df = stock_df("AAPL|GOOGL|FB$")
plot_close(df).show()
plot_volume(df).show()

We can see that the values are much more similar between the different companies.

## Stationary time series
Naturally, stock data has both **trend** (company value increases/decreases) and **seasonality** (Apple release a new IPhone every year).  
We will have to normalize them both in order to continue.  
The point of doing that is to make the data "Stationary".
Steps:
1. Detrend the prices per company (using order of 1, 2)
2. Try to deseasonalize with:
  1. Weekly period
  2. 90 days period(quarter)
  3. yearly period


Terms and defenitions taken from this links   
* [Stationarity in time series analysis](https://towardsdatascience.com/stationarity-in-time-series-analysis-90c94f27322)   
* [Trend, Seasonality, Moving Average, Auto Regressive Model : My Journey to Time Series Data with Interactive Code](https://towardsdatascience.com/trend-seasonality-moving-average-auto-regressive-model-my-journey-to-time-series-data-with-edc4c0c8284b)

In [33]:
from statsmodels.tsa.tsatools import detrend
from statsmodels.tsa.seasonal import seasonal_decompose

In [49]:
def remove_trend_and_seasonality(df):
    if len(df)<180:
        return pd.DataFrame()
    day = df.reset_index().day
    df = df.reset_index(drop=True)
    data = {}
    for col in df.keys():
        decomp = seasonal_decompose(df[col], period=90, two_sided=True, extrapolate_trend=1)
        data[f"{col}_trend"] = decomp.trend
        data[f"{col}_seasonal"] = decomp.seasonal
        data[col] = decomp.resid  # Residual will override the original column
        data["day"] = day[-len(decomp.resid):]  # Mocking "day"
    return pd.DataFrame.from_dict(data)

pre_processed_2 = (pre_processed_1
                   .set_index(["symbol", "day"])
                   .groupby("symbol")
                   .apply(remove_trend_and_seasonality)
                   .reset_index(1, drop=True)  # Remove "day" index
                   .reset_index())

# Override 
def stock_df(symbol):
    return pre_processed_2[pre_processed_2.symbol.str.match(symbol)]

# Overriding
def plot_close(df, resid_only=True):
    if not resid_only:
        df = df.assign(close = df.close_trend+df.close_seasonal+df.close)
        
    return px.line(data_frame=df,
                   y='close',
                   line_group='symbol', color="symbol",
                   hover_data=["open", "high","low", "volume"],
                   x='day', title="Close price")

# Overriding
def plot_volume(df, days=90, resid_only=True):
    if not resid_only:
        df = df.assign(close = df.close_trend + df.close_seasonal + df.close)
        
    df = df.drop(columns="day").groupby([df.day // days, "symbol"]).sum().reset_index()
    return px.bar(df, x="day", y="volume", color="symbol", 
                  barmode='group', title="Volume")



In [35]:

df = stock_df("AAPL|GOOGL|FB$|MSFT")
fig = make_subplots(2,2, column_titles=["residual only", "original"], row_titles=["close price", "volume"])
fig.add_traces(plot_close(df).data, 1,1)
fig.add_traces(plot_close(df, resid_only=False).data, 1, 2,)
fig.add_traces(plot_volume(df).data, 2, 1)
fig.add_traces(plot_volume(df, resid_only=False).data, 2, 2)
fig


## Pre Process Summary
We can see we achieved what we needed.
1. Our data is in the same range cross-symbols
2. we have inversable transformation that removes seasonality and trend


# Base model
Lets define the task better.
We want to take the last few days (const) and predict the price for the next **1** day.  
We should define a base line (to measure our improvement) and a loss function.  
I chose these arbitrary
1. Base line - "*tomorrow will be the same as today*", Nothing changes.   
2. loss function - mase [mean absolute scaled error](#https://en.wikipedia.org/wiki/Mean_absolute_scaled_error)  
3. metrics
    1. mae - [Mean absolute error](#https://en.wikipedia.org/wiki/Mean_absolute_error)
    2. mse - [Mean squared error](#https://en.wikipedia.org/wiki/Mean_squared_error)
    3. mape - [mean absolute percentage error](#https://en.wikipedia.org/wiki/Mean_absolute_percentage_error)

In [75]:
import sktime  # Scikit learn like package designed for time series analysis.
from sktime.forecasting.naive import NaiveForecaster
from tensorflow.keras.metrics import mape, mae, mse
# from sktime. import  

In [108]:
def train_test_split(df, split=0.8, col="day"):
    """Temporal train test split by column value. 
    Not using default existing version because they dont look at the value of a specific column.
    """
    split_val = df[col].unique()[int(len(df[col].unique())*split)]
    return df[df[col]<split_val], df[df[col]>=split_val]
    
def predict_naive(X):
    # I.E for each value, predicting the current value
    return X[:-1]

train, test = train_test_split(stock_df("GOOGL").reset_index(drop=True))
y_pred = predict_naive(test.close)
y_true = test.close[1:]

metrics = [mape, mae, mse]
go.Figure([
    go.Scatter(x=train.index, y=train.close),
    go.Scatter(x=test.index, y=test.close),
    go.Scatter(x=y_pred.index-1, y=y_pred)
]).show()
{mf.__name__: mf(y_true, y_pred).numpy() for mf in metrics}

{'mean_absolute_percentage_error': 131.37082644986677,
 'mean_absolute_error': 0.01192722746173797,
 'mean_squared_error': 0.00032011503907438966}

We can see that mean in terms of `mae` and `mse` we have pretty good performance! of course it doesn't help us so much.  
`mape` looks like a better idea in this case

# XGBoost model

In [193]:
from pandarallel import pandarallel
pandarallel.initialize(progress_bar=False)
from concurrent.futures import ThreadPoolExecutor

INFO: Pandarallel will run on 12 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.


In [196]:

def ravel_window(df, window=90):
    if len(df)!= window:
        return pd.DataFrame()
    df = df.reset_index(drop=True)
    columns = [f"{c}_{i}" for i in range(window-1) for c in df.keys()]
    y_columns = [f"y_{c}" for c in df.keys()]
    return pd.DataFrame([df.to_numpy().ravel()], columns=columns+y_columns)

window = 90
def windowed_df(df):
    with ThreadPoolExecutor(10) as tp:
        return pd.concat(tp.map(ravel_window, df.reset_index(drop=True).rolling(window)))
        

windowed_df(stock_df("GOOGL"))

#     data.append(ravel_window(i.drop(columns=["day", "symbol"]), len(i)))
    
    


Unnamed: 0,symbol_0,open_trend_0,open_seasonal_0,open_0,day_0,close_trend_0,close_seasonal_0,close_0,low_trend_0,low_seasonal_0,...,y_close,y_low_trend,y_low_seasonal,y_low,y_high_trend,y_high_seasonal,y_high,y_volume_trend,y_volume_seasonal,y_volume
0,GOOGL,0.152780,-0.009364,0.023431,0.0,0.154936,-0.011974,0.025790,0.155450,-0.010440,...,0.004043,0.069342,-0.013841,0.005670,0.070665,-0.014171,0.004579,0.216892,0.004762,0.019679
0,GOOGL,0.151660,-0.014061,0.029450,1.0,0.153794,-0.012259,0.024775,0.154261,-0.012488,...,0.007957,0.068340,-0.010440,0.010760,0.069678,-0.010431,0.010545,0.218253,0.001610,-0.014895
0,GOOGL,0.150541,-0.013056,0.028398,2.0,0.152652,-0.016125,0.015862,0.153072,-0.016119,...,0.006309,0.067255,-0.012488,0.001260,0.068647,-0.013733,0.005240,0.217760,-0.020605,0.062446
0,GOOGL,0.149421,-0.015622,0.017525,3.0,0.151509,-0.021020,0.009367,0.151884,-0.019413,...,0.011555,0.066175,-0.016119,0.007851,0.067646,-0.014974,0.005180,0.217894,-0.010823,-0.038884
0,GOOGL,0.148301,-0.021247,0.008880,4.0,0.150367,-0.022536,0.019035,0.150695,-0.022002,...,0.008841,0.065294,-0.019413,0.010880,0.066755,-0.017216,0.010597,0.217133,-0.007060,-0.039696
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
0,GOOGL,0.708231,0.014811,0.063925,2034.0,0.710054,0.019273,0.047362,0.708688,0.018608,...,0.035817,0.925158,0.015005,0.042479,0.931214,0.013391,0.034317,0.060541,-0.016640,-0.005504
0,GOOGL,0.710430,0.015175,0.048921,2037.0,0.712450,0.013297,0.063710,0.710904,0.014789,...,0.033777,0.927033,0.018608,0.029493,0.933230,0.016759,0.024822,0.060746,-0.002853,-0.029089
0,GOOGL,0.712834,0.017316,0.056182,2038.0,0.714877,0.019579,0.048917,0.713214,0.018192,...,0.015560,0.928908,0.014789,0.012649,0.935247,0.014602,0.020485,0.060951,0.054747,-0.060130
0,GOOGL,0.715295,0.015936,0.057592,2039.0,0.717174,0.017053,0.070384,0.715423,0.018585,...,0.002908,0.930784,0.018192,0.012715,0.937263,0.018158,-0.002615,0.061157,0.051238,-0.081427


array([0.15278001, 0.15166037, 0.15054072, 0.14942108, 0.14830144,
       0.1471818 , 0.14606216, 0.14494251, 0.14382287, 0.14270323,
       0.14158359, 0.14046395, 0.13934431, 0.13822466, 0.13710502,
       0.13598538, 0.13486574, 0.1337461 , 0.13262645, 0.13150681,
       0.13038717, 0.12926753, 0.12814789, 0.12702825, 0.1259086 ,
       0.12478896, 0.12366932, 0.12254968, 0.12143004, 0.1203104 ,
       0.11919075, 0.11807111, 0.11695147, 0.11583183, 0.11471219,
       0.11359254, 0.1124729 , 0.11135326, 0.11023362, 0.10911398,
       0.10799434, 0.10687469, 0.10575505, 0.10463541, 0.10351577,
       0.10239613, 0.10127648, 0.10011444, 0.09904089, 0.09808202,
       0.09702493, 0.09580643, 0.09470403, 0.09366473, 0.09254921,
       0.09152952, 0.09059841, 0.0896341 , 0.08874555, 0.0881102 ,
       0.08766929, 0.08727393, 0.08678821, 0.08625797, 0.08573957,
       0.08521533, 0.08478955, 0.0843566 , 0.08393329, 0.08363217,
       0.08330747, 0.08295762, 0.08257636, 0.08213658, 0.08162

TypeError: must be real number, not NoneType

In [None]:
# TODO: 
# Look at the fundementals data
# Use deep learning and (RNN, LSTMs)
# Find anomalies in data
# Find connections between different stocks and use that data
# Reinforcment learning - automatic strategy.