# Now We Will Start On The Code

We'll start by getting code to install Python packages on Google Collab.

We'll be using the following libraries:

*    pandas 
*    numpy 
*    ta
*    yfinance
*    plotly

In [1]:
# Code to do library installations
import sys # Import the library that does system activities (like install other packages)

Next we'll have code that you, as a user, will input. 

You'll submit what company ticker you're interested in, and the start and end dates of interest. 

You can tind the ticker symbols for a lot of companies [here](http://www.eoddata.com/symbols.aspx?AspxAutoDetectCookieSupport=1).

When selected a date range, keep in mind COVID-19 has had an insane effect on the market. Stocks are trading very irreguarly and differently than the did, pre-COVID. If you traing on historical data before COVID, and try to use it after COVID, your model might not be that effective. 

In [2]:
# Choose your ticker
tickerSymbol = "AMZN"

# Choose date range - format should be 'YYYY-MM-DD' 
startDate = '2015-04-01' # as strings
endDate = '2020-01-01' # as strings

Next, we'll go ahead and install that *yfinance* Python library. As a reminder, this is how we'll get stock price information from the Yahoo! Finance website. 

This will be what we use to go and get the stock data for that ticker.

In [3]:
# Check if local computer has the library yfinance. If not, install. Then Import it.
!{sys.executable} -m pip install yfinance # Check if the machine has yfinance, if not, download yfinance
import yfinance as yf # Import library to access Yahoo finance stock data

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting yfinance
  Downloading yfinance-0.1.74-py2.py3-none-any.whl (27 kB)
Collecting requests>=2.26
  Downloading requests-2.28.1-py3-none-any.whl (62 kB)
[K     |████████████████████████████████| 62 kB 1.3 MB/s 
Installing collected packages: requests, yfinance
  Attempting uninstall: requests
    Found existing installation: requests 2.23.0
    Uninstalling requests-2.23.0:
      Successfully uninstalled requests-2.23.0
Successfully installed requests-2.28.1 yfinance-0.1.74


Now that *yfinance* is imported, let's go ahead and get the stock data using the *yfinance* package.

We'll print out a preview of what the data looks like once it is complete. 

In [4]:
# Create ticker yfinance object
tickerData = yf.Ticker(tickerSymbol)

# Create historic data dataframe and fetch the data for the dates given. 
df = tickerData.history(start = startDate, end = endDate)

# Print statement showing the download is done

# Show what the first 5 rows of the data frame
# Note the dataframe has:
#   - Date (YYY-MM-DD) as an index
#   - Open (price the stock started as)
#   - High (highest price stock reached that day)
#   - Low (lowest price stock reached that day)
#   - Close (price the stock ended the day as)
#   - Volume (how many shares were traded that day)
#   - Dividends (any earnings shared to shareholders)
#   - Stock Splits (any stock price changes)

print('-----------------------')
print('Done!')
print(df.head())

-----------------------
Done!
                 Open       High        Low      Close    Volume  Dividends  \
Date                                                                          
2015-04-01  18.605000  18.658001  18.417000  18.513000  49162000          0   
2015-04-02  18.525000  18.664000  18.450001  18.612499  37506000          0   
2015-04-06  18.504999  19.010000  18.468000  18.851999  61014000          0   
2015-04-07  18.807501  18.965500  18.701500  18.720501  39098000          0   
2015-04-08  18.733000  19.079000  18.732500  19.059999  52728000          0   

            Stock Splits  
Date                      
2015-04-01             0  
2015-04-02             0  
2015-04-06             0  
2015-04-07             0  
2015-04-08             0  


Let's get another useful library imported, [pandas](https://pandas.pydata.org/). 

*pandas* is the best way to manipulate dataframe objects.

[*numpy*](https://numpy.org/) is also helpful dealing with data structures. 

In [5]:
# Import the library that does dataframe management
import pandas as pd # Library that manages dataframes
import numpy as np

The date is just a string right now, but Python is smart and can realize it is a date if we help it out. These date variable types are easier to work with and efficient. 

Let's change the date time from a string to a date type. 

In [12]:
# Change the date column to a pandas date time column 

# Define string format
date_change = '%Y-%m-%d'

# Create a new date column from the index
df['Date'] = df.index

# Perform the date type change
df['Date'] = pd.to_datetime(df['Date'], format = date_change)

# Create a variable that is the date column
Dates = df['Date']

We know the "Open", "High", "Low", "Close", "Volume" are useful, but there is more data that can be derived off of this data. 

Financial Technical Indicators are useful to understand what is going on with a particular stock. 

We will create some of these with help from a package called *ta* standing for technical anlysis. 

First, we'll have to install and then import the package.

In [13]:
# Add financial information and indicators 
!{sys.executable} -m pip install ta # Download ta
from ta import add_all_ta_features # Library that does financial technical analysis 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Now that the package is imported, let's add these technical indicators to our dataframe.

We'll print out each column name of our dataframe to see what new columns we gained. 

In [14]:
# Add all technical analysis to the dataframe we've already loaded
df = add_all_ta_features(df, "Open", "High", "Low", "Close", "Volume", fillna=True) 

print(df.columns)

  dip[idx] = 100 * (self._dip[idx] / value)
  din[idx] = 100 * (self._din[idx] / value)


Index(['Open', 'High', 'Low', 'Close', 'Volume', 'Dividends', 'Stock Splits',
       'Date', 'volume_adi', 'volume_obv', 'volume_cmf', 'volume_fi',
       'volume_em', 'volume_sma_em', 'volume_vpt', 'volume_vwap', 'volume_mfi',
       'volume_nvi', 'volatility_bbm', 'volatility_bbh', 'volatility_bbl',
       'volatility_bbw', 'volatility_bbp', 'volatility_bbhi',
       'volatility_bbli', 'volatility_kcc', 'volatility_kch', 'volatility_kcl',
       'volatility_kcw', 'volatility_kcp', 'volatility_kchi',
       'volatility_kcli', 'volatility_dcl', 'volatility_dch', 'volatility_dcm',
       'volatility_dcw', 'volatility_dcp', 'volatility_atr', 'volatility_ui',
       'trend_macd', 'trend_macd_signal', 'trend_macd_diff', 'trend_sma_fast',
       'trend_sma_slow', 'trend_ema_fast', 'trend_ema_slow',
       'trend_vortex_ind_pos', 'trend_vortex_ind_neg', 'trend_vortex_ind_diff',
       'trend_trix', 'trend_mass_index', 'trend_dpo', 'trend_kst',
       'trend_kst_sig', 'trend_kst_diff', 'trend

Yay! Now, we've added the techincal indicators!

You can learn and understand what all these new values are on the documentation of the *ta* site. They have a [dictionary](https://technical-analysis-library-in-python.readthedocs.io/en/latest/ta.html) that exmplains what these indicators are, and what they mean. 

Now that we have the technical indicators and dates sorted out, let's add some date features that will show what month it is, what day of the dear it is, what day in the quarter it is, ect. 

To do that, we will use a Python package called *fastai*. 

Let's install the package.

In [15]:
# Install fastai to use the date function
!{sys.executable} -m pip install fastai # Download fastai 
import fastai.tabular # Library that does date factors

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


After it is imported, let's add the new date features. 

In [17]:
from fastai.tabular.all import *
# Define the date parts 
fastai.tabular.add_datepart(df,'Date', drop = 'True')

# Ensure the correct format
df['Date'] = pd.to_datetime(df.index.values, format = date_change)

# Add the date parts
fastai.tabular.add_cyclic_datepart(df, 'Date', drop = 'True')

AttributeError: ignored

### Data Pulling, Complete!

We've now pulled all the data we need. We'll start creating our model now! 

Let's start by defining how far out we want to do our predictions. 

I'm interested in 1 day out, 5 days out, and 10 days out so I'll add those to my *shifts* list. 

We'll also define how much of our data we want to use to train and how much we will use to evaluate the model. 75% is a good start. 




In [None]:
# Define key model parameters

# Set days out to predict 
shifts = [1,5,10]

# Set a training percentage
train_pct = .75

# Plotting dimensions
w = 16 # width
h = 4 # height 

### Defining Functions

Next, we'll define some functions to do some tasks for us.

The first one is boring and tedious, but the packages we used were a little lazy on what variable types they used. The following function just goes through and makes sure the right columns are numbers (floats) and the right columns are categories (like strings). 

In [None]:
# Ensure column types are correct

def CorrectColumnTypes(df):
  # Input: dataframe 
  # ouptut: dataframe (with column types changed)

  # Numbers
  for col in df.columns[1:80]:
      df[col] = df[col].astype('float')

  for col in df.columns[-10:]:
      df[col] = df[col].astype('float')

  # Categories 
  for col in df.columns[80:-10]:
      df[col] = df[col].astype('category')

  return df 

In order to do the days in the future, we have to move our closing costs by that number of days.

We'll write a function that does that for us. 

In [None]:
# Create the lags 
def CreateLags(df,lag_size):
  # inputs: dataframe , size of the lag (int)
  # ouptut: dataframe ( with extra lag column), shift size (int)

  # add lag
  shiftdays = lag_size
  shift = -shiftdays
  df['Close_lag'] = df['Close'].shift(shift)
  return df, shift


Finally, we'll actually divide the historic data into the test and train sets.

We'll split up the x's and the y as well for this.

We'll end up with a test and training set for the *x*'s and the *y*. 

In [None]:
# Split the testing and training data 
def SplitData(df, train_pct, shift):
  # inputs: dataframe , training_pct (float between 0 and 1), size of the lag (int)
  # ouptut: x train dataframe, y train data frame, x test dataframe, y test dataframe, train data frame, test dataframe

  train_pt = int(len(df)*train_pct)
  
  train = df.iloc[:train_pt,:]
  test = df.iloc[train_pt:,:]
  
  x_train = train.iloc[:shift,1:-1]
  y_train = train['Close_lag'][:shift]
  x_test = test.iloc[:shift,1:-1]
  y_test = test['Close'][:shift]

  return x_train, y_train, x_test, y_test, train, test



The best way to understand how good our predictions are is to actually *see* and *compare*. We'll do this by making a time series visualization.This visual will compare the actual versus the predicted over time. 

The best visualization package for Python is [plotly](https://plotly.com/). 

We'll start by installing it. 

In [None]:
!{sys.executable} -m pip install plotly # Download plotly 
import plotly.graph_objs as go  # Import the graph ojbects 

Now we'll make a function that greats a sweet graph for us

In [None]:
# Function to make the plots
def PlotModelResults_Plotly(train, test, pred, ticker, w, h, shift_days,name):
  # inputs: train dataframe, test dataframe, predicted value (list), ticker ('string'), width (int), height (int), shift size (int), name (string)
  # output: None

  # Create lines of the training actual, testing actual, prediction 
  D1 = go.Scatter(x=train.index,y=train['Close'],name = 'Train Actual') # Training actuals
  D2 = go.Scatter(x=test.index[:shift],y=test['Close'],name = 'Test Actual') # Testing actuals
  D3 = go.Scatter(x=test.index[:shift],y=pred,name = 'Our Prediction') # Testing predction

  # Combine in an object  
  line = {'data': [D1,D2,D3],
          'layout': {
              'xaxis' :{'title': 'Date'},
              'yaxis' :{'title': '$'},
              'title' : name + ' - ' + tickerSymbol + ' - ' + str(shift_days)
          }}
  # Send object to a figure 
  fig = go.Figure(line)

  # Show figure
  fig.show()

## Making the Model

In order to make the models, we'll be using a package called SciKit Learn.

We'll have to install and import the package. 

In [None]:
# Import sklearn modules that will help with modeling building

!{sys.executable} -m pip install sklearn # Download sklearn 
from sklearn.metrics import mean_squared_error # Install error metrics 
from sklearn.linear_model import LinearRegression # Install linear regression model
from sklearn.neural_network import MLPRegressor # Install ANN model 
from sklearn.preprocessing import StandardScaler # to scale for ann

As discussed earlier, the easiest form of machine learning is linear regression. 

In [None]:
# Regreesion Function

def LinearRegression_fnc(x_train,y_train, x_test, y_test):
  #inputs: x train data, y train data, x test data, y test data (all dataframe's)
  # output: the predicted values for the test data (list)
  
  lr = LinearRegression()
  lr.fit(x_train,y_train)
  lr_pred = lr.predict(x_test)
  lr_MSE = mean_squared_error(y_test, lr_pred)
  lr_R2 = lr.score(x_test, y_test)
  print('Linear Regression R2: {}'.format(lr_R2))
  print('Linear Regression MSE: {}'.format(lr_MSE))

  return lr_pred


In [None]:
# ANN Function 

def ANN_func(x_train,y_train, x_test, y_test):

  # Scaling data
  scaler = StandardScaler()
  scaler.fit(x_train)
  x_train_scaled = scaler.transform(x_train)
  x_test_scaled = scaler.transform(x_test)


  MLP = MLPRegressor(random_state=1, max_iter=1000, hidden_layer_sizes = (100,), activation = 'identity',learning_rate = 'adaptive').fit(x_train_scaled, y_train)
  MLP_pred = MLP.predict(x_test_scaled)
  MLP_MSE = mean_squared_error(y_test, MLP_pred)
  MLP_R2 = MLP.score(x_test_scaled, y_test)

  print('Muli-layer Perceptron R2 Test: {}'.format(MLP_R2))
  print('Multi-layer Perceptron MSE: {}'.format(MLP_MSE))

  return MLP_pred

Let's create one last function to calculate how much money we would have made, had we been trading this strategy

In [None]:
def CalcProfit(test_df,pred,j):
  pd.set_option('mode.chained_assignment', None)
  test_df['pred'] = np.nan
  test_df['pred'].iloc[:-j] = pred
  test_df['change'] = test_df['Close_lag'] - test_df['Close'] 
  test_df['change_pred'] = test_df['pred'] - test_df['Close'] 
  test_df['MadeMoney'] = np.where(test_df['change_pred']/test_df['change'] > 0, 1, -1) 
  test_df['profit'] = np.abs(test['change']) * test_df['MadeMoney']
  profit_dollars = test['profit'].sum()
  print('Would have made: $ ' + str(round(profit_dollars,1)))
  profit_days = len(test_df[test_df['MadeMoney'] == 1])
  print('Percentage of good trading days: ' + str( round(profit_days/(len(test_df)-j),2))     )

  return test_df, profit_dollars

## Let's Start Predicting!
## Time To Make Money!

We've gotten our data, created functions, now let's get to the point of actually doing predictions. 

For the ticker, we'll have a prediction for each time length out into the future. 

In [None]:
# Go through each shift....

for j in shifts: 
  print(str(j) + ' days out:')
  print('------------')
  df_lag, shift = CreateLags(df,j)
  df_lag = CorrectColumnTypes(df_lag)
  x_train, y_train, x_test, y_test, train, test = SplitData(df, train_pct, shift)

  # Linear Regression
  print("Linear Regression")
  lr_pred = LinearRegression_fnc(x_train,y_train, x_test, y_test)
  test2, profit_dollars = CalcProfit(test,lr_pred,j)
  PlotModelResults_Plotly(train, test, lr_pred, tickerSymbol, w, h, j, 'Linear Regression')

  # Artificial Neuarl Network 
  print("ANN")
  MLP_pred = ANN_func(x_train,y_train, x_test, y_test)
  test2, profit_dollars = CalcProfit(test,MLP_pred,j)
  PlotModelResults_Plotly(train, test, MLP_pred, tickerSymbol, w, h, j, 'ANN')
  print('------------')







1 days out:
------------
Linear Regression
Linear Regression R2: -1.2744069625602852
Linear Regression MSE: 33739.732916610694
Would have made: $ -53.1
Percentage of good trading days: 0.53


ANN
Muli-layer Perceptron R2 Test: 0.9844618026594185
Multi-layer Perceptron MSE: 230.50168105652483
Would have made: $ 455.0
Percentage of good trading days: 0.54


------------
5 days out:
------------
Linear Regression
Linear Regression R2: -19.888112691972612
Linear Regression MSE: 311731.97149463434
Would have made: $ 1000.2
Percentage of good trading days: 0.53


ANN
Muli-layer Perceptron R2 Test: 0.8877512966148928
Multi-layer Perceptron MSE: 1675.1877069967784
Would have made: $ -53.1
Percentage of good trading days: 0.49


------------
10 days out:
------------
Linear Regression
Linear Regression R2: -48.516137962624775
Linear Regression MSE: 750876.139489157
Would have made: $ 1919.2
Percentage of good trading days: 0.6


ANN
Muli-layer Perceptron R2 Test: 0.7318278853782345
Multi-layer Perceptron MSE: 4066.6346454124987
Would have made: $ 1152.8
Percentage of good trading days: 0.56


------------
