# Stock Price Prediction with TensorFlow

### Basis
We assume that the stock price is the sum of the instrinsic value of the stock + some mispricing error (e.g. speculation). The instrinsic value of the stock is a function of the values of fundamental factors (such as P/E ratio, EBITDA, and revenue). We can therefore use these fundamental factors to train a ML model to estimate the instrinsic value of a stock.

We further assume that the mispricing error of a stock is independent and cancelled out when averaged over the whole market. This means that if the instrinsic value (as calculated by the model) is much lower than the current stock price, the stock is overvalued.

### Settings
We use the yfinance library https://github.com/ranaroussi/yfinance to obtain fundamental data for stocks. The data collected reflects all stocks in NYSE and NASDAQ exchanges for which yfinance has adequate data. The data collected is stored in (stock_data.csv).

We use pandas to process the data and generate the input vectors according to the factors in (stock_intrinsic_factors.json). The input vectors are normalised to have mean = 0 and variance = 1.

We create the model in TensorFlow and train it using the mean squared error as the loss function. The output is the predicted intrinsic price of the stock.

In [1]:
import pandas as pd
import numpy as np
import json
from sklearn.model_selection import train_test_split
import tensorflow as tf
import os
# import yfinance as yf

In [2]:
def generate_stock_vector(ticker, stock_data_df, stock_intrinsic_factors):
    """
    Generate input vector from stock info dataframe
    """
    stock = stock_data_df.loc[stock_data_df['symbol'] == ticker]
    stock_vector = [float(stock[f].values[0]) for f in stock_intrinsic_factors]
    stock_vector = np.array(stock_vector)
    return stock_vector


def rec_key_change(key):
    """
    Change recommendation key to numerical values
    """
    if key == 'none':          return 0
    elif key == 'hold':        return 1
    elif key == 'buy':         return 2
    elif key == 'strong_buy':  return 3

def stock_sector_change(key, stock_sectors):
    """
    Change stock sector into numerical values
    """
    return stock_sectors.index(key)

In [5]:
class stock_price_predictor:
    def __init__(self, ):
        pass
        
    def initialize_dataset(self, training_data_csv_file='stock_data.csv', 
                           intrinsic_factors='stock_intrinsic_factors.json'):
        '''
        Read and process dataset from csv file
        '''
        # Read important intrinsic factors to be set for input
        with open(intrinsic_factors, 'r') as file:
            self.stock_intrinsic_factors = json.load(file)
            
        # Read training data from csv
        self.stock_data_df = pd.read_csv(training_data_csv_file)

        # Generate list of sectors represented in training data 
        self.stock_sectors = self.stock_data_df['sector'].values
        self.stock_sectors = list(np.unique(self.stock_sectors))
        self.stock_sectors.sort()
        with open('stock_sectors.json', 'w') as file:
            json.dump(list(self.stock_sectors), file)

        # Change sectors into numerical values
        self.stock_data_df['sector'] = self.stock_data_df.apply(lambda row: stock_sector_change(row['sector'], self.stock_sectors), axis=1)
        # Change recommendationKey into numerical values
        self.stock_data_df['recommendationKey'] = self.stock_data_df.apply(lambda row: rec_key_change(row['recommendationKey']), axis=1)


    def generate_mean_variance(self, x_data):
        '''
        Normalise inputs to have mean 0 and variance 1.
        '''
        # Normalise inputs
        x_mean = []
        x_variance = []
        for i in range(len(x_data[0])):
            x_mean.append(np.mean([f[i] for f in x_data]))
            x_variance.append(np.var([f[i] for f in x_data]))
        x_mean = np.array(x_mean)
        x_variance = np.array(x_variance)

        self.x_mean = x_mean
        self.x_variance = x_variance

        # Save mean and variance into json file
        with open('mean_variance.json', 'w') as file:
            json.dump({'mean': list(x_mean), 'variance': list(x_variance)}, file)

        return x_data


    def generate_training_set(self):
        '''
        Generate the input vector from the stock data collected
        Normalise inputs to have mean 0 and variance 1
        Split dataset into training and test sets
        '''
        # Convert to np array
        x_data = np.array([generate_stock_vector(f, self.stock_data_df, self.stock_intrinsic_factors) \
            for f in self.stock_data_df['symbol'].values])
        
        prices = self.stock_data_df['currentPrice'].values
        # recKey = self.stock_data_df['recommendationKey'].values
        y_data = np.array([prices[i] for i in range(len(prices))])
        self.data_set = [x_data, y_data]

        self.generate_mean_variance(x_data)
        x_data = list(x_data)
        x_data = [(f - self.x_mean) / (np.sqrt(self.x_variance)) for f in x_data]
        x_data = np.array(x_data)

        # Split data into training and test sets
        x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.20, random_state=42)

        
        self.training_set = [x_train, y_train]
        self.test_set = [x_test, y_test]

    def create_model(self):
        '''
        Create tensorflow model
        Create path linked to previously saved model weights
        '''
        # Set model architecture
        self.model = tf.keras.models.Sequential([
            tf.keras.layers.Dense(200, activation='relu'),
            tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(200, activation='relu'),
            tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(1, activation='relu')
            ])

        # Prepare to save the model weights
        self.checkpoint_path = "training_1/cp.ckpt"
        self.checkpoint_dir = os.path.dirname(self.checkpoint_path)


    def training(self):
        '''
        Train tensorflow model
        '''
        # Check TensorFlow version
        print("TensorFlow version:", tf.__version__)

        # Create ML model
        self.create_model()
        
        # Create a callback that saves the model's weights
        cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=self.checkpoint_path,
                                                        save_weights_only=True,
                                                        verbose=1)

        # Define loss function
        loss_fn = tf.keras.losses.MeanSquaredError()
        
        # Compile model
        self.model.compile(optimizer='adam',
                    loss=loss_fn,
                    metrics=['mse'])
        
        [x_train, y_train] = self.training_set
        [x_test, y_test] = self.test_set

        # Train model
        self.model.fit(x_train, y_train, epochs=50, callbacks=[cp_callback])
        
        print(self.model.summary())

        # Evaluate model on test set
        self.model.evaluate(x_test,  y_test, verbose=2)
        print('---------------------------')
        predictions = self.model(x_test[:5]).numpy()
        for i in range(len(predictions)):
            print(f'Model price: {predictions[i][0]}, Actual price: {y_test[i]}')
        print('---------------------------')


    def predict_price(self, ticker):
        '''
        Use the ML model to predict the price of a stock given its fundamental info such as P/E ratio, cashflow, etc.
        Fundamental info taken from yahoo finance with yfinance.
        '''
        stock_vector = [generate_stock_vector(ticker, self.stock_data_df, self.stock_intrinsic_factors)]
        stock_vector = np.array([(f - self.x_mean) / (np.sqrt(self.x_variance)) for f in stock_vector])
        price = self.model(stock_vector).numpy()[0][0]
        return price

In [8]:
# Training Demonstration
predictor = stock_price_predictor()
predictor.initialize_dataset()
predictor.generate_training_set()
predictor.training()

TensorFlow version: 2.11.0
Epoch 1/50
 1/39 [..............................] - ETA: 16s - loss: 30941.5781 - mse: 30941.5781
Epoch 1: saving model to training_1\cp.ckpt
Epoch 2/50
Epoch 2: saving model to training_1\cp.ckpt
Epoch 3/50
 1/39 [..............................] - ETA: 0s - loss: 83548.4062 - mse: 83548.4062
Epoch 3: saving model to training_1\cp.ckpt
Epoch 4/50
 1/39 [..............................] - ETA: 0s - loss: 2572.9048 - mse: 2572.9048
Epoch 4: saving model to training_1\cp.ckpt
Epoch 5/50
Epoch 5: saving model to training_1\cp.ckpt
Epoch 6/50
Epoch 6: saving model to training_1\cp.ckpt
Epoch 7/50
Epoch 7: saving model to training_1\cp.ckpt
Epoch 8/50
Epoch 8: saving model to training_1\cp.ckpt
Epoch 9/50
 1/39 [..............................] - ETA: 0s - loss: 1869.5520 - mse: 1869.5520
Epoch 9: saving model to training_1\cp.ckpt
Epoch 10/50
 1/39 [..............................] - ETA: 0s - loss: 690.8137 - mse: 690.8137
Epoch 10: saving model to training_1\cp.ckp

 1/39 [..............................] - ETA: 0s - loss: 343.4335 - mse: 343.4335
Epoch 37: saving model to training_1\cp.ckpt
Epoch 38/50
 1/39 [..............................] - ETA: 0s - loss: 240.8310 - mse: 240.8310
Epoch 38: saving model to training_1\cp.ckpt
Epoch 39/50
 1/39 [..............................] - ETA: 0s - loss: 105.4042 - mse: 105.4042
Epoch 39: saving model to training_1\cp.ckpt
Epoch 40/50
 1/39 [..............................] - ETA: 0s - loss: 308.4404 - mse: 308.4404
Epoch 40: saving model to training_1\cp.ckpt
Epoch 41/50
 1/39 [..............................] - ETA: 0s - loss: 337.1203 - mse: 337.1203
Epoch 41: saving model to training_1\cp.ckpt
Epoch 42/50
 1/39 [..............................] - ETA: 0s - loss: 1560.5594 - mse: 1560.5594
Epoch 42: saving model to training_1\cp.ckpt
Epoch 43/50
 1/39 [..............................] - ETA: 0s - loss: 159.8494 - mse: 159.8494
Epoch 43: saving model to training_1\cp.ckpt
Epoch 44/50
 1/39 [..................

### Test: Microsoft 
($MSFT)

In [9]:
ticker = 'MSFT'
price = predictor.predict_price(ticker=ticker)
stock_data_df = predictor.stock_data_df
print(f'Actual Price: {stock_data_df.loc[stock_data_df["symbol"] == ticker]["currentPrice"].values[0]}')
print(f'Predicted Price: {price}')

Actual Price: 308.13
Predicted Price: 346.7920837402344
