# Profit Prophet: The Stock Market ML Predictor

This project uses LSTM machine learning to make predictions on the stock market and recommend a course of action.

## Stock Data

TBD. What specific data should we collect for the LSTM?

## Approach

First we train an LSTM model for the [TBD. Aggreggated index? Each stock?] to get a forecast of the market based on pure numeric data.

Simultaneously, we read the news and use ML to catagorize the news and react in these ways:

|    Stock Implication   |          Past          |         Present           |          Future          |
| :--------------------: | :--------------------: | :-----------------------: | :----------------------: |
| Artificially Increased |  Reduce Past Estimate  |  Reduce Present Estimate  |  Reduce Future Estimate  |
| Artificially Decreased | Increase Past Estimate | Increase Present Estimate | Increase Future Estimate |
|        No Change       |       Do Nothing       |         Do Nothing        |        Do Nothing        |

This gives us an estimate and forecast of the *True* value of the stock, which we can use to make fat stacks.

## Tools

There are multiple different ways to get the stock prices; Bloomberg Terminals, and OpenBB

### Bloomberg API

Bloomberg terminals are the defacto way to get stock information. UW also provides access to 4 of these terminals in the MC building. The API for Bloomberg requires the terminal to be running, so the API can only run on a machine with the terminal open.

For this reason, we are moving away from Bloomberg API

In [None]:
# Bloomberg API

from xbbg import blp
import pandas as pd

DATA_DIR = './Data/'

tickers = ['NVDA US Equity', 'AAPL US Equity']
fields = ['High', 'Low', 'Last_Price']
start_date = '2024-11-01'
end_date = '2024-11-10'

# This line hangs unless it is running with a Bloomberg terminal
hist_tick_data = blp.bdh(tickers=tickers, fields=fields, start_date=start_date, end_date=end_date)

filename = f'tick_data_{start_date}_to_{end_date}.csv'
hist_tick_data.to_csv(DATA_DIR + filename)



### OpenBB

OpenBB is a free open-source implementation of Bloomberg's stock viewer. It can be run without any special software running in the background.

In [28]:
import openbb
openbb.build()

In [2]:
# OpenBB API

from openbb import obb
import pandas as pd

obb.user.preferences.output_type = 'OBBject'

def downloadStockData(symbol, start_date=None, end_date=None):
    # Fetch daily OHLCV data
    ohlcv_data = obb.equity.price.historical(symbol=symbol, start_date=start_date, end_date=end_date)
    ohlcv_df = ohlcv_data.to_df()
    print(ohlcv_df.head())

    # Calculate RSI
    rsi_data = obb.technical.rsi(data=ohlcv_data.results, target='close', length=14, scalar=100.0, drift=1)
    rsi_df = rsi_data.to_df().rename(columns={'rsi': 'RSI_14'})

    # Calculate MACD
    macd_data = obb.technical.macd(data=ohlcv_data.results, target='close', fast=12, slow=26, signal=9)
    macd_df = macd_data.to_df()

    # Merge into main DataFrame
    merged_df = ohlcv_df.merge(rsi_df, left_index=True, right_index=True)
    merged_df = merged_df.merge(macd_df, left_index=True, right_index=True)

    # Fetch S&P 500 data
    sp500_data = obb.equity.price.historical("^GSPC", start_date=start_date, end_date=end_date)
    sp500_df = sp500_data.to_df()[['close']].rename(columns={'close': 'SP500'})

    # Merge with OHLCV data
    merged_df = merged_df.merge(sp500_df, left_index=True, right_index=True)

    return merged_df

# Declare search bounds 
symbols = ['AAPL', 'NVDA']
start_date = '2010-01-01'
end_date = '2025-01-01'

data_df = downloadStockData(symbols, start_date, end_date)

data_df.to_csv('Stock Data.csv')
print(data_df)

            open  high   low  close     volume symbol
date                                                 
2010-01-04  7.63  7.66  7.59   7.64  493728200   AAPL
2010-01-04  0.46  0.47  0.45   0.46  800352668   NVDA
2010-01-05  7.67  7.70  7.62   7.66  601904016   AAPL
2010-01-05  0.46  0.47  0.46   0.47  728697549   NVDA
2010-01-06  7.66  7.69  7.53   7.54  552158376   AAPL


OpenBBError: 
[Unexpected Error] -> InvalidIndexError -> Reindexing only valid with uniquely valued Index objects

In [None]:
# LSTM pre-processing

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import numpy as np

# Declare search bounds 
symbols = ['AAPL', 'NVDA']
start_date = '2024-11-01'
end_date = '2024-11-10'

# Get stock data
data_df = downloadStockData(symbols, start_date, end_date)

# Drop NA (from rolling indicators like MACD)
data_df.dropna(inplace=True)

# Select features (adjust as needed)
features = ['open', 'high', 'low', 'close', 'volume', 'RSI_14', 'MACD_12_26', 'MACD_signal', 'SP500']
target = 'close'

# Normalize features (excluding target)
scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(data_df[features])

def create_sequences(data, sequence_length):
    X, y = [], []
    for i in range(len(data) - sequence_length):
        X.append(data[i:i+sequence_length])
        y.append(data[i+sequence_length, 3])  # Assuming 'close' is the 4th column (index 3)
    return np.array(X), np.array(y)

sequence_length = 30  # Adjust based on your model
X, y = create_sequences(scaled_features, sequence_length)

train_size = int(0.8 * len(X))
X_train, X_test = X[:train_size], X[train_size:]
y_train, y_test = y[:train_size], y[train_size:]

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(sequence_length, len(features))))
model.add(Dropout(0.2))
model.add(LSTM(50))
model.add(Dropout(0.2))
model.add(Dense(1))

model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_split=0.1)