In [76]:
import yfinance as yf
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_score

Stock market data of s&p500 is obtained from yahooFinance.
Stock data is modified to begin from 1990 to current.
The machine learning model will predict the next day's data using previous data.
The implementation is designed so that only on days when the model predicts the price to go up, the stock will be bought.
Days where the algorithm projects the price to go down but increases instead are ignored. This implementation strictly tries to minimize loss.
The implementation is more focused on precision than recall.


In [77]:
sp500 = yf.Ticker('^GSPC')
sp500 = sp500.history(period='max')
del sp500["Dividends"]
del sp500["Stock Splits"]
sp500["Tomorrow"] = sp500["Close"].shift(-1)
sp500["Target"] = (sp500["Tomorrow"] > sp500["Close"]).astype(int)
sp500 = sp500.loc["1990-01-01":].copy()

Mulitple trends of the s&p500 are examined to get a better prediction of the stock market. Trends of 2 days, 1 week, 3 months, 1 year and 4 years are examined.

In [78]:
horizons = [2,5,60,250,1000]
new_predictions = []

for horizon in horizons:
    rolling_avg = sp500.rolling(horizon).mean()
    ratio = f"Close_Ratio_{horizon}"
    sp500[ratio] = sp500["Close"]/rolling_avg["Close"]
    trend = f"Trend_Column_{horizon}"
    sp500[trend] = sp500.shift(1).rolling(horizon).sum()["Target"]
    new_predictions += [ratio,trend]


In [79]:
sp500 = sp500.dropna(subset=sp500.columns[sp500.columns != "Tomorrow"])

Random Forest regression model is used to predict whether the price went up or down from yesterday.
The backtest function is used to train and test the s&p500 data with the help of the predict function.

In [80]:
model = RandomForestClassifier(n_estimators=200,min_samples_split=50,random_state=1)

def predict(train,test,predict_arr,model):
    model.fit(train[predict_arr],train["Target"])
    predictions = model.predict_proba(test[predict_arr])[:,1]
    predictions[predictions >= 0.6] = 1
    predictions[predictions < 0.6] = 0
    predictions = pd.Series(predictions,index=test.index,name="Predictions")
    combined = pd.concat([test["Target"],predictions],axis=1)
    return combined
def backtest(data,model,predict_arr,start,step):
    yrly_pred = []
    for i in range(start,data.shape[0],step):
        train = data.iloc[0:i].copy()
        test = data.iloc[i:(i+step)].copy()
        prediction = predict(train,test,predict_arr,model)
        yrly_pred.append(prediction)
    return pd.concat(yrly_pred)

In [90]:
predictions = backtest(sp500,model,new_predictions,2500,250)

S&P500 Positive and negative closing percentages if one were to buy the stock and sell at closing everyday.

In [91]:
predictions["Target"].value_counts() / predictions.shape[0]

1    0.545971
0    0.454029
Name: Target, dtype: float64

Precision score accuracy of program's predictions compared to actual predictions.

In [92]:
precision_score(predictions["Target"],predictions["Predictions"])

0.5691358024691358

Program shows a higher percentage of positive predictions over buying and selling s&p500 stock at closing everyday.

In [94]:
check = predictions["Predictions"].iat[-1]
if check == 0:
    print("The market price is expected to decrease from today at closing.")
else:
    print("The market price is expected to increase from today at closing.")


The market price is expected to decrease from today at closing.
