<h1> Building a backtesting system </h1>

Currently we're only able to test against the last 100 days. </br>
In real world, we want to be able to test across multiple years of data, so we're gonna do something called "back testing".

In [1]:
#first, we create a prediction function
def predict(train, test, predictors, model):
    model.fit(train[predictors], train["Target"])
    preds = model.predict(test[predictors])
    preds = pd.Series(preds, index=test.index, name="Predictions")
    combined = pd.concat([test["Target"], preds], axis=1)
    return combined

Now, we create a backtest function. 
It takes the s&p500 data, the ML model, the predictors, start and step value. </br>
We give start the value of 2500, because every trading year has about 250 days. We want to backtest data for 10 years. (250 * 10 = 2500) </br>
step is 250, because we will train a model for about a year and then going to the next year and so on... </br>
So we're gonna take the first 10 years of data and predict values for the 11th year.
Then we'll take the first 11th years and predict values for the 12th year and so on...</br>
This way, we'll be able to have more confidence in our model.


In [3]:
def backtest(data, model, predictors, start=2500, step=250):
    all_predictions = []
    
    #loop across the data year by year and make predictions for all the years except the first 10 or so
    #data.shape[0] = rows
    for i in range(start, data.shape[0], step):
        #spliting train and test data
        #train = all of the years prior to the current year
        train = data.iloc[0:i].copy()
        #test = the current year
        test = data.iloc[i:(i+step)].copy()
        #using predict function to generate our predictions
        predictions = predict(train, test, predictors, model)
        all_predictions.append(predictions)
        #at the end, we're gonna concatenate all our predictions together
    return pd.concat(all_predictions)

In [None]:
#backtesting for s&p500 data with the model and predictors we created earlier
predictions = backtest(sp500, model, predictors)

In [None]:
#lets see how many days we predicted the market would go up versus down
#1 = up
#0 = down
predictions["Predictions"].value_counts()

In [None]:
#cheking precisions score for accuracy
#so when we said the market would go up it went 52% of the time
precision_score(predictions["Target"], predictions["Predictions"])

In [None]:
#percentage of days where the market actually went up
predictions["Target"].value_counts() / predictions.shape[0]

<h1> Adding more predictors to our model to improve the accuracy </h1>

We're gonna create a variety of rolling averages and aluculate the mean close price in the last 2 days, he last trading week which is 5 days, the last 3 month which is 60 trading days, the last year = 250 trading days, the last 4 years = 1000 trading days. </br>
Then we're gonna find the ratio between today's closing price and the closing price in those periods.
It will helps us know if the market has gone up a tone, it maybe due for a downturn. and if the market has gone down a ton, it maybe due for an upswing.

In [None]:
horizons = [2, 5, 60, 250, 1000]
new_predictors = []

#we're gonna loop through these horizones and then we're gonna calculate a rolling average against that horizon
for horizon in horizons:
    rolling_averages = sp500.rolling(horizon).mean()
    #creating a couple of columns
    ratio_column = f"Close_Ration_{horizon}"
    # so the first loop is going to be the ration between today's close and the average close in the last 2 days
    # second time through the loop it'll be the ratio between today's close and the average close in the last 5 days and so on...
    sp500[ratio_column] = sp500["Close"] / rolling_averages["Close"]

    #trend is gonna be the number of days in the past x days whatever horizon is that the stock market actually went up
    trend_column = f"Trend_{horizon}"
    #shifting forward then we'll find rolling sum of the target
    sp500[trend_column] = sp500.shift(1).rolling(horizon).sum()["Target"]
    
    new_predictors += [ratio_column, trend_column]
    
    #deleting NaN values
    sp500 = sp500.dropna()

<h1> Improving our model </h1>

In [None]:
#increasing the number of estimators and reducing the number of min samples
model = RandomForestClassifier(n_estimators=200, min_samples_split=50, random_state=1)

When we run .predict the model returns 0 or 1. 
We want more control over what we define, what becomes 1 and what becomes 0.
So instead, we're gonna use .predict_proba method.
It will return a probability that the row will be 0 or 1.
(The probability of stock market going up or down tomorrow.)

Then we want to set our custom threshold.
By default, the threshold is 0.5 </br>
It means: if there's greater than a 50 chance that the price will go up, the model return that the price will go up. But we're gonna set the threshold to 60, so this means the model has to be more confident the price will go up.
This way, it's gonna reduce our total number of trading days. 
So it'll reduce the number of days that it predicts the price will go up.
But it will increase the chance that the price will actually go up on those days.
It fit's really well with what we want

In [None]:
def predict(train, test, predictors, model):
    model.fit(train[predictors], train["Target"])
    #we're gonna get the second column which will be the probability
    preds = model.predict_proba(test[predictors])[:,1]

    preds[preds >= .6] = 1
    preds[preds < .6] = 0
    
    preds = pd.Series(preds, index=test.index, name="Predictions")
    combined = pd.concat([test["Target"], preds], axis=1)
    return combined

In [None]:
#running our backtest again
#this time we use ratio = new_predictors
predictions = backtest(sp500, model, new_predictors)

The distribution is diffrent now.
This time, there's only a few days that we've predicted the price would go up.
That's because we change the threshold.
We asked the model to be more confident in its predictions, and it means we're gonna buy stock on fewer days and hopefully we'll be more accurate on those days.

In [None]:
predictions["Predictions"].value_counts()

In [None]:
#we'll check the precision score
precision_score(predictions["Target"], predictions["Predictions"])

In [9]:
#so when the model predicts that the stock will go up, it will go up ??% of the time

In [None]:
predictions