<a href="https://colab.research.google.com/github/LukeSchreiber/FastAI-Projects/blob/main/Lesson6RandomForests.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Here is my random forest project. This project uses a Random Forest classifier to predict whether Bitcoins price will go up or down the next day based on historical market features.

First off we need to import everything. Were importing pandas to get together the data smoothly then numpy to crunch all the numbers, yfianance in order to get the data we need. while were going to use sklearn to make the random forest. Matplotlib to plot and graph what we need

In [None]:
import pandas as pd
import numpy as np
import yfinance as yf
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

df = yf.download("BTC-USD", interval="1d", auto_adjust=False)
df = df.dropna().copy()

df = df[['Open','High','Low','Close','Volume']].copy()
df.head()


Now we need to set the feature matrix (x) and target (y)

In [None]:
X = pd.DataFrame(index=df.index)

X['pct_oc'] = (df['Close'] - df['Open']) / df['Open']

X['range_hl'] = (df['High'] - df['Low']) / df['Open']

#Volume z-score (14d)
vol = df['Volume']
X['vol_z14'] = (vol - vol.rolling(14).mean()) / vol.rolling(14).std()

X = X.dropna()

y = (df['Close'].shift(-1) > df['Close']).astype(int).reindex(X.index)

X.head(), y.head()


split 80/20

In [None]:
split_idx = int(len(X) * 0.8)
X_train, X_test = X.iloc[:split_idx], X.iloc[split_idx:]
y_train, y_test = y.iloc[:split_idx], y.iloc[split_idx:]

len(X_train), len(X_test)


We now have clean and split data so lets create the model and train it.

In [None]:
#Random forest training
rf = RandomForestClassifier(
    n_estimators=300,
    max_depth=None,
    min_samples_leaf=3,
    random_state=42,
    n_jobs=-1
)
rf.fit(X_train, y_train)


Now that the model is trained we can now use it to predict and evaluate

In [None]:
pred = rf.predict(X_test)

acc = accuracy_score(y_test, pred)
cm = confusion_matrix(y_test, pred)
print(f"Accuracy: {acc:.3f}\n")
print("Confusion matrix:\n", cm, "\n")
print(classification_report(y_test, pred, digits=3))


This block will help us show the feature importance. How much each feature played into creating our model

In [None]:
imp = pd.Series(rf.feature_importances_, index=X.columns).sort_values(ascending=False)
print(imp)

imp.plot(kind='bar', title='Feature importances');
plt.show()


Now we can show the comparison of what happend vs what we predicted

In [None]:
viz = pd.DataFrame({
    'y_true': y_test.values.ravel(),   # flatten to 1D
    'y_pred': pred.ravel()             # flatten to 1D
}, index=y_test.index)

viz.tail(30)


Here is the accuracy score and the classifaction report

In [None]:
from sklearn.metrics import accuracy_score, classification_report

print("Accuracy:", accuracy_score(y_test, pred))
print(classification_report(y_test, pred))


Finally we can save it into model.pkl!

In [None]:
import joblib
joblib.dump(rf, "bitcoin_rf_model.pkl")
