![QuantConnect Logo](https://cdn.quantconnect.com/web/i/icon.png)
<hr>

### Random Forest Regression

For another installment of our "mini-series" of examples on how to move your work from the research environment and into production, we've shown how you can implement a basic random forest regression model using the sklearn RandomForestRegressor. Briefly, random forests is a supervised learning algorithm that we here use specifically for regression in order to identify important features of our dataset and create weights to build a tradeable portfolio. 

To start, we continue to use the US Treasuries ETF basket and get the historical data we want. We'll use the most recent 1000 hours of historical data to create our train / test data sets.

In [1]:
# QuantBook Analysis Tool 
# For more information see [https://www.quantconnect.com/docs/research/overview]
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
qb = QuantBook()
qb

symbols = {}
assets = ["SHY", "TLT", "SHV", "TLH", "EDV", "BIL",
                  "SPTL", "TBT", "TMF", "TMV", "TBF", "VGSH", "VGIT",
                  "VGLT", "SCHO", "SCHR", "SPTS", "GOVT"]

for i in range(len(assets)):
    symbols[assets[i]] = qb.AddEquity(assets[i],Resolution.Minute).Symbol

#Copy Paste Region For Backtesting.
#==========================================
# Set up classifier
# Initialize instance of Random Forest Regressor
regressor = RandomForestRegressor(n_estimators=100, min_samples_split=5, random_state = 1990)

# Fetch history on our universe
df = qb.History(qb.Securities.Keys, 500, Resolution.Hour)

# Get train/test data
returns = df.unstack(level=1).close.transpose().pct_change().dropna()
X = returns
# use real portfolio value in algo: y = [x for x in qb.portfolioValue][-X.shape[0]:]
y = np.random.normal(100000, 5, X.shape[0])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 1990)

Once we have our data and have initialized our regressor, we fit the model and then can determine the importance of each feature, which in this case are the different symbols.

Our final variable <em>selected</em> is a zip of symbol-weight tuples to be used in building our portfolio.

In [2]:

# Fit regressor
regressor.fit(X_train, y_train)

# Get long-only predictions
weights = regressor.feature_importances_
symbols = returns.columns[np.where(weights)]
selected = zip(symbols, weights)
for x, y in selected:
    print(f'Symbol: {x}, Weight: {y}')

Symbol: BIL TT1EBZ21QWKL, Weight: 0.06860311608980395
Symbol: EDV TYCF240SL9PH, Weight: 0.06236283531890688
Symbol: GOVT V45XL2BVKU3P, Weight: 0.04368632766949955
Symbol: SCHO UOVIOSUIT3DX, Weight: 0.061877890201449244
Symbol: SCHR UOVIOSUIT3DX, Weight: 0.07772532787888427
Symbol: SHV TP8J6Z7L419H, Weight: 0.06571655997142939
Symbol: SHY SGNKIKYGE9NP, Weight: 0.0637328714858151
Symbol: SST V2245V5VOQQT, Weight: 0.09435425484418379
Symbol: TBF UF9WRZG9YA1X, Weight: 0.043968112419911415
Symbol: TBT U297ZHBXJ5NP, Weight: 0.035527396871039806
Symbol: TLH TP8J6Z7L419H, Weight: 0.07438852724550572
Symbol: TLO TT1EBZ21QWKL, Weight: 0.04383431314645853
Symbol: TLT SGNKIKYGE9NP, Weight: 0.02775756426489062
Symbol: TMF UBTUG7D0B7TX, Weight: 0.029645002255879502
Symbol: TMV UBTUG7D0B7TX, Weight: 0.03533909176068473
Symbol: VGIT UHVG8V7B7YAT, Weight: 0.06269298525092254
Symbol: VGLT UHVG8V7B7YAT, Weight: 0.03941532098553043
Symbol: VGSH UHVG8V7B7YAT, Weight: 0.06937250233920446
