# Challenge: Overfitting on Other Datasets

## Download data from `yfinance`

In [7]:
import yfinance as yf

ticker = 'META'
df = yf.download(ticker)
df

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2012-05-18,42.049999,45.000000,38.000000,38.230000,38.230000,573576400
2012-05-21,36.529999,36.660000,33.000000,34.029999,34.029999,168192700
...,...,...,...,...,...,...
2023-05-11,233.050003,238.210007,232.300003,235.789993,235.789993,20449000
2023-05-12,236.740005,236.960007,231.449997,233.809998,233.809998,16155300


## Preprocess the data

### Filter the date range

- Since 1 year ago at least

In [8]:
df = df.loc['2020-01-01':].copy()

### Create the target variable

#### Percentage change

- Percentage change on `Adj Close` for tomorrow

In [9]:
df['change_tomorrow'] = df['Adj Close'].pct_change(-1)
df.change_tomorrow = df.change_tomorrow * -1
df.change_tomorrow = df.change_tomorrow * 100

#### Remove rows with any missing data

In [10]:
df = df.dropna().copy()
df

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,change_tomorrow
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-01-02,206.750000,209.789993,206.270004,209.779999,209.779999,12077100,-0.531941
2020-01-03,207.210007,210.399994,206.949997,208.669998,208.669998,11188400,1.848546
...,...,...,...,...,...,...,...
2023-05-10,236.169998,236.750000,230.720001,233.080002,233.080002,19119000,1.149324
2023-05-11,233.050003,238.210007,232.300003,235.789993,235.789993,20449000,-0.846840


## Machine Learning modelling

### Feature selection

1. Target: which variable do you want to predict?
2. Explanatory: which variables will you use to calculate the prediction?

In [11]:
y = df.change_tomorrow
X = df.drop(columns='change_tomorrow')

### Train test split

In [12]:
from sklearn.model_selection import train_test_split

In [None]:
train_test_split

In [None]:
>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=0.33, random_state=42)

### Fit the model on train set

### Evaluate model

#### On test set

In [None]:
from sklearn.metrics import ???

#### On train set

## Backtesting

In [None]:
from backtesting import Backtest, Strategy

### Create the `Strategy`

In [None]:
class Regression(Strategy):
    limit_buy = 1
    limit_sell = -5
    
    def init(self):
        self.model = DecisionTreeRegressor(max_depth=15, random_state=42)
        self.already_bought = False
        
        ???

    def next(self):
        explanatory_today = self.data.df.iloc[[-1], :]
        forecast_tomorrow = self.model.predict(explanatory_today)[0]
        
        if forecast_tomorrow > self.limit_buy and self.already_bought == False:
            self.buy()
            self.already_bought = True
        elif forecast_tomorrow < self.limit_sell and self.already_bought == True:
            self.sell()
            self.already_bought = False
        else:
            pass

### Run the backtest on `test` data

In [None]:
bt = Backtest(???, Regression,
              cash=10000, commission=.002, exclusive_orders=True)

In [None]:
results = bt.run(limit_buy=1, limit_sell=-5)

df_results_test = results.to_frame(name='Values').loc[:'Return [%]']\
    .rename({'Values':'Out of Sample (Test)'}, axis=1)
df_results_test

### Run the backtest on `train` data

In [None]:
bt = Backtest(???, Regression,
              cash=10000, commission=.002, exclusive_orders=True)

results = bt.run(limit_buy=1, limit_sell=-5)

df_results_train = results.to_frame(name='Values').loc[:'Return [%]']\
    .rename({'Values':'In Sample (Train)'}, axis=1)
df_results_train

### Compare both backtests

- HINT: Concatenate the previous `DataFrames`

#### Plot both backtest reports

## Continue with the tutorials on the following chapter

**How to solve the overfitting problem?**

Walk Forward Validation: A Realistic Approach to Algorithmic Trading

[LinkedIn Course Chapter]()

![](<src/10_Table_Validation Methods.png>)