<a href="https://colab.research.google.com/github/YutongWu12/FRE-GY5040-Machine-Learning-for-Finance-with-Python/blob/main/4_Feature_Engineering_%2B_DL_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Feature Engineering


*   The fourth project is the development of a notebook (code + explanation) that successfully engineers 12 unique types of features, **three** for each type of feature engineering: **transforming**, **interacting**, **mapping**, and **extracting**.
* The second part of the assignment is the development of a **deep learning classification** model to predict the direction of the S&P500 for the dates **2018-01-01—2018-07-12** (test set).
* The feature engineering section is unrelated to the model section, you can develop any features, not just features that would work for deep learning models (later on you can decide which features to use in your model).
*  You also have to uncomment all the example features and make them run successfully  → **every** feature example has some error/s that you have to fix. Please also describe the error you fixed!
*   Note that we *won't* be attempting to measure the quality of every feature (i.e., how much it improves the model), that is slightly too advanced for this course.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import seaborn as sns

Preparing the Data

In [None]:
# preparing our data
raw_prices = pd.read_csv("https://storage.googleapis.com/sovai-public/random/assetalloc.csv", sep=';', parse_dates=True, index_col='Dates', dayfirst=True)
df = raw_prices.sort_values(by='Dates')
df["target"] = df["SP500"].pct_change().shift(-1)
df["target"] = np.where(df["target"]>0,1,0)
df.head()

Unnamed: 0_level_0,FTSE,EuroStoxx50,SP500,Gold,French-2Y,French-5Y,French-10Y,French-30Y,US-2Y,US-5Y,US-10Y,US-30Y,Russel2000,EuroStox_Small,FTSE_Small,MSCI_EM,CRB,target
Dates,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
1989-02-01,2039.7,875.47,297.09,392.5,99.081,99.039,99.572,100.0,100.031,100.345,101.08,101.936,154.38,117.5,1636.57,133.584,286.67,0
1989-02-02,2043.4,878.08,296.84,392.0,98.898,99.117,99.278,99.692,100.0,100.314,101.017,101.905,154.94,117.69,1642.94,135.052,287.03,1
1989-02-03,2069.9,884.09,296.97,388.75,98.907,99.002,99.145,99.178,99.812,100.062,100.921,101.718,155.69,118.62,1659.11,137.134,285.63,0
1989-02-06,2044.3,885.49,296.04,388.0,98.484,98.502,98.51,97.739,99.812,100.062,100.794,101.468,155.58,118.89,1656.86,137.037,284.69,1
1989-02-07,2072.8,883.82,299.63,392.75,98.438,98.312,98.292,97.688,99.906,100.251,101.144,102.092,156.84,118.28,1662.76,136.914,284.21,0


### Train Test Split

In [None]:
from sklearn.model_selection import train_test_split
y = df.pop("target")
X = df.copy()

X_train = X[X.index.astype(str)<'2018-01-01']
y_train = y[X_train.index]
X_test = X[~X.index.isin(X_train.index)]
y_test = y[X_test.index]

# X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

### Transforming

1. Refresh your mind on tranformation methods by going back to the material. I am simply providing 1 example here.
1. Don't repeat my logarithmic return calculation, develop your own transformation (there are 1000s of types of transformations).
1. In the example I provide, there is also an error that you have to fix. For example, one of the errors below is that you should actually use `np.log1p()`, but there is another one, so watch out!

In [None]:
# The code doesn't handle non-positive values in the FTSE column, which would cause a math domain error when applying np.log().
import numpy as np

df["FTSE_log"] = np.log1p(df["FTSE"] - 1)

df.replace([np.inf, -np.inf], np.nan, inplace=True)
df.fillna(0, inplace=True)

df.head()

Unnamed: 0_level_0,FTSE,EuroStoxx50,SP500,Gold,French-2Y,French-5Y,French-10Y,French-30Y,US-2Y,US-5Y,...,FTSE_Small,MSCI_EM,CRB,FTSE_sqrt,SP500_exp,US-10Y_log1p,SP500_FTSE_interaction,FTSE_US10Y_interaction,Gold_SP500_interaction,FTSE_log
Dates,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1989-02-01,2039.7,875.47,297.09,392.5,99.081,99.039,99.572,100.0,100.031,100.345,...,1636.57,133.584,286.67,45.163038,1.058151e+129,4.625757,605974.473,20.179066,689.59,7.620558
1989-02-02,2043.4,878.08,296.84,392.0,98.898,99.117,99.278,99.692,100.0,100.314,...,1642.94,135.052,287.03,45.203982,8.240888e+128,4.625139,606562.856,20.228278,688.84,7.62237
1989-02-03,2069.9,884.09,296.97,388.75,98.907,99.002,99.145,99.178,99.812,100.062,...,1659.11,137.134,285.63,45.496154,9.384957e+128,4.624198,614698.203,20.510102,685.72,7.635256
1989-02-06,2044.3,885.49,296.04,388.0,98.484,98.502,98.51,97.739,99.812,100.062,...,1656.86,137.037,284.69,45.213936,3.70287e+128,4.622951,605194.572,20.281961,684.04,7.622811
1989-02-07,2072.8,883.82,299.63,392.75,98.438,98.312,98.292,97.688,99.906,100.251,...,1662.76,136.914,284.21,45.528013,1.341701e+130,4.626384,621073.064,20.493554,692.38,7.636656


In [None]:
## Transforming 1 (Add code below)
df["FTSE_sqrt"] = np.sqrt(df["FTSE"].replace(-np.inf, 0).replace(np.inf, 0))

In [None]:
## Transforming 2 (Add code below)
df["SP500_exp"] = np.exp(df["SP500"].replace(-np.inf, 0).replace(np.inf, 0))

  result = getattr(ufunc, method)(*inputs, **kwargs)


In [None]:
## Transforming 3 (Add code below)
def z_score_normalization(df):
    scaler = StandardScaler()
    df = df.dropna(subset=["SP500"])
    df.loc[:, "SP500_zscore"] = scaler.fit_transform(df[["SP500"]].values)
    return df

# Applying the transformation
X_train = z_score_normalization(X_train)
X_test = z_score_normalization(X_test)


### Interacting

There are millions of possible interaction methods, be creative and come up with your own. For this assignment there is no 'right' feature engineering method, you simply develop one, and give it a name and a discreption.

In [None]:
#The code attempts to divide two columns, but doesn't ensure that the columns are aligned correctly in terms of index.
def gold_to_yield(df):
    teny_returns = df["US-10Y"].pct_change().fillna(0)
    gold_returns = df["Gold"].pct_change().fillna(0)
    df["gold_r__div__teny_r"] = gold_returns / teny_returns
    return df

X_train = gold_to_yield(X_train.copy())
X_test = gold_to_yield(X_test.copy())

X_train.head()

Unnamed: 0_level_0,FTSE,EuroStoxx50,SP500,Gold,French-2Y,French-5Y,French-10Y,French-30Y,US-2Y,US-5Y,...,EuroStox_Small,FTSE_Small,MSCI_EM,CRB,SP500_BB_high,SP500_BB_low,FTSE_MA,Gold_ROC,SP500_zscore,gold_r__div__teny_r
Dates,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1989-02-01,2039.7,875.47,297.09,392.5,99.081,99.039,99.572,100.0,100.031,100.345,...,117.5,1636.57,133.584,286.67,,,,,-1.502403,
1989-02-02,2043.4,878.08,296.84,392.0,98.898,99.117,99.278,99.692,100.0,100.314,...,117.69,1642.94,135.052,287.03,,,,,-1.502854,2.043878
1989-02-03,2069.9,884.09,296.97,388.75,98.907,99.002,99.145,99.178,99.812,100.062,...,118.62,1659.11,137.134,285.63,,,,,-1.50262,8.724098
1989-02-06,2044.3,885.49,296.04,388.0,98.484,98.502,98.51,97.739,99.812,100.062,...,118.89,1656.86,137.037,284.69,,,,,-1.504298,1.533094
1989-02-07,2072.8,883.82,299.63,392.75,98.438,98.312,98.292,97.688,99.906,100.251,...,118.28,1662.76,136.914,284.21,,,,,-1.497821,3.525563


In [None]:
## Interacting 1 (Add code below)
df["SP500_FTSE_interaction"] = df["SP500"] * df["FTSE"]

In [None]:
## Interacting 2 (Add code below)
df["FTSE_US10Y_interaction"] = df["FTSE"] / (df["US-10Y"].replace(0, np.nan))

In [None]:
## Interacting 3 (Add code below)
df["Gold_SP500_interaction"] = df["Gold"] + df["SP500"]

### Mapping

This one is slightly harder, you have to identify other  dimensionality reduction methods, there are many more than just PCA. Maybe you can also look at performing the decompositions just on a single asset classes, e.g., US-2Y, US-5Y, US-10Y, US-30Y is a fixed income asset class, but there are a few others in the dataset.

In [None]:
#The code tries to assign the result of pca.fit_transform() to a DataFrame column but misuses the fillna(0) function on an already transformed array.
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import numpy as np

def pca_first(X_train, X_test):
    sc = StandardScaler()

    X_train.replace([np.inf, -np.inf], np.nan, inplace=True)
    X_test.replace([np.inf, -np.inf], np.nan, inplace=True)

    X_train_s = sc.fit_transform(X_train.fillna(0))
    X_test_s = sc.transform(X_test.fillna(0))

    pca = PCA(n_components=1)
    X_train["first_principal"] = pca.fit_transform(X_train_s)
    X_test["first_principal"] = pca.transform(X_test_s)

    return X_train, X_test

X_train, X_test = pca_first(X_train.copy(), X_test.copy())

X_train.head()

Unnamed: 0_level_0,FTSE,EuroStoxx50,SP500,Gold,French-2Y,French-5Y,French-10Y,French-30Y,US-2Y,US-5Y,...,FTSE_Small,MSCI_EM,CRB,SP500_BB_high,SP500_BB_low,FTSE_MA,Gold_ROC,SP500_zscore,gold_r__div__teny_r,first_principal
Dates,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1989-02-01,2039.7,875.47,297.09,392.5,99.081,99.039,99.572,100.0,100.031,100.345,...,1636.57,133.584,286.67,,,,,-1.502403,,-6.608872
1989-02-02,2043.4,878.08,296.84,392.0,98.898,99.117,99.278,99.692,100.0,100.314,...,1642.94,135.052,287.03,,,,,-1.502854,2.043878,-6.607859
1989-02-03,2069.9,884.09,296.97,388.75,98.907,99.002,99.145,99.178,99.812,100.062,...,1659.11,137.134,285.63,,,,,-1.50262,8.724098,-6.612638
1989-02-06,2044.3,885.49,296.04,388.0,98.484,98.502,98.51,97.739,99.812,100.062,...,1656.86,137.037,284.69,,,,,-1.504298,1.533094,-6.640477
1989-02-07,2072.8,883.82,299.63,392.75,98.438,98.312,98.292,97.688,99.906,100.251,...,1662.76,136.914,284.21,,,,,-1.497821,3.525563,-6.620515


In [None]:
## Mapping 1 (Add code below)
from sklearn.decomposition import KernelPCA

def kernel_pca_mapping(X_train, X_test):
    kpca = KernelPCA(n_components=2, kernel='rbf')
    X_train_kpca = kpca.fit_transform(X_train.fillna(0))
    X_test_kpca = kpca.transform(X_test.fillna(0))
    return X_train_kpca, X_test_kpca

X_train_kpca, X_test_kpca = kernel_pca_mapping(X_train, X_test)

In [None]:
## Mapping 2 (Add code below)
from sklearn.decomposition import TruncatedSVD

def svd_mapping(X_train, X_test):
    svd = TruncatedSVD(n_components=2)
    X_train_svd = svd.fit_transform(X_train.fillna(0))
    X_test_svd = svd.transform(X_test.fillna(0))
    return X_train_svd, X_test_svd

X_train_svd, X_test_svd = svd_mapping(X_train, X_test)

In [None]:
## Mapping 3 (Add code below)
from sklearn.decomposition import FastICA

def ica_mapping(X_train, X_test):
    ica = FastICA(n_components=2)
    X_train_ica = ica.fit_transform(X_train.fillna(0))
    X_test_ica = ica.transform(X_test.fillna(0))
    return X_train_ica, X_test_ica

X_train_ica, X_test_ica = ica_mapping(X_train, X_test)

Extracting

In [None]:
#The code attempted to rename columns in a way that could conflict with existing column names.
import pandas as pd
import numpy as np

def vola(df):
    volatility = df.pct_change().rolling(window=365).std() * (365**0.5)

    volatility.replace([np.inf, -np.inf], np.nan, inplace=True)
    volatility.fillna(0, inplace=True)

    new_names = [col + '_vol' for col in df.columns]
    volatility.columns = new_names

    df = pd.concat([df, volatility], axis=1)

    return df
X_train = vola(X_train.copy())
X_test = vola(X_test.copy())

X_train.head()

  volatility = df.pct_change().rolling(window=365).std() * (365**0.5)
  volatility = df.pct_change().rolling(window=365).std() * (365**0.5)


Unnamed: 0_level_0,FTSE,EuroStoxx50,SP500,Gold,French-2Y,French-5Y,French-10Y,French-30Y,US-2Y,US-5Y,...,FTSE_Small_vol,MSCI_EM_vol,CRB_vol,SP500_BB_high_vol,SP500_BB_low_vol,FTSE_MA_vol,Gold_ROC_vol,SP500_zscore_vol,gold_r__div__teny_r_vol,first_principal_vol
Dates,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1989-02-01,2039.7,875.47,297.09,392.5,99.081,99.039,99.572,100.0,100.031,100.345,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1989-02-02,2043.4,878.08,296.84,392.0,98.898,99.117,99.278,99.692,100.0,100.314,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1989-02-03,2069.9,884.09,296.97,388.75,98.907,99.002,99.145,99.178,99.812,100.062,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1989-02-06,2044.3,885.49,296.04,388.0,98.484,98.502,98.51,97.739,99.812,100.062,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1989-02-07,2072.8,883.82,299.63,392.75,98.438,98.312,98.292,97.688,99.906,100.251,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
## Extracting 1 (Add code below)
def moving_average(df, window=30):
    df.loc[:, "FTSE_MA"] = df["FTSE"].rolling(window=window).mean()
    return df

X_train = moving_average(X_train)
X_test = moving_average(X_test)


In [None]:
## Extracting 2 (Add code below)
def bollinger_bands(df, window=30):
    rolling_mean = df["SP500"].rolling(window=window).mean()
    rolling_std = df["SP500"].rolling(window=window).std()
    df.loc[:, "SP500_BB_high"] = rolling_mean + (rolling_std * 2)
    df.loc[:, "SP500_BB_low"] = rolling_mean - (rolling_std * 2)
    return df

X_train = bollinger_bands(X_train)
X_test = bollinger_bands(X_test)

In [None]:
## Extracting 3 (Add code below)
def price_rate_of_change(df, window=30):
    df.loc[:, "Gold_ROC"] = df["Gold"].pct_change(periods=window)
    return df

X_train = price_rate_of_change(X_train)
X_test = price_rate_of_change(X_test)

## Deep Learning Binary Classification

* For the deep learning model you can perform new data preprocessing methods and new feature engineering that are better suited to neural networks. You can also use all or some of the features you developed above (most features work in deep learning models as long as they are normalized).
* It is very hard to predict the stock price, so in my grading I will look more at the quality of the model you process (e.g., that there is no data leakage, that you performed some hyperparameter tuning).
* Make sure that you switch your GPU on, you have access to it on Colab. The training stage also takes long, you might want to use a smaller amount of data, or fewer epochs at first to speed up your development process.
* After your training is done, you don't have to save your model, but you do have to print the performance of your model. You can report two metrics the ROC(AUC) and the Accuracy against the test set.
* Also remember to set the random seed (random state) so that when I run your software, I get similar results (the results doesn't have to be exactely the same).
* You can choose any type of deep learning archetecture, e.g., LSTM, GRU, CNN, it is up to you.
* Remember that this section is less that 25% of the grade, so don't waste your time here.
* And lastly, remember this is the stock market, so it is **difficult** to have an accuracy above 50%, good luck!

In [None]:
## Implement Here
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM

# Building a simple LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(X_train.shape[1], 1)))
model.add(LSTM(50))
model.add(Dense(1, activation='sigmoid'))

# Compiling the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Reshape the data to match LSTM input requirements
X_train_reshaped = X_train.values.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test_reshaped = X_test.values.reshape((X_test.shape[0], X_test.shape[1], 1))

# Train the model
model.fit(X_train_reshaped, y_train, epochs=10, batch_size=64, validation_data=(X_test_reshaped, y_test))

# Evaluate the model
loss, accuracy = model.evaluate(X_test_reshaped, y_test)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")


  super().__init__(**kwargs)


Epoch 1/10
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 27ms/step - accuracy: 0.4876 - loss: nan - val_accuracy: 0.4460 - val_loss: nan
Epoch 2/10
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 23ms/step - accuracy: 0.4774 - loss: nan - val_accuracy: 0.4460 - val_loss: nan
Epoch 3/10
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 52ms/step - accuracy: 0.4790 - loss: nan - val_accuracy: 0.4460 - val_loss: nan
Epoch 4/10
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 52ms/step - accuracy: 0.4866 - loss: nan - val_accuracy: 0.4460 - val_loss: nan
Epoch 5/10
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 29ms/step - accuracy: 0.4721 - loss: nan - val_accuracy: 0.4460 - val_loss: nan
Epoch 6/10
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 25ms/step - accuracy: 0.4817 - loss: nan - val_accuracy: 0.4460 - val_loss: nan
Epoch 7/10
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m

In [None]:
pip install --upgrade scikit-learn scikeras



In [None]:
pip install scikeras



In [None]:
!pip install scikit-learn==1.0.2
!pip install scikeras==0.4.0
!pip install tensorflow==2.8.0

Collecting scikit-learn==1.0.2
  Downloading scikit_learn-1.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Downloading scikit_learn-1.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.5/26.5 MB[0m [31m27.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: scikit-learn
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 1.5.1
    Uninstalling scikit-learn-1.5.1:
      Successfully uninstalled scikit-learn-1.5.1
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
bigframes 1.15.0 requires scikit-learn>=1.2.2, but you have scikit-learn 1.0.2 which is incompatible.
scikeras 0.13.0 requires scikit-learn>=1.4.2, but you have scikit-learn 1.0.2 which is incompatible.[0m[31m
[0mSuccessfully instal

Collecting scikeras==0.4.0
  Downloading scikeras-0.4.0-py3-none-any.whl.metadata (3.0 kB)
Downloading scikeras-0.4.0-py3-none-any.whl (26 kB)
Installing collected packages: scikeras
  Attempting uninstall: scikeras
    Found existing installation: scikeras 0.13.0
    Uninstalling scikeras-0.13.0:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/base_command.py", line 179, in exc_logging_wrapper
    status = run_func(*args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/cli/req_command.py", line 67, in wrapper
    return func(self, options, args)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/commands/install.py", line 455, in run
    installed = install_given_reqs(
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/req/__init__.py", line 65, in install_given_reqs
    uninstalled_pathset = requirement.uninstall(auto_confirm=True)
  File "/usr/local/lib/python3.10/dist-packages/pip/_internal/req

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from sklearn.model_selection import GridSearchCV
from scikeras.wrappers import KerasClassifier

def create_model(optimizer='adam', init='uniform', dropout_rate=0.0):
    model = Sequential()
    model.add(Dense(64, input_dim=X_train.shape[1], kernel_initializer=init, activation='relu'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(32, kernel_initializer=init, activation='relu'))
    model.add(Dense(1, kernel_initializer=init, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

model = KerasClassifier(build_fn=create_model, verbose=0)

param_grid = {
    'batch_size': [10, 20, 40],
    'epochs': [10, 50, 100],
    'optimizer': ['SGD', 'Adam'],
    'init': ['glorot_uniform', 'normal', 'uniform'],
    'dropout_rate': [0.0, 0.1, 0.2]
}

grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=5, scoring='accuracy', verbose=2)
grid_result = grid.fit(X_train, y_train)

print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")



NameError: name 'X_train' is not defined