Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TabNetRegressor vs other networks #523

Closed
dcarrion87 opened this issue Oct 28, 2023 · 1 comment
Closed

TabNetRegressor vs other networks #523

dcarrion87 opened this issue Oct 28, 2023 · 1 comment
Assignees

Comments

@dcarrion87
Copy link

dcarrion87 commented Oct 28, 2023

Describe the bug

I've been testing a few networks with my data and finding TabNetRegressor predictions are wildly different to RandomForestGenerator and a basic PyTorch Linear Regression network.

Code looks like this:

train_data = pd.read_excel(config.TRAIN_DATA_FILE)
validation_data = pd.read_excel(config.VALIDATION_DATA_FILE)

X_train = train_data.drop(columns=['Months','ID'])
X_val = validation_data.drop(columns=['Months','ID'])
y_train_mth = train_data['Months']
y_val_mth = validation_data['Months]

imputer = SimpleImputer(strategy='mean')
X_train = imputer.fit_transform(X_train)
X_val = imputer.transform(X_val)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)

y_train_mth = y_train_mth.values.reshape(-1, 1)
y_val_mth = y_val_mth.values.reshape(-1, 1)

regressor = TabNetRegressor(verbose=1,seed=42)
regressor.fit(
    X_train=X_train, 
    y_train=y_train_mth,
    eval_set=[(X_val, y_val_mth)],
    virtual_batch_size=64,
    eval_metric=['rmse'],
)

y_val_pred = regressor.predict(X_val)

for true_val, pred_val in zip(y_val_mth, y_val_pred):
    print(f"True: {true_val[0]}, Predicted: {pred_val[0]}")

Data looks like this:

ID	Months	Max	Volume	A1	A2	A3	A4	A5 A...
1	20.47	7.26346	488601	9.99133	15.7748	4.87628	2.38E+06	41
2	89.23	15.4819	101610	16.0093	22.9652	8.06708	819696	3
3	24.57	4.18762	26165.2	5.00004	5.83497	4.4945	117598	7

What is the current behavior?

The output looks like this and the predicted values are wildly wrong.

epoch 0  | loss: 0.0     | val_0_rmse: 52.95739|  0:00:00s
epoch 1  | loss: 0.0     | val_0_rmse: 52.95739|  0:00:00s
epoch 2  | loss: 0.0     | val_0_rmse: 52.95739|  0:00:00s
epoch 3  | loss: 0.0     | val_0_rmse: 52.95739|  0:00:00s
epoch 4  | loss: 0.0     | val_0_rmse: 52.95739|  0:00:00s
epoch 5  | loss: 0.0     | val_0_rmse: 52.95739|  0:00:00s
epoch 6  | loss: 0.0     | val_0_rmse: 52.95739|  0:00:00s
epoch 7  | loss: 0.0     | val_0_rmse: 52.95739|  0:00:00s
epoch 8  | loss: 0.0     | val_0_rmse: 52.95739|  0:00:00s
epoch 9  | loss: 0.0     | val_0_rmse: 52.95739|  0:00:00s
epoch 10 | loss: 0.0     | val_0_rmse: 52.95739|  0:00:00s

Early stopping occurred at epoch 10 with best_epoch = 0 and best_val_0_rmse = 52.95739
.../pytorch_tabnet/callbacks.py:172: UserWarning: Best weights from best epoch are automatically used!
  warnings.warn(wrn_msg)
True: 12.98, Predicted: -0.04269159585237503
True: 67.55, Predicted: -0.0058983564376831055
True: 56.64, Predicted: -0.4818570613861084
True: 9.03, Predicted: 0.05411398410797119
True: 54.01, Predicted: -0.0857810527086258

Expected behavior

Should look closer to:

True: 12.98, Predicted: 30.733763574218806
True: 67.55, Predicted: 58.54040414611832
True: 56.64, Predicted: 60.1098913061525
True: 9.03, Predicted: 16.965372472534174
True: 54.01, Predicted: 64.88073784667964

Thanks for any insight!

@dcarrion87 dcarrion87 added the bug Something isn't working label Oct 28, 2023
@dcarrion87
Copy link
Author

Nevermind, setting batch_size fixed it!

@Optimox Optimox removed the bug Something isn't working label Oct 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants