Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results when running multiple jobs in parallel #13

Closed
4Freye opened this issue May 13, 2024 · 1 comment
Closed

Different results when running multiple jobs in parallel #13

4Freye opened this issue May 13, 2024 · 1 comment

Comments

@4Freye
Copy link
Owner

4Freye commented May 13, 2024

I discovered an error when setting n_jobs == anything other than -1. Notably the predictions are slightly different than when running only one job (the default).

Until this issue is resolved I reccommend that n_jobs be set to 1 when using cross_val_fit_predict, cross_val_fit, and cross_val_predict. Otherwise the predicted values may not be optimal.

from sklearn.metrics import mean_squared_error
import numpy as np
import pandas as pd

# create dataframe
np.random.seed(1)
df = pd.DataFrame(np.random.random((40,10)))
df['entity'] = np.repeat(['A','B','C', 'D'], 10)
df['time'] = list(range(10)) * 4
df.set_index(['entity','time'], inplace=True)

# specify model 
model = RandomForestRegressor(n_estimators=10, random_state=1)

# run panel split: initialize, generate test labels, and fit and predict on data with and without parallel
ps = PanelSplit(periods = pd.Series(df.index.get_level_values('time')), n_splits = 5)
pred_df = ps.gen_test_labels(df.iloc[:, 0].reset_index())
pred_df['pred'], _ = ps.cross_val_fit_predict(model, X= df.iloc[:, 1:], y=df.iloc[:, 0])
pred_df['pred_parallel'], _ = ps.cross_val_fit_predict(model, X= df.iloc[:, 1:], y=df.iloc[:, 0], n_jobs = -1)

# see if the predictions are the same in parallel and not. The output is False.
print(mean_squared_error(pred_df[0], pred_df['pred']) == mean_squared_error(pred_df[0], pred_df['pred_parallel']))```
@4Freye
Copy link
Owner Author

4Freye commented May 14, 2024

Closed as it was fixed when this issue was resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant