Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.NullPointerException with predict on a model_fit object #25

Open
spsanderson opened this issue Dec 11, 2022 · 0 comments
Open

Comments

@spsanderson
Copy link

spsanderson commented Dec 11, 2022

Data:_
testing_data.csv
training_data.csv

Script:

model_spec <- automl_reg(mode = 'regression') %>%
  set_engine(
    engine                     = 'h2o',
    max_runtime_secs           = 60 * 60, 
    max_runtime_secs_per_model = 60 * 30,
    max_models                 = 15,
    nfolds                     = 5,
    #exclude_algos              = c("DeepLearning"),
    verbosity                  = NULL,
    seed                       = 786
  ) 

model_spec

model_fitted <- model_spec %>%
  fit(posting_amount_positive ~ ., data = training_data)

automl_leaderboard(model_fitted)

automl_update_model(
  model_fitted, 
  model_id = automl_leaderboard(model_fitted) %>%
    slice(1) %>%
    pull(model_id)
)

predict(model_fitted, testing_data)

This produces this failure:

> model_spec <- automl_reg(mode = 'regression') %>%
+   set_engine(
+     engine                     = 'h2o',
+     max_runtime_secs           = 60 * 60, 
+     max_runtime_secs_per_model = 60 * 30,
+     max_models                 = 15,
+     nfolds                     = 5,
+     #exclude_algos              = c("DeepLearning"),
+     verbosity                  = NULL,
+     seed                       = 786
+   ) 
> model_spec
H2O AutoML Model Specification (regression)

Engine-Specific Arguments:
  max_runtime_secs = 60 * 60
  max_runtime_secs_per_model = 60 * 30
  max_models = 15
  nfolds = 5
  verbosity = NULL
  seed = 786

Computational engine: h2o 

> model_fitted <- model_spec %>%
+   fit(posting_amount_positive ~ ., data = training(daily_splits))
Converting to H2OFrame...
  |=======================================================================================| 100%

Training H2O AutoML...
  |=======================================================================================| 100%
  |=======================================================================================| 100%


Leaderboard: 
                                                 model_id      rmse       mse       mae rmsle
1 StackedEnsemble_BestOfFamily_1_AutoML_7_20221211_140030 0.9665065 0.9341348 0.3746044   NaN
2    StackedEnsemble_AllModels_1_AutoML_7_20221211_140030 0.9667923 0.9346873 0.3753276   NaN
3    DeepLearning_grid_3_AutoML_7_20221211_140030_model_1 0.9685012 0.9379946 0.3743257   NaN
4    DeepLearning_grid_2_AutoML_7_20221211_140030_model_1 0.9734572 0.9476189 0.3956395   NaN
5             GBM_grid_1_AutoML_7_20221211_140030_model_2 0.9818772 0.9640828 0.3952624   NaN
6                          GBM_1_AutoML_7_20221211_140030 0.9822901 0.9648938 0.3997966   NaN
  mean_residual_deviance
1              0.9341348
2              0.9346873
3              0.9379946
4              0.9476189
5              0.9640828
6              0.9648938

[17 rows x 6 columns] 

Using top model: StackedEnsemble_BestOfFamily_1_AutoML_7_20221211_140030
> automl_leaderboard(model_fitted)
# A tibble: 17 x 6
   model_id                                                 rmse   mse   mae rmsle mean_residua~1
   <chr>                                                   <dbl> <dbl> <dbl> <lgl>          <dbl>
 1 StackedEnsemble_BestOfFamily_1_AutoML_7_20221211_140030 0.967 0.934 0.375 NA             0.934
 2 StackedEnsemble_AllModels_1_AutoML_7_20221211_140030    0.967 0.935 0.375 NA             0.935
 3 DeepLearning_grid_3_AutoML_7_20221211_140030_model_1    0.969 0.938 0.374 NA             0.938
 4 DeepLearning_grid_2_AutoML_7_20221211_140030_model_1    0.973 0.948 0.396 NA             0.948
 5 GBM_grid_1_AutoML_7_20221211_140030_model_2             0.982 0.964 0.395 NA             0.964
 6 GBM_1_AutoML_7_20221211_140030                          0.982 0.965 0.400 NA             0.965
 7 GLM_1_AutoML_7_20221211_140030                          0.982 0.965 0.394 NA             0.965
 8 DeepLearning_1_AutoML_7_20221211_140030                 0.983 0.966 0.424 NA             0.966
 9 GBM_grid_1_AutoML_7_20221211_140030_model_1             0.994 0.988 0.411 NA             0.988
10 DeepLearning_grid_1_AutoML_7_20221211_140030_model_1    0.994 0.988 0.404 NA             0.988
11 GBM_grid_1_AutoML_7_20221211_140030_model_3             0.995 0.989 0.413 NA             0.989
12 GBM_2_AutoML_7_20221211_140030                          1.00  1.00  0.431 NA             1.00 
13 GBM_4_AutoML_7_20221211_140030                          1.00  1.01  0.431 NA             1.01 
14 GBM_3_AutoML_7_20221211_140030                          1.01  1.02  0.430 NA             1.02 
15 XRT_1_AutoML_7_20221211_140030                          1.03  1.07  0.457 NA             1.07 
16 GBM_5_AutoML_7_20221211_140030                          1.04  1.07  0.436 NA             1.07 
17 DRF_1_AutoML_7_20221211_140030                          1.04  1.08  0.463 NA             1.08 
# ... with abbreviated variable name 1: mean_residual_deviance
> automl_update_model(
+   model_fitted, 
+   model_id = automl_leaderboard(model_fitted) %>%
+     slice(1) %>%
+     pull(model_id)
+ )
parsnip model object


H2O AutoML - Stackedensemble
--------
Model: Model Details:
==============

H2ORegressionModel: stackedensemble
Model ID:  StackedEnsemble_BestOfFamily_1_AutoML_7_20221211_140030 
Number of Base Models: 5

Base Models (count by algorithm type):

deeplearning          drf          gbm          glm 
           1            2            1            1 

Metalearner:

Metalearner algorithm: glm
Metalearner cross-validation fold assignment:
  Fold assignment scheme: AUTO
  Number of folds: 5
  Fold column: NULL
Metalearner hyperparameters: 


H2ORegressionMetrics: stackedensemble
** Reported on training data. **

MSE:  0.933981
RMSE:  0.9664269
MAE:  0.3746983
RMSLE:  NaN
Mean Residual Deviance :  0.933981



H2ORegressionMetrics: stackedensemble
** Reported on cross-validation data. **
** 5-fold cross-validation on training data (Metrics computed for combined holdout predictions) **

MSE:  0.9341348
RMSE:  0.9665065
MAE:  0.3746044
RMSLE:  NaN
Mean Residual Deviance :  0.9341348


Cross-Validation Metrics Summary: 
                             mean        sd cv_1_valid cv_2_valid cv_3_valid cv_4_valid
mae                      0.374247  0.026903   0.392305   0.405298   0.350108   0.380611
mean_residual_deviance   0.932917  0.179598   1.096220   1.072399   0.729629   1.018487
mse                      0.932917  0.179598   1.096220   1.072399   0.729629   1.018487
null_deviance          394.018070 80.098915 460.412320 469.711000 316.659150 422.672200
r2                      -0.000552  0.000855  -0.000413  -0.000020  -0.000022  -0.000252
residual_deviance      394.018070 80.098915 460.412320 469.711000 316.659150 422.672200
rmse                     0.962148  0.094791   1.047005   1.035567   0.854183   1.009201
rmsle                          NA  0.000000         NA         NA         NA         NA
                       cv_5_valid
mae                      0.342915
mean_residual_deviance   0.747850
mse                      0.747850
null_deviance          300.635680
r2                      -0.002052
residual_deviance      300.635680
rmse                     0.864783
rmsle                          NA
> predict(model_fitted, testing(daily_splits))
Converting to H2OFrame...
  |=======================================================================================| 100%
  |                                                                                       |   0%

java.lang.NullPointerException

java.lang.NullPointerException
	at water.MRTask.dfork(MRTask.java:623)
	at water.MRTask.doAll(MRTask.java:529)
	at water.MRTask.doAll(MRTask.java:549)
	at hex.glm.GLMModel.predictScoreImpl(GLMModel.java:2045)
	at hex.Model.score(Model.java:1938)
	at hex.ensemble.StackedEnsembleModel.predictScoreImpl(StackedEnsembleModel.java:252)
	at hex.Model.score(Model.java:1938)
	at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:497)
	at water.H2O$H2OCountedCompleter.compute(H2O.java:1677)
	at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
	at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
	at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)
	at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
	at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

The error seems to stem from non deep learning models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant