Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to set parameter ranges in metalearners (misleading error message in "ensemble_model_spec") #20

Closed
lg1000 opened this issue May 17, 2022 · 0 comments

Comments

@lg1000
Copy link

lg1000 commented May 17, 2022

When using the standard modeling workflow, without stacked ensembles, I do not experience any hardship in setting individual parameter ranges like this:

xgb_grid <- grid_latin_hypercube(
learn_rate(range = c(-5.0, -0.1)),
size = 30
)

Also I know, how to update parameters and how to pull them from workflow objects.

What I do not know is, how this works with metalearner stacks. If I am not fundamentally wrong, the argument "param_info" is used for this purpose.
As the documentation of ensemble_model_spec states, param_info can take a.dials parameter object as an input. However, ether I am not getting the concept of dials param objects right, or there is some problem with my code, because my solution is resulting in "all models failed, see .notes column".
This error message is not helpful, as I do not have a tuned object created by for example a tune_grid function. Where do I see a notes column here? From this point on I am stucked, because I have no clear indication about the source of the error.

reprex:

# time series ML
suppressPackageStartupMessages(library(modeltime))
suppressPackageStartupMessages(library(tidymodels))
suppressPackageStartupMessages(library(modeltime.ensemble))
suppressPackageStartupMessages(library(modeltime.resample))
suppressPackageStartupMessages(library(timetk))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(doParallel))

data <- m4_monthly

H = 6
# training + forecast
full_data_tbl <- data %>%   
  group_by(id) %>%
  future_frame(
    .length_out = H,
    .bind_data  = TRUE
  ) %>%
  ungroup() %>% 
  mutate(id = fct_drop(id)
         )
# training and test
data_prepared_tbl <- full_data_tbl %>%
  filter(!is.na(value)
         )
# forecast 
future_tbl <- full_data_tbl %>%
  filter(is.na(value)
         )
# splits
set.seed(544)
splits <- data_prepared_tbl %>%
  time_series_split(
    date_var    = date,
    assess      = H,
    cumulative = TRUE
  )
resamples_tscv <- data_prepared_tbl %>%
  time_series_cv(
    date_var    = date,
    assess      = "6 months",
    skip        = "6 months",
    cumulative  = TRUE,
    slice_limit = 3
  )
#recipe
recipe_spec_mars <- recipe(value ~ .,
                       data = training(splits)
                       ) %>%  
  update_role(date, new_role = "ID") %>%
  step_dummy(all_nominal(), one_hot = TRUE) 
set.seed(522)
wflw_fit_mars <- workflow() %>%
  add_model(
    mars(num_terms = 5, prod_degree = 2,
         mode = "regression"
    ) %>%
      set_engine("earth")
  ) %>%
  add_recipe(recipe_spec_mars) %>%
  fit(training(splits)
      )
# lasso -----------------------
set.seed(522)
wflw_fit_lasso <- workflow() %>%
  add_model(
    linear_reg(penalty = 0.1, mixture = 1,
               mode = "regression"
    ) %>%
      set_engine("glmnet")
  ) %>%
  add_recipe(recipe_spec_mars) %>%
  fit(training(splits)
      )

#### STACK ------------------------
submodels_stacks <- modeltime_table(
  wflw_fit_lasso,
  wflw_fit_mars
)
# fit resamples
cores <- parallel::detectCores(logical = FALSE)
cl <- makePSOCKcluster(cores)
registerDoParallel(cl)
set.seed(234)
submodel_predictions <- submodels_stacks %>%
  modeltime_fit_resamples(
    resamples = resamples_tscv,
    control   = control_resamples(verbose = TRUE)
  )
stopCluster(cl)
# Metalearner XGBOOST
set.seed(123)
cores <- parallel::detectCores(logical = FALSE)
cl <- makePSOCKcluster(cores)
registerDoParallel(cl)
ensemble_fit_xgboost <- submodel_predictions %>%
  ensemble_model_spec(
    model_spec = boost_tree(
      trees          = tune(),
      tree_depth     = tune(),
      learn_rate     = tune(),
      loss_reduction = tune(),
      min_n          = tune(), 
      mtry           = tune()
    ) %>%
      set_engine("xgboost"),
    kfolds = 10,
    grid   = 30,
    param_info = tune::parameters(learn_rate(range = c(-0.5, -0.01))
    ),
    control = control_grid(verbose = TRUE,
                           allow_par = TRUE)
  )
stopCluster(cl)

@lg1000 lg1000 closed this as completed May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant