Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DefaultAlgorithm errors out during cross validation with encoder and feature selection interaction #2904

Closed
jeremyliweishih opened this issue Oct 12, 2021 · 0 comments · Fixed by #2944
Assignees
Labels
bug Issues tracking problems with existing features.

Comments

@jeremyliweishih
Copy link
Collaborator

Related to #2903

After running a naive pipeline with preprocessing and a random forest estimator, we use the random forest estimator to create a feature selection component. However, since the feature selection component selects columns after the encoder, these columns are post-encoding columns. In the scenario where different values are provided to the encoder during validation, the encoder may not provide the columns selected by the feature selector and the feature selector will error out since it cannot find the same columns. For example below, an original column would be INTERNODE_21 but post-encoding these columns would become INTERNODE_21_X. However, during cross validation certain values are not given to the encoder and these columns are created and will error out during feature selection.

Fold 0: Exception during automl search: Columns of type {'INTERNODE_21_3.5', 'INTERNODE_9_6.1', 'INTERNODE_6_1.0', 'INTERNODE_23_2.6', 'INTERNODE_19_0.8', 'INTERNODE_4_0.5', 'INTERNODE_10_0.6', 'INTERNODE_19_1.3', 'INTERNODE_25_3.1', 'INTERNODE_26_2.2', 'INTERNODE_16_0.5', 'INTERNODE_14_4.6', 'INTERNODE_19_2.1', 'INTERNODE_16_4.3', 'INTERNODE_13_5.1', 'INTERNODE_25_?', 'INTERNODE_18_3.3', 'INTERNODE_11_4.8', 'INTERNODE_10_6.0', 'INTERNODE_16_0.6', 'INTERNODE_8_4.5', 'INTERNODE_17_1.9', 'INTERNODE_6_4.5', 'INTERNODE_13_6.0', 'INTERNODE_7_3.2', 'INTERNODE_7_8.6', 'INTERNODE_20_5.1', 'INTERNODE_20_5.5', 'INTERNODE_8_3.0', 'INTERNODE_9_5.7', 'INTERNODE_12_6.6', 'INTERNODE_18_0.6', 'INTERNODE_15_1.7', 'INTERNODE_16_0.4', 'INTERNODE_11_5.8', 'INTERNODE_3_2.2', 'INTERNODE_28_0.9', 'INTERNODE_11_6.2', 'INTERNODE_22_0.4', 'INTERNODE_8_3.3', 'INTERNODE_14_4.0', 'INTERNODE_18_3.4', 'INTERNODE_20_0.2', 'INTERNODE_11_7.1', 'INTERNODE_21_3.8', 'INTERNODE_22_2.0', 'INTERNODE_15_4.5', 'INTERNODE_11_2.3', 'INTERNODE_29_0.3', 'INTERNODE_12_5.1', 'INTERNODE_27_1.2', 'INTERNODE_2_1.5', 'INTERNODE_13_4.0', 'INTERNODE_14_3.4', 'INTERNODE_18_2.6', 'INTERNODE_5_2.2', 'INTERNODE_12_5.6', 'INTERNODE_13_5.5', 'INTERNODE_5_1.3', 'INTERNODE_19_1.2', 'INTERNODE_11_8.5', 'INTERNODE_5_4.2', 'INTERNODE_9_5.8', 'INTERNODE_17_3.4', 'INTERNODE_15_1.5', 'INTERNODE_13_6.1', 'INTERNODE_15_2.3', 'INTERNODE_17_3.2', 'INTERNODE_6_3.6', 'INTERNODE_10_6.3', 'INTERNODE_13_7.4', 'TREE_Q19', 'INTERNODE_17_1.3', 'INTERNODE_17_0.8', 'INTERNODE_5_2.8', 'INTERNODE_9_5.2', 'INTERNODE_14_2.4', 'INTERNODE_15_3.8', 'INTERNODE_4_3.1', 'INTERNODE_22_2.6', 'INTERNODE_10_7.2', 'INTERNODE_14_5.8', 'INTERNODE_4_1.0', 'INTERNODE_13_4.9', 'TREE_M25', 'INTERNODE_15_3.1', 'INTERNODE_15_3.7', 'INTERNODE_15_5.5', 'INTERNODE_18_1.3', 'INTERNODE_16_5.1', 'INTERNODE_12_7.5', 'INTERNODE_19_1.7', 'INTERNODE_6_4.2', 'INTERNODE_7_4.3', 'INTERNODE_24_2.7', 'INTERNODE_17_4.2', 'INTERNODE_20_3.1', 'INTERNODE_14_3.2', 'INTERNODE_19_5.6', 'INTERNODE_18_0.2', 'INTERNODE_20_0.6', 'INTERNODE_12_4.6', 'INTERNODE_3_1.2', 'INTERNODE_12_6.8'} not found in input data.
			Fold 0: Parameters:
	{'Imputer': {'categorical_impute_strategy': 'most_frequent', 'numeric_impute_strategy': 'mean', 'categorical_fill_value': None, 'numeric_fill_value': None}, 'One Hot Encoder': {'top_n': 10, 'features_to_encode': None, 'categories': None, 'drop': 'if_binary', 'handle_unknown': 'ignore', 'handle_missing': 'error'}, 'Select Columns Transformer': {'columns': ['N', 'TL', 'IN', 'INTERNODE_1', 'TREE_D13', 'TREE_D18', 'TREE_D22', 'TREE_G27', 'TREE_J31', 'TREE_M25', 'TREE_Q17', 'TREE_Q19', 'TREE_Q23', 'TREE_Q25', 'BR_?', 'BR_A', 'BR_B', 'BR_C', 'BR_D', 'BR_E', 'BR_F', 'BR_G', 'BR_H', 'BR_I', 'INTERNODE_2_0.3', 'INTERNODE_2_0.4', 'INTERNODE_2_0.5', 'INTERNODE_2_0.6', 'INTERNODE_2_0.7', 'INTERNODE_2_0.8', 'INTERNODE_2_1.0', 'INTERNODE_2_1.1', 'INTERNODE_2_1.5', 'INTERNODE_2_?', 'INTERNODE_3_0.2', 'INTERNODE_3_0.3', 'INTERNODE_3_0.4', 'INTERNODE_3_0.5', 'INTERNODE_3_0.6', 'INTERNODE_3_0.7', 'INTERNODE_3_1.0', 'INTERNODE_3_1.2', 'INTERNODE_3_2.2', 'INTERNODE_3_?', 'INTERNODE_4_0.2', 'INTERNODE_4_0.3', 'INTERNODE_4_0.4', 'INTERNODE_4_0.5', 'INTERNODE_4_0.6', 'INTERNODE_4_0.8', 'INTERNODE_4_1.0', 'INTERNODE_4_1.5', 'INTERNODE_4_3.1', 'INTERNODE_4_?', 'INTERNODE_5_0.2', 'INTERNODE_5_0.3', 'INTERNODE_5_0.7', 'INTERNODE_5_1.1', 'INTERNODE_5_1.3', 'INTERNODE_5_2.2', 'INTERNODE_5_2.8', 'INTERNODE_5_4.2', 'INTERNODE_5_6.8', 'INTERNODE_5_?', 'INTERNODE_6_0.2', 'INTERNODE_6_0.3', 'INTERNODE_6_0.4', 'INTERNODE_6_1.0', 'INTERNODE_6_1.7', 'INTERNODE_6_2.8', 'INTERNODE_6_3.6', 'INTERNODE_6_4.2', 'INTERNODE_6_4.5', 'INTERNODE_6_?', 'INTERNODE_7_0.2', 'INTERNODE_7_0.3', 'INTERNODE_7_0.7', 'INTERNODE_7_0.8', 'INTERNODE_7_1.5', 'INTERNODE_7_3.2', 'INTERNODE_7_4.3', 'INTERNODE_7_6.6', 'INTERNODE_7_8.6', 'INTERNODE_7_?', 'INTERNODE_8_0.2', 'INTERNODE_8_0.3', 'INTERNODE_8_0.4', 'INTERNODE_8_1.5', 'INTERNODE_8_3.0', 'INTERNODE_8_3.1', 'INTERNODE_8_3.3', 'INTERNODE_8_4.0', 'INTERNODE_8_4.5', 'INTERNODE_8_?', 'INTERNODE_9_0.2', 'INTERNODE_9_0.3', 'INTERNODE_9_1.7', 'INTERNODE_9_5.2', 'INTERNODE_9_5.3', 'INTERNODE_9_5.7', 'INTERNODE_9_5.8', 'INTERNODE_9_6.1', 'INTERNODE_9_6.3', 'INTERNODE_9_?', 'INTERNODE_10_0.2', 'INTERNODE_10_0.6', 'INTERNODE_10_1.1', 'INTERNODE_10_1.6', 'INTERNODE_10_3.3', 'INTERNODE_10_6.0', 'INTERNODE_10_6.3', 'INTERNODE_10_6.5', 'INTERNODE_10_7.2', 'INTERNODE_10_?', 'INTERNODE_11_0.2', 'INTERNODE_11_0.3', 'INTERNODE_11_0.5', 'INTERNODE_11_2.3', 'INTERNODE_11_4.8', 'INTERNODE_11_5.8', 'INTERNODE_11_6.2', 'INTERNODE_11_7.1', 'INTERNODE_11_8.5', 'INTERNODE_11_?', 'INTERNODE_12_0.4', 'INTERNODE_12_1.7', 'INTERNODE_12_4.6', 'INTERNODE_12_5.1', 'INTERNODE_12_5.2', 'INTERNODE_12_5.6', 'INTERNODE_12_6.6', 'INTERNODE_12_6.8', 'INTERNODE_12_7.5', 'INTERNODE_12_?', 'INTERNODE_13_0.2', 'INTERNODE_13_4.0', 'INTERNODE_13_4.9', 'INTERNODE_13_5.1', 'INTERNODE_13_5.3', 'INTERNODE_13_5.5', 'INTERNODE_13_6.0', 'INTERNODE_13_6.1', 'INTERNODE_13_7.4', 'INTERNODE_13_?', 'INTERNODE_14_2.4', 'INTERNODE_14_2.5', 'INTERNODE_14_3.2', 'INTERNODE_14_3.4', 'INTERNODE_14_4.0', 'INTERNODE_14_4.6', 'INTERNODE_14_5.8', 'INTERNODE_14_6.0', 'INTERNODE_14_6.3', 'INTERNODE_14_?', 'INTERNODE_15_0.2', 'INTERNODE_15_1.5', 'INTERNODE_15_1.7', 'INTERNODE_15_2.3', 'INTERNODE_15_3.1', 'INTERNODE_15_3.7', 'INTERNODE_15_3.8', 'INTERNODE_15_4.5', 'INTERNODE_15_5.5', 'INTERNODE_15_?', 'INTERNODE_16_0.2', 'INTERNODE_16_0.4', 'INTERNODE_16_0.5', 'INTERNODE_16_0.6', 'INTERNODE_16_2.5', 'INTERNODE_16_3.7', 'INTERNODE_16_4.3', 'INTERNODE_16_4.5', 'INTERNODE_16_5.1', 'INTERNODE_16_?', 'INTERNODE_17_0.3', 'INTERNODE_17_0.8', 'INTERNODE_17_1.3', 'INTERNODE_17_1.9', 'INTERNODE_17_2.4', 'INTERNODE_17_3.2', 'INTERNODE_17_3.4', 'INTERNODE_17_4.2', 'INTERNODE_17_6.2', 'INTERNODE_17_?', 'INTERNODE_18_0.2', 'INTERNODE_18_0.3', 'INTERNODE_18_0.4', 'INTERNODE_18_0.6', 'INTERNODE_18_0.8', 'INTERNODE_18_1.3', 'INTERNODE_18_2.6', 'INTERNODE_18_3.3', 'INTERNODE_18_3.4', 'INTERNODE_18_?', 'INTERNODE_19_0.2', 'INTERNODE_19_0.8', 'INTERNODE_19_1.2', 'INTERNODE_19_1.3', 'INTERNODE_19_1.6', 'INTERNODE_19_1.7', 'INTERNODE_19_2.1', 'INTERNODE_19_4.6', 'INTERNODE_19_5.6', 'INTERNODE_19_?', 'INTERNODE_20_0.2', 'INTERNODE_20_0.3', 'INTERNODE_20_0.6', 'INTERNODE_20_0.8', 'INTERNODE_20_2.0', 'INTERNODE_20_3.1', 'INTERNODE_20_3.3', 'INTERNODE_20_5.1', 'INTERNODE_20_5.5', 'INTERNODE_20_?', 'INTERNODE_21_0.3', 'INTERNODE_21_0.5', 'INTERNODE_21_1.2', 'INTERNODE_21_1.6', 'INTERNODE_21_2.1', 'INTERNODE_21_3.5', 'INTERNODE_21_3.8', 'INTERNODE_21_?', 'INTERNODE_22_0.2', 'INTERNODE_22_0.4', 'INTERNODE_22_0.6', 'INTERNODE_22_0.8', 'INTERNODE_22_2.0', 'INTERNODE_22_2.6', 'INTERNODE_22_?', 'INTERNODE_23_0.6', 'INTERNODE_23_2.6', 'INTERNODE_23_?', 'INTERNODE_24_0.7', 'INTERNODE_24_2.7', 'INTERNODE_24_?', 'INTERNODE_25_0.3', 'INTERNODE_25_3.1', 'INTERNODE_25_?', 'INTERNODE_26_2.2', 'INTERNODE_27_1.2', 'INTERNODE_28_0.9', 'INTERNODE_29_0.3']}, 'Decision Tree Classifier': {'criterion': 'gini', 'max_features': 'auto', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0}}
			Fold 0: Traceback:
  File "/Users/jeremy.shih/.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/evalml/automl/engine/engine_base.py", line 207, in train_and_score_pipeline
    cv_pipeline = train_pipeline(

  File "/Users/jeremy.shih/.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/evalml/automl/engine/engine_base.py", line 136, in train_pipeline
    cv_pipeline.fit(X, y)

  File "/Users/jeremy.shih/.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/evalml/utils/base_meta.py", line 19, in _set_fit
    return_value = method(self, X, y)

  File "/Users/jeremy.shih/.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/evalml/pipelines/classification_pipeline.py", line 53, in fit
    self._fit(X, y)

  File "/Users/jeremy.shih/.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/evalml/pipelines/pipeline_base.py", line 272, in _fit
    self.component_graph.fit(X, y)

  File "/Users/jeremy.shih/.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/evalml/pipelines/component_graph.py", line 189, in fit
    self._compute_features(self.compute_order, X, y, fit=True)

  File "/Users/jeremy.shih/.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/evalml/pipelines/component_graph.py", line 365, in _compute_features
    output = component_instance.fit_transform(x_inputs, y_input)

  File "/Users/jeremy.shih/.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/evalml/utils/base_meta.py", line 19, in _set_fit
    return_value = method(self, X, y)

  File "/Users/jeremy.shih/.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/evalml/pipelines/components/transformers/transformer.py", line 65, in fit_transform
    return self.fit(X_ww, y_ww).transform(X_ww, y_ww)

  File "/Users/jeremy.shih/.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/evalml/utils/base_meta.py", line 19, in _set_fit
    return_value = method(self, X, y)

  File "/Users/jeremy.shih/.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/evalml/pipelines/components/transformers/column_selectors.py", line 51, in fit
    self._check_input_for_columns(X)

  File "/Users/jeremy.shih/.pyenv/versions/3.8.8/envs/evalml38/lib/python3.8/site-packages/evalml/pipelines/components/transformers/column_selectors.py", line 34, in _check_input_for_columns
    raise ValueError(f"Columns of type {missing_cols} not found in input data.")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issues tracking problems with existing features.
Projects
None yet
1 participant