Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AutoMM usage of text in TabularPredictor #2628

Merged
merged 2 commits into from
Jan 4, 2023
Merged

Conversation

Innixma
Copy link
Contributor

@Innixma Innixma commented Jan 3, 2023

Issue #, if available:

Description of changes:

  • Fixed case where AutoMM is specified as a model in TabularPredictor, but the global preprocessor does not keep raw text features in output, leading AutoMM to not use text features.
  • Now this situation is fixed for all future models so long as they specify handles_text=True in _class_tags.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Innixma Innixma added this to the 0.6.2 Release milestone Jan 3, 2023
@github-actions
Copy link

github-actions bot commented Jan 4, 2023

Job PR-2628-384e3f4 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2628/384e3f4/index.html

@@ -1606,8 +1606,14 @@ def _ag_params(self) -> set:
def _features(self):
return self._features_internal

def _get_tags(self):
collected_tags = {}
def _get_tags(self) -> dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussion point: maybe call these capabilities vs tags? Tags seems too generic and not pointing to the main purpose of this labeling: what functionality is supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the API used by sklearn for this logic, so I'd prefer to be consistent with them.


@classmethod
def _class_tags(cls):
return {'handles_text': False}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move these constants to a separate constant class where they can be fully documented + reduces likelihood of typos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scikit-learn just uses strings: https://github.com/scikit-learn/scikit-learn/blob/ac665ad80be586a90c7d353ed37f1e64a4f547a1/sklearn/neighbors/_classification.py#L712-L713

I think we can also just use strings unless it becomes a problem.

is_advanced_hyperparameter_type = False
for key in hyperparameters:
if isinstance(key, int) or key == 'default':
is_advanced_hyperparameter_type = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add break to skip the rest of the cycle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Comment on lines +3499 to +3506
if is_advanced_hyperparameter_type:
for key in hyperparameters:
for m in hyperparameters[key]:
models_in_hyperparameters.add(m)
else:
for key in hyperparameters:
models_in_hyperparameters.add(key)
models_in_hyperparameters_raw_text_compatible = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for key, m in hyperparameters.items():
    models_in_hyperparameters.add(m if is_advanced_hyperparameter_type else key)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x = {1: {'GBM': {}}}
a = set()
is_advanced_hyperparameter_type = True
for key, m in x.items():
    a.add(m if is_advanced_hyperparameter_type else key)
Traceback (most recent call last):
  File "/Users/neerick/workspace/virtual/autogluon38/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3433, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-8-2d35d45650af>", line 2, in <module>
    a.add(m if is_advanced_hyperparameter_type else key)
TypeError: unhashable type: 'dict

The proposed code is not the same and will error.

@Innixma Innixma merged commit c59d751 into master Jan 4, 2023
@github-actions
Copy link

github-actions bot commented Jan 4, 2023

Job PR-2628-58f7dac is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2628/58f7dac/index.html

@Innixma Innixma deleted the fix_text_raw_automm branch January 18, 2023 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants