Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactor][Bug fix] Fix fit_summary + move problem_type outside predictor.py #2578

Merged
merged 24 commits into from
Dec 21, 2022

Conversation

sxjscience
Copy link
Collaborator

@sxjscience sxjscience commented Dec 16, 2022

Issue #, if available:

Description of changes:

  • Fix fit_summary in MultiModalPredictor. Previously, calling predictor.fit_summary() won't work after you have loaded the predictor from folder.
  • Move problem_type registry outside of predictor.py. In addition, register problem type to ease future refactoring.
  • Revise tutorial and fix typo in multimodal NER tutorial filename.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.


name: str # Name of the problem
support_fit: bool = True # Whether the problem type support `.fit()`
inference_ready: bool = False # Support `.predict()` and `.evaluate()` without calling `.fit()`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think inference_ready is better to be a state of predictor.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indicates whether the problem supports zero-shot inference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using name support_zero_shot? I think we may need inference_ready in predictor in later refactoring.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. support_zero_shot is reasonable.

@github-actions
Copy link

Job PR-2578-27d2029 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2578/27d2029/index.html


Usually you do not need to preprocess the text / image data. AutoGluon Multimodal has built-in
support of text / image preprocessing. However, this won't block you from appending custom preprocessing logic before
feeding in the dataframe to AutoGluon Multimodal.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this won't block you from appending custom preprocessing logic before feeding in the dataframe to AutoGluon Multimodal. Just want to confirm, if user already preprocess the image (e.g., minus 255, divided by customized mean, or data augmentation), will AutoGluon do another round of preprocessing and data augmentation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will do another round of normalization. As in

processor.append(self.normalization)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a flag to disable normalization in a later PR. I think this is a good point.

Copy link
Contributor

@bryanyzhu bryanyzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor, looks good to me with a question.

Comment on lines 5 to 6
from typing import OrderedDict as t_OrderedDict
from typing import Union
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two lines are not used?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

t_OrderedDict is used for type hint. Will remove Union.

@@ -1152,7 +1119,7 @@ def _fit(
)

config = update_config_by_rules(
problem_type=self._problem_type,
problem_type=self.problem_type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see all self._problem_type are replaced by self.problem_type. Mind to tell the reasons?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to access it via the property call. Both self._problem_type and self.problem_type should be okay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Currently, self.problem_type has the same value as self._problem_type. But since the property is more user-faced, there may be some cases they are not exactly the same. I think using self._problem_type internally may be a better choice to avoid possible changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also change all self.problem_type to self._problem_type so that it will be consistent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thanks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One advantage of self.problem_type over self._problem_type is that you can never use self.problem_type = XXX which accidentally overwrite the value.

@github-actions
Copy link

Job PR-2578-88c338f is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2578/88c338f/index.html

support_zero_shot=True,
is_matching=True,
supported_modality_type={TEXT},
supported_label_type={CATEGORICAL, NUMERICAL},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For matching problems, labels can be absent. Will it support this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those case, we can treat the label type as BINARY. But we need to submit a follow-up PR to refactor the code-base to better use these flags.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of no label is different from binary label. We'd better distinguish between them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there are multiple concepts: modality type, column type and problem type. This is the first PR to split the problem type into a separate file. Need to submit follow-up PRs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching itself also needs major refactor so I haven’t touched it extensively

IMAGE_SIMILARITY,
IMAGE_TEXT_SIMILARITY,
MULTICLASS,
NAMED_ENTITY_RECOGNITION,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not recall I used this constant: "NAMED_ENTITY_RECOGNITION". Shall we use NER only?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both NER and NAMED_ENTITY_RECOGNITION will be mapped to NER.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can view NAMED_ENTITY_RECOGNITION as the alias of NER problem type.

Copy link
Contributor

@zhiqiangdon zhiqiangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@bryanyzhu bryanyzhu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link

Job PR-2578-69f61a0 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2578/69f61a0/index.html

@sxjscience sxjscience merged commit e306b3d into autogluon:master Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants