-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: text classification lightning pipeline #119
feat: text classification lightning pipeline #119
Conversation
✔️ Deploy Preview for embeddingsclarinpl canceled. 🔨 Explore the source changes: b802ff8 🔍 Inspect the deploy log: https://app.netlify.com/sites/embeddingsclarinpl/deploys/61bb40ba87e74e00085f60a0 |
metrics = MetricCollection( | ||
[ | ||
Accuracy(num_classes=self.hparams.num_labels), | ||
Precision(num_classes=self.hparams.num_labels, average="macro"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we prefer macro
or weighted
? I used mostly weighted
average due to possible imbalances in data. wdyt? @ktagowski @Albert097 @riomus @djaniak
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is mainly for purpose of optimization step of the model (schedulers, early stopping etc.) I thing that it should correspond with the loss. If loss is non-weighted I would use macro
, in case of weighted loss I would use weighted
. After the training is finished instead of these metrics defined here we use our Evaluators classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually prefer macro-average since it's better to assume that there may be imbalance in the data and then the metric will tell us that the predictor doesn't work on the particular class. Although if we were to use these metrics in a sequence tagging task I'd probably prefer the weighted average since we will probably have more not fully covered classes in that case.
# task_model_kwargs={"pool_strategy": "cls", "learning_rate": 5e-4} | ||
# ) | ||
|
||
pipeline = TextClassificationPipeline( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example should be also used in form of test. With limited number of epochs and dataset size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote test but there is an error with pytorch-lightning and transformers version; it's fixed in transformers > 4.10.x so this PR should fix it #116
I can append the tests in the PR or in this one if other PR will be merged faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I would add tests in the separate PR. Without fixing versions until the versions on the main will be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djaniak paste PR reference when it will be ready
* feat: adapt and extend pipelines functionalities * chore: change arguments from positional to keyword in pipelines
7d10624
to
cbac091
Compare
Co-authored-by: Albert <34009816+asawczyn@users.noreply.github.com>
No description provided.