Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: text classification lightning pipeline #119

Merged
merged 43 commits into from
Dec 16, 2021

Conversation

djaniak
Copy link
Collaborator

@djaniak djaniak commented Dec 6, 2021

No description provided.

@netlify
Copy link

netlify bot commented Dec 6, 2021

✔️ Deploy Preview for embeddingsclarinpl canceled.

🔨 Explore the source changes: b802ff8

🔍 Inspect the deploy log: https://app.netlify.com/sites/embeddingsclarinpl/deploys/61bb40ba87e74e00085f60a0

@djaniak djaniak temporarily deployed to Test deployment December 6, 2021 14:09 Inactive
@djaniak djaniak temporarily deployed to Test deployment December 7, 2021 12:32 Inactive
@djaniak djaniak temporarily deployed to Test deployment December 7, 2021 13:51 Inactive
metrics = MetricCollection(
[
Accuracy(num_classes=self.hparams.num_labels),
Precision(num_classes=self.hparams.num_labels, average="macro"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we prefer macro or weighted? I used mostly weighted average due to possible imbalances in data. wdyt? @ktagowski @Albert097 @riomus @djaniak

Copy link
Collaborator

@ktagowski ktagowski Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mainly for purpose of optimization step of the model (schedulers, early stopping etc.) I thing that it should correspond with the loss. If loss is non-weighted I would use macro, in case of weighted loss I would use weighted. After the training is finished instead of these metrics defined here we use our Evaluators classes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually prefer macro-average since it's better to assume that there may be imbalance in the data and then the metric will tell us that the predictor doesn't work on the particular class. Although if we were to use these metrics in a sequence tagging task I'd probably prefer the weighted average since we will probably have more not fully covered classes in that case.

examples/evaluate_lightning_text_classification.py Outdated Show resolved Hide resolved
embeddings/pipeline/lightning_pipeline.py Show resolved Hide resolved
embeddings/pipeline/lightning_pipeline.py Outdated Show resolved Hide resolved
embeddings/data/datamodule.py Outdated Show resolved Hide resolved
embeddings/data/datamodule.py Show resolved Hide resolved
embeddings/data/datamodule.py Outdated Show resolved Hide resolved
embeddings/data/datamodule.py Outdated Show resolved Hide resolved
examples/evaluate_lightning_text_classification.py Outdated Show resolved Hide resolved
embeddings/data/datamodule.py Outdated Show resolved Hide resolved
embeddings/data/datamodule.py Show resolved Hide resolved
embeddings/data/datamodule.py Outdated Show resolved Hide resolved
embeddings/task/lightning_task/text_classification.py Outdated Show resolved Hide resolved
# task_model_kwargs={"pool_strategy": "cls", "learning_rate": 5e-4}
# )

pipeline = TextClassificationPipeline(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example should be also used in form of test. With limited number of epochs and dataset size

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote test but there is an error with pytorch-lightning and transformers version; it's fixed in transformers > 4.10.x so this PR should fix it #116
I can append the tests in the PR or in this one if other PR will be merged faster

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I would add tests in the separate PR. Without fixing versions until the versions on the main will be updated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djaniak paste PR reference when it will be ready

@djaniak djaniak temporarily deployed to Test deployment December 9, 2021 10:48 Inactive
@djaniak djaniak temporarily deployed to Test deployment December 9, 2021 13:08 Inactive
@djaniak djaniak temporarily deployed to Test deployment December 9, 2021 15:34 Inactive
@djaniak djaniak temporarily deployed to Test deployment December 9, 2021 16:27 Inactive
embeddings/data/datamodule.py Outdated Show resolved Hide resolved
embeddings/data/datamodule.py Outdated Show resolved Hide resolved
embeddings/data/datamodule.py Outdated Show resolved Hide resolved
embeddings/data/datamodule.py Outdated Show resolved Hide resolved
embeddings/data/datamodule.py Outdated Show resolved Hide resolved
embeddings/task/lightning_task/text_classification.py Outdated Show resolved Hide resolved
embeddings/task/lightning_task/lightning_task.py Outdated Show resolved Hide resolved
@djaniak djaniak temporarily deployed to Test deployment December 10, 2021 11:46 Inactive
embeddings/data/datamodule.py Outdated Show resolved Hide resolved
@djaniak djaniak temporarily deployed to Test deployment December 10, 2021 14:04 Inactive
@djaniak djaniak temporarily deployed to Test deployment December 10, 2021 16:17 Inactive
Base automatically changed from feature/hyperparameters-search to main December 11, 2021 19:45
@djaniak djaniak force-pushed the feature/lightning-text-classification-pipeline branch from 7d10624 to cbac091 Compare December 14, 2021 12:16
@djaniak djaniak temporarily deployed to Test deployment December 14, 2021 12:47 Inactive
@djaniak djaniak temporarily deployed to Test deployment December 15, 2021 17:46 Inactive
ktagowski
ktagowski previously approved these changes Dec 15, 2021
embeddings/data/datamodule.py Outdated Show resolved Hide resolved
embeddings/task/lightning_task/lightning_task.py Outdated Show resolved Hide resolved
Co-authored-by: Albert <34009816+asawczyn@users.noreply.github.com>
@djaniak djaniak temporarily deployed to Test deployment December 16, 2021 10:48 Inactive
embeddings/pipeline/lightning_classification.py Outdated Show resolved Hide resolved
embeddings/pipeline/lightning_pipeline.py Outdated Show resolved Hide resolved
embeddings/task/lightning_task/lightning_task.py Outdated Show resolved Hide resolved
@djaniak djaniak temporarily deployed to Test deployment December 16, 2021 11:58 Inactive
@djaniak djaniak temporarily deployed to Test deployment December 16, 2021 14:01 Inactive
@ktagowski ktagowski merged commit 5f352ff into main Dec 16, 2021
@ktagowski ktagowski deleted the feature/lightning-text-classification-pipeline branch December 16, 2021 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants