feat: text classification lightning pipeline #119

djaniak · 2021-12-06T12:12:31Z

No description provided.

netlify · 2021-12-06T12:12:36Z

✔️ Deploy Preview for embeddingsclarinpl canceled.

🔨 Explore the source changes: b802ff8

🔍 Inspect the deploy log: https://app.netlify.com/sites/embeddingsclarinpl/deploys/61bb40ba87e74e00085f60a0

laugustyniak · 2021-12-07T14:14:25Z

embeddings/task/lightning_task/text_classification.py

+            metrics = MetricCollection(
+                [
+                    Accuracy(num_classes=self.hparams.num_labels),
+                    Precision(num_classes=self.hparams.num_labels, average="macro"),


Do we prefer macro or weighted? I used mostly weighted average due to possible imbalances in data. wdyt? @ktagowski @Albert097 @riomus @djaniak

It is mainly for purpose of optimization step of the model (schedulers, early stopping etc.) I thing that it should correspond with the loss. If loss is non-weighted I would use macro, in case of weighted loss I would use weighted. After the training is finished instead of these metrics defined here we use our Evaluators classes.

I usually prefer macro-average since it's better to assume that there may be imbalance in the data and then the metric will tell us that the predictor doesn't work on the particular class. Although if we were to use these metrics in a sequence tagging task I'd probably prefer the weighted average since we will probably have more not fully covered classes in that case.

examples/evaluate_lightning_text_classification.py

embeddings/pipeline/lightning_pipeline.py

embeddings/data/datamodule.py

examples/evaluate_lightning_text_classification.py

embeddings/data/datamodule.py

embeddings/task/lightning_task/text_classification.py

ktagowski · 2021-12-07T19:15:01Z

examples/evaluate_lightning_text_classification.py

+#     task_model_kwargs={"pool_strategy": "cls", "learning_rate": 5e-4}
+# )
+
+pipeline = TextClassificationPipeline(


This example should be also used in form of test. With limited number of epochs and dataset size

I wrote test but there is an error with pytorch-lightning and transformers version; it's fixed in transformers > 4.10.x so this PR should fix it #116
I can append the tests in the PR or in this one if other PR will be merged faster

Ok, so I would add tests in the separate PR. Without fixing versions until the versions on the main will be updated.

@djaniak paste PR reference when it will be ready

examples/evaluate_lightning_text_classification.py

examples/evaluate_lightning_document_classification.py

embeddings/task/lightning_task/lightning_task.py

embeddings/data/datamodule.py

embeddings/task/lightning_task/text_classification.py

embeddings/task/lightning_task/lightning_task.py

embeddings/data/datamodule.py

* feat: adapt and extend pipelines functionalities * chore: change arguments from positional to keyword in pipelines

embeddings/pipeline/lightning_classification.py

embeddings/data/datamodule.py

embeddings/task/lightning_task/lightning_task.py

Co-authored-by: Albert <34009816+asawczyn@users.noreply.github.com>

embeddings/pipeline/lightning_classification.py

embeddings/pipeline/lightning_pipeline.py

embeddings/task/lightning_task/lightning_task.py

djaniak temporarily deployed to Test deployment December 6, 2021 14:09 Inactive

djaniak requested review from asawczyn and ktagowski December 6, 2021 14:20

djaniak temporarily deployed to Test deployment December 7, 2021 12:32 Inactive

djaniak temporarily deployed to Test deployment December 7, 2021 13:51 Inactive

laugustyniak reviewed Dec 7, 2021

View reviewed changes

ktagowski requested changes Dec 7, 2021

View reviewed changes

djaniak temporarily deployed to Test deployment December 9, 2021 10:48 Inactive

ktagowski requested changes Dec 9, 2021

View reviewed changes

examples/evaluate_lightning_text_classification.py Outdated Show resolved Hide resolved

djaniak temporarily deployed to Test deployment December 9, 2021 13:08 Inactive

djaniak requested review from laugustyniak and ktagowski December 9, 2021 15:08

ktagowski requested changes Dec 9, 2021

View reviewed changes

embeddings/task/lightning_task/lightning_task.py Show resolved Hide resolved

djaniak temporarily deployed to Test deployment December 9, 2021 15:34 Inactive

djaniak requested a review from ktagowski December 9, 2021 16:04

djaniak temporarily deployed to Test deployment December 9, 2021 16:27 Inactive

asawczyn requested changes Dec 9, 2021

View reviewed changes

djaniak temporarily deployed to Test deployment December 10, 2021 11:46 Inactive

asawczyn requested changes Dec 10, 2021

View reviewed changes

embeddings/data/datamodule.py Outdated Show resolved Hide resolved

djaniak requested a review from asawczyn December 10, 2021 13:36

djaniak temporarily deployed to Test deployment December 10, 2021 14:04 Inactive

djaniak temporarily deployed to Test deployment December 10, 2021 16:17 Inactive

Base automatically changed from feature/hyperparameters-search to main December 11, 2021 19:45

ktagowski and others added 5 commits December 14, 2021 12:44

Add initial version of pipeline

3e6e3d3

feat: adapt and extend pipelines functionalities (#96)

0b17322

* feat: adapt and extend pipelines functionalities * chore: change arguments from positional to keyword in pipelines

Refactor code

d11ea6c

Update pipelines

995c865

Add abstraction layer for OptimizedPipeline

fe8d078

djaniak added 7 commits December 14, 2021 12:49

refactor: refactor due to PR comments

2d00eaa

refactor: create model dynamically in LightningModule setup

2c984f9

refactor: remove unnecessary argument in datamodule init

e178d32

refactor: remove unnecessary argument in datamodule init v2

3d69a89

refactor: remove redundant loop and simplified processing

f586c8b

refactor: add default values for keyword arguments and refactor

2d52593

refactor: code refactor due to PR comments

cbac091

djaniak force-pushed the feature/lightning-text-classification-pipeline branch from 7d10624 to cbac091 Compare December 14, 2021 12:16

djaniak added 2 commits December 14, 2021 13:16

chore: update poetry.lock with pytorch-lightning

f1f839d

refactor: fix numpy ndarray typing

2032b90

djaniak temporarily deployed to Test deployment December 14, 2021 12:47 Inactive

ktagowski approved these changes Dec 14, 2021

View reviewed changes

ktagowski requested changes Dec 15, 2021

View reviewed changes

embeddings/pipeline/lightning_classification.py Show resolved Hide resolved

refactor: add datamodule kwargs to pipeline

de46cf1

djaniak temporarily deployed to Test deployment December 15, 2021 17:46 Inactive

djaniak requested a review from ktagowski December 15, 2021 18:23

ktagowski previously approved these changes Dec 15, 2021

View reviewed changes

asawczyn requested changes Dec 16, 2021

View reviewed changes

embeddings/data/datamodule.py Outdated Show resolved Hide resolved

embeddings/task/lightning_task/lightning_task.py Outdated Show resolved Hide resolved

Update embeddings/data/datamodule.py

b194796

Co-authored-by: Albert <34009816+asawczyn@users.noreply.github.com>

djaniak dismissed ktagowski’s stale review via b194796 December 16, 2021 10:22

djaniak temporarily deployed to Test deployment December 16, 2021 10:48 Inactive

refactor: transformers are unfrozen on default

bdbca1e

asawczyn requested changes Dec 16, 2021

View reviewed changes

embeddings/pipeline/lightning_classification.py Outdated Show resolved Hide resolved

embeddings/pipeline/lightning_pipeline.py Outdated Show resolved Hide resolved

embeddings/task/lightning_task/lightning_task.py Outdated Show resolved Hide resolved

djaniak temporarily deployed to Test deployment December 16, 2021 11:58 Inactive

refactor: auto gpu detect, task refactor

b802ff8

djaniak temporarily deployed to Test deployment December 16, 2021 14:01 Inactive

asawczyn approved these changes Dec 16, 2021

View reviewed changes

ktagowski approved these changes Dec 16, 2021

View reviewed changes

ktagowski merged commit 5f352ff into main Dec 16, 2021

ktagowski deleted the feature/lightning-text-classification-pipeline branch December 16, 2021 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: text classification lightning pipeline #119

feat: text classification lightning pipeline #119

djaniak commented Dec 6, 2021

netlify bot commented Dec 6, 2021 •

edited

Loading

laugustyniak Dec 7, 2021

ktagowski Dec 7, 2021 •

edited

Loading

djaniak Dec 9, 2021

ktagowski Dec 7, 2021

djaniak Dec 9, 2021

ktagowski Dec 9, 2021

ktagowski Dec 9, 2021

feat: text classification lightning pipeline #119

feat: text classification lightning pipeline #119

Conversation

djaniak commented Dec 6, 2021

netlify bot commented Dec 6, 2021 • edited Loading

laugustyniak Dec 7, 2021

Choose a reason for hiding this comment

ktagowski Dec 7, 2021 • edited Loading

Choose a reason for hiding this comment

djaniak Dec 9, 2021

Choose a reason for hiding this comment

ktagowski Dec 7, 2021

Choose a reason for hiding this comment

djaniak Dec 9, 2021

Choose a reason for hiding this comment

ktagowski Dec 9, 2021

Choose a reason for hiding this comment

ktagowski Dec 9, 2021

Choose a reason for hiding this comment

netlify bot commented Dec 6, 2021 •

edited

Loading

ktagowski Dec 7, 2021 •

edited

Loading