refactor token classification#35
Merged
Merged
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #35 +/- ##
==========================================
- Coverage 95.62% 95.20% -0.43%
==========================================
Files 40 41 +1
Lines 3316 3417 +101
==========================================
+ Hits 3171 3253 +82
- Misses 145 164 +19 ☔ View full report in Codecov by Sentry. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #35 +/- ##
==========================================
+ Coverage 95.62% 95.79% +0.16%
==========================================
Files 40 45 +5
Lines 3316 3544 +228
==========================================
+ Hits 3171 3395 +224
- Misses 145 149 +4 ☔ View full report in Codecov by Sentry. |
4 tasks
…not have "train" as key)
… can not have "train" as key)
…enClassificationModelWithSeq2SeqEncoderAndCrf; use taskmodule_config in model tests
…also for TokenClassificationModelWithSeq2SeqEncoderAndCrf
…allAndF1ForLabeledAnnotations
… is recommended to use...")
…ssificationTaskModule ad remove deprecated parameters
…add reset parameter
…skModule to DefaultModel
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces several changes to the token classification based labeled span extraction setups. Especially, metric setup happens in the task module now and span based metrics are logged during training.
Task Modules and Metrics
WrappedMetricWithPrepareFunction: newly addedtorchmetrics.Metricthat pre-processes the predictions and targets with aprepare_functionprepare_functioncan unbatch the predictions / targets in which case the wrapped metric is updated multiple timesPrecisionRecallAndF1ForLabeledAnnotations: make it a realtorchmetrics.MetricLabeledSpanExtractionByTokenClassificationTaskModule: formerlyTokenClassificationTaskModulelabel_token_pad_idtolabel_pad_id(default as before:-100), i.e. this is breakingunbatch_output()expects aLongTensorrepresenting the predicted label indices (andlabel_pad_idon pad positions)special_tokens_maskin input encodingsconfigure_model_metric()that creates the following metrics:log_precision_recall_metrics(default:True) to disable logging of precision and recall metrics. Disabling this is useful to not get overwhelmed by to many logged metricsModels
WithMetricsFromTaskModule: new mixinModel: new abstract modelWithMetricsFromTaskModuleSimpleTokenClassificationModel: newly addedpytorch_ie.models.TransformerTokenClassificationModel, simple wrapper aroundtransformers.AutoModelForTokenClassificationFor both models,
SimpleTokenClassificationModelandTokenClassificationModelWithSeq2SeqEncoderAndCrf:label_pad_token_idtolabel_pad_id, i.e. this is breakingtaskmodule.configure_model_metric()inWithMetricsFromTaskModule)ModelRequires
post_prepare()inTaskModuel._from_config()pytorch-ie#399, i.e.pytorch-ie >= 0.29.7decode()toPyTorchIEModelpytorch-ie#402, i.e.pytorch-ie >= 0.29.8Follow-up
pie-modules>=0.10.2and use its token classification taskmodule argumentation-structure-identification#193simple_generativemodel #36WithMetricsFromTaskModuleand forDefaultModelTODOs:
step()taskmodule toLabeledSpanExtraction(TaskmoDule|Model)LabeledSpanExtractionByTokenClassificationTaskModule