-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ThutmoseTaggerModel, a new model for inverse text normalization #4011
Conversation
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
This pull request introduces 8 alerts when merging 42730ba into 9005f23 - view on LGTM.com new alerts:
|
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
…icense headers Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
examples/nlp/text_normalization_as_tagging/conf/thutmose_tagger_itn_config.yaml
Outdated
Show resolved
Hide resolved
examples/nlp/text_normalization_as_tagging/conf/thutmose_tagger_itn_config.yaml
Outdated
Show resolved
Hide resolved
examples/nlp/text_normalization_as_tagging/conf/thutmose_tagger_itn_config.yaml
Outdated
Show resolved
Hide resolved
examples/nlp/text_normalization_as_tagging/utils/eval_per_class.py
Outdated
Show resolved
Hide resolved
examples/nlp/text_normalization_as_tagging/utils/prepare_corpora_after_alignment.py
Outdated
Show resolved
Hide resolved
examples/nlp/text_normalization_as_tagging/normalization_as_tagging_infer.py
Outdated
Show resolved
Hide resolved
examples/nlp/text_normalization_as_tagging/normalization_as_tagging_infer.py
Outdated
Show resolved
Hide resolved
examples/nlp/text_normalization_as_tagging/utils/extract_giza_alignments.py
Outdated
Show resolved
Hide resolved
'--giza_suffix', type=str, required=True, help='suffix of alignment files, e.g. \"Ahmm.5\", \"A3.final\"' | ||
) | ||
parser.add_argument('--out_filename', type=str, required=True, help='Output file') | ||
parser.add_argument('--lang', type=str, required=True, help="Language") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
en and ru only?
examples/nlp/text_normalization_as_tagging/utils/get_label_vocab.py
Outdated
Show resolved
Hide resolved
nemo/collections/nlp/models/text_normalization_as_tagging/thutmose_tagger.py
Show resolved
Hide resolved
nemo/collections/nlp/models/text_normalization_as_tagging/thutmose_tagger.py
Show resolved
Hide resolved
nemo/collections/nlp/models/text_normalization_as_tagging/thutmose_tagger.py
Outdated
Show resolved
Hide resolved
nemo/collections/nlp/models/text_normalization_as_tagging/thutmose_tagger.py
Outdated
Show resolved
Hide resolved
nemo/collections/nlp/data/text_normalization_as_tagging/thutmose_tagger_dataset.py
Show resolved
Hide resolved
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
This pull request introduces 1 alert when merging 1d48227 into 0d052c8 - view on LGTM.com new alerts:
|
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
This pull request introduces 1 alert when merging b03d10e into 70d9687 - view on LGTM.com new alerts:
|
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
examples/nlp/text_normalization_as_tagging/prepare_dataset_en.sh
Outdated
Show resolved
Hide resolved
examples/nlp/text_normalization_as_tagging/prepare_dataset_en.sh
Outdated
Show resolved
Hide resolved
examples/nlp/text_normalization_as_tagging/prepare_dataset_en.sh
Outdated
Show resolved
Hide resolved
examples/nlp/text_normalization_as_tagging/utils/corpus_errors.ru
Outdated
Show resolved
Hide resolved
examples/nlp/text_normalization_as_tagging/utils/extract_giza_alignments.py
Outdated
Show resolved
Hide resolved
examples/nlp/text_normalization_as_tagging/utils/filter_sentences_with_errors.py
Outdated
Show resolved
Hide resolved
nemo/collections/nlp/data/text_normalization_as_tagging/utils.py
Outdated
Show resolved
Hide resolved
|
||
src_hiddens = self.bert_model(input_ids=input_ids, token_type_ids=segment_ids, attention_mask=input_mask) | ||
log_softmax = self.logits(hidden_states=src_hiddens) | ||
log_softmax_semiotic = self.semiotic_logits(hidden_states=src_hiddens) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this called log_softmax if log_softmax=False in the self.logits init?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I renamed it
span_predictions.append(cid) | ||
else: | ||
span_predictions.append(self.tag_classification_report.num_classes - 1) # this stands for WRONG | ||
assert len(span_labels) == len(span_predictions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise errors instead of assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
else: | ||
# this stands for WRONG | ||
multiword_span_predictions.append(self.tag_classification_report.num_classes - 1) | ||
assert len(multiword_span_labels) == len(multiword_span_predictions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise errors instead of assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
This pull request introduces 1 alert when merging fc8c6ce into 58ff608 - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging 883d9e8 into 58ff608 - view on LGTM.com new alerts:
|
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
This pull request introduces 1 alert when merging 7073117 into 58ff608 - view on LGTM.com new alerts:
|
examples/nlp/text_normalization_as_tagging/dataset_preparation/prepare_corpora_for_alignment.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
examples/nlp/text_normalization_as_tagging/dataset_preparation/prepare_corpora_for_alignment.py
Show resolved
Hide resolved
nemo/collections/nlp/models/text_normalization_as_tagging/thutmose_tagger.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova aleksandraa@nvidia.com
What does this PR do ?
A new tagger-based model for inverse text normalization
Collection: [NLP]
Changelog
Usage
# Add a code snippet demonstrating how to use this
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information