Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThutmoseTaggerModel, a new model for inverse text normalization #4011

Merged
merged 21 commits into from
Apr 29, 2022

Conversation

bene-ges
Copy link
Contributor

@bene-ges bene-ges commented Apr 15, 2022

Signed-off-by: Alexandra Antonova aleksandraa@nvidia.com

What does this PR do ?

A new tagger-based model for inverse text normalization

Collection: [NLP]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • [ X] New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Apr 15, 2022

This pull request introduces 8 alerts when merging 42730ba into 9005f23 - view on LGTM.com

new alerts:

  • 7 for Module-level cyclic import
  • 1 for Module is imported with 'import' and 'import from'

@ekmb ekmb self-requested a review April 15, 2022 21:18
Alexandra Antonova added 4 commits April 17, 2022 18:14
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
…icense headers

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
'--giza_suffix', type=str, required=True, help='suffix of alignment files, e.g. \"Ahmm.5\", \"A3.final\"'
)
parser.add_argument('--out_filename', type=str, required=True, help='Output file')
parser.add_argument('--lang', type=str, required=True, help="Language")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

en and ru only?

Alexandra Antonova added 3 commits April 22, 2022 20:17
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
@bene-ges bene-ges marked this pull request as ready for review April 26, 2022 19:39
Alexandra Antonova added 3 commits April 28, 2022 00:35
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Apr 27, 2022

This pull request introduces 1 alert when merging 1d48227 into 0d052c8 - view on LGTM.com

new alerts:

  • 1 for Unused import

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Apr 28, 2022

This pull request introduces 1 alert when merging b03d10e into 70d9687 - view on LGTM.com

new alerts:

  • 1 for Unused import

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>

src_hiddens = self.bert_model(input_ids=input_ids, token_type_ids=segment_ids, attention_mask=input_mask)
log_softmax = self.logits(hidden_states=src_hiddens)
log_softmax_semiotic = self.semiotic_logits(hidden_states=src_hiddens)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this called log_softmax if log_softmax=False in the self.logits init?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I renamed it

span_predictions.append(cid)
else:
span_predictions.append(self.tag_classification_report.num_classes - 1) # this stands for WRONG
assert len(span_labels) == len(span_predictions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise errors instead of assert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

else:
# this stands for WRONG
multiword_span_predictions.append(self.tag_classification_report.num_classes - 1)
assert len(multiword_span_labels) == len(multiword_span_predictions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise errors instead of assert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Apr 29, 2022

This pull request introduces 1 alert when merging fc8c6ce into 58ff608 - view on LGTM.com

new alerts:

  • 1 for Unused import

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
@bene-ges bene-ges requested a review from ekmb April 29, 2022 17:20
@lgtm-com
Copy link

lgtm-com bot commented Apr 29, 2022

This pull request introduces 1 alert when merging 883d9e8 into 58ff608 - view on LGTM.com

new alerts:

  • 1 for Unused import

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Apr 29, 2022

This pull request introduces 1 alert when merging 7073117 into 58ff608 - view on LGTM.com

new alerts:

  • 1 for Unused import

Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
ekmb
ekmb previously approved these changes Apr 29, 2022
Alexandra Antonova added 2 commits April 30, 2022 00:40
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
Signed-off-by: Alexandra Antonova <aleksandraa@nvidia.com>
@ekmb ekmb merged commit c232ef3 into NVIDIA:main Apr 29, 2022
@bene-ges bene-ges deleted the thutmose_tagger branch May 3, 2022 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants