Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug/augmentation output differs from input file #734

Merged

Conversation

ArshaanNazir
Copy link
Collaborator

@ArshaanNazir ArshaanNazir commented Aug 30, 2023

Description

This PR fixes augmentation and also for swap_entities ( takes care of sentences having I-labels only without the B-tag)

We fixed an issue in the data augmentation process where augmented files format differed from input files, leading to inconsistencies that was negatively impacting model training and evaluation.

Expected Output

-DOCSTART- -X- -X- O

CRICKET NNP B-NP O
- : O O
LEICESTERSHIRE NNP B-NP B-ORG
TAKE NNP I-NP O
OVER IN B-PP O
AT NNP B-NP O
TOP NNP I-NP O
AFTER NNP I-NP O
INNINGS NNP I-NP O
VICTORY NN I-NP O
. . O O

Actual Output

-DOCSTART- -X- -X- O

CRICKET -X- -X- O
- -X- -X- O
LEICESTERSHIRE -X- -X- B-ORG
TAKE -X- -X- O
OVER -X- -X- O
AT -X- -X- O
TOP -X- -X- O
AFTER -X- -X-O
INNINGS -X- -X- O
VICTORY -X- -X- O
. -X- -X- O

Swap-entities issue:
image

@ArshaanNazir ArshaanNazir linked an issue Aug 30, 2023 that may be closed by this pull request
@ArshaanNazir ArshaanNazir added 🐛 Bug Something isn't working 💡Enhancements Something can be improved labels Aug 30, 2023
Copy link
Collaborator

@chakravarthik27 chakravarthik27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ArshaanNazir ArshaanNazir merged commit b86680a into release/1.4.0 Aug 31, 2023
3 checks passed
@ArshaanNazir ArshaanNazir deleted the bug/augmentation-output-differs-from-input-file branch September 6, 2023 04:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 Bug Something isn't working 💡Enhancements Something can be improved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Augmentation Output Differs from Input File
2 participants