Skip to content

fix/sentences containing white spaces for ConllDataset#681

Merged
JulesBelveze merged 2 commits intorelease/1.2.0from
error-in-load_data-function-when-handling-conll-data
Jul 31, 2023
Merged

fix/sentences containing white spaces for ConllDataset#681
JulesBelveze merged 2 commits intorelease/1.2.0from
error-in-load_data-function-when-handling-conll-data

Conversation

@Prikshit7766
Copy link
Copy Markdown
Contributor

@Prikshit7766 Prikshit7766 commented Jul 29, 2023

Description

It is cause by the logic in load_data and load_raw_data for ConllDataset

sentences = doc.strip().split("\n\n")

so it is not able to handel condtion where sentence are seprated by

\n   \n

we not are doing the splitting of sentences in the right way

change the condtion to

sentences = re.split(r"\n\n|\n\s+\n", doc.strip())

Fixes #680

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Usage

Checklist:

  • I've added Google style docstrings to my code.
  • I've used pydantic for typing when/where necessary.
  • I have linted my code
  • I have added tests to cover my changes.

Screenshots (if appropriate):

image

@Prikshit7766 Prikshit7766 added the 🐛 Bug Something isn't working label Jul 29, 2023
@Prikshit7766 Prikshit7766 linked an issue Jul 29, 2023 that may be closed by this pull request
@Prikshit7766 Prikshit7766 self-assigned this Jul 29, 2023
Copy link
Copy Markdown
Contributor

@JulesBelveze JulesBelveze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JulesBelveze JulesBelveze merged commit 4f8f911 into release/1.2.0 Jul 31, 2023
@ArshaanNazir ArshaanNazir deleted the error-in-load_data-function-when-handling-conll-data branch August 7, 2023 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐛 Bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error in load_data function when handling CoNLL data

2 participants