Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CoNLLGenerator Bugfixes and Improvements both on Scala and Python #13051

Conversation

jfernandrezj
Copy link
Contributor

@jfernandrezj jfernandrezj commented Nov 8, 2022

Description

Test for non-int metadata values in CoNLLGenerator
Fix for non-int metadata values bug in CoNLLGenerator
Include escaping when writing csv in order to preserve special char tokens
Remove unnecessary option from csv write
Adding metadata sentence key parameter in order to select which metadata field to use as sentence for CoNLL Generation
Minor formatting in scala, Python refactorization (several bug fixes supporting scala overloaded methods), Python Tests for 2 and 3 arguments

Motivation and Context

Issue 13004

How Has This Been Tested?

Tested existing Projects using the 2 arguments alternative (the only one supported so far)
Created new tests to test this functionality

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • Code improvements with no or little impact
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING page.
  • I have added tests to cover my changes.
  • Reported scala failing tests not because of this PR, but failing from before.
  • All new and existing tests passed.

@maziyarpanahi maziyarpanahi self-assigned this Nov 8, 2022
@maziyarpanahi maziyarpanahi changed the base branch from release/423-release-candidate to SPARKNLP-645-fix-a-bug-in-co-nll-generator-annotator November 8, 2022 15:55
@maziyarpanahi maziyarpanahi merged commit a456407 into JohnSnowLabs:SPARKNLP-645-fix-a-bug-in-co-nll-generator-annotator Nov 9, 2022
maziyarpanahi added a commit that referenced this pull request Nov 10, 2022
* SPARKNLP-645 Update unit tests for CoNLLGenerator

* CoNLLGenerator Bugfixes and Improvements both on Scala and Python (#13051)

* Test for non-int metadata values in CoNLLGenerator

* Fix for non-int metadata values bug in CoNLLGenerator

* Include escaping when writing csv in order to preserve special char tokens

* Remove unnecessary option from csv write

* Addin metadata sentence key parameter in order to selecti which metadata field to use as sentence for CoNLL Generation

* Minor formatting in scala, Python refactorization (several bug fixes supporting scala overloaded methods), Python Tests for 2 and 3 arguments

Co-authored-by: Andres Fernandez <51669244+jfernandrezj@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

exportConllFiles from CoNLLGenerator failes when the token has non-int metadata
2 participants