Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add better support for csv files #368

Merged
merged 4 commits into from
Jan 13, 2023
Merged

Add better support for csv files #368

merged 4 commits into from
Jan 13, 2023

Conversation

gabegma
Copy link
Contributor

@gabegma gabegma commented Jan 9, 2023

Description:

  • Refactor the syntax tagging module to get rid of the tokenizer for counting tokens. I switched to counting words instead of sub-words, since this is what spacy provides, and it also makes sense. I lowered a bit the default value for a long_sentence to account for that.
  • Add loading function for csv.

Checklist:

You should check all boxes before the PR is ready. If a box does not apply, check it to acknowledge it.

  • ISSUE NUMBER. You linked the issue number (Ex: Resolve #XXX).
  • PRE-COMMIT. You ran pre-commit on all commits, or else, you
    ran pre-commit run --all-files at the end.
  • USER CHANGES. The changes are added to CHANGELOG.md and the documentation, if they impact
    our users.
  • DEV CHANGES.
    • Update the documentation if this PR changes how to develop/launch on the app.
    • Update the README files and our wiki for any big design decisions, if relevant.
    • Add unit tests, docstrings, typing and comments for complex sections.

@gabegma
Copy link
Contributor Author

gabegma commented Jan 12, 2023

@lindsaydbrin @JosephMarinier - I made some big refactoring to the syntax tagging module - you can look at that commit 9061507 individually to assess that it makes sense.

Copy link
Contributor

@lindsaydbrin lindsaydbrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Big fan of the syntax tagging refactoring. 😁 Thanks!

Co-authored-by: Lindsay Brin <lindsay.brin@servicenow.com>
Copy link
Contributor

@JosephMarinier JosephMarinier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! 👍

@gabegma gabegma merged commit a4c96d0 into main Jan 13, 2023
@gabegma gabegma deleted the ggm/better-support-csv branch January 13, 2023 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants