Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue281 handle invalid lines in training data tsv file #299

Merged

Conversation

juhoinkinen
Copy link
Member

@juhoinkinen juhoinkinen commented Jul 17, 2019

  1. Handle lines that are missing tab in training data .tsv by giving warning for such line and continuing:
$ annif train tfidf-en trainingdata_with_invalid_lines.tsv 
creating vectorizer
warning: Skipping invalid line (missing tab): ""
warning: Skipping invalid line (missing tab): "A line without tabs"
Backend tfidf: creating similarity index

2. Hide full traceback for missing file:

Edit: missing file case is handled by issue #318.

This closes #281.

@codecov
Copy link

codecov bot commented Jul 17, 2019

Codecov Report

Merging #299 into master will increase coverage by 0.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #299      +/-   ##
==========================================
+ Coverage   99.38%   99.45%   +0.07%     
==========================================
  Files          55       56       +1     
  Lines        2920     2959      +39     
==========================================
+ Hits         2902     2943      +41     
+ Misses         18       16       -2
Impacted Files Coverage Δ
annif/corpus/document.py 100% <100%> (ø) ⬆️
tests/test_corpus.py 100% <100%> (ø) ⬆️
tests/test_exception.py 100% <0%> (ø)
annif/exception.py 97.14% <0%> (+9.64%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b660eed...8786802. Read the comment docs.

@juhoinkinen juhoinkinen marked this pull request as ready for review August 8, 2019 10:24
@juhoinkinen juhoinkinen self-assigned this Aug 12, 2019
annif/corpus/document.py Outdated Show resolved Hide resolved
tests/test_corpus.py Outdated Show resolved Hide resolved
annif/corpus/document.py Outdated Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented Aug 26, 2019

This pull request introduces 1 alert when merging ec3cc2c into b660eed - view on LGTM.com

new alerts:

  • 1 for Unused import

@juhoinkinen juhoinkinen merged commit 129974c into master Sep 2, 2019
@juhoinkinen juhoinkinen deleted the issue281-handle-invalid-lines-in-training-data-TSV-file branch September 2, 2019 10:26
@lgtm-com
Copy link

lgtm-com bot commented Sep 2, 2019

This pull request fixes 1 alert when merging 8786802 into 2fb6108 - view on LGTM.com

fixed alerts:

  • 1 for Unused import

@juhoinkinen juhoinkinen added this to the 0.42 milestone Sep 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validate TSV training input
2 participants