Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor get_training_data for simplicity #115

Merged
merged 2 commits into from
Dec 1, 2021

Conversation

MarkDavidson
Copy link
Contributor

@MarkDavidson MarkDavidson commented Dec 1, 2021

This change is to simplify how training data is retrieved from the database so that future improvements are easier. I ran this locally and the application behaved as expected. For manual testing, I trained a model and uploaded reports.

What changed:

  • get_training_data now returns an X, y tuple instead of objects. The X, y tuple has better ergonomics for the SKLearn pipeline
  • test() and train() both call get_training_data() directly
  • Removed dead code (y_counts) in SKLearnModel.test
  • Deleted SKLearnModel._load_and_vectorize_data. It has been replaced by the improved get_training_data().
  • Replaced _preprocess_text() with lemmatize().
    • Input changed from a list of sentences to a single sentence
    • Output changed from a list of lemmatized sentences to a single lemmatized sentence

Limitations:

  • Project test coverage dropped by 0.15% because I deleted covered lines.

@codecov
Copy link

codecov bot commented Dec 1, 2021

Codecov Report

Merging #115 (01f0520) into master (cf70164) will decrease coverage by 0.14%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #115      +/-   ##
==========================================
- Coverage   94.29%   94.14%   -0.15%     
==========================================
  Files          17       17              
  Lines         824      803      -21     
==========================================
- Hits          777      756      -21     
  Misses         47       47              
Impacted Files Coverage Δ
src/tram/tram/ml/base.py 91.83% <100.00%> (-0.82%) ⬇️
src/tram/tram/models.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf70164...01f0520. Read the comment docs.

@MarkDavidson MarkDavidson changed the title WIP: Refactor get_training_data for simplicity Refactor get_training_data for simplicity Dec 1, 2021
@sonarcloud
Copy link

sonarcloud bot commented Dec 1, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@markeaimark markeaimark merged commit b583bf4 into master Dec 1, 2021
@markeaimark markeaimark deleted the simplify_sklearn_model branch December 1, 2021 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants