Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add primitives from nlp_primitives that do not require additional external libraries #2328

Merged
merged 5 commits into from Oct 20, 2022

Conversation

thehomebrewnerd
Copy link
Contributor

Add primitives from nlp_primitives that do not require additional external libraries

Closes #2287

Moves the following primitives from nlp_primitives to Featuretools:

  • CountString
  • MeanCharactersPerWord
  • MedianWordLength
  • NumUniqueSeparators
  • NumberOfCommonWords
  • NumberOfHashtags
  • NumberOfMentions
  • NumberOfUniqueWords
  • NumberOfWordsInQuotes
  • PunctuationCount
  • TitleWordCount
  • TotalWordLength
  • UpperCaseCount
  • WhitespaceCount

@codecov
Copy link

codecov bot commented Oct 19, 2022

Codecov Report

Merging #2328 (23828ae) into main (c98d2ff) will increase coverage by 0.00%.
The diff coverage is 99.43%.

@@           Coverage Diff            @@
##             main    #2328    +/-   ##
========================================
  Coverage   99.38%   99.38%            
========================================
  Files         147      177    +30     
  Lines       17895    18773   +878     
========================================
+ Hits        17785    18658   +873     
- Misses        110      115     +5     
Impacted Files Coverage Δ
featuretools/tests/primitive_tests/utils.py 93.82% <93.50%> (-6.18%) ⬇️
featuretools/primitives/standard/api.py 100.00% <100.00%> (ø)
...itives/standard/natural_language_primitives/api.py 100.00% <100.00%> (ø)
.../standard/natural_language_primitives/constants.py 100.00% <100.00%> (ø)
...andard/natural_language_primitives/count_string.py 100.00% <100.00%> (ø)
...al_language_primitives/mean_characters_per_word.py 100.00% <100.00%> (ø)
.../natural_language_primitives/median_word_length.py 100.00% <100.00%> (ø)
...tural_language_primitives/num_unique_separators.py 100.00% <100.00%> (ø)
...ural_language_primitives/number_of_common_words.py 100.00% <100.00%> (ø)
.../natural_language_primitives/number_of_hashtags.py 100.00% <100.00%> (ø)
... and 22 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@gsheni
Copy link
Contributor

gsheni commented Oct 19, 2022

Should we put all these primitives in 1 file?

@thehomebrewnerd
Copy link
Contributor Author

thehomebrewnerd commented Oct 20, 2022

Should we put all these primitives in 1 file?

@gsheni I fine with combining if you want to be consistent with how we have other primitives and tests organized. I just did it this way to start since that was the easiest way to move things over. What is your preference?

EDIT: Combining into a single file goes against the proposal in Issue #2179

@gsheni
Copy link
Contributor

gsheni commented Oct 20, 2022

Gotcha

@thehomebrewnerd thehomebrewnerd merged commit bb43b32 into main Oct 20, 2022
@thehomebrewnerd thehomebrewnerd deleted the OSSFT-289-add-nlp-primitives branch October 20, 2022 16:25
@thehomebrewnerd thehomebrewnerd restored the OSSFT-289-add-nlp-primitives branch October 20, 2022 16:25
@thehomebrewnerd thehomebrewnerd deleted the OSSFT-289-add-nlp-primitives branch October 20, 2022 16:25
Comment on lines +1 to +43
# flake8: noqa
from featuretools.primitives.standard.natural_language_primitives.count_string import (
CountString,
)
from featuretools.primitives.standard.natural_language_primitives.mean_characters_per_word import (
MeanCharactersPerWord,
)
from featuretools.primitives.standard.natural_language_primitives.median_word_length import (
MedianWordLength,
)
from featuretools.primitives.standard.natural_language_primitives.num_unique_separators import (
NumUniqueSeparators,
)
from featuretools.primitives.standard.natural_language_primitives.number_of_common_words import (
NumberOfCommonWords,
)
from featuretools.primitives.standard.natural_language_primitives.number_of_hashtags import (
NumberOfHashtags,
)
from featuretools.primitives.standard.natural_language_primitives.number_of_mentions import (
NumberOfMentions,
)
from featuretools.primitives.standard.natural_language_primitives.number_of_unique_words import (
NumberOfUniqueWords,
)
from featuretools.primitives.standard.natural_language_primitives.number_of_words_in_quotes import (
NumberOfWordsInQuotes,
)
from featuretools.primitives.standard.natural_language_primitives.punctuation_count import (
PunctuationCount,
)
from featuretools.primitives.standard.natural_language_primitives.title_word_count import (
TitleWordCount,
)
from featuretools.primitives.standard.natural_language_primitives.total_word_length import (
TotalWordLength,
)
from featuretools.primitives.standard.natural_language_primitives.upper_case_count import (
UpperCaseCount,
)
from featuretools.primitives.standard.natural_language_primitives.whitespace_count import (
WhitespaceCount,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor things since we are going into the restructure directory route, to reduce the lines here we can just use relative imports, it's a bit cleaner. I would also move all of this code to __init__.py since we're trying to remove api.py as it's redundant to how init.py works.

ex: https://github.com/alteryx/featuretools/pull/2187/files#diff-4ffbd94fdc40f8ee9f37abe2ead42e08aca5156d0b13c958a07573691929c9ce

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll clean this up in the PR for issue #2179

@@ -5,5 +5,6 @@
from featuretools.primitives.standard.datetime_transform_primitives import *
from featuretools.primitives.standard.exponential_transform_primitives import *
from featuretools.primitives.standard.latlong_transform_primitives import *
from featuretools.primitives.standard.natural_language_primitives.api import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the comment below you would just be able to do:
from featuretools.primitives.standard.natural_language_primitives import *

@sbadithe sbadithe mentioned this pull request Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move all Natural Language primitives that don't require an external library into core Featuretools
3 participants