Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sourcery refactored master branch #3

Closed
wants to merge 5 commits into from
Closed

Conversation

sourcery-ai[bot]
Copy link

@sourcery-ai sourcery-ai bot commented Mar 14, 2022

Branch master refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch origin sourcery/master
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

@sourcery-ai sourcery-ai bot requested a review from adbar March 14, 2022 12:13
py3langid/langid.py Show resolved Hide resolved
py3langid/langid.py Show resolved Hide resolved
py3langid/langid.py Show resolved Hide resolved
py3langid/langid.py Show resolved Hide resolved
urlExtraCrapBeforeEnd = regex_or(punctChars, entity) + "+?"
urlExtraCrapBeforeEnd = f'{regex_or(punctChars, entity)}+?'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 58-195 refactored with the following changes:

This removes the following comments ( why? ):

# iOS 'emoji' characters (some smileys, some symbols) [\ue001-\uebbb]
# Standard version  :) :( :] :D :P
# myleott: o.O and O.o are two of the biggest sources of differences
# reversed version (: D:  use positive lookbehind to remove "(word):"
# TODO should try a big precompiled lexicon from Wikipedia, Dan Ramage told me (BTO) he does this
#          between this and the Java version. One little hack won't hurt...
#inspired by http://en.wikipedia.org/wiki/User:Scapler/emoticons#East_Asian_style
# because eyes on the right side is more ambiguous with the standard usage of : ;

py3langid/train/NBtrain.py Outdated Show resolved Hide resolved
Comment on lines -138 to +142
f = lambda fn, chunks: pool.imap_unordered(fn, chunks, chunksize=chunksize)
yield f
yield lambda fn, chunks: pool.imap_unordered(fn, chunks, chunksize=chunksize)
else:
if initializer is not None:
initializer(*initargs)
f = imap
yield f

yield imap
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function MapPool refactored with the following changes:

for docname in filenames:
candidates.append(os.path.join(dirpath, docname))
candidates.extend(os.path.join(dirpath, docname) for docname in filenames)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function CorpusIndexer.__init__ refactored with the following changes:

Comment on lines -183 to +184
reject_langs = {
l
for l in lang_domain_count if lang_domain_count[l] < min_domain
}

# Remove the languages from the indexer
if reject_langs:
if reject_langs := {
l for l in lang_domain_count if lang_domain_count[l] < min_domain
}:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function CorpusIndexer.prune_min_domain refactored with the following changes:

This removes the following comments ( why? ):

# Remove the languages from the indexer

py3langid/train/tokenize.py Outdated Show resolved Hide resolved
@sourcery-ai
Copy link
Author

sourcery-ai bot commented Mar 14, 2022

Sourcery Code Quality Report

✅  Merging this PR will increase code quality in the affected files by 1.14%.

Quality metrics Before After Change
Complexity 15.48 🙂 18.77 😞 3.29 👎
Method Length 59.99 ⭐ 52.08 ⭐ -7.91 👍
Working memory 10.82 😞 10.28 😞 -0.54 👍
Quality 56.72% 🙂 57.86% 🙂 1.14% 👍
Other metrics Before After Change
Lines 1695 914 -781
Changed files Quality Before Quality After Quality Change
py3langid/langid.py 49.48% 😞 49.48% 😞 0.00%
py3langid/tools/printfeats.py 94.84% ⭐ 96.01% ⭐ 1.17% 👍
py3langid/train/common.py 80.33% ⭐ 81.22% ⭐ 0.89% 👍
py3langid/train/index.py 63.10% 🙂 63.28% 🙂 0.18% 👍

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation
py3langid/langid.py main 65 ⛔ 623 ⛔ 17 ⛔ 7.88% ⛔ Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
py3langid/langid.py application 21 😞 223 ⛔ 15 😞 30.34% 😞 Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
py3langid/train/index.py CorpusIndexer.__init__ 10 🙂 146 😞 12 😞 50.01% 🙂 Try splitting into smaller methods. Extract out complex expressions
py3langid/train/index.py CorpusIndexer.prune_min_domain 9 🙂 104 🙂 11 😞 57.78% 🙂 Extract out complex expressions
py3langid/langid.py LanguageIdentifier.set_languages 7 ⭐ 88 🙂 12 😞 60.16% 🙂 Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

  • ⭐ excellent
  • 🙂 good
  • 😞 poor
  • ⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

@adbar adbar closed this Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant