Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sourcery refactored master branch #8

Closed
wants to merge 1 commit into from
Closed

Conversation

sourcery-ai[bot]
Copy link

@sourcery-ai sourcery-ai bot commented Sep 2, 2022

Branch master refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch origin sourcery/master
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

return re.search('__version__ = [\'"]([^\'"]+)[\'"]', initfile).group(1)
return re.search('__version__ = [\'"]([^\'"]+)[\'"]', initfile)[1]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_version refactored with the following changes:

urlExtraCrapBeforeEnd = regex_or(punctChars, entity) + "+?"
urlExtraCrapBeforeEnd = f"{regex_or(punctChars, entity)}+?"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 58-195 refactored with the following changes:

This removes the following comments ( why? ):

# Standard version  :) :( :] :D :P
# iOS 'emoji' characters (some smileys, some symbols) [\ue001-\uebbb]
#inspired by http://en.wikipedia.org/wiki/User:Scapler/emoticons#East_Asian_style
# TODO should try a big precompiled lexicon from Wikipedia, Dan Ramage told me (BTO) he does this
#          between this and the Java version. One little hack won't hurt...
# reversed version (: D:  use positive lookbehind to remove "(word):"
# myleott: o.O and O.o are two of the biggest sources of differences
# because eyes on the right side is more ambiguous with the standard usage of : ;

Comment on lines -233 to +240
indices.append(first)
indices.append(second)
indices.extend((first, second))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function simpleTokenize refactored with the following changes:

Comment on lines -274 to +280
m = Contractions.search(token)
if m:
if m := Contractions.search(token):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function splitToken refactored with the following changes:

if args.printfeat:
return feat
else:
return repr(feat)
return feat if args.printfeat else repr(feat)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function show refactored with the following changes:

Comment on lines -134 to +139
f = lambda fn, chunks: pool.imap_unordered(fn, chunks, chunksize=chunksize)
yield f
yield lambda fn, chunks: pool.imap_unordered(fn, chunks, chunksize=chunksize)
else:
if initializer is not None:
initializer(*initargs)
f = imap
yield f

yield imap
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function MapPool refactored with the following changes:

Comment on lines -94 to +97
self.domain_index = dict((k,v) for v,k in enumerate(domains))
self.domain_index = {k: v for v,k in enumerate(domains)}

self.coverage_index = defaultdict(set)
self.items = list()
self.items = []
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function CorpusIndexer.__init__ refactored with the following changes:

Comment on lines -142 to +149
reject_langs = set( l for l in lang_domain_count if lang_domain_count[l] < min_domain)

# Remove the languages from the indexer
if reject_langs:
if reject_langs := {
l for l in lang_domain_count if lang_domain_count[l] < min_domain
}:
#print "reject (<{0} domains): {1}".format(min_domain, sorted(reject_langs))
reject_ids = set(self.lang_index[l] for l in reject_langs)
reject_ids = {self.lang_index[l] for l in reject_langs}

new_lang_index = defaultdict(Enumerator())
lm = dict()
lm = {}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function CorpusIndexer.prune_min_domain refactored with the following changes:

This removes the following comments ( why? ):

# Remove the languages from the indexer

Comment on lines -218 to +217
if args.model:
model_dir = args.model
else:
model_dir = os.path.join('.', corpus_name+'.model')

model_dir = args.model or os.path.join('.', f'{corpus_name}.model')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 218-222 refactored with the following changes:

tk_output_f = dict( (k,[feats[i] for i in v]) for k,v in tk_output.iteritems() )
tk_output_f = {k: [feats[i] for i in v] for k,v in tk_output.iteritems()}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Scanner.from_file refactored with the following changes:

Comment on lines -76 to +77
goto = dict()
fail = dict()
goto = {}
fail = {}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Scanner.build refactored with the following changes:

Comment on lines -176 to +177
for key in self.output.get(state, []):
yield key
yield from self.output.get(state, [])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Scanner.search refactored with the following changes:

  • Replace yield inside for loop with yield from (yield-from)

tk_output = {}
for k,v in raw_output.items():
tk_output[k] = tuple(feat_index[f] for f in v)
tk_output = {k: tuple(feat_index[f] for f in v) for k, v in raw_output.items()}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function build_scanner refactored with the following changes:

Comment on lines -212 to +210
return dict((k,v) for (v,k) in enumerate(seq))
return {k: v for (v,k) in enumerate(seq)}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function index refactored with the following changes:

Comment on lines -225 to +223
if args.output:
output_path = args.output
else:
output_path = input_path + '.scanner'

output_path = args.output or f'{input_path}.scanner'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 225-229 refactored with the following changes:

Comment on lines -113 to +122
b_freq_lang = [tempfile.mkstemp(prefix=__procname+'-', suffix='.lang', dir=p)[0] for p in __b_dirs]
b_freq_domain = [tempfile.mkstemp(prefix=__procname+'-', suffix='.domain', dir=p)[0] for p in __b_dirs]
b_freq_lang = [
tempfile.mkstemp(prefix=f'{__procname}-', suffix='.lang', dir=p)[0]
for p in __b_dirs
]

b_freq_domain = [
tempfile.mkstemp(prefix=f'{__procname}-', suffix='.domain', dir=p)[0]
for p in __b_dirs
]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function pass_tokenize refactored with the following changes:

Comment on lines -166 to +182
b_dirs = [ tempfile.mkdtemp(prefix="tokenize-",suffix='-{0}'.format(tokenizer.__class__.__name__), dir=outdir) for i in range(buckets) ]
b_dirs = [
tempfile.mkdtemp(
prefix="tokenize-",
suffix='-{0}'.format(tokenizer.__class__.__name__),
dir=outdir,
)
for _ in range(buckets)
]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function build_index refactored with the following changes:

Comment on lines -214 to +230
if args.temp:
buckets_dir = args.temp
else:
buckets_dir = os.path.join(args.model, 'buckets')
buckets_dir = args.temp or os.path.join(args.model, 'buckets')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 214-245 refactored with the following changes:

@sourcery-ai
Copy link
Author

sourcery-ai bot commented Sep 2, 2022

Sourcery Code Quality Report

✅  Merging this PR will increase code quality in the affected files by 0.76%.

Quality metrics Before After Change
Complexity 9.59 🙂 8.74 🙂 -0.85 👍
Method Length 90.77 🙂 89.61 🙂 -1.16 👍
Working memory 9.66 🙂 9.68 🙂 0.02 👎
Quality 54.70% 🙂 55.46% 🙂 0.76% 👍
Other metrics Before After Change
Lines 1961 1944 -17
Changed files Quality Before Quality After Quality Change
setup.py 94.23% ⭐ 94.38% ⭐ 0.15% 👍
py3langid/examples/_twokenize.py 55.17% 🙂 54.79% 🙂 -0.38% 👎
py3langid/tools/printfeats.py 94.18% ⭐ 95.34% ⭐ 1.16% 👍
py3langid/train/DFfeatureselect.py 52.88% 🙂 54.27% 🙂 1.39% 👍
py3langid/train/IGweight.py 44.61% 😞 46.48% 😞 1.87% 👍
py3langid/train/LDfeatureselect.py 53.86% 🙂 54.50% 🙂 0.64% 👍
py3langid/train/NBtrain.py 59.63% 🙂 61.82% 🙂 2.19% 👍
py3langid/train/common.py 82.11% ⭐ 83.02% ⭐ 0.91% 👍
py3langid/train/index.py 56.40% 🙂 57.03% 🙂 0.63% 👍
py3langid/train/scanner.py 43.95% 😞 43.01% 😞 -0.94% 👎
py3langid/train/tokenize.py 48.42% 😞 49.66% 😞 1.24% 👍

Here are some functions in these files that still need a tune-up:

File Function Complexity Length Working Memory Quality Recommendation
py3langid/train/scanner.py Scanner.build 44 ⛔ 386 ⛔ 8.94% ⛔ Refactor to reduce nesting. Try splitting into smaller methods
py3langid/train/tokenize.py pass_tokenize 20 😞 276 ⛔ 14 😞 29.58% 😞 Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
py3langid/train/IGweight.py pass_IG 17 🙂 345 ⛔ 13 😞 31.13% 😞 Try splitting into smaller methods. Extract out complex expressions
py3langid/train/NBtrain.py learn_ptc 3 ⭐ 283 ⛔ 15 😞 42.95% 😞 Try splitting into smaller methods. Extract out complex expressions
py3langid/train/tokenize.py build_index 4 ⭐ 183 😞 14 😞 49.18% 😞 Try splitting into smaller methods. Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

  • ⭐ excellent
  • 🙂 good
  • 😞 poor
  • ⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.


Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

@adbar adbar closed this Sep 2, 2022
@adbar adbar deleted the sourcery/master branch September 2, 2022 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant