Sourcery refactored master branch #8

sourcery-ai · 2022-09-02T15:49:27Z

Branch master refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch origin sourcery/master
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

sourcery-ai · 2022-09-02T15:49:29Z

setup.py

-    return re.search('__version__ = [\'"]([^\'"]+)[\'"]', initfile).group(1)
+    return re.search('__version__ = [\'"]([^\'"]+)[\'"]', initfile)[1]


Function get_version refactored with the following changes:

Replace m.group(x) with m[x] for re.Match objects (use-getitem-for-re-match-groups)

sourcery-ai · 2022-09-02T15:49:29Z

py3langid/examples/_twokenize.py

-urlExtraCrapBeforeEnd = regex_or(punctChars, entity) + "+?"
+urlExtraCrapBeforeEnd = f"{regex_or(punctChars, entity)}+?"


Lines 58-195 refactored with the following changes:

Use f-string instead of string concatenation [×31] (use-fstring-for-concatenation)

This removes the following comments ( why? ):

# Standard version :) :( :] :D :P # iOS 'emoji' characters (some smileys, some symbols) [\ue001-\uebbb] #inspired by http://en.wikipedia.org/wiki/User:Scapler/emoticons#East_Asian_style # TODO should try a big precompiled lexicon from Wikipedia, Dan Ramage told me (BTO) he does this # between this and the Java version. One little hack won't hurt... # reversed version (: D: use positive lookbehind to remove "(word):" # myleott: o.O and O.o are two of the biggest sources of differences # because eyes on the right side is more ambiguous with the standard usage of : ;

sourcery-ai · 2022-09-02T15:49:29Z

py3langid/examples/_twokenize.py

-        indices.append(first)
-        indices.append(second)
+        indices.extend((first, second))


Function simpleTokenize refactored with the following changes:

Merge consecutive list appends into a single extend (merge-list-appends-into-extend)

sourcery-ai · 2022-09-02T15:49:29Z

py3langid/examples/_twokenize.py

-    m = Contractions.search(token)
-    if m:
+    if m := Contractions.search(token):


Function splitToken refactored with the following changes:

Use named expression to simplify assignment and conditional (use-named-expression)

sourcery-ai · 2022-09-02T15:49:29Z

py3langid/tools/printfeats.py

-        if args.printfeat:
-            return feat
-        else:
-            return repr(feat)
+        return feat if args.printfeat else repr(feat)


Function show refactored with the following changes:

Replace if statement with if expression (assign-if-exp)

sourcery-ai · 2022-09-02T15:49:31Z

py3langid/train/common.py

-            f = lambda fn, chunks: pool.imap_unordered(fn, chunks, chunksize=chunksize)
-            yield f
+            yield lambda fn, chunks: pool.imap_unordered(fn, chunks, chunksize=chunksize)
    else:
        if initializer is not None:
            initializer(*initargs)
-        f = imap
-        yield f
-
+        yield imap


Function MapPool refactored with the following changes:

Inline variable that is immediately yielded [×2] (inline-immediately-yielded-variable)

sourcery-ai · 2022-09-02T15:49:31Z

py3langid/train/index.py

-            self.domain_index = dict((k,v) for v,k in enumerate(domains))
+            self.domain_index = {k: v for v,k in enumerate(domains)}

        self.coverage_index = defaultdict(set)
-        self.items = list()
+        self.items = []


Function CorpusIndexer.__init__ refactored with the following changes:

Replace list(), dict() or set() with comprehension (collection-builtin-to-comprehension)

Replace list() with [] (list-literal)

sourcery-ai · 2022-09-02T15:49:31Z

py3langid/train/index.py

-        reject_langs = set( l for l in lang_domain_count if lang_domain_count[l] < min_domain)
-
-        # Remove the languages from the indexer
-        if reject_langs:
+        if reject_langs := {
+            l for l in lang_domain_count if lang_domain_count[l] < min_domain
+        }:
            #print "reject (<{0} domains): {1}".format(min_domain, sorted(reject_langs))
-            reject_ids = set(self.lang_index[l] for l in reject_langs)
+            reject_ids = {self.lang_index[l] for l in reject_langs}

            new_lang_index = defaultdict(Enumerator())
-            lm = dict()
+            lm = {}


Function CorpusIndexer.prune_min_domain refactored with the following changes:

Use named expression to simplify assignment and conditional (use-named-expression)

Replace list(), dict() or set() with comprehension [×2] (collection-builtin-to-comprehension)

Replace dict() with {} (dict-literal)

This removes the following comments ( why? ):

# Remove the languages from the indexer

sourcery-ai · 2022-09-02T15:49:32Z

py3langid/train/index.py

-    if args.model:
-        model_dir = args.model
-    else:
-        model_dir = os.path.join('.', corpus_name+'.model')
-
+    model_dir = args.model or os.path.join('.', f'{corpus_name}.model')


Lines 218-222 refactored with the following changes:

Simplify if expression by using or (or-if-exp-identity)

Replace if statement with if expression (assign-if-exp)

Use f-string instead of string concatenation (use-fstring-for-concatenation)

sourcery-ai · 2022-09-02T15:49:32Z

py3langid/train/scanner.py

-        tk_output_f = dict( (k,[feats[i] for i in v]) for k,v in tk_output.iteritems() )
+        tk_output_f = {k: [feats[i] for i in v] for k,v in tk_output.iteritems()}


Function Scanner.from_file refactored with the following changes:

Replace list(), dict() or set() with comprehension (collection-builtin-to-comprehension)

sourcery-ai · 2022-09-02T15:49:36Z

py3langid/train/scanner.py

-        goto = dict()
-        fail = dict()
+        goto = {}
+        fail = {}


Function Scanner.build refactored with the following changes:

Replace dict() with {} [×2] (dict-literal)

Replace list(), dict() or set() with comprehension (collection-builtin-to-comprehension)

sourcery-ai · 2022-09-02T15:49:36Z

py3langid/train/scanner.py

-            for key in self.output.get(state, []):
-                yield key
+            yield from self.output.get(state, [])


Function Scanner.search refactored with the following changes:

Replace yield inside for loop with yield from (yield-from)

sourcery-ai · 2022-09-02T15:49:36Z

py3langid/train/scanner.py

-    tk_output = {}
-    for k,v in raw_output.items():
-        tk_output[k] = tuple(feat_index[f] for f in v)
+    tk_output = {k: tuple(feat_index[f] for f in v) for k, v in raw_output.items()}


Function build_scanner refactored with the following changes:

Convert for loop into dictionary comprehension (dict-comprehension)

sourcery-ai · 2022-09-02T15:49:37Z

py3langid/train/scanner.py

-    return dict((k,v) for (v,k) in enumerate(seq))
+    return {k: v for (v,k) in enumerate(seq)}


Function index refactored with the following changes:

Replace list(), dict() or set() with comprehension (collection-builtin-to-comprehension)

sourcery-ai · 2022-09-02T15:49:37Z

py3langid/train/scanner.py

-    if args.output:
-        output_path = args.output
-    else:
-        output_path = input_path + '.scanner'
-
+    output_path = args.output or f'{input_path}.scanner'


Lines 225-229 refactored with the following changes:

Simplify if expression by using or (or-if-exp-identity)

Replace if statement with if expression (assign-if-exp)

Use f-string instead of string concatenation (use-fstring-for-concatenation)

sourcery-ai · 2022-09-02T15:49:37Z

py3langid/train/tokenize.py

-    b_freq_lang = [tempfile.mkstemp(prefix=__procname+'-', suffix='.lang', dir=p)[0] for p in __b_dirs]
-    b_freq_domain = [tempfile.mkstemp(prefix=__procname+'-', suffix='.domain', dir=p)[0] for p in __b_dirs]
+    b_freq_lang = [
+        tempfile.mkstemp(prefix=f'{__procname}-', suffix='.lang', dir=p)[0]
+        for p in __b_dirs
+    ]
+
+    b_freq_domain = [
+        tempfile.mkstemp(prefix=f'{__procname}-', suffix='.domain', dir=p)[0]
+        for p in __b_dirs
+    ]
+


Function pass_tokenize refactored with the following changes:

Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)

sourcery-ai · 2022-09-02T15:49:37Z

py3langid/train/tokenize.py

-    b_dirs = [ tempfile.mkdtemp(prefix="tokenize-",suffix='-{0}'.format(tokenizer.__class__.__name__), dir=outdir) for i in range(buckets) ]
+    b_dirs = [
+        tempfile.mkdtemp(
+            prefix="tokenize-",
+            suffix='-{0}'.format(tokenizer.__class__.__name__),
+            dir=outdir,
+        )
+        for _ in range(buckets)
+    ]
+


Function build_index refactored with the following changes:

Replace unused for index with underscore (for-index-underscore)

sourcery-ai · 2022-09-02T15:49:37Z

py3langid/train/tokenize.py

-    if args.temp:
-        buckets_dir = args.temp
-    else:
-        buckets_dir = os.path.join(args.model, 'buckets')
+    buckets_dir = args.temp or os.path.join(args.model, 'buckets')


Lines 214-245 refactored with the following changes:

Simplify if expression by using or [×2] (or-if-exp-identity)

Replace if statement with if expression (assign-if-exp)

sourcery-ai · 2022-09-02T15:49:51Z

Sourcery Code Quality Report

✅ Merging this PR will increase code quality in the affected files by 0.76%.

Quality metrics	Before	After	Change
Complexity	9.59 🙂	8.74 🙂	-0.85 👍
Method Length	90.77 🙂	89.61 🙂	-1.16 👍
Working memory	9.66 🙂	9.68 🙂	0.02 👎
Quality	54.70% 🙂	55.46% 🙂	0.76% 👍

Other metrics	Before	After	Change
Lines	1961	1944	-17

Changed files	Quality Before	Quality After	Quality Change
setup.py	94.23% ⭐	94.38% ⭐	0.15% 👍
py3langid/examples/_twokenize.py	55.17% 🙂	54.79% 🙂	-0.38% 👎
py3langid/tools/printfeats.py	94.18% ⭐	95.34% ⭐	1.16% 👍
py3langid/train/DFfeatureselect.py	52.88% 🙂	54.27% 🙂	1.39% 👍
py3langid/train/IGweight.py	44.61% 😞	46.48% 😞	1.87% 👍
py3langid/train/LDfeatureselect.py	53.86% 🙂	54.50% 🙂	0.64% 👍
py3langid/train/NBtrain.py	59.63% 🙂	61.82% 🙂	2.19% 👍
py3langid/train/common.py	82.11% ⭐	83.02% ⭐	0.91% 👍
py3langid/train/index.py	56.40% 🙂	57.03% 🙂	0.63% 👍
py3langid/train/scanner.py	43.95% 😞	43.01% 😞	-0.94% 👎
py3langid/train/tokenize.py	48.42% 😞	49.66% 😞	1.24% 👍

Here are some functions in these files that still need a tune-up:

File	Function	Complexity	Length	Working Memory	Quality	Recommendation
py3langid/train/scanner.py	Scanner.build	44 ⛔	386 ⛔		8.94% ⛔	Refactor to reduce nesting. Try splitting into smaller methods
py3langid/train/tokenize.py	pass_tokenize	20 😞	276 ⛔	14 😞	29.58% 😞	Refactor to reduce nesting. Try splitting into smaller methods. Extract out complex expressions
py3langid/train/IGweight.py	pass_IG	17 🙂	345 ⛔	13 😞	31.13% 😞	Try splitting into smaller methods. Extract out complex expressions
py3langid/train/NBtrain.py	learn_ptc	3 ⭐	283 ⛔	15 😞	42.95% 😞	Try splitting into smaller methods. Extract out complex expressions
py3langid/train/tokenize.py	build_index	4 ⭐	183 😞	14 😞	49.18% 😞	Try splitting into smaller methods. Extract out complex expressions

Legend and Explanation

The emojis denote the absolute quality of the code:

⭐ excellent
🙂 good
😞 poor
⛔ very poor

The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request.

Please see our documentation here for details on how these metrics are calculated.

We are actively working on this report - lots more documentation and extra metrics to come!

Help us improve this quality report!

'Refactored by Sourcery'

60df813

sourcery-ai bot requested a review from adbar September 2, 2022 15:49

sourcery-ai bot force-pushed the sourcery/master branch from 0da1ae8 to 60df813 Compare September 2, 2022 15:49

sourcery-ai bot commented Sep 2, 2022

View reviewed changes

adbar closed this Sep 2, 2022

adbar deleted the sourcery/master branch September 2, 2022 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sourcery refactored master branch #8

Sourcery refactored master branch #8

sourcery-ai bot commented Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot Sep 2, 2022

sourcery-ai bot commented Sep 2, 2022

		return re.search('__version__ = [\'"]([^\'"]+)[\'"]', initfile).group(1)
		return re.search('__version__ = [\'"]([^\'"]+)[\'"]', initfile)[1]

		urlExtraCrapBeforeEnd = regex_or(punctChars, entity) + "+?"
		urlExtraCrapBeforeEnd = f"{regex_or(punctChars, entity)}+?"

		tk_output_f = dict( (k,[feats[i] for i in v]) for k,v in tk_output.iteritems() )
		tk_output_f = {k: [feats[i] for i in v] for k,v in tk_output.iteritems()}

		return dict((k,v) for (v,k) in enumerate(seq))
		return {k: v for (v,k) in enumerate(seq)}

Sourcery refactored master branch #8

Sourcery refactored master branch #8

Conversation

sourcery-ai bot commented Sep 2, 2022

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot Sep 2, 2022

Choose a reason for hiding this comment

sourcery-ai bot commented Sep 2, 2022

Sourcery Code Quality Report

Legend and Explanation