Sourcery refactored master branch #7

sourcery-ai · 2022-07-13T15:24:08Z

Branch master refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch origin sourcery/master
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

sourcery-ai · 2022-07-13T15:24:10Z

setup.py

-    return re.search('__version__ = [\'"]([^\'"]+)[\'"]', initfile).group(1)
+    return re.search('__version__ = [\'"]([^\'"]+)[\'"]', initfile)[1]


Function get_version refactored with the following changes:

Replace m.group(x) with m[x] for re.Match objects (use-getitem-for-re-match-groups)

sourcery-ai · 2022-07-13T15:24:10Z

py3langid/examples/_twokenize.py

-urlExtraCrapBeforeEnd = regex_or(punctChars, entity) + "+?"
+urlExtraCrapBeforeEnd = f"{regex_or(punctChars, entity)}+?"


Lines 58-195 refactored with the following changes:

Use f-string instead of string concatenation [×31] (use-fstring-for-concatenation)

This removes the following comments ( why? ):

# because eyes on the right side is more ambiguous with the standard usage of : ; # between this and the Java version. One little hack won't hurt... # TODO should try a big precompiled lexicon from Wikipedia, Dan Ramage told me (BTO) he does this # iOS 'emoji' characters (some smileys, some symbols) [\ue001-\uebbb] # myleott: o.O and O.o are two of the biggest sources of differences # Standard version :) :( :] :D :P #inspired by http://en.wikipedia.org/wiki/User:Scapler/emoticons#East_Asian_style # reversed version (: D: use positive lookbehind to remove "(word):"

sourcery-ai · 2022-07-13T15:24:11Z

py3langid/examples/_twokenize.py

-        indices.append(first)
-        indices.append(second)
+        indices.extend((first, second))


Function simpleTokenize refactored with the following changes:

Merge consecutive list appends into a single extend (merge-list-appends-into-extend)

sourcery-ai · 2022-07-13T15:24:11Z

py3langid/examples/_twokenize.py

-    m = Contractions.search(token)
-    if m:
+    if m := Contractions.search(token):


Function splitToken refactored with the following changes:

Use named expression to simplify assignment and conditional (use-named-expression)

sourcery-ai · 2022-07-13T15:24:11Z

py3langid/tools/printfeats.py

-        if args.printfeat:
-            return feat
-        else:
-            return repr(feat)
+        return feat if args.printfeat else repr(feat)


Function show refactored with the following changes:

Replace if statement with if expression (assign-if-exp)

sourcery-ai · 2022-07-13T15:24:13Z

py3langid/train/common.py

-            f = lambda fn, chunks: pool.imap_unordered(fn, chunks, chunksize=chunksize)
-            yield f
+            yield lambda fn, chunks: pool.imap_unordered(fn, chunks, chunksize=chunksize)
    else:
        if initializer is not None:
            initializer(*initargs)
-        f = imap
-        yield f
-
+        yield imap


Function MapPool refactored with the following changes:

Inline variable that is immediately yielded [×2] (inline-immediately-yielded-variable)

sourcery-ai · 2022-07-13T15:24:13Z

py3langid/train/index.py

-            self.domain_index = dict((k,v) for v,k in enumerate(domains))
+            self.domain_index = {k: v for v,k in enumerate(domains)}

        self.coverage_index = defaultdict(set)
-        self.items = list()
+        self.items = []


Function CorpusIndexer.__init__ refactored with the following changes:

Replace list(), dict() or set() with comprehension (collection-builtin-to-comprehension)

Replace list() with [] (list-literal)

sourcery-ai · 2022-07-13T15:24:13Z

py3langid/train/index.py

-        reject_langs = set( l for l in lang_domain_count if lang_domain_count[l] < min_domain)
-
-        # Remove the languages from the indexer
-        if reject_langs:
+        if reject_langs := {
+            l for l in lang_domain_count if lang_domain_count[l] < min_domain
+        }:
            #print "reject (<{0} domains): {1}".format(min_domain, sorted(reject_langs))
-            reject_ids = set(self.lang_index[l] for l in reject_langs)
+            reject_ids = {self.lang_index[l] for l in reject_langs}

            new_lang_index = defaultdict(Enumerator())
-            lm = dict()
+            lm = {}


Function CorpusIndexer.prune_min_domain refactored with the following changes:

Use named expression to simplify assignment and conditional (use-named-expression)

Replace list(), dict() or set() with comprehension [×2] (collection-builtin-to-comprehension)

Replace dict() with {} (dict-literal)

This removes the following comments ( why? ):

# Remove the languages from the indexer

sourcery-ai · 2022-07-13T15:24:13Z

py3langid/train/index.py

-    if args.model:
-        model_dir = args.model
-    else:
-        model_dir = os.path.join('.', corpus_name+'.model')
-
+    model_dir = args.model or os.path.join('.', f'{corpus_name}.model')


Lines 218-222 refactored with the following changes:

Simplify if expression by using or (or-if-exp-identity)

Replace if statement with if expression (assign-if-exp)

Use f-string instead of string concatenation (use-fstring-for-concatenation)

sourcery-ai · 2022-07-13T15:24:13Z

py3langid/train/scanner.py

-        tk_output_f = dict( (k,[feats[i] for i in v]) for k,v in tk_output.iteritems() )
+        tk_output_f = {k: [feats[i] for i in v] for k,v in tk_output.iteritems()}


Function Scanner.from_file refactored with the following changes:

Replace list(), dict() or set() with comprehension (collection-builtin-to-comprehension)

sourcery-ai · 2022-07-13T15:24:17Z

py3langid/train/scanner.py

-        goto = dict()
-        fail = dict()
+        goto = {}
+        fail = {}


Function Scanner.build refactored with the following changes:

Replace dict() with {} [×2] (dict-literal)

Replace list(), dict() or set() with comprehension (collection-builtin-to-comprehension)

sourcery-ai · 2022-07-13T15:24:17Z

py3langid/train/scanner.py

-            for key in self.output.get(state, []):
-                yield key
+            yield from self.output.get(state, [])


Function Scanner.search refactored with the following changes:

Replace yield inside for loop with yield from (yield-from)

sourcery-ai · 2022-07-13T15:24:18Z

py3langid/train/scanner.py

-    tk_output = {}
-    for k,v in raw_output.items():
-        tk_output[k] = tuple(feat_index[f] for f in v)
+    tk_output = {k: tuple(feat_index[f] for f in v) for k, v in raw_output.items()}


Function build_scanner refactored with the following changes:

Convert for loop into dictionary comprehension (dict-comprehension)

sourcery-ai · 2022-07-13T15:24:18Z

py3langid/train/scanner.py

-    return dict((k,v) for (v,k) in enumerate(seq))
+    return {k: v for (v,k) in enumerate(seq)}


Function index refactored with the following changes:

Replace list(), dict() or set() with comprehension (collection-builtin-to-comprehension)

sourcery-ai · 2022-07-13T15:24:18Z

py3langid/train/scanner.py

-    if args.output:
-        output_path = args.output
-    else:
-        output_path = input_path + '.scanner'
-
+    output_path = args.output or f'{input_path}.scanner'


Lines 225-229 refactored with the following changes:

Simplify if expression by using or (or-if-exp-identity)

Replace if statement with if expression (assign-if-exp)

Use f-string instead of string concatenation (use-fstring-for-concatenation)

sourcery-ai · 2022-07-13T15:24:18Z

py3langid/train/tokenize.py

-    b_freq_lang = [tempfile.mkstemp(prefix=__procname+'-', suffix='.lang', dir=p)[0] for p in __b_dirs]
-    b_freq_domain = [tempfile.mkstemp(prefix=__procname+'-', suffix='.domain', dir=p)[0] for p in __b_dirs]
+    b_freq_lang = [
+        tempfile.mkstemp(prefix=f'{__procname}-', suffix='.lang', dir=p)[0]
+        for p in __b_dirs
+    ]
+
+    b_freq_domain = [
+        tempfile.mkstemp(prefix=f'{__procname}-', suffix='.domain', dir=p)[0]
+        for p in __b_dirs
+    ]
+


Function pass_tokenize refactored with the following changes:

Use f-string instead of string concatenation [×2] (use-fstring-for-concatenation)

sourcery-ai · 2022-07-13T15:24:18Z

py3langid/train/tokenize.py

-    b_dirs = [ tempfile.mkdtemp(prefix="tokenize-",suffix='-{0}'.format(tokenizer.__class__.__name__), dir=outdir) for i in range(buckets) ]
+    b_dirs = [
+        tempfile.mkdtemp(
+            prefix="tokenize-",
+            suffix='-{0}'.format(tokenizer.__class__.__name__),
+            dir=outdir,
+        )
+        for _ in range(buckets)
+    ]
+


Function build_index refactored with the following changes:

Replace unused for index with underscore (for-index-underscore)

sourcery-ai · 2022-07-13T15:24:18Z

py3langid/train/tokenize.py

-    if args.temp:
-        buckets_dir = args.temp
-    else:
-        buckets_dir = os.path.join(args.model, 'buckets')
+    buckets_dir = args.temp or os.path.join(args.model, 'buckets')


Lines 214-245 refactored with the following changes:

Simplify if expression by using or [×2] (or-if-exp-identity)

Replace if statement with if expression (assign-if-exp)

'Refactored by Sourcery'

0da1ae8

sourcery-ai bot force-pushed the sourcery/master branch from 9f60691 to 0da1ae8 Compare July 13, 2022 15:24

sourcery-ai bot requested a review from adbar July 13, 2022 15:24

sourcery-ai bot commented Jul 13, 2022

View reviewed changes

adbar closed this Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sourcery refactored master branch #7

Sourcery refactored master branch #7

sourcery-ai bot commented Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

sourcery-ai bot Jul 13, 2022

		return re.search('__version__ = [\'"]([^\'"]+)[\'"]', initfile).group(1)
		return re.search('__version__ = [\'"]([^\'"]+)[\'"]', initfile)[1]

		urlExtraCrapBeforeEnd = regex_or(punctChars, entity) + "+?"
		urlExtraCrapBeforeEnd = f"{regex_or(punctChars, entity)}+?"

		tk_output_f = dict( (k,[feats[i] for i in v]) for k,v in tk_output.iteritems() )
		tk_output_f = {k: [feats[i] for i in v] for k,v in tk_output.iteritems()}

		return dict((k,v) for (v,k) in enumerate(seq))
		return {k: v for (v,k) in enumerate(seq)}

Sourcery refactored master branch #7

Sourcery refactored master branch #7

Conversation

sourcery-ai bot commented Jul 13, 2022

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment

sourcery-ai bot Jul 13, 2022

Choose a reason for hiding this comment