New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sourcery refactored master branch #8
Conversation
0da1ae8
to
60df813
Compare
return re.search('__version__ = [\'"]([^\'"]+)[\'"]', initfile).group(1) | ||
return re.search('__version__ = [\'"]([^\'"]+)[\'"]', initfile)[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function get_version
refactored with the following changes:
- Replace m.group(x) with m[x] for re.Match objects (
use-getitem-for-re-match-groups
)
urlExtraCrapBeforeEnd = regex_or(punctChars, entity) + "+?" | ||
urlExtraCrapBeforeEnd = f"{regex_or(punctChars, entity)}+?" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 58-195
refactored with the following changes:
- Use f-string instead of string concatenation [×31] (
use-fstring-for-concatenation
)
This removes the following comments ( why? ):
# Standard version :) :( :] :D :P
# iOS 'emoji' characters (some smileys, some symbols) [\ue001-\uebbb]
#inspired by http://en.wikipedia.org/wiki/User:Scapler/emoticons#East_Asian_style
# TODO should try a big precompiled lexicon from Wikipedia, Dan Ramage told me (BTO) he does this
# between this and the Java version. One little hack won't hurt...
# reversed version (: D: use positive lookbehind to remove "(word):"
# myleott: o.O and O.o are two of the biggest sources of differences
# because eyes on the right side is more ambiguous with the standard usage of : ;
indices.append(first) | ||
indices.append(second) | ||
indices.extend((first, second)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function simpleTokenize
refactored with the following changes:
- Merge consecutive list appends into a single extend (
merge-list-appends-into-extend
)
m = Contractions.search(token) | ||
if m: | ||
if m := Contractions.search(token): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function splitToken
refactored with the following changes:
- Use named expression to simplify assignment and conditional (
use-named-expression
)
if args.printfeat: | ||
return feat | ||
else: | ||
return repr(feat) | ||
return feat if args.printfeat else repr(feat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function show
refactored with the following changes:
- Replace if statement with if expression (
assign-if-exp
)
f = lambda fn, chunks: pool.imap_unordered(fn, chunks, chunksize=chunksize) | ||
yield f | ||
yield lambda fn, chunks: pool.imap_unordered(fn, chunks, chunksize=chunksize) | ||
else: | ||
if initializer is not None: | ||
initializer(*initargs) | ||
f = imap | ||
yield f | ||
|
||
yield imap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function MapPool
refactored with the following changes:
- Inline variable that is immediately yielded [×2] (
inline-immediately-yielded-variable
)
self.domain_index = dict((k,v) for v,k in enumerate(domains)) | ||
self.domain_index = {k: v for v,k in enumerate(domains)} | ||
|
||
self.coverage_index = defaultdict(set) | ||
self.items = list() | ||
self.items = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function CorpusIndexer.__init__
refactored with the following changes:
- Replace list(), dict() or set() with comprehension (
collection-builtin-to-comprehension
) - Replace list() with [] (
list-literal
)
reject_langs = set( l for l in lang_domain_count if lang_domain_count[l] < min_domain) | ||
|
||
# Remove the languages from the indexer | ||
if reject_langs: | ||
if reject_langs := { | ||
l for l in lang_domain_count if lang_domain_count[l] < min_domain | ||
}: | ||
#print "reject (<{0} domains): {1}".format(min_domain, sorted(reject_langs)) | ||
reject_ids = set(self.lang_index[l] for l in reject_langs) | ||
reject_ids = {self.lang_index[l] for l in reject_langs} | ||
|
||
new_lang_index = defaultdict(Enumerator()) | ||
lm = dict() | ||
lm = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function CorpusIndexer.prune_min_domain
refactored with the following changes:
- Use named expression to simplify assignment and conditional (
use-named-expression
) - Replace list(), dict() or set() with comprehension [×2] (
collection-builtin-to-comprehension
) - Replace dict() with {} (
dict-literal
)
This removes the following comments ( why? ):
# Remove the languages from the indexer
if args.model: | ||
model_dir = args.model | ||
else: | ||
model_dir = os.path.join('.', corpus_name+'.model') | ||
|
||
model_dir = args.model or os.path.join('.', f'{corpus_name}.model') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 218-222
refactored with the following changes:
- Simplify if expression by using or (
or-if-exp-identity
) - Replace if statement with if expression (
assign-if-exp
) - Use f-string instead of string concatenation (
use-fstring-for-concatenation
)
tk_output_f = dict( (k,[feats[i] for i in v]) for k,v in tk_output.iteritems() ) | ||
tk_output_f = {k: [feats[i] for i in v] for k,v in tk_output.iteritems()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function Scanner.from_file
refactored with the following changes:
- Replace list(), dict() or set() with comprehension (
collection-builtin-to-comprehension
)
goto = dict() | ||
fail = dict() | ||
goto = {} | ||
fail = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function Scanner.build
refactored with the following changes:
- Replace dict() with {} [×2] (
dict-literal
) - Replace list(), dict() or set() with comprehension (
collection-builtin-to-comprehension
)
for key in self.output.get(state, []): | ||
yield key | ||
yield from self.output.get(state, []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function Scanner.search
refactored with the following changes:
- Replace yield inside for loop with yield from (
yield-from
)
tk_output = {} | ||
for k,v in raw_output.items(): | ||
tk_output[k] = tuple(feat_index[f] for f in v) | ||
tk_output = {k: tuple(feat_index[f] for f in v) for k, v in raw_output.items()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function build_scanner
refactored with the following changes:
- Convert for loop into dictionary comprehension (
dict-comprehension
)
return dict((k,v) for (v,k) in enumerate(seq)) | ||
return {k: v for (v,k) in enumerate(seq)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function index
refactored with the following changes:
- Replace list(), dict() or set() with comprehension (
collection-builtin-to-comprehension
)
if args.output: | ||
output_path = args.output | ||
else: | ||
output_path = input_path + '.scanner' | ||
|
||
output_path = args.output or f'{input_path}.scanner' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 225-229
refactored with the following changes:
- Simplify if expression by using or (
or-if-exp-identity
) - Replace if statement with if expression (
assign-if-exp
) - Use f-string instead of string concatenation (
use-fstring-for-concatenation
)
b_freq_lang = [tempfile.mkstemp(prefix=__procname+'-', suffix='.lang', dir=p)[0] for p in __b_dirs] | ||
b_freq_domain = [tempfile.mkstemp(prefix=__procname+'-', suffix='.domain', dir=p)[0] for p in __b_dirs] | ||
b_freq_lang = [ | ||
tempfile.mkstemp(prefix=f'{__procname}-', suffix='.lang', dir=p)[0] | ||
for p in __b_dirs | ||
] | ||
|
||
b_freq_domain = [ | ||
tempfile.mkstemp(prefix=f'{__procname}-', suffix='.domain', dir=p)[0] | ||
for p in __b_dirs | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function pass_tokenize
refactored with the following changes:
- Use f-string instead of string concatenation [×2] (
use-fstring-for-concatenation
)
b_dirs = [ tempfile.mkdtemp(prefix="tokenize-",suffix='-{0}'.format(tokenizer.__class__.__name__), dir=outdir) for i in range(buckets) ] | ||
b_dirs = [ | ||
tempfile.mkdtemp( | ||
prefix="tokenize-", | ||
suffix='-{0}'.format(tokenizer.__class__.__name__), | ||
dir=outdir, | ||
) | ||
for _ in range(buckets) | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function build_index
refactored with the following changes:
- Replace unused for index with underscore (
for-index-underscore
)
if args.temp: | ||
buckets_dir = args.temp | ||
else: | ||
buckets_dir = os.path.join(args.model, 'buckets') | ||
buckets_dir = args.temp or os.path.join(args.model, 'buckets') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 214-245
refactored with the following changes:
- Simplify if expression by using or [×2] (
or-if-exp-identity
) - Replace if statement with if expression (
assign-if-exp
)
Sourcery Code Quality Report✅ Merging this PR will increase code quality in the affected files by 0.76%.
Here are some functions in these files that still need a tune-up:
Legend and ExplanationThe emojis denote the absolute quality of the code:
The 👍 and 👎 indicate whether the quality has improved or gotten worse with this pull request. Please see our documentation here for details on how these metrics are calculated. We are actively working on this report - lots more documentation and extra metrics to come! Help us improve this quality report! |
Branch
master
refactored by Sourcery.If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.
See our documentation here.
Run Sourcery locally
Reduce the feedback loop during development by using the Sourcery editor plugin:
Review changes via command line
To manually merge these changes, make sure you're on the
master
branch, then run:Help us improve this pull request!