Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sourcery refactored master branch #7

Closed
wants to merge 1 commit into from
Closed

Conversation

sourcery-ai[bot]
Copy link

@sourcery-ai sourcery-ai bot commented Jul 13, 2022

Branch master refactored by Sourcery.

If you're happy with these changes, merge this Pull Request using the Squash and merge strategy.

See our documentation here.

Run Sourcery locally

Reduce the feedback loop during development by using the Sourcery editor plugin:

Review changes via command line

To manually merge these changes, make sure you're on the master branch, then run:

git fetch origin sourcery/master
git merge --ff-only FETCH_HEAD
git reset HEAD^

Help us improve this pull request!

return re.search('__version__ = [\'"]([^\'"]+)[\'"]', initfile).group(1)
return re.search('__version__ = [\'"]([^\'"]+)[\'"]', initfile)[1]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function get_version refactored with the following changes:

urlExtraCrapBeforeEnd = regex_or(punctChars, entity) + "+?"
urlExtraCrapBeforeEnd = f"{regex_or(punctChars, entity)}+?"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 58-195 refactored with the following changes:

This removes the following comments ( why? ):

# because eyes on the right side is more ambiguous with the standard usage of : ;
#          between this and the Java version. One little hack won't hurt...
# TODO should try a big precompiled lexicon from Wikipedia, Dan Ramage told me (BTO) he does this
# iOS 'emoji' characters (some smileys, some symbols) [\ue001-\uebbb]
# myleott: o.O and O.o are two of the biggest sources of differences
# Standard version  :) :( :] :D :P
#inspired by http://en.wikipedia.org/wiki/User:Scapler/emoticons#East_Asian_style
# reversed version (: D:  use positive lookbehind to remove "(word):"

Comment on lines -233 to +240
indices.append(first)
indices.append(second)
indices.extend((first, second))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function simpleTokenize refactored with the following changes:

Comment on lines -274 to +280
m = Contractions.search(token)
if m:
if m := Contractions.search(token):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function splitToken refactored with the following changes:

if args.printfeat:
return feat
else:
return repr(feat)
return feat if args.printfeat else repr(feat)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function show refactored with the following changes:

Comment on lines -134 to +139
f = lambda fn, chunks: pool.imap_unordered(fn, chunks, chunksize=chunksize)
yield f
yield lambda fn, chunks: pool.imap_unordered(fn, chunks, chunksize=chunksize)
else:
if initializer is not None:
initializer(*initargs)
f = imap
yield f

yield imap
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function MapPool refactored with the following changes:

Comment on lines -94 to +97
self.domain_index = dict((k,v) for v,k in enumerate(domains))
self.domain_index = {k: v for v,k in enumerate(domains)}

self.coverage_index = defaultdict(set)
self.items = list()
self.items = []
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function CorpusIndexer.__init__ refactored with the following changes:

Comment on lines -142 to +149
reject_langs = set( l for l in lang_domain_count if lang_domain_count[l] < min_domain)

# Remove the languages from the indexer
if reject_langs:
if reject_langs := {
l for l in lang_domain_count if lang_domain_count[l] < min_domain
}:
#print "reject (<{0} domains): {1}".format(min_domain, sorted(reject_langs))
reject_ids = set(self.lang_index[l] for l in reject_langs)
reject_ids = {self.lang_index[l] for l in reject_langs}

new_lang_index = defaultdict(Enumerator())
lm = dict()
lm = {}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function CorpusIndexer.prune_min_domain refactored with the following changes:

This removes the following comments ( why? ):

# Remove the languages from the indexer

Comment on lines -218 to +217
if args.model:
model_dir = args.model
else:
model_dir = os.path.join('.', corpus_name+'.model')

model_dir = args.model or os.path.join('.', f'{corpus_name}.model')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 218-222 refactored with the following changes:

tk_output_f = dict( (k,[feats[i] for i in v]) for k,v in tk_output.iteritems() )
tk_output_f = {k: [feats[i] for i in v] for k,v in tk_output.iteritems()}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Scanner.from_file refactored with the following changes:

Comment on lines -76 to +77
goto = dict()
fail = dict()
goto = {}
fail = {}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Scanner.build refactored with the following changes:

Comment on lines -176 to +177
for key in self.output.get(state, []):
yield key
yield from self.output.get(state, [])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function Scanner.search refactored with the following changes:

  • Replace yield inside for loop with yield from (yield-from)

tk_output = {}
for k,v in raw_output.items():
tk_output[k] = tuple(feat_index[f] for f in v)
tk_output = {k: tuple(feat_index[f] for f in v) for k, v in raw_output.items()}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function build_scanner refactored with the following changes:

Comment on lines -212 to +210
return dict((k,v) for (v,k) in enumerate(seq))
return {k: v for (v,k) in enumerate(seq)}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function index refactored with the following changes:

Comment on lines -225 to +223
if args.output:
output_path = args.output
else:
output_path = input_path + '.scanner'

output_path = args.output or f'{input_path}.scanner'
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 225-229 refactored with the following changes:

Comment on lines -113 to +122
b_freq_lang = [tempfile.mkstemp(prefix=__procname+'-', suffix='.lang', dir=p)[0] for p in __b_dirs]
b_freq_domain = [tempfile.mkstemp(prefix=__procname+'-', suffix='.domain', dir=p)[0] for p in __b_dirs]
b_freq_lang = [
tempfile.mkstemp(prefix=f'{__procname}-', suffix='.lang', dir=p)[0]
for p in __b_dirs
]

b_freq_domain = [
tempfile.mkstemp(prefix=f'{__procname}-', suffix='.domain', dir=p)[0]
for p in __b_dirs
]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function pass_tokenize refactored with the following changes:

Comment on lines -166 to +182
b_dirs = [ tempfile.mkdtemp(prefix="tokenize-",suffix='-{0}'.format(tokenizer.__class__.__name__), dir=outdir) for i in range(buckets) ]
b_dirs = [
tempfile.mkdtemp(
prefix="tokenize-",
suffix='-{0}'.format(tokenizer.__class__.__name__),
dir=outdir,
)
for _ in range(buckets)
]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function build_index refactored with the following changes:

Comment on lines -214 to +230
if args.temp:
buckets_dir = args.temp
else:
buckets_dir = os.path.join(args.model, 'buckets')
buckets_dir = args.temp or os.path.join(args.model, 'buckets')
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 214-245 refactored with the following changes:

@adbar adbar closed this Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant