Skip to content

Commit

Permalink
also dropping sentences now!
Browse files Browse the repository at this point in the history
  • Loading branch information
Robert Meyer committed Feb 20, 2018
1 parent 615a1aa commit 814a9d3
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions trufflepig/preprocessing.py
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,10 @@ def preprocess(post_df, ncores=4, chunksize=500,
post_df.filtered_sentences,
ncores=ncores,
chunksize=chunksize)
post_df.drop('filtered_sentences', axis=1, inplace=True)
logger.info('Intermediate garbage collection.')
gc.collect()

to_drop = post_df.loc[post_df.grammar_errors_per_sentence > max_grammar_errors_per_sentence]
post_df.drop(to_drop.index, inplace=True)
logger.info('Filtered according to grammar mistake limit {} per sentence '
Expand Down

0 comments on commit 814a9d3

Please sign in to comment.