Skip to content

Commit

Permalink
Merge pull request #13 from hugo-quantmetry/master
Browse files Browse the repository at this point in the history
fixed tokenizer from conf file
  • Loading branch information
hugo-quantmetry committed Aug 28, 2019
2 parents 9a9d54d + 40aef80 commit 038641b
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion melusine/nlp_tools/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ def tokenize(self, row):
def _tokenize(self, text, pattern=regex_tokenize):
"""Returns list of tokens from text."""
if isinstance(text, str):
tokens = re.findall("\w+(?:[\?\-\"_]\w+)*", text, re.M+re.DOTALL)
tokens = re.findall(pattern, text, re.M+re.DOTALL)
tokens = self._remove_stopwords(tokens)
else:
tokens = []
Expand Down

0 comments on commit 038641b

Please sign in to comment.