Skip to content

Conversation

@bact
Copy link
Member

@bact bact commented Sep 21, 2019

No description provided.

bact and others added 30 commits May 10, 2019 07:45
Merge from 2.0.5 release
1.  Move all functions related to text processing to the new file
`ulmfit/rules.py`

2. For the rules imported from `fastai` library, we copied the code
to the pythainlp library

3. Use the code of BaseTokenizer class from `fastai` librarry
Function ungroup_emoji has a Cognitive Complexity of 7
(exceeds 5 allowed). Consider refactoring.
changes in this commit:

- assert the output of the method `get_ner`
   when argument `tag` is set to True
   for all 13 tags as described in
   (https://github.com/wannaphongcom/thai-ner/tree/master/model/1.2)

- assert the output of the method `get_ner`
   when argument `pos` is set to True

- asser the output of the method `get_ner`
   when argument `pos` is set to False
As thainer version 1.2 provide different results from previous version
- Add new function `process_thai` as the function of `ulmfit` module.
This function process Thai texts
(such as replace repetitive character/words).
More detail will be explained
in the release note.
Change in this commit:

- add new test case: test_remove_space
- add new test case: test_replace_wrep_post_nonum
- add new test case: test_replace_wrep_post
- add new test case: test_replace_rep_nonum
- add new test case: test_replace_rep_after
- remove a test case: test_replace_all_caps (as the function `replace_all_caps` was removed from /pythianlp/ulmfit/__init__.py)
@pep8speaks
Copy link

Hello @bact! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 31:80: E501 line too long (194 > 79 characters)
Line 32:80: E501 line too long (182 > 79 characters)
Line 33:80: E501 line too long (174 > 79 characters)
Line 41:80: E501 line too long (83 > 79 characters)

Line 13:1: E302 expected 2 blank lines, found 1
Line 13:80: E501 line too long (97 > 79 characters)
Line 14:72: E225 missing whitespace around operator
Line 14:80: E501 line too long (86 > 79 characters)
Line 16:80: E501 line too long (87 > 79 characters)
Line 17:80: E501 line too long (96 > 79 characters)
Line 23:80: E501 line too long (85 > 79 characters)
Line 29:80: E501 line too long (85 > 79 characters)
Line 30:28: E231 missing whitespace after ','
Line 34:1: E302 expected 2 blank lines, found 1
Line 34:80: E501 line too long (96 > 79 characters)
Line 34:97: W291 trailing whitespace
Line 35:48: E225 missing whitespace around operator
Line 37:80: E501 line too long (87 > 79 characters)
Line 38:80: E501 line too long (112 > 79 characters)
Line 44:80: E501 line too long (85 > 79 characters)
Line 45:80: E501 line too long (106 > 79 characters)
Line 50:23: E225 missing whitespace around operator
Line 51:31: E231 missing whitespace after ','
Line 51:40: E231 missing whitespace after ','
Line 51:50: E231 missing whitespace after ','
Line 51:58: E231 missing whitespace after ','
Line 51:80: E501 line too long (81 > 79 characters)
Line 53:29: E231 missing whitespace after ','
Line 53:39: E231 missing whitespace after ','
Line 53:47: E231 missing whitespace after ','
Line 53:55: E231 missing whitespace after ','
Line 57:1: E302 expected 2 blank lines, found 1
Line 57:80: E501 line too long (110 > 79 characters)
Line 57:90: E231 missing whitespace after ','
Line 57:95: E252 missing whitespace around parameter equals
Line 57:96: E252 missing whitespace around parameter equals
Line 57:98: E231 missing whitespace after ','
Line 57:103: E225 missing whitespace around operator
Line 59:80: E501 line too long (87 > 79 characters)
Line 69:26: E225 missing whitespace around operator
Line 69:43: E227 missing whitespace around bitwise or shift operator
Line 69:53: E225 missing whitespace around operator
Line 78:65: E231 missing whitespace after ','
Line 84:80: E501 line too long (86 > 79 characters)

Line 97:1: W293 blank line contains whitespace

Line 40:80: E501 line too long (80 > 79 characters)
Line 66:80: E501 line too long (83 > 79 characters)
Line 435:1: W293 blank line contains whitespace
Line 435:1: W391 blank line at end of file

Line 14:1: W293 blank line contains whitespace

Line 27:1: W391 blank line at end of file

Line 30:9: E225 missing whitespace around operator
Line 40:1: E302 expected 2 blank lines, found 1
Line 54:8: W291 trailing whitespace
Line 55:80: E501 line too long (2015 > 79 characters)
Line 386:80: E501 line too long (83 > 79 characters)

Line 182:5: E303 too many blank lines (2)

@coveralls
Copy link

Coverage Status

Coverage increased (+2.9%) to 90.069% when pulling b9025aa on dev into d61bfa3 on 2.1.

@bact bact merged commit 984f3f7 into 2.1 Sep 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants