-
Notifications
You must be signed in to change notification settings - Fork 20
[FEATURE] Add text format segmentation #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The checklist is not completed. |
Codecov Report
@@ Coverage Diff @@
## master #30 +/- ##
=======================================
Coverage 99.85% 99.85%
=======================================
Files 46 46
Lines 1338 1342 +4
=======================================
+ Hits 1336 1340 +4
Misses 2 2
Continue to review full report at Codecov.
|
tswsxk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some examples in sif4sci in EduNLP/SIF/sif.py and seg in EduNLP/SIF/segment/segment.py, which will be helpful for users to know to deal with \textbf
|
In addition, checklist should be completed |
|
Checklist should be completed |
- add text format examples - fix the bug that new added text_segment may be type of string, rather than TextSegment
EduNLP/SIF/segment/segment.py
Outdated
| def append(self, segment) -> None: | ||
| if isinstance(segment, TextSegment): | ||
| self._text_segments.append(len(self)) | ||
| if len(self._text_segments) != 0 and self._text_segments[-1] == len(self) - 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why modify these lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think textf{} can be simply seen as a TextSegment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you said,
"hello world, I am \textf{Robot, b}" should be aggregate into one text_segments as ['hello world, I am Robot']
Without the if branch, it will be divided into ['hello world, I am', 'Robot']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, why not first judge whether the text conatins textf and process it with regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it is a good way to process it in two steps for you break the original code logic without enough test cases which in fact has already failed in the test stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, let me give it another shot
|
In addition, your modification does not pass the test, merge is blocked |
|
oops, didn't know it can be tested in the local. QAQ |
|
I made some changes in sif.py/is_sif, in order to avoid Chinese character warning when parsing in |
|
Test is not passed, please first pass the test before you make a PR. |
tswsxk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think these changes disobey our process logic.
|
First, let us make the functions of three steps clearly: 1. is_sif: only judge whether the item follows the sif protocol; 2. to_sif: only convert the non-sif item into sif protocol; 3. sif4sci: conduct syntax analysis on the item in sif protocol. Thus, I think your changes have broken the code functionalities, which is unacceptable. Please only modify the codes in segment.py where the sentence contains |
|
If is_sif raises some warnings, contact @karin0018 for modification. |
|
Run |
Delete a blank line which results in error
EduNLP/SIF/segment/segment.py
Outdated
| self._tag_segments = [] | ||
| self._sep_segments = [] | ||
| segments = re.split(r"(\$.+?\$)", item) | ||
| item_detextf = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variable name is not intuitive enough, use full name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, a short but clear annotation is encouraged to be placed here
Rename variable and add annotation for removing `$\textf{}$`
EduNLP/SIF/segment/segment.py
Outdated
| segments = re.split(r"(\$.+?\$)", item) | ||
| remove_textf_item = '' | ||
| remove_textf_segments = re.split(r"\$\\textf\{([^,]+?),b?d?i?t?u?w?}\$", item) | ||
| # 按照$\textf{}$切割,$\textf{}$段仅捕获文本内容 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use English
EduNLP/SIF/segment/segment.py
Outdated
| self._tag_segments = [] | ||
| self._sep_segments = [] | ||
| segments = re.split(r"(\$.+?\$)", item) | ||
| remove_textf_item = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe item_no_textf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will handle this
Thanks for sending a pull request!
Please make sure you click the link above to view the contribution guidelines,
then fill out the blanks below.
Description
(Brief description on what this PR is about)
What does this implement/fix? Explain your changes.
Add corresponding codes of Text Format in EduNLP/SIF/parser and EduNLP/SIF/segment, test passed.
Pull request type
Changes
Does this close any currently open issues?
N/A
Any relevant logs, error output, etc?
N/A
Checklist
Before you submit a pull request, please make sure you have to following:
Essentials
Comments