You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The email pattern seems quite limited. It cannot extract emails in many cases. Simple modification on the example will make the extraction fail: 请发简历至dongyu@163.com
I actually dug a bit deeper and find out your regex pattern is problematic.
I think it is better to change it to a more generalized pattern here is what I found out:
apparently the search can return None. And you did not handle it properly. Through out the code, I found similar problems repetitively. I guess it would reduce lots of similar issues if we put some effort into type checking via type hints and use mypy to check the code before release and merge PR.
描述你遇到了什么问题(Please describe your issue here)
the extract_email performed very badly even worse than simple regex. Is this expected or I did something wrong, may
版本(Version):
python 版本: 3.10
jionlp 版本: 1.5.2
jionlp的调用代码与输入文本(Code & Text):
importjionlpasjiotext='请发简历至dongyu@163.com。'text1='请发简历至dongyu@163.com'print(jio.extract_email(text, detail=True)) # this worksprint(jio.extract_email(text1, detail=True)) # return empty
期望行为(Expectation)
若返回结果不理想,描述你期望发生的事情(Please describe your expectation)
请顺手 star 一下右上角的⭐小星星
The text was updated successfully, but these errors were encountered:
For your example, I have tackled the bug and pushed to the latest commit. The reason is when I wrote the regex, I add # in the begining and ending place of the given text. # is not suitable because it is also approved in the email regex.
Besides, my regex of email can not cover all kinds of cases such as Chinese characters, which are also legal for email regex actually. If you still find some other exceptions, feel free to raise an issue.
提问题时,请尊重我!把必要的信息,什么环境,输入具体什么文本,运行什么函数讲清楚!
不要甩一句话,说的不清不楚,我无从定位,浪费时间。这样的提单我将直接close。
描述(Description)
The email pattern seems quite limited. It cannot extract emails in many cases. Simple modification on the example will make the extraction fail:
请发简历至dongyu@163.com
I actually dug a bit deeper and find out your regex pattern is problematic.
I think it is better to change it to a more generalized pattern here is what I found out:
Also I think the following line is problematic too:
apparently the search can return None. And you did not handle it properly. Through out the code, I found similar problems repetitively. I guess it would reduce lots of similar issues if we put some effort into type checking via type hints and use mypy to check the code before release and merge PR.
期望行为(Expectation)
请顺手 star 一下右上角的⭐小星星
The text was updated successfully, but these errors were encountered: