You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't think any number would work in my implementation but I can increase it to any finite number you want (say 2 or 3 punctuation marks). I think one is fine; more than two is probably overkill.
Edit: Maybe allow 3 or 4 due to the use of "...", although I'm not sure how often people use those after dates. I could see people using a date like this: He said, "The date is 1/2/13." So maybe increasing the constraint is actually a good idea, and I can increase it infinitely following the date, just not preceding it.
I tend to agree, but the only thing that concerns me is that this worked pre 0.15.0 (I chose 0.13.0 for example):
venv ❯ python3
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license"for more information.
>>> import arrow
>>> arrow.__version__
'0.13.0'
>>> arrow.get("This date has too many punctuation marks following it 11.11.2011", "DD.MM.YYYY")
<Arrow [2011-11-11T00:00:00+00:00]>
>>> arrow.get("This date has too many punctuation marks following it (11.11.2011)", "DD.MM.YYYY")
<Arrow [2011-11-11T00:00:00+00:00]>
>>> arrow.get("This date has too many punctuation marks following it (11.11.2011).", "DD.MM.YYYY")
<Arrow [2011-11-11T00:00:00+00:00]>
This is definitely an improvement, but for full pre-0.15.0 behavior while still containing improvements, we probably need to add support for any number of punctuation marks. Curious, why would finite numbers work but not infinite (e.g. with the + quantifier in regex)?
jadchaar
changed the title
Handling of punctuation in string parsing
Improve string parsing of nautral language string with punctuation
Dec 21, 2019
jadchaar
changed the title
Improve string parsing of nautral language string with punctuation
Improve parsing of nautral language string with punctuation
Dec 21, 2019
We definitely need to figure out a way to make the regex simpler and more general. It would be nice to allow for n number of punctuation marks rather than hardcoding an amount.
A starting word boundary of (?<![\S]) and an ending word boundary of (?![\w]) could be a possibility.
I don't think any number would work in my implementation but I can increase it to any finite number you want (say 2 or 3 punctuation marks). I think one is fine; more than two is probably overkill.
Edit: Maybe allow 3 or 4 due to the use of "...", although I'm not sure how often people use those after dates. I could see people using a date like this: He said, "The date is 1/2/13." So maybe increasing the constraint is actually a good idea, and I can increase it infinitely following the date, just not preceding it.
Originally posted by @andrewchouman in #720
===========================================================
I tend to agree, but the only thing that concerns me is that this worked pre 0.15.0 (I chose 0.13.0 for example):
This is definitely an improvement, but for full pre-0.15.0 behavior while still containing improvements, we probably need to add support for any number of punctuation marks. Curious, why would finite numbers work but not infinite (e.g. with the
+
quantifier in regex)?Originally posted by @jadchaar in #720
The text was updated successfully, but these errors were encountered: