Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem finding dates with EXTRA_TOKENS_PATTERNS words in sentence #14

Closed
AngelaO opened this issue Feb 24, 2016 · 4 comments
Closed

Problem finding dates with EXTRA_TOKENS_PATTERNS words in sentence #14

AngelaO opened this issue Feb 24, 2016 · 4 comments

Comments

@AngelaO
Copy link

AngelaO commented Feb 24, 2016

Hi,
Thanks for writing this module, I've been playing around with it and have found that there seems to be an issue finding dates when words like "to", "by" and "until" are in the string. I notice these words are included in EXTRA_TOKENS_PATTERNS in datefinder.py but I'm not really familiar with dateutil module so not sure why this should cause an issue. Below is some output showing some examples where dates aren't identified and how swapping the word "to" for the word "so" means dates are correctly identified:

>>> chk = "i am looking for a date june 4th 1996 to july 3rd 2013"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[]
>>> chk = "i am looking for a date june 4th 1996 so july 3rd 2013"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[datetime.datetime(1996, 6, 4, 0, 0), datetime.datetime(2013, 7, 3, 0, 0)]
>>> chk = "october 27 1994 to be put into effect on june 1 1995"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[datetime.datetime(1995, 6, 1, 0, 0)]
>>> chk = "october 27 1994 so be put into effect on june 1 1995"
>>> matches = datefinder.find_dates(chk)
>>> matchlist = list(matches)
>>> matchlist
[datetime.datetime(1994, 10, 27, 0, 0), datetime.datetime(1995, 6, 1, 0, 0)]
@ranchodeluxe
Copy link
Collaborator

@AngelaO: thanks for your feedback! I'll look into this later this morning.

Did you install datefinder through pip? Or did you clone the github repo?

ranchodeluxe added a commit to ranchodeluxe/datefinder that referenced this issue Feb 24, 2016
1. handle dateutil.parser.parse ValueError
2. handle REPLACEMENT tokens such as 'to'
ranchodeluxe added a commit to ranchodeluxe/datefinder that referenced this issue Feb 24, 2016
dateutil.parser.parse throws ValueError for crud date strings, handle it
@ranchodeluxe
Copy link
Collaborator

@AngelaO: your wonderful feedback has been very fruitful, thank you.

It's led to the tickets #15, #16, #17 and most importantly #18 ( which your first example highlights ). I think I've got a fix for #16 and #17 ( your third example ).

more to come soon!

@AngelaO
Copy link
Author

AngelaO commented Feb 25, 2016

@thebigspoon no worries, I installed through pip, have been removing the extra_tokens_pattern words from my strings for now as I'm not interested in date ranges, just strict dates but will check back for the fixes - thank you!

@akoumjian
Copy link
Owner

All these issues have now been addressed!

In [17]: chk = "i am looking for a date june 4th 1996 to july 3rd 2013"

In [18]: print(next(datefinder.find_dates(chk, source=True)))
(datetime.datetime(1996, 6, 4, 0, 0), 'date june 4th 1996')

In [19]: chk = "i am looking for a date june 4th 1996 so july 3rd 2013"

In [20]: print(list(datefinder.find_dates(chk, source=True)))
[(datetime.datetime(1996, 6, 4, 0, 0), 'date june 4th 1996'), (datetime.datetime(2013, 7, 3, 0, 0), 'july 3rd 2013')]

In [21]: chk = "october 27 1994 to be put into effect on june 1 1995"

In [22]: print(list(datefinder.find_dates(chk, source=True)))
[(datetime.datetime(1994, 10, 27, 0, 0), 'october 27 1994'), (datetime.datetime(1995, 6, 1, 0, 0), 'on june 1 1995')]

In [23]: chk = "october 27 1994 so be put into effect on june 1 1995"

In [24]: print(list(datefinder.find_dates(chk, source=True)))
[(datetime.datetime(1994, 10, 27, 0, 0), 'october 27 1994'), (datetime.datetime(1995, 6, 1, 0, 0), 'on june 1 1995')]

In [25]: chk = 'june 5th 2012 to january 1st 2014'

In [26]: print(list(datefinder.find_dates(chk, source=True)))
[(datetime.datetime(2012, 6, 5, 0, 0), 'june 5th 2012'), (datetime.datetime(2014, 1, 1, 0, 0), 'january 1st 2014')]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants