New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved date parsing #122

Merged
merged 6 commits into from Nov 17, 2014

Conversation

Projects
None yet
3 participants
@alicebartlett
Contributor

alicebartlett commented Nov 14, 2014

Trello ticket: https://trello.com/c/DoeSThvG/401-finders-increase-the-range-of-dates-that-we-support-3

Opening this up for comment @james and I did a bit of pairing on this.

Our user research and a look at analytics showed us that we weren't being as flexible with dates as our users expected, so we've written a test suite of all the dates we'd like to accept and have switched to using the Chronic date parsing gem and rewritten the date parser to accept the most commonly input dates.

Thoughts welcome, I'm interested on any feedback on the clarity of the code especially, as there is quite a lot of date fiddling going on to support a few additional formats for which chronic doesn't work.

Ideally after this is merged here I'd like to do the same thing for the code supporting gov.uk/publications as I think this is better than what's currently there.

Alice Bartlett added some commits Nov 14, 2014

Alice Bartlett
Test suite for new date parser
Bin all the old tests, a new approach for this is that we have a whole load of dates that we have seen users entering (through analytics) and we want to support the most common of these.

We have a bit of meta-programming to run through all of these dates. Meta-programming is a little complex (for me, at least), but does allow us to have a test for each date without having to actually write a test for each date. Having one test per date means we have one failing test per failing date, so it's easier to assess the health of our code as we can say "7 out of 10 tests are failing" rather than "our test is failing for an unknown number of dates because it fails for one date and then doesn't run the rest"

The second test stubs takes a date with unspecified year and makes sure the date we assume is this year by stubbing Time.
Alice Bartlett
Add Chronic gem
We're going to use Chronic to handle dates, simply because a few more tests in the test suite pass with chronic than with Date.parse and we're trying to write as little date parsing as possible.
Alice Bartlett
Re-write date parsing
User research showed us that some users were using a lot broader date types than we were supporting (specifically, just entering a year, no month, no day) We looked into this a little deeper and found there were a couple of date formats we weren't supporting that people were entering.

Now we have a full test suite for all the dates we want to accept, we can add the code required to make it pass. Date parsing is largely handled well by the Chronic gem, with three notable exceptions:

Chronic parses single years ie `2008` as a time: `20:08`. We never expect users to enter a time into a date field on a finder so we are going to assume that four digits is always a year and so convert it to 01/01/yyyy

Chronic doesn't handle eight digits without delimiters. Since eight digits is an unambiguous date so long as you're assuming ddmmyyyy order we can guess at this by converting dddddddd into dd/dd/dddd. The 'so long as you're assuming' here is based on the analytics for this page which show us that a small number of users have put in 8 digits in a row which can be interpreted as ddmmyyyy. None have done yyyymmdd.

Finally, chronic doesn't handle spaces or full stops to delimit dates so when we find numeric dates seperated by these we convert them to /'s

Then pass the date to Chronic, and rescue nil, which is what is returned if the date is invalid.
parsed_date.month.should == 9
parsed_date.day.should == 22
}
# These dates have been chosen based on analytics from site search more info here: https://designpatterns.hackpad.com/Dates-vpx6XlVjIbE

This comment has been minimized.

@edds

edds Nov 17, 2014

Contributor

👍 Really nice to see this checking that this approach works as you expect with real data.

@edds

edds Nov 17, 2014

Contributor

👍 Really nice to see this checking that this approach works as you expect with real data.

@edds

This comment has been minimized.

Show comment
Hide comment
@edds

edds Nov 17, 2014

Contributor

My only thought on this is does this want to strip trailing whitespace on the entered data before you start running your regular expressions on this so it doesn't suffer from the same problem described in this blog post on the GDS data blog (this might already be happening elsewhere and I might just not know about it).

Contributor

edds commented Nov 17, 2014

My only thought on this is does this want to strip trailing whitespace on the entered data before you start running your regular expressions on this so it doesn't suffer from the same problem described in this blog post on the GDS data blog (this might already be happening elsewhere and I might just not know about it).

Alice Bartlett and others added some commits Nov 17, 2014

@alicebartlett

This comment has been minimized.

Show comment
Hide comment
@alicebartlett

alicebartlett Nov 17, 2014

Contributor

Good point @edds! I have written some additional tests and added ruby's .trim to the parser to remove whitespace.

Contributor

alicebartlett commented Nov 17, 2014

Good point @edds! I have written some additional tests and added ruby's .trim to the parser to remove whitespace.

@alicebartlett alicebartlett changed the title from [DO NOT MERGE] Date parsing to Improved date parsing Nov 17, 2014

@tommyp

This comment has been minimized.

Show comment
Hide comment
@tommyp

tommyp Nov 17, 2014

Contributor

I like this PR. 👍

Contributor

tommyp commented Nov 17, 2014

I like this PR. 👍

tommyp added a commit that referenced this pull request Nov 17, 2014

@tommyp tommyp merged commit b3b2049 into master Nov 17, 2014

1 check passed

default Build #473 succeeded on Jenkins
Details

@tommyp tommyp deleted the date-parsing branch Nov 17, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment