Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv2rec imports dates incorrectly and has no option #208

Closed
ddale opened this issue Jun 20, 2011 · 5 comments
Closed

csv2rec imports dates incorrectly and has no option #208

ddale opened this issue Jun 20, 2011 · 5 comments

Comments

@ddale
Copy link
Contributor

ddale commented Jun 20, 2011

Original report at SourceForge, opened Mon Jun 6 05:36:16 2011

when using the csv2rec, you have no option to specify the date format previous to importing, which means ambiguous dates tend to get wrong according to the format.

For example, in a csv file with the date

12-01-1998

The default for mlib is to import in the MM-DD-YY format - which result in december, 1st, 1998.
However, the next line,

13-01-1998

dateutil correctly identifies as unambiguous and parses it as jan 13th, 1998.

The library dateutil that mlib uses has ways to specify date precedence, as seen here: http://labix.org/python-dateutil#head-b95ce2094d189a89f80f5ae52a05b4ab7b41af47

But there's no way to specify date precedence in mlib.

@surak
Copy link

surak commented Jun 27, 2011

I'm the original poster of this issue at sourceforge. Perhaps an option to set the "dayfirst" and "yearfirst" parameters of the datetime library without touching the current interface would be fine.

@mbewley
Copy link

mbewley commented Sep 6, 2012

Hi, I've just been hit by this bug, and it's actually worse than described.
If you have a list of dates in non-USA format (i.e. DD/MM/YYYY), it does something horrible. All the dates that will be valid (the first 12 days of every month) are parsed as US format. For days 12+ from every month, it will interpret them correctly as non-US. I agree with surak that there should be an easy option (using converterd is a bit clunky for every column), but at very least, the default behaviour should be to interpret the entire column of data as the same date format (i.e. ALL DD/MM or ALL MM/DD).
The impact is not immediately obvious, if you have several years worth of data, it effectively adds some noise to your dates (by inverting day and month for about 1/3 of them). This makes it a particularly easy one to let slip by!

@dmcdougall
Copy link
Member

What would be better, in my opinion, is for csv2rec to assume US dates by default and if any of them are 'invalid', then convert them all to UK dates. How does that sound?

@efiring
Copy link
Member

efiring commented Sep 6, 2012

csv2rec is built around using dateutil.parser for handling dates, and that in turn is designed to be dangerous; if a date is invalid for one convention, another convention is tried, until something fits, or it fails completely. It doesn't tell you what convention it ended up using, it just returns the result. The simplest way to improve csv2rec would be to let it pass precedence kwargs to dateutil.parser. I think the only way to go beyond that would be to allow one to specify that only a single convention, given as a strptime format, would be tried, thereby allowing one to bypass all the dateutil flexibility.

@dmcdougall
Copy link
Member

Closing; #1210 was merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants