New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dateutil kwargs to csv2rec #1210
Conversation
Fixes ambiguous date problems between US/UK date conventions.
@@ -2090,7 +2090,7 @@ def extract(r): | |||
|
|||
def csv2rec(fname, comments='#', skiprows=0, checkrows=0, delimiter=',', | |||
converterd=None, names=None, missing='', missingd=None, | |||
use_mrecords=False): | |||
use_mrecords=False, dayfirst=False, yearfirst=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just use the builtin datetime for this? (showing my naivety here). If a user specifies a string format as a kwarg, use that with strftime. If not, let dateparser
do its (frightening) magic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, thinking about it, that wouldn't fix current scripts...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hopeless; there is nothing reasonably simple you can do to fix existing usage of this function when applied to ambiguous dates. All we can do is make it easier for users to write new code, or fix old code, so it doesn't blow up in their faces. Dateparser is badly designed for our purposes, but for the time being we are stuck with it. Some method of feeding it the kwargs seems like an essential improvement.
This may or may not be helpful. |
@dmcdougall - where do you want to go with this PR? For this to get merged a bit of documentation would be needed, but not much else. A much more extreme option would be to start down the road of deprecating these routines and reducing the amount of non visualisation code that sits in the mpl codebase - if I ever need to read a CSV file matplotlib would not be the first place I looked. I suspect this option wouldn't be very popular though... 😉 |
@pelson I think the kwargs option, as @efiring pointed out some time ago, is the way to go. As far as removing CSV support is concerned, I think that is a matter needing discussion and consensus on the mailing list. I'll push a rebase and doc update for this. P.S. Thanks for peeling the scabs off these old wounds. They have been sitting for too long and have become stale. |
@pelson Done. |
Hello, I've been looking at csv2rec for a bit, trying to understand what looks like a bug in the code. If a test.csv file contains datetime type entries (ie, file contains "11/01/14 12:00:01 AM" only), and you attempt to run the following script:
...then the datetime object echoed misinterprets the datetime by swapping the day and the month. The problem is that there are inconsistent handlings of formats in the converter candidate dictionary in mlab.csv2rec. If the time entry is a simple date ("11/01/14"), then the dates will be echoed by the print statement correctly. If the column in the .csv file contains a datetime object (ie date and time, for instance "11/01/14 12:00:01 AM"), then the datetime data will be interpreted as though dayfirst and yearfirst have NOT been specified, and so will NOT be echoed correctly. This happens because the mydate converter (which uses dayfirst and yearfirst) returns an error if d.hour or d.minute or d.second is greater than zero, but the alternative mydateparser converter, which does not return an error, does not interpret dayfirst or yearfirst, so assumes monthfirst. So then datetime strings in the .csv file are not interpreted correctly if they are other than month first format, and are a datetime string with hour or minute or second other than zero. In other words, mydateparser is NOT consistent with mydate. Mydateparser should reflect the approach used for mydate and also refer to the dayfirst and yearfirst arguments so that datetime elements of the .csv file column are interpreted in a way that matches dayfirst and/or yearfirst arguments in the call, if specified as True. A fix that works in my own C:\Python33\Lib\site-packages\matplotlib\mlab.py file is to replace this line:
with this block:
This problem is also present in Python 3.4, and probably exists in other releases as well. This appears to be a design bug. I think it is a dangerous bug because it fails in a way that is inconsistent with other aspects of function behaviour, and can silently leave the user with invalid data despite the correct format having been specified. The proposed fix shouldn't break any code that uses the workaround of explicitly specifying a converterd in the csv2rec arguments (which also works - included as commented out line in script code above). It would be good to get this bug fixed properly, so let me know if I need to do anything further to achieve this. I'm new to this bug reporting system, so apologies if I've done anything unconventional or improper - please just point me in the right direction! |
@ultra-andy Can you please create a new issue or pr. Comments on a almost 3 year old pull request that have already been merged are almost sure to be lost. |
Thanks Jen, have created new issue #6184 - I went with creating a new issue as I'm new on this and would prefer not to do a pull request myself until I understand how the system works. |
Fixes ambiguous date problems between US/UK date conventions.
My attempt at resolving #208.
I'm not sure this is the best way to resolve this issue. Another option is to keep the
dateutil
stuff separate from the specificcsv2rec
kwargs by having only onedateutil_properties
kwarg, say. This kwarg would take a dictionary of keyword arguments that are passed directly to the dateutil parser. The functionality would be the same, with the only difference being an aesthetic one.Another method is suggested by @efiring in the comments in #208.
Edit: @efiring's suggestion is to have one kwarg specifying the exact date in strptime format, rather than having
dateutil
do it.