New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In io.ascii, fall back to string if integers are too large #2234
Conversation
Or maybe use Python longs stored in an object array. This would keep operations on these columns numerical rather than string-based (though obviously they'd still be far slower to work with than "native" ints). |
I agree--an object array could be used for that column. Not ideal, but better than strings. |
There is a really simple fix that leads to an alternative behavior. Essentially what's happening is that the code below which tries various conversion options is missing
Adding
To me this seems like a better alternative than string or objects. What do you think? Doing objects is a bit scary to me because I'm not sure what else will break. I'll try to post on stackoverflow later, but there is also an option on a 32-bit machine to explicitly specify the converter as |
@taldcroft - converting to a float sounds good to me. |
Code attached. |
@taldcroft - this needs rebasing for some reason |
Speaking as a regular Python user (not an astropy user, and not even a NumPy user): I would be extremely wary of automatic conversion to |
Maybe a compromise is that this should emit a warning? |
I was going to say the same thing as @jkyeung. Putting arbitrary precision Python longs in object arrays, while slower, doesn't introduce any data loss, which I think is more important. Maybe we can provide a flag to select the desired behavior here? Of course, the user can always convert from "object array of long ints" to floats if necessary after the fact, but the inverse is not true without data loss. |
Thinking about this more, I think that it's highly unlikely whoever stored >64-bit integers stored them as actual numbers, and much more likely they were intended as some kind of ID. In that case, I would argue that it would make more sense to use an array of strings than an object array of long ints. One can still easily convert an array of strings to an array of normal ints or long ints. This would also be in line with the idea that if a value can't be parsed, it stays as a string. Object arrays have the potential to confuse people, and will be less efficient than string arrays. |
OK, I'm persuaded that float is no good. I'm on the fence about object vs. string. I'll say that in the past I've been burned by having a numpy array that looked like a normal int array but was actually an object dtype. I can't remember the specific problem, but it was one of those subtle issues that took a while to understand. Users with more limited knowledge might have a difficult time with that. If you get a string instead then it's immediately obvious what's going on. As a slight technical issue, at the point of the code in question, what you actually have is a list of strings. The only thing you know is that doing |
I forgot to say that I do think there are cases where the big int was really intended as an int and getting a string will be initially surprising / annoying. In any case a warning can be emitted which should help out. |
Thinking about this even more, the conversion function can also be user-specified, so you don't even know that |
I've reworked this to fall through to strings and emit a warning. What do y'all think? |
This works for me! Just a couple of comments:
EDIT: ignore the second comment - at the moment the code would crash, so it's not like it's working at all. This can go in 0.3.2. |
@astrofrog - you can see the resolution of this in 0bfbd1f. I think this is sufficiently in the corner-case realm to leave as a known issue. Eventually numpy 1.5 support will go away. |
Sounds good! I've restarted Travis - feel free to merge once it passes. |
OK there is one unrelated Travis failure (SAMP), so I'm merging. |
In io.ascii, fall back to string if integers are too large
In io.ascii, fall back to string if integers are too large Conflicts: CHANGES.rst astropy/io/ascii/core.py
If I try and read the following file:
using
Table
, I get:Maybe when the integers are too large, strings should be used instead? This is a simplified version from the issue described in http://stackoverflow.com/questions/22617428/overflowerror-python-int-too-large-to-convert-to-c-long-with-astropy-table.