Skip to content


Subversion checkout URL

You can clone with
Download ZIP


In io.ascii, fall back to string if integers are too large #2234

merged 7 commits into from

5 participants


If I try and read the following file:

a                             b
12121312311248721894712984728 1122

using Table, I get:

In [5]: t ='data', format='ascii')
ERROR: OverflowError: Python int too large to convert to C long []
OverflowError                             Traceback (most recent call last)
<ipython-input-5-bcb96ef3f221> in <module>()
----> 1 t ='data', format='ascii')

/Volumes/Raptor/Library/Python/3.3/lib/python/site-packages/astropy-0.4.dev7437-py3.3-macosx-10.8-x86_64.egg/astropy/table/ in read(cls, *args, **kwargs)
   1725         passed through to the underlying data reader (e.g. ``).
   1726         """
-> 1727         return, *args, **kwargs)
   1729     def write(self, *args, **kwargs):

/Volumes/Raptor/Library/Python/3.3/lib/python/site-packages/astropy-0.4.dev7437-py3.3-macosx-10.8-x86_64.egg/astropy/io/ in read(cls, *args, **kwargs)
    318         reader = get_reader(format, cls)
--> 319         table = reader(*args, **kwargs)
    321         if not isinstance(table, cls):

/Volumes/Raptor/Library/Python/3.3/lib/python/site-packages/astropy-0.4.dev7437-py3.3-macosx-10.8-x86_64.egg/astropy/io/ascii/ in read_asciitable(filename, **kwargs)
     19 def read_asciitable(filename, **kwargs):
     20     from .ui import read
---> 21     return read(filename, **kwargs)
     23 io_registry.register_reader('ascii', Table, read_asciitable)

/Volumes/Raptor/Library/Python/3.3/lib/python/site-packages/astropy-0.4.dev7437-py3.3-macosx-10.8-x86_64.egg/astropy/io/ascii/ in read(table, guess, **kwargs)
    154         guess = _GUESS
    155     if guess:
--> 156         dat = _guess(table, new_kwargs)
    157     else:
    158         reader = get_reader(**new_kwargs)

/Volumes/Raptor/Library/Python/3.3/lib/python/site-packages/astropy-0.4.dev7437-py3.3-macosx-10.8-x86_64.egg/astropy/io/ascii/ in _guess(table, read_kwargs)
    197             reader = get_reader(**guess_kwargs)
--> 198             dat =
    200             # When guessing require at least two columns

/Volumes/Raptor/Library/Python/3.3/lib/python/site-packages/astropy-0.4.dev7437-py3.3-macosx-10.8-x86_64.egg/astropy/io/ascii/ in read(self, table)
--> 853         table = self.outputter(cols, self.meta)
    854         self.cols = self.header.cols

/Volumes/Raptor/Library/Python/3.3/lib/python/site-packages/astropy-0.4.dev7437-py3.3-macosx-10.8-x86_64.egg/astropy/io/ascii/ in __call__(self, cols, meta)
    650     def __call__(self, cols, meta):
--> 651         self._convert_vals(cols)
    653         # If there are any values that were filled and tagged with a mask bit then this

/Volumes/Raptor/Library/Python/3.3/lib/python/site-packages/astropy-0.4.dev7437-py3.3-macosx-10.8-x86_64.egg/astropy/io/ascii/ in _convert_vals(self, cols)
    631                     if not issubclass(converter_type, col.type):
    632                         raise TypeError()
--> 633            = converter_func(col.str_vals)
    634                     col.type = converter_type
    635                 except (TypeError, ValueError):

/Volumes/Raptor/Library/Python/3.3/lib/python/site-packages/astropy-0.4.dev7437-py3.3-macosx-10.8-x86_64.egg/astropy/io/ascii/ in converter(vals)
    591     def converter(vals):
--> 592         return numpy.array(vals, numpy_type)
    593     return converter, converter_type

OverflowError: Python int too large to convert to C long

Maybe when the integers are too large, strings should be used instead? This is a simplified version from the issue described in

@astrofrog astrofrog added this to the v0.3.2 milestone

Or maybe use Python longs stored in an object array. This would keep operations on these columns numerical rather than string-based (though obviously they'd still be far slower to work with than "native" ints).


I agree--an object array could be used for that column. Not ideal, but better than strings.


There is a really simple fix that leads to an alternative behavior. Essentially what's happening is that the code below which tries various conversion options is missing OverflowError as a possible exception. If I put that in then it fails to do the int conversion if the native int size is too small and instead falls through to float.

/Volumes/Raptor/Library/Python/3.3/lib/python/site-packages/astropy-0.4.dev7437-py3.3-macosx-10.8-x86_64.egg/astropy/io/ascii/ in _convert_vals(self, cols)
    631                     if not issubclass(converter_type, col.type):
    632                         raise TypeError()
--> 633            = converter_func(col.str_vals)
    634                     col.type = converter_type
    635                 except (TypeError, ValueError):

Adding OverflowError above yields:

In [4]: print['a b', '12121312311248721894712984728 1122'], format='ascii')
        a          b  
----------------- ----
1.21213123112e+28 1122

To me this seems like a better alternative than string or objects. What do you think? Doing objects is a bit scary to me because I'm not sure what else will break.

I'll try to post on stackoverflow later, but there is also an option on a 32-bit machine to explicitly specify the converter as int64, or if supported on a particular platform, int128.


@taldcroft - converting to a float sounds good to me.


Code attached.


@taldcroft - this needs rebasing for some reason


Speaking as a regular Python user (not an astropy user, and not even a NumPy user): I would be extremely wary of automatic conversion to float. It could be that incoming data was very carefully and specifically crafted to be integer. Silently introducing potential loss of precision doesn't strike me as the right thing to do. But I dunno; maybe astropy's domain is such that float is always a reasonable choice.


Maybe a compromise is that this should emit a warning?


I was going to say the same thing as @jkyeung. Putting arbitrary precision Python longs in object arrays, while slower, doesn't introduce any data loss, which I think is more important. Maybe we can provide a flag to select the desired behavior here? Of course, the user can always convert from "object array of long ints" to floats if necessary after the fact, but the inverse is not true without data loss.


Thinking about this more, I think that it's highly unlikely whoever stored >64-bit integers stored them as actual numbers, and much more likely they were intended as some kind of ID. In that case, I would argue that it would make more sense to use an array of strings than an object array of long ints. One can still easily convert an array of strings to an array of normal ints or long ints. This would also be in line with the idea that if a value can't be parsed, it stays as a string. Object arrays have the potential to confuse people, and will be less efficient than string arrays.


OK, I'm persuaded that float is no good.

I'm on the fence about object vs. string. I'll say that in the past I've been burned by having a numpy array that looked like a normal int array but was actually an object dtype. I can't remember the specific problem, but it was one of those subtle issues that took a while to understand. Users with more limited knowledge might have a difficult time with that.

If you get a string instead then it's immediately obvious what's going on.

As a slight technical issue, at the point of the code in question, what you actually have is a list of strings. The only thing you know is that doing np.array(str_vals, gave an overflow error. So there is no list of Python ints at that point, but I suppose it could be created.


I forgot to say that I do think there are cases where the big int was really intended as an int and getting a string will be initially surprising / annoying.

In any case a warning can be emitted which should help out.


Thinking about this even more, the conversion function can also be user-specified, so you don't even know that int is the intended target. From that perspective the only universally applicable response is to leave the values as strings and emit a warning like "Warning: overflow error occurred during type conversion for column , leaving as string".


I've reworked this to fall through to strings and emit a warning. What do y'all think?


This works for me! Just a couple of comments:

  • It looks like the Travis failure is genuine
  • Should this be 0.4.0 since technically it changes the API (at least it changes the output type for some table columns)?

EDIT: ignore the second comment - at the moment the code would crash, so it's not like it's working at all. This can go in 0.3.2.


@astrofrog - you can see the resolution of this in 0bfbd1f. I think this is sufficiently in the corner-case realm to leave as a known issue. Eventually numpy 1.5 support will go away.


Sounds good! I've restarted Travis - feel free to merge once it passes.


OK there is one unrelated Travis failure (SAMP), so I'm merging.

@taldcroft taldcroft merged commit 9c5debf into astropy:master

1 check failed

Details default The Travis CI build failed
@taldcroft taldcroft deleted the taldcroft:catch-overflow branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
This page is out of date. Refresh to see the latest.
@@ -194,6 +194,10 @@ Bug Fixes
- ````
+ - When reading a table with values that generate an overflow error
+ during type conversion (e.g. overflowing the native C long type), fall
+ through to using string. Previously this generated an exception [#2234].
- ````
- ````
8 astropy/io/ascii/
@@ -16,10 +16,12 @@
import itertools
import functools
import numpy
+import warnings
from ...extern import six
from ...extern.six.moves import zip
from ...extern.six.moves import cStringIO as StringIO
+from ...utils.exceptions import AstropyWarning
from ...table import Table
from import get_readable_fileobj
@@ -642,6 +644,12 @@ def _convert_vals(self, cols):
col.type = converter_type
except (TypeError, ValueError):
+ except OverflowError:
+ # Overflow during conversion (most likely an int that doesn't fit in native C long).
+ # Put string at the top of the converters list for the next while iteration.
+ warnings.warn("OverflowError converting to {0} for column {1}, using string instead."
+ .format(converter_type.__name__,, AstropyWarning)
+ col.converters.insert(0, convert_numpy(numpy.str))
except IndexError:
raise ValueError('Column %s failed to convert' %
17 astropy/io/ascii/tests/
@@ -11,11 +11,28 @@
from ... import ascii as asciitable # TODO: delete this line, use ascii.*
from ... import ascii
from ....table import Table
+from distutils import version
from .common import (raises, assert_equal, assert_almost_equal,
assert_true, setup_function, teardown_function)
from ....tests.helper import pytest
+_NUMPY_VERSION = version.LooseVersion(np.__version__)
+def test_convert_overflow():
+ """
+ Test reading an extremely large integer, which falls through to
+ string due to an overflow error (#2234).
+ """
+ # Before Numpy 1.6 the exception from np.array(['1' * 10000],
+ # is exactly the same as np.array(['abc'], In this case
+ # it falls through to float, so we just accept this as a known issue for
+ # numpy < 1.6.
+ expected_kind = ('f',) if _NUMPY_VERSION < version.LooseVersion('1.6') else ('S', 'U')
+ dat =['a', '1' * 10000], format='basic', guess=False)
+ assert dat['a'].dtype.kind in expected_kind
def test_guess_with_names_arg():
Make sure reading a table with guess=True gives the expected result when
12 docs/known_issues.rst
@@ -184,3 +184,15 @@ One workaround is to install the ``bsddb3`` module.
.. [#] Continuum `says
this will be fixed in their next Python build.
+Very long integers in ASCII tables silently converted to float for Numpy 1.5
+For Numpy 1.5, when reading an ASCII table that has integers which are too
+large to fit into the native C long int type for the machine, then the
+values get converted to float type with no warning. This is due to the
+behavior of `numpy.array` and cannot easily be worked around. We recommend
+that users upgrade to a newer version of Numpy. For Numpy >= 1.6 a warning
+is printed and the values are treated as strings to preserve all information.
Something went wrong with that request. Please try again.