Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrays have a 'silent' length limit due to Int32 type on size. #4011

Open
shelvacu opened this issue Feb 8, 2017 · 15 comments
Open

Arrays have a 'silent' length limit due to Int32 type on size. #4011

shelvacu opened this issue Feb 8, 2017 · 15 comments

Comments

@shelvacu
Copy link

shelvacu commented Feb 8, 2017

arr = Array(UInt8).new(2147483647,1u8)
arr << 2_u8
arr.size #=> -2147483648

This could be a very surprising behaviour to run across. Ideally an error should be raised or the limit removed entirely (hopefully).

A similar bug exists in Indexable, trying to use each_index also doesn't work if size is greater than 2**31-1 because the index variable (i) defaults to the Int32 type. https://github.com/crystal-lang/crystal/blob/master/src/indexable.cr#L181

Version: Crystal 0.20.5+37 [2b57a98] (2017-02-08)
OS: Arch Linux x86_64

@mverzilli
Copy link

Removing the limit would mean that you could store an arbitrary number of elements in an Array, and still be oblivious to the implications of that. I think if you're writing code that needs to hold more than 2147483647 elements in memory:

  1. You should have a deep understanding of the hardware you're running on.
  2. You are likely to need very particular algorithms to deal with them.

Maybe raising a compilation error would be a more acceptable solution, but then again, if it means that to implement it we need to penalize the compiler's performance, I wouldn't do it. Storing more than 2147483647 elements in a normal Array is a 1 in 10000 case, it doesn't make sense to penalize the other 9999.

@drosehn
Copy link

drosehn commented Feb 8, 2017

I've written programs where 2147483647 was an incredibly huge limit (in the context of the program) when it was written, only to have the program still be in use when that limit became a serious issue. Programs which dealt with disk-sizes, for instance. Once you do hit that limit, maybe years after writing a program, it is very helpful if the program fails in a spectacular way instead of silently doing the wrong thing.

It would be nice if there was a way to ask the compiler to add those checks wherever they were needed (which is to say "for all arithmetic which might cause an overflow"). Then programmers who were paranoid could turn on that checking, and those who needed top performance would leave it off.

I'll also note that many years ago I worked with a group who wrote a systems-programming compiler where there was an option like this. And it turned out that the performance penalty of checking for overflows (*1) was not significant, and we compiled almost everything with the checks turned on. Even in production code which was being used by thousands of users on a single machine.

(*1 - this was true for the kinds of programs we were writing. It might not be true for everyone!)

@jzakiya
Copy link

jzakiya commented Feb 8, 2017

This is one of the exact problems Rust (https://www.rust-lang.org) was created to prevent. I understand Crystal uses the libc, et al, C libraries to build upon. Is this correct? Have you considered using Rust equivalent libraries?

@drosehn
Copy link

drosehn commented Feb 8, 2017

Who, me? I have considered rust. At this point I have no interest in it. I'd also think that the main problem they want to solve is ownership of memory, which did interest me. But the more I looked into the language, the less I liked what they were doing.

There are some aspects of rust which are interesting, but for me personally there is no desire to choose rust over crystal.

@RX14
Copy link
Contributor

RX14 commented Feb 8, 2017

👍 for raising if an array tries to expand through Int32::MAX elements

👎 for increasing the max size of array. If arrays being too small ever emerges as a big problem, we can revisit this.

@RX14
Copy link
Contributor

RX14 commented Feb 8, 2017

@jzakiya @drosehn I don't think rust has anything to do with this. Crystal can implement overflow checking, and there is a separate issue for that you can make your opinion known at.

@shelvacu
Copy link
Author

shelvacu commented Feb 9, 2017

Is there really any significant cost to increasing the max size of the array? As I understand it, it would result in the array type being 8 bytes larger (Int32 -> UInt64 for both @size and @capacity). Would there be any performance detriment?

@asterite
Copy link
Member

asterite commented Feb 9, 2017

@shelvacu It's not about the performance, it's about consistency and expected behaviour. We don't want to make the size be UInt64: unsigned integers are very unintuitive and bug-prone. Int64 could work, but the default numeric type for literals is Int32, so for example this will stop working:

a = [] of Int32
a << 1
a << a.size # Error: you'll have to do a.size.to_i

Also, right now creating the array from the first snippet in this thread, the one with 2147483647 elements, takes 14 seconds. I don't think such big arrays are common. So making the language work for a very rare use case at the cost of making everything else worse (like the example above) is not good. If you need a big array you could make one yourself, with Pointer you can allocate more memory than the limit of Int32.

That said, we do need these checks in Array, Hash and other containers, instead of silently failing.

@RX14
Copy link
Contributor

RX14 commented Feb 9, 2017

I think that, in addition to what @asterite said, once your array gets to Int32::MAX members, it's often better to fail hard than continue to eat memory. Consider an array of objects (pointers, assume 64bit), that will itself eat 16GiB of memory before it fails. In fact for that case it's way too high.

@sevk
Copy link

sevk commented Mar 9, 2017

suggest:

  • return nil if not defined
  • add limit check

irb(main):001:0> a=[]
=> []
irb(main):002:0> a[999]
=> nil
irb(main):003:0> a[999999999]
=> nil
irb(main):004:0> a[9999999999999999]
RangeError: bignum too big to convert into `long' 


@refi64
Copy link
Contributor

refi64 commented Mar 9, 2017

return nil if not defined

Not type safe...

@sevk
Copy link

sevk commented Mar 10, 2017

why we need type safe , duck typing is useful :

class Object
  def add(y)
    t = choose_type(self.type,y.type)
    self.to_type(t) + y.to_type(t)
  end
end
# Number -> String

@refi64
Copy link
Contributor

refi64 commented Mar 10, 2017

Because type safety is useful too, and Crystal already checks for nil, so that's not even possible...

@RespiteSage
Copy link

As a note, this issue changes with Crystal 1.0.0 (and several versions before) due to arithmetic overflow checking. The length limit is no longer silent, though the error could be wrapped to be more informative.

@crysbot
Copy link

crysbot commented Feb 18, 2024

This issue has been mentioned on Crystal Forum. There might be relevant details there:

https://forum.crystal-lang.org/t/is-increasing-array-index-size-in-the-future/6610/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests