PEP 237 -- int/long Unification #52

jdhardy · 2014-04-05T16:06:58Z

Python 3 gets rid of the distinction between int and long, with int now behaving as long. One way would be to switch everything to BigInteger, but it might be more efficient to try and use Int32 as much as possible behind the scenes, and switch to BigInteger if we have to while keeping the visible type int at all times.

Probably worth doing some benchmarks to see if it's worthwhile first.

markusschaber · 2021-04-20T13:38:28Z

On modern 64 bit machines, we could use Int64 instead of Int32 for "simple" operations. (Or just use 64 bit on all platforms, assuming emulated Int64 is still more performant than BigInteger... )

This is also what Python 2 did on 64 bit machines, as I just verified on an older ubuntu installation:

>>> sys.version
'2.7.17 (default, Sep 30 2020, 13:38:04) \n[GCC 7.5.0]'
>>> sys.maxint
9223372036854775807

BCSharp · 2022-02-02T04:25:30Z

To get a feeling about performance of various options across frameworks, I have done some measurements of a few arithmetic operations executed in a long loop. Results of all operations fit in 32 bits so it is possible to compare to native System.Int32.

.NET Performance Results

Here are best times for various types used for the variables in the loop, for both checked and unchecked code, on AMD Ryzen Threadripper 3960X.

Test loop template in C#

    public void CheckedIntTest()
    {
        int n = int.MaxValue;
        int z = 0;
        checked
        {
            for (int i = 0; i < n; i++)
            {
                if (i % 2 == 0)
                {
                    z += i;
                }
                else
                {
                    z -= i;
                }
            }
        }
    }

Method	.NET Fmwk	.NET 6.0
CheckedIntTest	1.71 s	1.22 s
UncheckedIntTest	1.24 s	1.06 s
CheckedLongTest	1.71 s	1.22 s
UncheckedLongTest	1.23 s	1.05 s
CheckedBoxTest	12.89 s	12.50 s
UncheckedBoxTest	13.08 s	12.74 s
CheckedBigIntTest[1]	179.35 s	62.35 s
UncheckedBigIntTest	179.55 s	59.17 s
CheckedGmpTest[2]	95.81 s	N/A
UncheckedGmpTest	94.14 s	N/A

Comments:

[1]: BigInteger on .NET 6 is 3x faster than on .NET Framework. Good optimization job?
[2]: GMP stands for the GNU Multiple Precision Arithmetic Library, reportedly the fastest out there. It is indeed faster than BigInteger in .NET Framework but loses to .NET 6.0 probably because of the P/Invoke overhead. Included here as a reference point.

The above is just the performance of raw .NET. It is useful to see the the relative cost of BigInteger and compare it to other options, but is not really representative for IronPython. In IronPython there is more happening under the hood even for a simple addition.

IronPython/CPython Performance Results

I have rewritten the test code above in Python and run it under various Python interpreters. Here are the results on the same workstation.

Test loop template in Python

def int_test():
    n = 0x7FFFFFFF
    z = 0
    i = 0
    while i < n:
        if i % 2 == 0:
            z += i
        else:
            z -= i
        i += 1
    return z

Interpreter	default	BigInteger
CPython 2.7.17 64-bit [1]	149 s
CPython 3.4.4 64-bit [1]	401 s
CPython 3.8.9 64-bit [1]	206 s
IronPython 3 on .NET Fmwk 4.8.4420	98 s	496 s
IronPython 3 on .NET Core 3.1.22	95 s	362 s
IronPython 3 on .NET 5.0.13	73 s	280 s
IronPython 3 on .NET 6.0.1	78 s	279 s

[1]: CPython test runs with GC disabled.

It is interesting to notice that, thanks to using Int32, IronPython beats CPython. Another interesting observation is that by switching over to BigInteger, the performance of IronPython is still comparable to CPython, however then modern CPython beats IronPython on all .NET versions (though this may be due to GC impact).

Of course there is more in the test going on than just simple integer arithmetics: comparisons, jumps, variable assignment, etc. This is deliberate. I wanted to get an impression of performance of an "average" Python code doing some basic arithmetics and the relative cost of the switch for programs at large.

Below are test results focused only on the performance of a single arithmetic operation.

Test Code

import timeit
timeit.timeit("z = a * b", setup="a=3; b=4", number=1000000000)

Interpreter	default	BigInteger
CPython 2.7.17 64-bit [1]	20.8 s
CPython 3.4.4 64-bit [1]	32.0 s
CPython 3.8.9 64-bit [1]	16.4 s
IronPython 3 on .NET Fmwk 4.8.4420	19.3 s	42.8 s
IronPython 3 on .NET Core 3.1.22	20.2 s	27.0 s
IronPython 3 on .NET 5.0.13	13.7 s	19.8 s
IronPython 3 on .NET 6.0.1	15.4 s	21.3 s

[1]: CPython test runs with GC disabled.

The overall picture is the same as in the previous test.

Conclusions

It seems to me that adopting BigInteger as the .NET type backing IronPython's int type is an acceptable solution from the performance point of view. But frankly it feels a little like a waste to abandon Int32 and leave some performance on the table, especially from the point of view of those programs that don't need very large integers. Also, the .NET API uses primarily Int32 wherever an integer number is required/returned, resorting to other integer types only when there is a good reason to. So to be able to treat Int32 as a native Python int type is very compelling (just as it is compelling to equate System.String with Python str).

So my recommendation is to go ahead with modifying IronPython in a way that both Int32 and BigInteger are mapped to Python int for all regular Python code. I am willing to give it a try.

What about using `Int64` as the backing type?

Responding to @markusschaber: thanks for the suggestion, indeed, using Int64 instead of Int32 has no performance impact, and reduces the chance that the program has to fall back on BigInteger. This would result in performance improvements in some programs that heavily depend on values within the 64 bit range but outside 32 bit range. However, it is not totally without cost.

Since most .NET API uses Int32 by default, this will require casting when such .NET calls are made. I am not concerned about the performance of the casts per se, more the load on the method resolution in the Binder, and the question how tho treat Int32 values returned from .NET calls. If we keep them as they are, it would introduce a very confusing behaviour since those values would not be int and overflow when exceed 32 bits in computation.

One solution may be to widen them automatically to Int64, but that would be also not intuitive when the interpreter spontaneously changes the underlying .NET type. IronPython is after all a .NET language and should play nicely within the .NET ecosystem.

Another solution could be to use Int64 in addition to rather than instead of Int32. That could work, but would considerably complicate the type unification mechanism (assuming the unification of Int32/BigInteger is even feasible). We already have lots of places in which we test "is it int" then "is it BigInteger", then "is it Extensible<int>" and "is it Extensible<BigInteger>", and so all those places would need to be found and upgraded to handle Int64 too. Also, to make use of the performance benefits of of Int64, we would have to add overloads to our builitins so that Int64 doesn't have to be widened to BigInteger at a call site. This is a lot of work and error-prone, and having more overloads for the same function puts extra burden on the Binder and code maintenance.

So, in the end, although it is a viable option, I am not convinced it is worth the effort. In any case I see it more as a performance improvement that can be done later, and not as a part of the long/int unification.

markusschaber · 2022-02-03T09:18:41Z

Considering the dominance of Int32 in .NET APIs, I agree that getting Int64 "on board" may not be worth the effort, and it can still be implemented later.

slozier · 2022-02-03T21:19:16Z

So my recommendation is to go ahead with modifying IronPython in a way that both Int32 and BigInteger are mapped to Python int for all regular Python code. I am willing to give it a try.

I was leaning towards something of the sort as well. Did you have an approach in mind? I guess a hacky way could be to just hide it by having type(1<<64) return int. Might be enough for all intents and purposes. My main concern with this approach is that we then have different sets of .NET methods exposed on the object depending on the state (int/BigInt) or the number which is not necessarily great.

BCSharp · 2022-02-04T19:43:52Z

Did you have an approach in mind?

I was thinking of keeping type(1<<64) and type(1) separate but make them appear as (almost) the same from the perspective of the Python code. This would require a few small hacks but in several places: both types get the same name int (PythonBinder), become equivalent in issubclass and isinstance tests (PythonType.IsSubclassOf), become equal in comparisons (extra __eq__, __ne__, and __hash__ (PythonType) and would test true if compared using is (PythonOps.IsRetBool). However, because they are actually different instances of type, it would still be possible to distinguish them with id(...), which I thought could be useful in debugging. Since they are backed by different .NET types, sometimes it may be surprising which overload gets picked in some obscure cases.

On a side note, I was surprised to discover that currently isinstance(1<<64, int) returns True, though issubclass(type(1<<64), int) returns False (as expected). How is it done?

I guess a hacky way could be to just hide it by having type(1<<64) return int. Might be enough for all intents and purposes.

I like your idea better than mine because it seems to require only one hack. Support for debugging can be done by adding an IronPython-specific dunder property to the type (say, __underlying_system_type__). My main concern with this approach is that there may be a code path that does GetPythonTypeFromType in one place and then later casts back to System.Type. It could lead to situations that the .NET type obtained does not match the type of the boxed integer. Am I correct?

My main concern with this approach is that we then have different sets of .NET methods exposed on the object depending on the state (int/BigInt) or the number which is not necessarily great.

I thought that we could merge BigIntegerOps into IntOps and associacte both Int32 and BigInteger with IntOps in PythonBinder. That would guarantee the same set of methods exposed, wouldn't it?

I have envisioned one more change: abolishing the usage of Extensible<int> altogether and always use Extensible<BigInteger> for subclasses. Currently the following fails:

class SubInt(int): pass

SubInt(1<<64) # OverflowError

There are a number of places that test if the the argument is, among other possibilities, Extensible<int>. This could be removed or changed to a test for Extensible<BigInteger>.

slozier · 2022-02-04T20:40:49Z

On a side note, I was surprised to discover that currently isinstance(1<<64, int) returns True, though issubclass(type(1<<64), int) returns False (as expected). How is it done?

I'm special casing it. See #858.

It could lead to situations that the .NET type obtained does not match the type of the boxed integer. Am I correct?

I'm not sure, I briefly looked into it before suggesting and there seemed to be a special call site for type(object) which could be tweaked without affecting other things (GetPythonTypeFromType would still return long)? We'd just be tweaking the if below:

ironpython3/Src/IronPython/Runtime/Types/PythonType.Generated.cs

Lines 94 to 100 in a8d1454

    
           private object GetPythonType(CallSite site, CodeContext context, object type, object instance) { 
        
               if (type == TypeCache.PythonType) { 
        
                   return DynamicHelpers.GetPythonType(instance); 
        
               } 
        
               return ((CallSite<Func<CallSite, CodeContext, object, object, object>>)site).Update(site, context, type, instance); 
        
           }

That would guarantee the same set of methods exposed, wouldn't it?

Yes and no. I'm not too worried about IntOps/BigIntegerOps since those are fully under out control. My concern was more with the stuff defined on the .NET side of things. For example:

import clr
clr.AddReference("System.Numerics")
import System.Numerics.BigInteger
print(set(dir(1<<64)) - set(dir(1)))
print(set(dir(1)) - set(dir(1<<64)))

Although most of those appear to be static properties on Int32/BigInteger. I guess if we had __underlying_system_type__ whoever wants to do .NET specific stuff can figure it out...

I have envisioned one more change: abolishing the usage of Extensible altogether and always use Extensible for subclasses.

This is interesting. Probably a good idea from a compat point of view.

BCSharp · 2022-02-04T21:17:56Z

I'm special casing it. See #858.

Thanks for the pointer. I assume this special case can be removed after the unification is completed.

I'm not sure, I briefly looked into it before suggesting and there seemed to be a special call site for type(object) which could be tweaked without affecting other things (GetPythonTypeFromType would still return long)?

OK, it settles it then. Just to make sure we are on the same page: both type(1<<64) and type(1) would return long renamed to int, not the current int (which is backed by Int32).

Yes and no. I'm not too worried about IntOps/BigIntegerOps since those are fully under out control. My concern was more with the stuff defined on the .NET side of things.

I see. Although most of those appear to be methods on BigInteger when I test on my laptop:

IronPython 3.4.0a1 (3.4.0.0001)
[.NETCoreApp,Version=v3.1 on .NET Core 3.1.22 (64-bit)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import clr
>>> clr.AddReference("System.Numerics")
>>> import System.Numerics.BigInteger
>>> print(set(dir(1<<64)) - set(dir(1)))
{'GetWords', 'ToChar', 'Divide', 'Xor', 'BitwiseOr', 'Remainder', 'ToDouble', 'Abs', 'Min', 'IsZero', 'AsUInt64', 'ToSingle', 'TryWriteBytes', 'ToUInt64', 'AsInt32', 'GetBitCount', 'GetWord', 'IsEven', 'ExclusiveOr', 'GetWordCount', 'BitwiseAnd', 'AsDecimal', 'IsPowerOfTwo', 'Compare', 'IsOne', 'ToInt64', 'Add', 'MinusOne', '__complex__', 'IsNegative', 'One', 'LeftShift', 'GetByteCount', 'Log', 'ToSByte', 'ToUInt16', 'ToUInt32', 'ToBoolean', 'Pow', 'ToType', 'ToByte', 'AsInt64', 'Zero', 'RightShift', 'ToFloat', 'Negate', 'OnesComplement', 'ToInt16', 'GreatestCommonDivisor', 'Multiply', 'Sign', 'DivRem', 'ToByteArray', 'Subtract', 'Mod', 'IsPositive', 'ToDecimal', 'Square', 'Log10', 'ModPow', 'ToInt32', 'AsUInt32', 'Create', 'Max'}

I'm not too worried about it. Most are redundant but we may still add the missing ones to IntOps. Going the other way will be tougher, but there are only two properties that BigInteger is missing: MinValue and MaxValue. These obviously are meaningless for BigInteger. This actually can be useful: it can be a way to differentiate between the actual underlying types without resorting to a new dunder property.

slozier · 2022-02-04T21:40:56Z

I assume this special case can be removed after the unification is completed.

Not sure if we'll be able to get rid of it, it might just change to typeinfo == TypeCache.BigInteger && o is int? Anyway, guess it depends on how things are implemented. Haven't thought it all through.

OK, it settles it then. Just to make sure we are on the same page: both type(1<<64) and type(1) would return long renamed to int, not the current int (which is backed by Int32).

Hmm, interesting, hadn't though about what the "real" backing int type should be. I guess if it's BigInteger it makes things like forcing class SubInt(int): pass to be Extensible<BigInteger> easier. So I guess we'd have str(System.Int32) == "<class 'Int32'> and then throw if you try to do class test(System.Int32): pass? I guess I don't really mind either way (as long as the logic is understandable).

BCSharp · 2022-02-04T21:58:50Z

Hmm, interesting, hadn't though about what the "real" backing int type should be. I guess if it's BigInteger it makes things like forcing class SubInt(int): pass to be Extensible<BigInteger> easier.

That was one of the motivations. Another was to make issubtype(SubInt, int) work without any extra hacks.

So I guess we'd have str(System.Int32) == "<class 'Int32'> and then throw if you try to do class test(System.Int32): pass?

Exactly. This will make the type System.Int32 behave in line with other unmanaged integer types.

BCSharp · 2022-02-09T05:58:19Z

There is one more place where a hack is required: PythonOps.GetPythonTypeName/PythonTypeOps.GetName. Since we have both Int32 and BigInteger objects flying around as Python int, reporting the actual type is not correct: it will report either int or Int32 (currently it reports int or long which is also not OK). It is mostly (if not always) used in error messages, which should say int in both cases.

Currently both functions are identical in functionality (one calls the other). I propose to add the hack in PythonOps.GetPythonTypeName and leave PythonTypeOps.GetName always reporting the true type name, which might be useful in debugging, tracing etc.

The new code:

internal static string GetPythonTypeName(object? obj) {
    if (obj is int) return "int";

    return PythonTypeOps.GetName(obj);
}

@slozier, are you OK with that? I am asking in advance because currently half the calls in the codebase are to the first function and the other half to the second. So after the change I will have to change 99+ places to use GetPythonTypeName and I'd rather agree on it in advance before commencing this change.

The alternative is to place the hack at the lower level in PythonTypeOps; no caller updates are necessary then but also no way to get the true type name.

slozier · 2022-02-09T14:36:42Z

@BCSharp This one is interesting. I've been wanting to normalize the "get type name" throughout the codebase so I'm fine with picking one call site and doing it. Also probably a good idea to have an analyzer to discourage the use of the other patterns. Another pattern to watch out for is DynamicHelpers.GetPythonType(o).Name.

Since this is going to be hitting a lot of code might be a good idea to have a PR specifically for this - will make reviewing easier.

slozier added the What's New In Python 3.0 label Nov 14, 2019

BCSharp mentioned this issue Apr 15, 2020

Update bytes/bytearray init #787

Merged

BCSharp mentioned this issue May 21, 2020

Factor out IPythonBuffer from IBufferProtocol #808

Merged

This was referenced May 21, 2020

Enable test_fractions #845

Merged

Make isinstance(val, int) return true for long #858

Merged

slozier mentioned this issue May 28, 2020

Remove long from builtin #862

Merged

slozier changed the title ~~int/long Unification~~ PEP 237 -- int/long Unification Feb 24, 2021

This was referenced Feb 12, 2022

Type name #1315

Merged

Inconsistent type conversion rules for CLI arrays #1320

Closed

This was referenced Feb 24, 2022

Extensible<T> fails on accessing static properties #1326

Open

PEP 237: int/long unification #1329

Merged

BCSharp mentioned this issue Mar 6, 2022

Eliminate Extensible<int> #1332

Merged

BCSharp added a commit to BCSharp/ironpython3 that referenced this issue Mar 6, 2022

Revert workarounds in StdLib for Issue IronLanguages#52

9c37909

BCSharp mentioned this issue Mar 6, 2022

Revert workarounds in StdLib for Issue #52 #1333

Merged

BCSharp added a commit that referenced this issue Mar 7, 2022

Revert workarounds in StdLib for Issue #52 (#1333)

1d6b2c6

This was referenced Mar 7, 2022

Rename LongOps.cs to BigIntegerOps.cs #1335

Merged

Cleanup of constructors in BigIntegerOps/IntOps #1347

Merged

This was referenced Mar 16, 2022

Remove obsolete BigIntegerV2 API from BigIntegerOps #1357

Merged

Relegate Int32 to non-Python-type level #1360

Merged

This was referenced Mar 20, 2022

Finish IConvertible members in BigIntegerOps #1366

Merged

Support large enums in int constructor #1367

Merged

Separate FloatOps.Generated from IntOps.Generated #1368

Merged

This was referenced Mar 27, 2022

Implement __int__ across all integer types #1382

Merged

Support BigInteger values in range #1374

Merged

BCSharp mentioned this issue Apr 4, 2022

Cleanup IntOps #1390

Merged

This was referenced Apr 13, 2022

Mimic BigInteger members on Int32 #1399

Merged

On Mono, not all CLR members defined on int are available on all int instances #1401

Open

Rename ConvertToLong to ConvertToInt #1403

Merged

slozier mentioned this issue Apr 25, 2022

Version 3.4.0-beta #1422

Merged

8 tasks

BCSharp added this to the 3.4-beta milestone Apr 25, 2022

BCSharp mentioned this issue Apr 26, 2022

Update notes on upgrading from ipy2 to cover PEP 237 #1423

Merged

BCSharp linked a pull request Apr 27, 2022 that will close this issue

Update notes on upgrading from ipy2 to cover PEP 237 #1423

Merged

BCSharp closed this as completed in #1423 Apr 27, 2022

This was referenced May 11, 2022

Improve clr.accepts/returns #1449

Merged

Improve method resolution for numerics #1456

Merged

slozier mentioned this issue Sep 9, 2022

Rich compasions of Int32-based ints and subclasses of int return NotImplemented #1554

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PEP 237 -- int/long Unification #52

PEP 237 -- int/long Unification #52

jdhardy commented Apr 5, 2014

markusschaber commented Apr 20, 2021 •

edited

BCSharp commented Feb 2, 2022

markusschaber commented Feb 3, 2022

slozier commented Feb 3, 2022

BCSharp commented Feb 4, 2022

slozier commented Feb 4, 2022

BCSharp commented Feb 4, 2022

slozier commented Feb 4, 2022

BCSharp commented Feb 4, 2022

BCSharp commented Feb 9, 2022

slozier commented Feb 9, 2022

PEP 237 -- int/long Unification #52

PEP 237 -- int/long Unification #52

Comments

jdhardy commented Apr 5, 2014

markusschaber commented Apr 20, 2021 • edited

BCSharp commented Feb 2, 2022

.NET Performance Results

IronPython/CPython Performance Results

Conclusions

What about using Int64 as the backing type?

markusschaber commented Feb 3, 2022

slozier commented Feb 3, 2022

BCSharp commented Feb 4, 2022

slozier commented Feb 4, 2022

BCSharp commented Feb 4, 2022

slozier commented Feb 4, 2022

BCSharp commented Feb 4, 2022

BCSharp commented Feb 9, 2022

slozier commented Feb 9, 2022

markusschaber commented Apr 20, 2021 •

edited

What about using `Int64` as the backing type?