Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 237 -- int/long Unification #52

Closed
jdhardy opened this issue Apr 5, 2014 · 11 comments · Fixed by #1423 or #1456
Closed

PEP 237 -- int/long Unification #52

jdhardy opened this issue Apr 5, 2014 · 11 comments · Fixed by #1423 or #1456

Comments

@jdhardy
Copy link
Member

jdhardy commented Apr 5, 2014

Python 3 gets rid of the distinction between int and long, with int now behaving as long. One way would be to switch everything to BigInteger, but it might be more efficient to try and use Int32 as much as possible behind the scenes, and switch to BigInteger if we have to while keeping the visible type int at all times.

Probably worth doing some benchmarks to see if it's worthwhile first.

@slozier slozier changed the title int/long Unification PEP 237 -- int/long Unification Feb 24, 2021
@markusschaber
Copy link

markusschaber commented Apr 20, 2021

On modern 64 bit machines, we could use Int64 instead of Int32 for "simple" operations. (Or just use 64 bit on all platforms, assuming emulated Int64 is still more performant than BigInteger... )

This is also what Python 2 did on 64 bit machines, as I just verified on an older ubuntu installation:

>>> sys.version
'2.7.17 (default, Sep 30 2020, 13:38:04) \n[GCC 7.5.0]'
>>> sys.maxint
9223372036854775807

@BCSharp
Copy link
Member

BCSharp commented Feb 2, 2022

To get a feeling about performance of various options across frameworks, I have done some measurements of a few arithmetic operations executed in a long loop. Results of all operations fit in 32 bits so it is possible to compare to native System.Int32.

.NET Performance Results

Here are best times for various types used for the variables in the loop, for both checked and unchecked code, on AMD Ryzen Threadripper 3960X.

Test loop template in C#
    public void CheckedIntTest()
    {
        int n = int.MaxValue;
        int z = 0;
        checked
        {
            for (int i = 0; i < n; i++)
            {
                if (i % 2 == 0)
                {
                    z += i;
                }
                else
                {
                    z -= i;
                }
            }
        }
    }
Method .NET Fmwk .NET 6.0
CheckedIntTest 1.71 s 1.22 s
UncheckedIntTest 1.24 s 1.06 s
CheckedLongTest 1.71 s 1.22 s
UncheckedLongTest 1.23 s 1.05 s
CheckedBoxTest 12.89 s 12.50 s
UncheckedBoxTest 13.08 s 12.74 s
CheckedBigIntTest[1] 179.35 s 62.35 s
UncheckedBigIntTest 179.55 s 59.17 s
CheckedGmpTest[2] 95.81 s N/A
UncheckedGmpTest 94.14 s N/A

Comments:

[1]: BigInteger on .NET 6 is 3x faster than on .NET Framework. Good optimization job?
[2]: GMP stands for the GNU Multiple Precision Arithmetic Library, reportedly the fastest out there. It is indeed faster than BigInteger in .NET Framework but loses to .NET 6.0 probably because of the P/Invoke overhead. Included here as a reference point.

The above is just the performance of raw .NET. It is useful to see the the relative cost of BigInteger and compare it to other options, but is not really representative for IronPython. In IronPython there is more happening under the hood even for a simple addition.

IronPython/CPython Performance Results

I have rewritten the test code above in Python and run it under various Python interpreters. Here are the results on the same workstation.

Test loop template in Python
def int_test():
    n = 0x7FFFFFFF
    z = 0
    i = 0
    while i < n:
        if i % 2 == 0:
            z += i
        else:
            z -= i
        i += 1
    return z
Interpreter default BigInteger
CPython 2.7.17 64-bit [1] 149 s
CPython 3.4.4 64-bit [1] 401 s
CPython 3.8.9 64-bit [1] 206 s
IronPython 3 on .NET Fmwk 4.8.4420 98 s 496 s
IronPython 3 on .NET Core 3.1.22 95 s 362 s
IronPython 3 on .NET 5.0.13 73 s 280 s
IronPython 3 on .NET 6.0.1 78 s 279 s

[1]: CPython test runs with GC disabled.

It is interesting to notice that, thanks to using Int32, IronPython beats CPython. Another interesting observation is that by switching over to BigInteger, the performance of IronPython is still comparable to CPython, however then modern CPython beats IronPython on all .NET versions (though this may be due to GC impact).

Of course there is more in the test going on than just simple integer arithmetics: comparisons, jumps, variable assignment, etc. This is deliberate. I wanted to get an impression of performance of an "average" Python code doing some basic arithmetics and the relative cost of the switch for programs at large.

Below are test results focused only on the performance of a single arithmetic operation.

Test Code
import timeit
timeit.timeit("z = a * b", setup="a=3; b=4", number=1000000000)
Interpreter default BigInteger
CPython 2.7.17 64-bit [1] 20.8 s
CPython 3.4.4 64-bit [1] 32.0 s
CPython 3.8.9 64-bit [1] 16.4 s
IronPython 3 on .NET Fmwk 4.8.4420 19.3 s 42.8 s
IronPython 3 on .NET Core 3.1.22 20.2 s 27.0 s
IronPython 3 on .NET 5.0.13 13.7 s 19.8 s
IronPython 3 on .NET 6.0.1 15.4 s 21.3 s

[1]: CPython test runs with GC disabled.

The overall picture is the same as in the previous test.

Conclusions

It seems to me that adopting BigInteger as the .NET type backing IronPython's int type is an acceptable solution from the performance point of view. But frankly it feels a little like a waste to abandon Int32 and leave some performance on the table, especially from the point of view of those programs that don't need very large integers. Also, the .NET API uses primarily Int32 wherever an integer number is required/returned, resorting to other integer types only when there is a good reason to. So to be able to treat Int32 as a native Python int type is very compelling (just as it is compelling to equate System.String with Python str).

So my recommendation is to go ahead with modifying IronPython in a way that both Int32 and BigInteger are mapped to Python int for all regular Python code. I am willing to give it a try.

What about using Int64 as the backing type?

Responding to @markusschaber: thanks for the suggestion, indeed, using Int64 instead of Int32 has no performance impact, and reduces the chance that the program has to fall back on BigInteger. This would result in performance improvements in some programs that heavily depend on values within the 64 bit range but outside 32 bit range. However, it is not totally without cost.

Since most .NET API uses Int32 by default, this will require casting when such .NET calls are made. I am not concerned about the performance of the casts per se, more the load on the method resolution in the Binder, and the question how tho treat Int32 values returned from .NET calls. If we keep them as they are, it would introduce a very confusing behaviour since those values would not be int and overflow when exceed 32 bits in computation.

One solution may be to widen them automatically to Int64, but that would be also not intuitive when the interpreter spontaneously changes the underlying .NET type. IronPython is after all a .NET language and should play nicely within the .NET ecosystem.

Another solution could be to use Int64 in addition to rather than instead of Int32. That could work, but would considerably complicate the type unification mechanism (assuming the unification of Int32/BigInteger is even feasible). We already have lots of places in which we test "is it int" then "is it BigInteger", then "is it Extensible<int>" and "is it Extensible<BigInteger>", and so all those places would need to be found and upgraded to handle Int64 too. Also, to make use of the performance benefits of of Int64, we would have to add overloads to our builitins so that Int64 doesn't have to be widened to BigInteger at a call site. This is a lot of work and error-prone, and having more overloads for the same function puts extra burden on the Binder and code maintenance.

So, in the end, although it is a viable option, I am not convinced it is worth the effort. In any case I see it more as a performance improvement that can be done later, and not as a part of the long/int unification.

@markusschaber
Copy link

Considering the dominance of Int32 in .NET APIs, I agree that getting Int64 "on board" may not be worth the effort, and it can still be implemented later.

@slozier
Copy link
Contributor

slozier commented Feb 3, 2022

So my recommendation is to go ahead with modifying IronPython in a way that both Int32 and BigInteger are mapped to Python int for all regular Python code. I am willing to give it a try.

I was leaning towards something of the sort as well. Did you have an approach in mind? I guess a hacky way could be to just hide it by having type(1<<64) return int. Might be enough for all intents and purposes. My main concern with this approach is that we then have different sets of .NET methods exposed on the object depending on the state (int/BigInt) or the number which is not necessarily great.

@BCSharp
Copy link
Member

BCSharp commented Feb 4, 2022

Did you have an approach in mind?

I was thinking of keeping type(1<<64) and type(1) separate but make them appear as (almost) the same from the perspective of the Python code. This would require a few small hacks but in several places: both types get the same name int (PythonBinder), become equivalent in issubclass and isinstance tests (PythonType.IsSubclassOf), become equal in comparisons (extra __eq__, __ne__, and __hash__ (PythonType) and would test true if compared using is (PythonOps.IsRetBool). However, because they are actually different instances of type, it would still be possible to distinguish them with id(...), which I thought could be useful in debugging. Since they are backed by different .NET types, sometimes it may be surprising which overload gets picked in some obscure cases.

On a side note, I was surprised to discover that currently isinstance(1<<64, int) returns True, though issubclass(type(1<<64), int) returns False (as expected). How is it done?

I guess a hacky way could be to just hide it by having type(1<<64) return int. Might be enough for all intents and purposes.

I like your idea better than mine because it seems to require only one hack. Support for debugging can be done by adding an IronPython-specific dunder property to the type (say, __underlying_system_type__). My main concern with this approach is that there may be a code path that does GetPythonTypeFromType in one place and then later casts back to System.Type. It could lead to situations that the .NET type obtained does not match the type of the boxed integer. Am I correct?

My main concern with this approach is that we then have different sets of .NET methods exposed on the object depending on the state (int/BigInt) or the number which is not necessarily great.

I thought that we could merge BigIntegerOps into IntOps and associacte both Int32 and BigInteger with IntOps in PythonBinder. That would guarantee the same set of methods exposed, wouldn't it?

I have envisioned one more change: abolishing the usage of Extensible<int> altogether and always use Extensible<BigInteger> for subclasses. Currently the following fails:

class SubInt(int): pass

SubInt(1<<64) # OverflowError

There are a number of places that test if the the argument is, among other possibilities, Extensible<int>. This could be removed or changed to a test for Extensible<BigInteger>.

@slozier
Copy link
Contributor

slozier commented Feb 4, 2022

On a side note, I was surprised to discover that currently isinstance(1<<64, int) returns True, though issubclass(type(1<<64), int) returns False (as expected). How is it done?

I'm special casing it. See #858.

It could lead to situations that the .NET type obtained does not match the type of the boxed integer. Am I correct?

I'm not sure, I briefly looked into it before suggesting and there seemed to be a special call site for type(object) which could be tweaked without affecting other things (GetPythonTypeFromType would still return long)? We'd just be tweaking the if below:

private object GetPythonType(CallSite site, CodeContext context, object type, object instance) {
if (type == TypeCache.PythonType) {
return DynamicHelpers.GetPythonType(instance);
}
return ((CallSite<Func<CallSite, CodeContext, object, object, object>>)site).Update(site, context, type, instance);
}

That would guarantee the same set of methods exposed, wouldn't it?

Yes and no. I'm not too worried about IntOps/BigIntegerOps since those are fully under out control. My concern was more with the stuff defined on the .NET side of things. For example:

import clr
clr.AddReference("System.Numerics")
import System.Numerics.BigInteger
print(set(dir(1<<64)) - set(dir(1)))
print(set(dir(1)) - set(dir(1<<64)))

Although most of those appear to be static properties on Int32/BigInteger. I guess if we had __underlying_system_type__ whoever wants to do .NET specific stuff can figure it out...

I have envisioned one more change: abolishing the usage of Extensible altogether and always use Extensible for subclasses.

This is interesting. Probably a good idea from a compat point of view.

@BCSharp
Copy link
Member

BCSharp commented Feb 4, 2022

I'm special casing it. See #858.

Thanks for the pointer. I assume this special case can be removed after the unification is completed.

I'm not sure, I briefly looked into it before suggesting and there seemed to be a special call site for type(object) which could be tweaked without affecting other things (GetPythonTypeFromType would still return long)?

OK, it settles it then. Just to make sure we are on the same page: both type(1<<64) and type(1) would return long renamed to int, not the current int (which is backed by Int32).

Yes and no. I'm not too worried about IntOps/BigIntegerOps since those are fully under out control. My concern was more with the stuff defined on the .NET side of things.

I see. Although most of those appear to be methods on BigInteger when I test on my laptop:

IronPython 3.4.0a1 (3.4.0.0001)
[.NETCoreApp,Version=v3.1 on .NET Core 3.1.22 (64-bit)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import clr
>>> clr.AddReference("System.Numerics")
>>> import System.Numerics.BigInteger
>>> print(set(dir(1<<64)) - set(dir(1)))
{'GetWords', 'ToChar', 'Divide', 'Xor', 'BitwiseOr', 'Remainder', 'ToDouble', 'Abs', 'Min', 'IsZero', 'AsUInt64', 'ToSingle', 'TryWriteBytes', 'ToUInt64', 'AsInt32', 'GetBitCount', 'GetWord', 'IsEven', 'ExclusiveOr', 'GetWordCount', 'BitwiseAnd', 'AsDecimal', 'IsPowerOfTwo', 'Compare', 'IsOne', 'ToInt64', 'Add', 'MinusOne', '__complex__', 'IsNegative', 'One', 'LeftShift', 'GetByteCount', 'Log', 'ToSByte', 'ToUInt16', 'ToUInt32', 'ToBoolean', 'Pow', 'ToType', 'ToByte', 'AsInt64', 'Zero', 'RightShift', 'ToFloat', 'Negate', 'OnesComplement', 'ToInt16', 'GreatestCommonDivisor', 'Multiply', 'Sign', 'DivRem', 'ToByteArray', 'Subtract', 'Mod', 'IsPositive', 'ToDecimal', 'Square', 'Log10', 'ModPow', 'ToInt32', 'AsUInt32', 'Create', 'Max'}

I'm not too worried about it. Most are redundant but we may still add the missing ones to IntOps. Going the other way will be tougher, but there are only two properties that BigInteger is missing: MinValue and MaxValue. These obviously are meaningless for BigInteger. This actually can be useful: it can be a way to differentiate between the actual underlying types without resorting to a new dunder property.

@slozier
Copy link
Contributor

slozier commented Feb 4, 2022

I assume this special case can be removed after the unification is completed.

Not sure if we'll be able to get rid of it, it might just change to typeinfo == TypeCache.BigInteger && o is int? Anyway, guess it depends on how things are implemented. Haven't thought it all through.

OK, it settles it then. Just to make sure we are on the same page: both type(1<<64) and type(1) would return long renamed to int, not the current int (which is backed by Int32).

Hmm, interesting, hadn't though about what the "real" backing int type should be. I guess if it's BigInteger it makes things like forcing class SubInt(int): pass to be Extensible<BigInteger> easier. So I guess we'd have str(System.Int32) == "<class 'Int32'> and then throw if you try to do class test(System.Int32): pass? I guess I don't really mind either way (as long as the logic is understandable).

@BCSharp
Copy link
Member

BCSharp commented Feb 4, 2022

Hmm, interesting, hadn't though about what the "real" backing int type should be. I guess if it's BigInteger it makes things like forcing class SubInt(int): pass to be Extensible<BigInteger> easier.

That was one of the motivations. Another was to make issubtype(SubInt, int) work without any extra hacks.

So I guess we'd have str(System.Int32) == "<class 'Int32'> and then throw if you try to do class test(System.Int32): pass?

Exactly. This will make the type System.Int32 behave in line with other unmanaged integer types.

@BCSharp
Copy link
Member

BCSharp commented Feb 9, 2022

There is one more place where a hack is required: PythonOps.GetPythonTypeName/PythonTypeOps.GetName. Since we have both Int32 and BigInteger objects flying around as Python int, reporting the actual type is not correct: it will report either int or Int32 (currently it reports int or long which is also not OK). It is mostly (if not always) used in error messages, which should say int in both cases.

Currently both functions are identical in functionality (one calls the other). I propose to add the hack in PythonOps.GetPythonTypeName and leave PythonTypeOps.GetName always reporting the true type name, which might be useful in debugging, tracing etc.

The new code:

internal static string GetPythonTypeName(object? obj) {
    if (obj is int) return "int";

    return PythonTypeOps.GetName(obj);
}

@slozier, are you OK with that? I am asking in advance because currently half the calls in the codebase are to the first function and the other half to the second. So after the change I will have to change 99+ places to use GetPythonTypeName and I'd rather agree on it in advance before commencing this change.

The alternative is to place the hack at the lower level in PythonTypeOps; no caller updates are necessary then but also no way to get the true type name.

@slozier
Copy link
Contributor

slozier commented Feb 9, 2022

@BCSharp This one is interesting. I've been wanting to normalize the "get type name" throughout the codebase so I'm fine with picking one call site and doing it. Also probably a good idea to have an analyzer to discourage the use of the other patterns. Another pattern to watch out for is DynamicHelpers.GetPythonType(o).Name.

Since this is going to be hitting a lot of code might be a good idea to have a PR specifically for this - will make reviewing easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants