Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickling is slow wrt int's #404

Open
skirpichev opened this issue Mar 27, 2023 · 2 comments
Open

Pickling is slow wrt int's #404

skirpichev opened this issue Mar 27, 2023 · 2 comments

Comments

@skirpichev
Copy link
Contributor

An example (~4-5x difference):

$ python -m timeit -r10 -s 'from math import factorial as f;from pickle import dumps as d' \
     -s 'a=f(1000)' 'd(a)'
50000 loops, best of 10: 5.7 usec per loop
$ python -m timeit -r10 -s 'from gmpy2 import fac as f, mpz as z;from pickle import dumps as d' \
    -s 'a=f(z(1000))' 'd(a)'
10000 loops, best of 10: 24.4 usec per loop

I wonder if using the pure python function for pickling support could be a significant part of this speed loss.
(Inspired by mpmath/mpmath#667)

@casevh
Copy link
Collaborator

casevh commented Mar 28, 2023

I tried three tests.

$ py311 -m timeit -r10 -s 'from gmpy2 import fac as f, mpz as z;from pickle import dumps as d' -s 'a=f(z(1000))' 'd(a)'
50000 loops, best of 10: 4.92 usec per loop
$ py311 -m timeit -r10 -s 'from math import factorial as f;from pickle import dumps as d' -s 'a=f(1000)' 'd(a)'
200000 loops, best of 10: 1.3 usec per loop
$ py311 -m timeit -r10 -s 'from gmpy2 import fac as f, mpz as z, to_binary as d' -s 'a=f(z(1000))' 'd(a)'
100000 loops, best of 10: 3.23 usec per loop

The third test directly calls gmpy2.to_binary which creates the pickled data. to_binary calls mpz_export for all the data manipulation.

The Python code inside gmpy2 is ancient. I would accept removing it since we no longer need to support old versions of Python and mpz is now a real type and not a factory function.

I don't know if it is worth trying to improve on mpz_export/mpz_import? The code in mpz_pylong.c may help.

@skirpichev
Copy link
Contributor Author

skirpichev commented May 28, 2023

BTW, the mpz.to_bytes() (or to_binary) is systematically slower than int.to_bytes(). Here is a simple benchmark:
to_bytes-100000
Code:

# a.py
from gmpy2 import mpz, to_binary
import sys
import random
import time
import platform
import matplotlib.pyplot as plt

int_time = []
mpz_time = []
#bin_time = []

times = 15
nbits = int(float(sys.argv[1]))

random.seed(1)

r = sorted(random.sample(range(1, nbits), 500))

for k in r:
    ns = [random.randint(2**k//2, 2**k) for _ in range(times)]
    nl = [(n, (n.bit_length() + 7)//8 + 2) for n in ns]

    start = time.perf_counter_ns()
    for n, l in nl:
        n.to_bytes(l)
    int_time.append((time.perf_counter_ns() - start) / times)

    nl = [(mpz(n), l) for n, l in nl]
    start = time.perf_counter_ns()
    for n, l in nl:
        n.to_bytes(l)
    mpz_time.append((time.perf_counter_ns() - start) / times)

#   start = time.perf_counter_ns()
#   for n, l in nl:
#       to_binary(n)
#   bin_time.append((time.perf_counter_ns() - start) / times)


fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(r, int_time, label='int.to_bytes()')
ax.plot(r, mpz_time, label='mpz.to_bytes()')
#ax.plot(r, bin_time, label='to_binary(mpz)')
ax.set_yscale('log')
ax.set_xlabel('bits')
ax.set_ylabel('time (ns)')
ax.legend()
plt.title('Benchmark for to_bytes with ' + str(nbits) + ' bits.')
plt.show()
fig.savefig('to_bytes-'+str(nbits) + '.png')

UPD:

$ python
Python 3.11.3+ (heads/3.11:9fbb614c4e, Apr 29 2023, 14:18:05) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gmpy2, sys
>>> gmpy2.mp_limbsize()
64
>>> sys.int_info
sys.int_info(bits_per_digit=30, sizeof_digit=4, default_max_str_digits=4300, str_digits_check_threshold=640)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants