Skip to content

Conversation

@Hritik14
Copy link
Collaborator

@Hritik14 Hritik14 commented Jun 11, 2021

Fixes: #475
Early profiling showed that a lot of time was being wasted during auto
commits undertaken by django. Wraping the importer in an atomic block
avoids lots of database commits and shows huge performance improvement.

Alpine: 202.7s -> 50.9s
Archlinux 2116.6s -> 107.8s
Gentoo 3176.3s -> 225.8s

Yielding an average of 93% reduction in time (14x faster)

Before using atomic blocks:

---> restats_alpine <---
Fri Jun 11 20:06:11 2021    restats_default/restats_alpine

         17843326 function calls (17320561 primitive calls) in 202.763 seconds

   Ordered by: internal time
   List reduced from 2779 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     4155  128.501    0.031  128.501    0.031 {method 'commit' of 'psycopg2.extensions.connection' objects}
    26521   12.478    0.000   12.756    0.000 {method 'execute' of 'psycopg2.extensions.cursor' objects}
      216    9.673    0.045    9.673    0.045 {method 'read' of '_ssl._SSLSocket' objects}
       36    5.905    0.164    5.905    0.164 {method 'do_handshake' of '_ssl._SSLSocket' objects}
       36    5.556    0.154    5.556    0.154 {method 'connect' of '_socket.socket' objects}

---> restats_archlinux <---
Fri Jun 11 21:34:26 2021    restats_default/restats_archlinux

         74938815 function calls (72697906 primitive calls) in 2116.686 seconds

   Ordered by: internal time
   List reduced from 830 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    27687 1472.325    0.053 1472.325    0.053 {method 'commit' of 'psycopg2.extensions.connection' objects}
    87507  377.318    0.004  379.760    0.004 {method 'execute' of 'psycopg2.extensions.cursor' objects}
   665302    8.463    0.000   19.086    0.000 local.py:46(_get_context_id)
   148038    7.755    0.000   63.665    0.000 query.py:1207(build_filter)
5738127/5072825    5.744    0.000   21.943    0.000 {built-in method builtins.hasattr}

---> restats_gentoo <---
Fri Jun 11 20:59:08 2021    restats_default/restats_gentoo

         78731890 function calls (76578734 primitive calls) in 3176.336 seconds

   Ordered by: internal time
   List reduced from 962 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    33185 1755.378    0.053 1755.378    0.053 {method 'commit' of 'psycopg2.extensions.connection' objects}
    99975 1087.850    0.011 1090.304    0.011 {method 'execute' of 'psycopg2.extensions.cursor' objects}
        3   27.587    9.196   27.587    9.196 {method 'poll' of 'select.poll' objects}
   691872    9.332    0.000   21.079    0.000 local.py:46(_get_context_id)
   153630    8.780    0.000   67.248    0.000 query.py:1207(build_filter)

After using atomic blocks:

---> restats_alpine <---
Fri Jun 11 21:42:18 2021    restats_atomic/restats_alpine

         18479196 function calls (17923185 primitive calls) in 50.974 seconds

   Ordered by: internal time
   List reduced from 2788 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      217    9.647    0.044    9.647    0.044 {method 'read' of '_ssl._SSLSocket' objects}
    34831    6.498    0.000    6.654    0.000 {method 'execute' of 'psycopg2.extensions.cursor' objects}
       36    5.862    0.163    5.862    0.163 {method 'do_handshake' of '_ssl._SSLSocket' objects}
       36    5.563    0.155    5.563    0.155 {method 'connect' of '_socket.socket' objects}
       36    1.989    0.055    1.990    0.055 {built-in method _socket.getaddrinfo}

---> restats_archlinux <---
Fri Jun 11 21:47:52 2021    restats_atomic/restats_archlinux

         79233966 function calls (76769671 primitive calls) in 107.852 seconds

   Ordered by: internal time
   List reduced from 832 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   143004   25.728    0.000   26.574    0.000 {method 'execute' of 'psycopg2.extensions.cursor' objects}
   665812    2.235    0.000    5.033    0.000 local.py:46(_get_context_id)
   148134    2.047    0.000   17.039    0.000 query.py:1207(build_filter)
       45    1.610    0.036    1.610    0.036 {method 'read' of '_ssl._SSLSocket' objects}
5714564/5048752    1.539    0.000    5.840    0.000 {built-in method builtins.hasattr}

---> restats_gentoo <---
Fri Jun 11 21:46:04 2021    restats_atomic/restats_gentoo

         83829866 function calls (81411194 primitive calls) in 225.877 seconds

   Ordered by: internal time
   List reduced from 971 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3  104.933   34.978  104.933   34.978 {method 'poll' of 'select.poll' objects}
   166348   30.434    0.000   31.271    0.000 {method 'execute' of 'psycopg2.extensions.cursor' objects}
   691888    2.486    0.000    5.597    0.000 local.py:46(_get_context_id)
   153633    2.255    0.000   17.736    0.000 query.py:1207(build_filter)
5795554/5103666    1.677    0.000    6.539    0.000 {built-in method builtins.hasattr}

Signed-off-by: Hritik Vijay hritikxx8@gmail.com

The profiling showed that a lot of time was being wasted during auto
commits undertaken by django. Wraping the importer in an atomic block
avoids lots of database commits and shows huge performance improvement.

Alpine: 202.7s -> 50.9s
Archlinux 2116.6s -> 107.8s
Gentoo 3176.3s -> 225.8s

Yielding an average of 93% reduction in time (14x faster)

Before using atomic blocks:
---> restats_alpine <---
Fri Jun 11 20:06:11 2021    restats_default/restats_alpine

         17843326 function calls (17320561 primitive calls) in 202.763 seconds

   Ordered by: internal time
   List reduced from 2779 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     4155  128.501    0.031  128.501    0.031 {method 'commit' of 'psycopg2.extensions.connection' objects}
    26521   12.478    0.000   12.756    0.000 {method 'execute' of 'psycopg2.extensions.cursor' objects}
      216    9.673    0.045    9.673    0.045 {method 'read' of '_ssl._SSLSocket' objects}
       36    5.905    0.164    5.905    0.164 {method 'do_handshake' of '_ssl._SSLSocket' objects}
       36    5.556    0.154    5.556    0.154 {method 'connect' of '_socket.socket' objects}

---> restats_archlinux <---
Fri Jun 11 21:34:26 2021    restats_default/restats_archlinux

         74938815 function calls (72697906 primitive calls) in 2116.686 seconds

   Ordered by: internal time
   List reduced from 830 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    27687 1472.325    0.053 1472.325    0.053 {method 'commit' of 'psycopg2.extensions.connection' objects}
    87507  377.318    0.004  379.760    0.004 {method 'execute' of 'psycopg2.extensions.cursor' objects}
   665302    8.463    0.000   19.086    0.000 local.py:46(_get_context_id)
   148038    7.755    0.000   63.665    0.000 query.py:1207(build_filter)
5738127/5072825    5.744    0.000   21.943    0.000 {built-in method builtins.hasattr}

---> restats_gentoo <---
Fri Jun 11 20:59:08 2021    restats_default/restats_gentoo

         78731890 function calls (76578734 primitive calls) in 3176.336 seconds

   Ordered by: internal time
   List reduced from 962 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    33185 1755.378    0.053 1755.378    0.053 {method 'commit' of 'psycopg2.extensions.connection' objects}
    99975 1087.850    0.011 1090.304    0.011 {method 'execute' of 'psycopg2.extensions.cursor' objects}
        3   27.587    9.196   27.587    9.196 {method 'poll' of 'select.poll' objects}
   691872    9.332    0.000   21.079    0.000 local.py:46(_get_context_id)
   153630    8.780    0.000   67.248    0.000 query.py:1207(build_filter)

After using atomic blocks:
---> restats_alpine <---
Fri Jun 11 21:42:18 2021    restats_atomic/restats_alpine

         18479196 function calls (17923185 primitive calls) in 50.974 seconds

   Ordered by: internal time
   List reduced from 2788 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      217    9.647    0.044    9.647    0.044 {method 'read' of '_ssl._SSLSocket' objects}
    34831    6.498    0.000    6.654    0.000 {method 'execute' of 'psycopg2.extensions.cursor' objects}
       36    5.862    0.163    5.862    0.163 {method 'do_handshake' of '_ssl._SSLSocket' objects}
       36    5.563    0.155    5.563    0.155 {method 'connect' of '_socket.socket' objects}
       36    1.989    0.055    1.990    0.055 {built-in method _socket.getaddrinfo}

---> restats_archlinux <---
Fri Jun 11 21:47:52 2021    restats_atomic/restats_archlinux

         79233966 function calls (76769671 primitive calls) in 107.852 seconds

   Ordered by: internal time
   List reduced from 832 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   143004   25.728    0.000   26.574    0.000 {method 'execute' of 'psycopg2.extensions.cursor' objects}
   665812    2.235    0.000    5.033    0.000 local.py:46(_get_context_id)
   148134    2.047    0.000   17.039    0.000 query.py:1207(build_filter)
       45    1.610    0.036    1.610    0.036 {method 'read' of '_ssl._SSLSocket' objects}
5714564/5048752    1.539    0.000    5.840    0.000 {built-in method builtins.hasattr}

---> restats_gentoo <---
Fri Jun 11 21:46:04 2021    restats_atomic/restats_gentoo

         83829866 function calls (81411194 primitive calls) in 225.877 seconds

   Ordered by: internal time
   List reduced from 971 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3  104.933   34.978  104.933   34.978 {method 'poll' of 'select.poll' objects}
   166348   30.434    0.000   31.271    0.000 {method 'execute' of 'psycopg2.extensions.cursor' objects}
   691888    2.486    0.000    5.597    0.000 local.py:46(_get_context_id)
   153633    2.255    0.000   17.736    0.000 query.py:1207(build_filter)
5795554/5103666    1.677    0.000    6.539    0.000 {built-in method builtins.hasattr}

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Copy link
Collaborator

@sbs2001 sbs2001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job.

I don't see any harm in using single transaction per importer.

With the introduction of atomic transaction in previous commit, it is
unnecessary to catch the erroneous Advisory because if there's an exception
all the changes will be rolled back anyway.

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
@sbs2001 sbs2001 merged commit 52562d1 into aboutcode-org:main Jun 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve importing speed

2 participants