Skip to content

Get rid of the custom paginated() and use the built-in iterator() #1798

@keshav-space

Description

@keshav-space

The custom BaseQuerySet.paginated() performs extremely poorly in terms of compute time and memory usage compared to the built-in iterator().

Below is an example where we compare .paginated() with .iterator() for iterating over 8 million advisory data.

Using custom paginated()

Even after 20 minutes, the iteration is not completed, and the ETA keeps getting worse and worse.

Iterating advisory data using custom BaseQuerySet.paginated()
INFO 2025-03-07 08:56:41.500 Pipeline [IterateAdvisory] starting
INFO 2025-03-07 08:56:41.500 Step [iterate_advisories] starting
INFO 2025-03-07 08:56:41.796 Iterating over 8073782 Advisory objects.
INFO 2025-03-07 08:57:50.106 Progress: 10% (807379/8073782) ETA: 615 seconds (10.2 minutes)
INFO 2025-03-07 09:00:09.226 Progress: 20% (1614757/8073782) ETA: 830 seconds (13.8 minutes)
INFO 2025-03-07 09:03:34.920 Progress: 30% (2422135/8073782) ETA: 964 seconds (16.1 minutes)
INFO 2025-03-07 09:09:04.444 Progress: 40% (3229513/8073782) ETA: 1114 seconds (18.6 minutes)
INFO 2025-03-07 09:16:32.571 Progress: 50% (4036891/8073782) ETA: 1191 seconds (19.9 minutes)
^CCommandError: Keyboard interrupt received. Stopping...

Using built-in iterator()

It takes around 4 minutes to iterate over 8 million advisories.

Iterating advisory data using built-in .iterator()
INFO 2025-03-07 10:02:43.483 Pipeline [IterateAdvisory] starting
INFO 2025-03-07 10:02:43.483 Step [iterate_advisories] starting
INFO 2025-03-07 10:02:44.042 Iterating over 8073782 Advisory objects.
INFO 2025-03-07 10:05:00.181 Progress: 10% (807379/8073782) ETA: 1225 seconds (20.4 minutes)
INFO 2025-03-07 10:05:19.790 Progress: 20% (1614757/8073782) ETA: 623 seconds (10.4 minutes)
INFO 2025-03-07 10:05:37.013 Progress: 30% (2422135/8073782) ETA: 404 seconds (6.7 minutes)
INFO 2025-03-07 10:05:50.865 Progress: 40% (3229513/8073782) ETA: 280 seconds (4.7 minutes)
INFO 2025-03-07 10:06:01.161 Progress: 50% (4036891/8073782) ETA: 197 seconds (3.3 minutes)
INFO 2025-03-07 10:06:11.465 Progress: 60% (4844270/8073782) ETA: 138 seconds (2.3 minutes)
INFO 2025-03-07 10:06:21.761 Progress: 70% (5651648/8073782) ETA: 93 seconds (1.6 minutes)
INFO 2025-03-07 10:06:32.129 Progress: 80% (6459026/8073782) ETA: 57 seconds
INFO 2025-03-07 10:06:42.248 Progress: 90% (7266404/8073782) ETA: 26 seconds
INFO 2025-03-07 10:06:52.325 Progress: 100% (8073782/8073782)
INFO 2025-03-07 10:06:52.587 Step [iterate_advisories] completed in 249 seconds (4.2 minutes)
INFO 2025-03-07 10:06:52.587 Pipeline completed in 249 seconds (4.2 minutes)

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions