Archive: Respect `filter_size` in query for existing nodes #6404

sphuber · 2024-05-20T14:27:41Z

The QueryParams dataclass defines the filter_size attribute which is used in all queries to limit the number of parameters used in a query. This is necessary because without it, large archives would result in queries with a lot of parameters which can cause exceptions in database backends, such as SQLite, which define a limit of 1000 by default.

The aiida.tools.archive._import_nodes function was not respecting this setting when determining the set of nodes from the archive that already exist in the target storage. This would result in an exception when trying to import a large archive into a storage using SQLite. The problem is fixed by using the batch_iter utility to retrieve the existing UUIDs in batches of size filter_size.

sphuber · 2024-05-20T17:25:43Z

With this fix, I have successfully imported the MC3D archive into a core.sqlite_dos profile.

The `QueryParams` dataclass defines the `filter_size` attribute which is used in all queries to limit the number of parameters used in a query. This is necessary because without it, large archives would result in queries with a lot of parameters which can cause exceptions in database backends, such as SQLite, which define a limit of 1000 by default. The `aiida.tools.archive._import_nodes` function was not respecting this setting when determining the set of nodes from the archive that already exist in the target storage. This would result in an exception when trying to import a large archive into a storage using SQLite. The problem is fixed by using the `batch_iter` utility to retrieve the existing UUIDs in batches of size `filter_size`.

GeigerJ2

I also just tried it, and it worked for me, as well. So this seems ready to go.

…#6404) The `QueryParams` dataclass defines the `filter_size` attribute which is used in all queries to limit the number of parameters used in a query. This is necessary because without it, large archives would result in queries with a lot of parameters which can cause exceptions in database backends, such as SQLite, which define a limit of 1000 by default. The `aiida.tools.archive._import_nodes` function was not respecting this setting when determining the set of nodes from the archive that already exist in the target storage. This would result in an exception when trying to import a large archive into a storage using SQLite. The problem is fixed by using the `batch_iter` utility to retrieve the existing UUIDs in batches of size `filter_size`.

sphuber requested a review from GeigerJ2 May 20, 2024 14:27

sphuber force-pushed the fix/6402/archive-import-filter-size branch from 26c4c2a to b81ed20 Compare May 20, 2024 18:29

GeigerJ2 approved these changes May 21, 2024

View reviewed changes

sphuber merged commit ef60b66 into aiidateam:main May 21, 2024
18 of 19 checks passed

sphuber deleted the fix/6402/archive-import-filter-size branch May 21, 2024 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Archive: Respect `filter_size` in query for existing nodes #6404

Archive: Respect `filter_size` in query for existing nodes #6404

sphuber commented May 20, 2024

sphuber commented May 20, 2024

GeigerJ2 left a comment

Archive: Respect filter_size in query for existing nodes #6404

Archive: Respect filter_size in query for existing nodes #6404

Conversation

sphuber commented May 20, 2024

sphuber commented May 20, 2024

GeigerJ2 left a comment

Choose a reason for hiding this comment

Archive: Respect `filter_size` in query for existing nodes #6404

Archive: Respect `filter_size` in query for existing nodes #6404