yaml parser: investigate use of multiprocessing to parallelise loading YAML #51

GraemeWatt · 2022-11-15T18:13:46Z

The Kubernetes pods used in production each have 16 CPUs (16 sockets with 1 core per socket and 1 thread per core). Using the Python multiprocessing package could potentially speed up the parsing of large submissions by parallelising the loop over data tables:

hepdata-converter/hepdata_converter/parsers/yaml_parser.py

Line 83 in 3f0330d

for i in range(0, len(submission_data)):

It looks like an attempt to use multiprocessing.Pool was started in 980dd23 but later removed in 4b1ad68. If successful, the use of multiprocessing could be extended to other parts of the converter code.

The text was updated successfully, but these errors were encountered:

GraemeWatt added the enhancement label Nov 15, 2022

GraemeWatt mentioned this issue Apr 26, 2024

converter: large records timeout after 220 seconds HEPData/hepdata#788

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yaml parser: investigate use of multiprocessing to parallelise loading YAML #51

yaml parser: investigate use of multiprocessing to parallelise loading YAML #51

GraemeWatt commented Nov 15, 2022

yaml parser: investigate use of multiprocessing to parallelise loading YAML #51

yaml parser: investigate use of multiprocessing to parallelise loading YAML #51

Comments

GraemeWatt commented Nov 15, 2022