Write speedups for multi-tiles runs #461

araichoor · 2024-02-26T04:49:10Z

This PR adds a couple of -- simple but efficient -- "tricks" to speed-up the individual fba-TILEID.fits file writing:

allow parallelisation in write_assignment_fits(): when N>1 tiles are run at once, those are sequentially written in the current main;
add fast_match argument in TargetTagalong class: this uses the desitarget.geomask.match() function, instead of building a dictionary with a TARGETID for each key, which takes a long time when dealing with 10M+ targets.

Those two "tricks" are controlled by two new arguments in parse_assign() in fiberassign/scripts/assign.py:

--write_fits_numproc (default=0): number of jobs to be run in parallel when calling write_assignment_fits();
--fast_match (default=False).

The use case is when one runs N>>1 tiles at once with tens of millions of targets (I developed that to explore the expected fiberassign when the GD1 tiles will be added to the Bright program).
The default values are disabling the "tricks", so that it is backwards-compatible, and this will change nothing in operations.

Test case with the GD1 tiles:
This case had ~800 tiles, and 7M targets plus 43M sky targets, so 50M targets.
The actual part of running the fiberassign is reasonably fast (1h), but the writing part was not.
Here are few timings to write the ~800 fba-TILEID.fits files (ie once the fiberassign is computed):

current main: 6h30;
with using --fast_match True --write_fits_numproc 0: 1h30.
with using --fast_match True --write_fits_numproc 64: 13min
with using --fast_match True --write_fits_numproc 128: 10min (no significant improvement w.r.t. 64 processes).

Note about the fast_match trick:
It assumes that the input TARGETID are unique.
If that is not the case, results will be wrong (ie the column values will be wrongly filled); but that s probably also the case for the current main... (I don t have clear, in-depth, understanding of the process).
As the goal is to look for speed, no checks are done (uniqueness of input the TARGETID, correctness of the matching).
Note to myself: a sanity check is planned to be implemented in desitarget.geomask.match(): desihub/desitarget#811; once that is done, I will have to make sure that such check is disable in the call here, to preserve the speed-up.

For sanity, I've rerun 1% of the main tiles (125 backup, bright, dark tiles) with the current main and with turning on the --fast_match option, and I get no difference in the output data.

Note about the write_fits_numproc trick:
I had to use multiprocessing.pool.ThreadPool instead of the "usual" multiprocessing.Pool, because the latter was crashing with the Assignment() object asgn.
I found this workaround with a quick google search and it worked.

@dstndstn: in case you have time to have a look, I'd appreciate! but there is no rush.

sbailey · 2024-02-26T18:19:17Z

These speedups look great. If they produce algorithmically identical results and/or don't make things slower for the standard single tile case, I encourage you to make the --write_fits_numproc and --fast_match options default to what you normally would want to use, instead of having to opt-in to get good performance. Though consider defaulting to not using the multiprocessing pool for the ntile=1 case to avoid a layer of spawning complexity for normal ops. I'm a little surprised that Pool didn't work and ThreadPool did, but haven't investigated myself.

Assuming uniqueness of TARGETID and silently doing the wrong thing sounds dangerous. Using the TARGETID column from /global/cfs/cdirs/desi/spectro/redux/iron/zcatalog/zall-tilecumulative-iron.fits, it takes 2.4 seconds to check 30 million targets for len(np.unique(t['TARGETID'])) == len(t['TARGETID']). Compared to a 10 minute overall runtime, those few seconds seem worth it. If that assumption is ever wrong, it will take way more than 10 minutes of human time to mop up the mess after we discover the wrong columns years later...

araichoor · 2024-02-26T23:30:58Z

thanks for the prompt comments!

let me already answer about ThreadPool:
sorry I didn t provide the example of the crash.
here s what I get if I use Pool on a simple test case (i.e. if I use mp.Pool(numproc) instead of ThreadPool(numproc)):

Traceback (most recent call last):
  File "/global/cfs/cdirs/desicollab/users/raichoor/fiberassign-designs/fiberassign-gd1-scnd/test/gd1.py", line 401, in <module>
    main()
  File "/global/cfs/cdirs/desicollab/users/raichoor/fiberassign-designs/fiberassign-gd1-scnd/test/gd1.py", line 382, in main
    run(fbadir, tilesfn, targfn, skiesfn, rundate, args.numproc)
  File "/global/cfs/cdirs/desicollab/users/raichoor/fiberassign-designs/fiberassign-gd1-scnd/test/gd1.py", line 259, in run
    run_assign_full(ag)
  File "/global/homes/r/raichoor/software_dev/fiberassign_write_speedups/py/fiberassign/scripts/assign.py", line 451, in run_assign_full
    write_assignment_fits(tiles, tagalong, asgn, out_dir=args.dir,
  File "/global/homes/r/raichoor/software_dev/fiberassign_write_speedups/py/fiberassign/assign.py", line 655, in write_assignment_fits
    _ = pool.starmap(write_assignment_fits_tile, all_params)
  File "/global/common/software/desi/perlmutter/desiconda/20230111-2.1.0/conda/lib/python3.10/multiprocessing/pool.py", line 375, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "/global/common/software/desi/perlmutter/desiconda/20230111-2.1.0/conda/lib/python3.10/multiprocessing/pool.py", line 774, in get
    raise self._value
  File "/global/common/software/desi/perlmutter/desiconda/20230111-2.1.0/conda/lib/python3.10/multiprocessing/pool.py", line 540, in _handle_tasks
    put(task)
  File "/global/common/software/desi/perlmutter/desiconda/20230111-2.1.0/conda/lib/python3.10/multiprocessing/connection.py", line 211, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/global/common/software/desi/perlmutter/desiconda/20230111-2.1.0/conda/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: cannot pickle 'fiberassign._internal.Assignment' object

araichoor added 2 commits February 13, 2024 19:32

allow parallelisation in write_assignment_fits()

560b758

add fast_match argument in TargetTagalong class

a14df0a

araichoor requested a review from dstndstn February 26, 2024 04:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write speedups for multi-tiles runs #461

Write speedups for multi-tiles runs #461

araichoor commented Feb 26, 2024

sbailey commented Feb 26, 2024

araichoor commented Feb 26, 2024

Write speedups for multi-tiles runs #461

Are you sure you want to change the base?

Write speedups for multi-tiles runs #461

Conversation

araichoor commented Feb 26, 2024

sbailey commented Feb 26, 2024

araichoor commented Feb 26, 2024