MultiBackendJobManager.run_jobs() doesn't add new jobs to existing job_tracker #558

VincentVerelst · 2024-04-16T13:12:00Z

The MultiBackendJobManager.run_jobs() method takes as input a df, which is a DataFrame containing information about all the jobs to run and an output_file, which contains the path to a csv file to track the status of all the jobs.
If the output_file already exists, however, the run_jobs() method will ignore the df input and continue from the existing jobs in the output_file, as seen in the code below:

output_file = Path(output_file)
 if output_file.exists() and output_file.is_file():
      # Resume from existing CSV
      _log.info(f"Resuming `run_jobs` from {output_file.absolute()}")
      df = pd.read_csv(output_file)
      status_histogram = df.groupby("status").size().to_dict()
      _log.info(f"Status histogram: {status_histogram}")

This makes it so that once a MultiBackendJobManager is run a second time, with the same output_file, it's not possible to add new jobs.
Is is possible that when output_file already exists, run_jobs() creates the union of the input df and existing output_file? Or is there a good reason not to?

The text was updated successfully, but these errors were encountered:

soxofaan · 2024-04-22T09:53:50Z

I haven't played a lot with MultiBackendJobManager myself and don't know the practical use details to be honest.

jdries · 2024-04-22T10:00:05Z

@VincentVerelst this is certainly a possibility. I suggest that data engineering is free to extend this job manager as needed.
Main reason not to do it would be to avoid unexpected behaviour, you really don't want your job csv to get corrupted and loose all info.
What I sometimes did in the past is using a separate script to make necessary updates to the csv, while job manager script is stopped, and then, after verification of csv, restart job manager with updated job list.

This was referenced Apr 16, 2024

Don't overwrite existing STAC collection when doing a new extraction Open-EO/openeo-gfmap#94

Closed

Issue94 extend stac collection Open-EO/openeo-gfmap#99

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MultiBackendJobManager.run_jobs() doesn't add new jobs to existing job_tracker #558

MultiBackendJobManager.run_jobs() doesn't add new jobs to existing job_tracker #558

VincentVerelst commented Apr 16, 2024

soxofaan commented Apr 22, 2024 •

edited

jdries commented Apr 22, 2024

MultiBackendJobManager.run_jobs() doesn't add new jobs to existing job_tracker #558

MultiBackendJobManager.run_jobs() doesn't add new jobs to existing job_tracker #558

Comments

VincentVerelst commented Apr 16, 2024

soxofaan commented Apr 22, 2024 • edited

jdries commented Apr 22, 2024

soxofaan commented Apr 22, 2024 •

edited