Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordering of results in run_over_distribution() does not match ordering in configuration list #133

Open
benmsanderson opened this issue May 9, 2024 · 3 comments

Comments

@benmsanderson
Copy link
Collaborator

benmsanderson commented May 9, 2024

Calling a distributed case:
distrorun1 = DistributionRun(testconfig, numvalues=10000)

results1 = distrorun1.run_over_distribution(scendata, output_vars, max_workers=100)

where scendata is a list of configuration dictionaries, output_vars is a list of desired outputs

the ordering of outputs in df=results1.timeseries() does not match that in distrorun1.cfgs - which renders any emulation impossible. I suspect this traces back to the parallel utility in openscmrunner:
https://github.com/openscm/openscm-runner/blob/main/src/openscm_runner/adapters/utils/_parallel_process.py

which (I think) doesn't preserve the ordering of the original configuration vector in the output.

Working on a simpler code which bypasses openscmrunner for this...

@benmsanderson
Copy link
Collaborator Author

benmsanderson commented May 9, 2024

Alright - this seems to work ( https://github.com/ciceroOslo/ciceroscm/blob/calibration-workflow/notebooks/CSCM_calibrate.ipynb ), without calling the openscmrunner:

Firstly, a quick wrapper to jointly return results and config, and handle crashes:

def get_results(cfg):
    try:
        cscm_dir._run({"results_as_dict":True},
                   pamset_udm=cfg['pamset_udm'],pamset_emiconc=cfg['pamset_emiconc'])
        res=cscm_dir.results

    except:
        res=None
    return [cfg,res]

then:

def run_parallel(cfgs,nworkers=4):
    results=len(cfgs)*[None]
    with ProcessPoolExecutor(nworkers) as exe:
            # execute tasks concurrently and process results in order
            pres=list(tqdm(exe.map(get_results, cfgs)))
            for result in pres:
                # get the corresponding index of the config
                ind=int(result[0]['Index'])
                # put it in the right element of the results vector
                results[ind]=result[1]
    return results

so, in use - you do:

distrorun1 = DistributionRun(testconfig, numvalues=10000)
results=run_parallel(distrorun1.cfgs,nworkers=100)

@maritsandstad
Copy link
Contributor

This is nice, I will take a look at implementing this on Monday, @benmsanderson . In the meantime a quick review #131 so we have a working version would be really great ;-)

@benmsanderson
Copy link
Collaborator Author

benmsanderson commented May 12, 2024

Will try and look tomorrow - in the meantime, a simple parallel implementation without openscmrunner is here, works fine on qbo (working on the emulation/optimization now). https://github.com/ciceroOslo/ciceroscm/blob/calibration-workflow/notebooks/calibration%20pipeline/1%20-%20run%20parallel%20PPE.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants