Best to run in a clean conda env
# get analysis code and install as package
git clone git@github.com:andrzejnovak/boostedhiggs.git
cd boostedhiggs
pip install -e .
cd ..
# get runner code
git clone git@github.com:andrzejnovak/nanocc.git
cd nanocc
# initiate proxy e.g
# voms-proxy-init --voms cms:/cms/dcms --valid 168:00 --vomses ~/.grid-security/vomses/
Use --executor iterative
for single process to debug, --executor futures
for local multiprocessing.
python runner.py --id test17 --json metadata/v2x17.json --year 2017 --limit 1 --chunk 5000 --max 2 --executor futures -j 5
Scale-out will be dependent on the cluster setup. If the cluster is sufficiently permissive the below might run right away, otherwise some editing of runner.py
and HighThroughputExecutor
when using --executor parsl
will be necessary. Analogously for --executor dask
python runner.py --id test17 --json metadata/v2x17.json --year 2017 --limit 1 --chunk 5000 --max 2 --executor parsl
Removing test limiters...
python runner.py --id test17 --json metadata/v2x17.json --year 2017 --executor parsl