[WIP] Coffea2023 #14

kmohrman · 2023-12-20T00:45:11Z

No description provided.

analysis/wwz/run_wwz4l.py

lgray · 2024-01-11T16:00:01Z

ewkcoffea/modules/selection_wwz.py

-    mask = (lep_collection.idx == ak.flatten(ll_pairs_idx.l0[sfos_pair_closest_to_z_idx]))
-    mask = (mask | (lep_collection.idx == ak.flatten(ll_pairs_idx.l1[sfos_pair_closest_to_z_idx])))
+    flat_pair_idxs = ak.flatten(ll_pairs_idx[sfos_pair_closest_to_z_idx])
+    mask = ((lep_local_idx == flat_pair_idxs.l0) | (lep_local_idx == flat_pair_idxs.l1))


can you ... unfix... this for now - some other people are looking that the over touching issue in your code in-situ :-)

Unfixed : )

…ne for data

…argument's steps must be an iterable of integer offsets or start-stop pairs

lgray · 2024-01-12T15:14:52Z

..._samples/sample_jsons/wwz_analysis_skims_v0/data_samples/UL16APV/DoubleEG_Run2016D-HIPM.json

@@ -43,7 +43,6 @@
        "/store/user/kdownham/skimOutput/3LepTau_4Lep/DoubleEG_Run2016D-HIPM_UL2016_MiniAODv2_NanoAODv9-v2_NANOAOD_3LepTau_4Lep/output_1.root",
        "/store/user/kdownham/skimOutput/3LepTau_4Lep/DoubleEG_Run2016D-HIPM_UL2016_MiniAODv2_NanoAODv9-v2_NANOAOD_3LepTau_4Lep/output_45.root",
        "/store/user/kdownham/skimOutput/3LepTau_4Lep/DoubleEG_Run2016D-HIPM_UL2016_MiniAODv2_NanoAODv9-v2_NANOAOD_3LepTau_4Lep/output_38.root",
-        "/store/user/kdownham/skimOutput/3LepTau_4Lep/DoubleEG_Run2016D-HIPM_UL2016_MiniAODv2_NanoAODv9-v2_NANOAOD_3LepTau_4Lep/output_15.root",


so this file is opening, accessing the tree but the tree is completely empty upon checking by hand?

preprocess should just remove files like that when skip_bad_files is turned on.

Checking by hand, it seems file has zero events. I have since found a few others that are empty as well and will probably remove those too (and I'll talk with the person who did the skimming to try to understand why their skimming code is producing some empty files).

The reason I noticed this now because these empty files seem to cause a crash in apply_to_fileset like this [1]. (I guess previously they'd just been getting essentially skipped, but not causing a crash.)

I would hesitate to use skip_bad_files as a solution, as generally I would want to process exactly the same static set of input samples every run. This is to avoid getting unpredictably different results in different runs (due to instances where e.g. some transient xrd error causes one file to not be opened properly).

[1]

Traceback (most recent call last): File "/home/k.mohrman/coffea_dir/migrate_to_coffea2023_repo/ewkcoffea/analysis/wwz/run_wwz4l.py", line 356, in <module> histos_to_compute, reports = apply_to_fileset( File "/home/k.mohrman/coffea_dir/migrate_to_coffea2023_repo/ewkcoffea/coffea_dir/coffea/src/coffea/dataset_tools/apply_processor.py", line 125, in apply_to_fileset dataset_out = apply_to_dataset( File "/home/k.mohrman/coffea_dir/migrate_to_coffea2023_repo/ewkcoffea/coffea_dir/coffea/src/coffea/dataset_tools/apply_processor.py", line 67, in apply_to_dataset events = NanoEventsFactory.from_root( File "/home/k.mohrman/coffea_dir/migrate_to_coffea2023_repo/ewkcoffea/coffea_dir/coffea/src/coffea/nanoevents/factory.py", line 674, in events events = self._mapping(form_mapping=self._schema) File "/blue/p.chang/k.mohrman/dir_for_miniconda/miniconda3/envs/coffea2023_env00/lib/python3.9/site-packages/uproot/_dask.py", line 167, in dask files = uproot._util.regularize_files(files, steps_allowed=True, **options) File "/blue/p.chang/k.mohrman/dir_for_miniconda/miniconda3/envs/coffea2023_env00/lib/python3.9/site-packages/uproot/_util.py", line 927, in regularize_files for file_path, object_path, maybe_steps in _regularize_files_inner( File "/blue/p.chang/k.mohrman/dir_for_miniconda/miniconda3/envs/coffea2023_env00/lib/python3.9/site-packages/uproot/_util.py", line 888, in _regularize_files_inner maybe_steps = regularize_steps(maybe_steps) File "/blue/p.chang/k.mohrman/dir_for_miniconda/miniconda3/envs/coffea2023_env00/lib/python3.9/site-packages/uproot/_util.py", line 786, in regularize_steps raise TypeError( TypeError: 'files' argument's steps must be an iterable of integer offsets or start-stop pairs.

OK - then a better solution is to return [[0,0]] for the steps in those cases and supply a dataset transformation to remove those files.

It would be fairly straightforward to implement filter_files that takes some function you define and just drops files when that function returns True.

Coffea2023 systs

…les for now

kmohrman added 22 commits December 19, 2023 14:48

First updates for coffea2023

af6cde6

More coffea 2023

37c42eb

Do not need columns thing here

ca7de05

Pass dict of samples to processor

747630d

Do not loop over datasets in processor

ce8d019

We pass the info for each json one at a time again

de47ff1

Update ele id xgb for coffea 2023

e20d047

Update mu xgb stuff too

a0e0937

Reorganize xgb to factor out common part

4b643d2

Uncomment lep ID and SFs

f1875f7

Make flake8 happy

40c5087

Need dask in conda env?

70d45f2

Dont need dak in processor

3b894de

Ahh dont pin coffea version

f1fedb8

For now need coffea2023 branch of topcoffea

2573205

Clean up

9a7f7d3

Enable sr bdt evaluation

2cfacfb

Dont need xgb import

6d28fd1

Dont need np here

5f6bdb2

clean up run script

f94e217

In progress updates for scaling up

e752129

Updates for distributed client

d056850

lgray reviewed Jan 5, 2024

View reviewed changes

analysis/wwz/run_wwz4l.py Outdated Show resolved Hide resolved

kmohrman added 7 commits January 5, 2024 13:25

Clean up run script a bit

0952ffc

Tmp add run script and json for reproducing error

d8765cd

Tmp make distributed client default in run script

192056a

Updates to run script

75ccf8b

Workaround from Lindsey to avoid overtouching

4b70f10

No longer need workaround for None arrays, see ak issue 2768

b5411a0

UAF 10 not working so switch to 8

362a472

lgray reviewed Jan 11, 2024

View reviewed changes

kmohrman added 4 commits January 11, 2024 12:10

Temp un-fix overtouching workaround

3be1cd9

Use events.nom for data for now, as weights object seems now to be No…

a4a1856

…ne for data

Remove this root file, it seems to be empty and causes error: files' …

1086d26

…argument's steps must be an iterable of integer offsets or start-stop pairs

Add fsspec-xrootd to env file

bdb9bd5

lgray reviewed Jan 12, 2024

View reviewed changes

kmohrman and others added 17 commits January 26, 2024 11:13

Do not need to skip empty file anymore

27da35d

Merge remote-tracking branch 'origin/main' into coffea2023_systs

f8b185f

Deepcopy causes crash, probably dont even need copy.copy here

b54550d

Merge pull request #18 from cmstas/coffea2023_systs

25db1e1

Coffea2023 systs

Fix weights for data

bcbfcdb

Try out parallelize_with_dask, temporary add exit after build task graph

26a0bb7

Clean up run script

acbfc57

Remove some old WQ stuff

fc03300

Using cache for masks

8920081

Remove unused import to make flake8 happy

43ae6b0

Adding jsons for hpg sample

6d541cc

Adding option to run with TaskVine

7f4cac4

Fix CI

0ed3830

Fix some missing and duplicates, and add cfgs for the local jsons

817f592

Update chunk size and just comment out the weird edge case problem fi…

d3853cb

…les for now

Update README.md

f91672e

Comment out samples for consistency, add run script

63abb06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Coffea2023 #14

[WIP] Coffea2023 #14

kmohrman commented Dec 20, 2023

lgray Jan 11, 2024

kmohrman Jan 11, 2024

lgray Jan 12, 2024

kmohrman Jan 12, 2024

lgray Jan 12, 2024 •

edited

Loading

[WIP] Coffea2023 #14

Are you sure you want to change the base?

[WIP] Coffea2023 #14

Conversation

kmohrman commented Dec 20, 2023

lgray Jan 11, 2024

Choose a reason for hiding this comment

kmohrman Jan 11, 2024

Choose a reason for hiding this comment

lgray Jan 12, 2024

Choose a reason for hiding this comment

kmohrman Jan 12, 2024

Choose a reason for hiding this comment

lgray Jan 12, 2024 • edited Loading

Choose a reason for hiding this comment

lgray Jan 12, 2024 •

edited

Loading