Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Coffea2023 #14

Open
wants to merge 50 commits into
base: main
Choose a base branch
from
Open

[WIP] Coffea2023 #14

wants to merge 50 commits into from

Conversation

kmohrman
Copy link
Collaborator

No description provided.

analysis/wwz/run_wwz4l.py Outdated Show resolved Hide resolved
mask = (lep_collection.idx == ak.flatten(ll_pairs_idx.l0[sfos_pair_closest_to_z_idx]))
mask = (mask | (lep_collection.idx == ak.flatten(ll_pairs_idx.l1[sfos_pair_closest_to_z_idx])))
flat_pair_idxs = ak.flatten(ll_pairs_idx[sfos_pair_closest_to_z_idx])
mask = ((lep_local_idx == flat_pair_idxs.l0) | (lep_local_idx == flat_pair_idxs.l1))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you ... unfix... this for now - some other people are looking that the over touching issue in your code in-situ :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfixed : )

@@ -43,7 +43,6 @@
"/store/user/kdownham/skimOutput/3LepTau_4Lep/DoubleEG_Run2016D-HIPM_UL2016_MiniAODv2_NanoAODv9-v2_NANOAOD_3LepTau_4Lep/output_1.root",
"/store/user/kdownham/skimOutput/3LepTau_4Lep/DoubleEG_Run2016D-HIPM_UL2016_MiniAODv2_NanoAODv9-v2_NANOAOD_3LepTau_4Lep/output_45.root",
"/store/user/kdownham/skimOutput/3LepTau_4Lep/DoubleEG_Run2016D-HIPM_UL2016_MiniAODv2_NanoAODv9-v2_NANOAOD_3LepTau_4Lep/output_38.root",
"/store/user/kdownham/skimOutput/3LepTau_4Lep/DoubleEG_Run2016D-HIPM_UL2016_MiniAODv2_NanoAODv9-v2_NANOAOD_3LepTau_4Lep/output_15.root",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this file is opening, accessing the tree but the tree is completely empty upon checking by hand?

preprocess should just remove files like that when skip_bad_files is turned on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking by hand, it seems file has zero events. I have since found a few others that are empty as well and will probably remove those too (and I'll talk with the person who did the skimming to try to understand why their skimming code is producing some empty files).

The reason I noticed this now because these empty files seem to cause a crash in apply_to_fileset like this [1]. (I guess previously they'd just been getting essentially skipped, but not causing a crash.)

I would hesitate to use skip_bad_files as a solution, as generally I would want to process exactly the same static set of input samples every run. This is to avoid getting unpredictably different results in different runs (due to instances where e.g. some transient xrd error causes one file to not be opened properly).

[1]

Traceback (most recent call last):
  File "/home/k.mohrman/coffea_dir/migrate_to_coffea2023_repo/ewkcoffea/analysis/wwz/run_wwz4l.py", line 356, in <module>
    histos_to_compute, reports = apply_to_fileset(
  File "/home/k.mohrman/coffea_dir/migrate_to_coffea2023_repo/ewkcoffea/coffea_dir/coffea/src/coffea/dataset_tools/apply_processor.py", line 125, in apply_to_fileset
    dataset_out = apply_to_dataset(
  File "/home/k.mohrman/coffea_dir/migrate_to_coffea2023_repo/ewkcoffea/coffea_dir/coffea/src/coffea/dataset_tools/apply_processor.py", line 67, in apply_to_dataset
    events = NanoEventsFactory.from_root(
  File "/home/k.mohrman/coffea_dir/migrate_to_coffea2023_repo/ewkcoffea/coffea_dir/coffea/src/coffea/nanoevents/factory.py", line 674, in events
    events = self._mapping(form_mapping=self._schema)
  File "/blue/p.chang/k.mohrman/dir_for_miniconda/miniconda3/envs/coffea2023_env00/lib/python3.9/site-packages/uproot/_dask.py", line 167, in dask
    files = uproot._util.regularize_files(files, steps_allowed=True, **options)
  File "/blue/p.chang/k.mohrman/dir_for_miniconda/miniconda3/envs/coffea2023_env00/lib/python3.9/site-packages/uproot/_util.py", line 927, in regularize_files
    for file_path, object_path, maybe_steps in _regularize_files_inner(
  File "/blue/p.chang/k.mohrman/dir_for_miniconda/miniconda3/envs/coffea2023_env00/lib/python3.9/site-packages/uproot/_util.py", line 888, in _regularize_files_inner
    maybe_steps = regularize_steps(maybe_steps)
  File "/blue/p.chang/k.mohrman/dir_for_miniconda/miniconda3/envs/coffea2023_env00/lib/python3.9/site-packages/uproot/_util.py", line 786, in regularize_steps
    raise TypeError(
TypeError: 'files' argument's steps must be an iterable of integer offsets or start-stop pairs.

Copy link

@lgray lgray Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - then a better solution is to return [[0,0]] for the steps in those cases and supply a dataset transformation to remove those files.

It would be fairly straightforward to implement filter_files that takes some function you define and just drops files when that function returns True.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants