Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: preprocess a dataset's base form and hint it to uproot #978

Merged
merged 8 commits into from Jan 10, 2024

Conversation

lgray
Copy link
Collaborator

@lgray lgray commented Jan 4, 2024

This results in significant speedups when processing large numbers of datasets on files hosted in xrootd.

@lgray
Copy link
Collaborator Author

lgray commented Jan 4, 2024

@nsmith- would appreciate your thoughts here!

@lgray lgray requested a review from nsmith- January 4, 2024 21:16
@lgray lgray force-pushed the preprocess_form branch 2 times, most recently from 6421121 to b7fbd93 Compare January 8, 2024 21:53
src/coffea/dataset_tools/preprocess.py Outdated Show resolved Hide resolved
src/coffea/dataset_tools/preprocess.py Outdated Show resolved Hide resolved
src/coffea/dataset_tools/preprocess.py Outdated Show resolved Hide resolved
src/coffea/dataset_tools/preprocess.py Outdated Show resolved Hide resolved
src/coffea/dataset_tools/preprocess.py Show resolved Hide resolved
src/coffea/dataset_tools/preprocess.py Outdated Show resolved Hide resolved
src/coffea/util.py Outdated Show resolved Hide resolved
@nsmith-
Copy link
Member

nsmith- commented Jan 9, 2024

I feel like the DatasetSpec could it self have methods like .normalize() or something, so you could construct it from some variety of ways and then choose what to cache.
Then maybe it is more straightforward to invalidate parts by simply setting them dataset.cached_form = None or del dataset.steps etc.

@lgray
Copy link
Collaborator Author

lgray commented Jan 9, 2024

For your higher level comment I'll address it in another PR. I think it's a good idea but requires changes outside the functionality being introduced here.

@lgray lgray merged commit c3bf334 into master Jan 10, 2024
14 checks passed
@lgray lgray deleted the preprocess_form branch January 10, 2024 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants