New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Pass list of files to NanoEventsFactory #837
Conversation
I think, instead we should change this interface to allow the standard entry points of uproot.dask / uproot.open and just forward the arguments here to there. Doing it piecewise, when it's just converging in that direction anyway seems kind of silly. What do you think? |
Do you mean skipping all the checks of the type and just pass the input to |
Yes, precisely! Feel free to try your hand at it. |
Okay, I am giving it a try now. There is some awkwardness regarding the use of the "treepath" argument. I am converging towards making it optional as it is useful only when the user is passing an |
That may break most of the runner interface but it's fairly broken in coffea 2023 anyway (I'm starting to work on that this week). |
Yeah, I am not sure. Your input is very welcome if you can of a better way. But I can't think of a better way if we expose the uproot interface directly. |
Actually I think you're safe here. In all other cases uproot will know how to get the tree. |
@lgray I basically got rid of the type-checking. Now it works as expected when passing a dict like: {
"file_1.root": "Events",
"file_2.root"`: "Events"
} or a list of dicts: [ {"file_1.root": "Events"}, {"file_2.root": "Events"} ] Let me know if you think this should be the expected behavior. |
Cool - this is good. I'm working on what will use this in the daskified runner interface. If tests pass (on macos) I'll merge. Hopefully this + revamped runner will get tests running again. |
OK this seems to have broken the case of passing a single file or str + tree path, or is the point to remove those as entrypoints? I'm actually more interested in the latter implementation since it makes more sense. since between str/list/dict/set you expect different bits of specification. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
If a single file that contains multiple trees is passed to |
Yeah - exactly. I've updated the tests to not use the older way. I think just promoting the uproot interface to NanoEventsFactory makes a ton of sense. |
The only way to pass multiple files to NanoEventsFactory as of now is by using a string with wildcard characters, e.g.,
path/*.root
. This won't work forxrootd
access becausefsspec
support is still missing. I added support for passing a list of filenames. This should resolve this problem temporarily.I tested this locally using 1) a
str
, 2) a list with just onestr
, and 3) a list with twostr
. Then I dumped the output withdak.to_parquet()
. It behaves as expected. Let me know if I need to add unit tests as well.