Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Pass list of files to NanoEventsFactory #837

Merged
merged 18 commits into from Jun 14, 2023

Conversation

chrispap95
Copy link
Contributor

The only way to pass multiple files to NanoEventsFactory as of now is by using a string with wildcard characters, e.g., path/*.root. This won't work for xrootd access because fsspec support is still missing. I added support for passing a list of filenames. This should resolve this problem temporarily.

I tested this locally using 1) a str, 2) a list with just one str, and 3) a list with two str. Then I dumped the output with dak.to_parquet(). It behaves as expected. Let me know if I need to add unit tests as well.

@lgray
Copy link
Collaborator

lgray commented Jun 12, 2023

I think, instead we should change this interface to allow the standard entry points of uproot.dask / uproot.open and just forward the arguments here to there.

Doing it piecewise, when it's just converging in that direction anyway seems kind of silly.

What do you think?

@chrispap95
Copy link
Contributor Author

Do you mean skipping all the checks of the type and just pass the input to uproot.open/dask? I think this is okay. Since the only way to open files is through uproot we can let it handle the type. We are not adding any extra robustness, that's true.

@lgray
Copy link
Collaborator

lgray commented Jun 12, 2023

Yes, precisely!

Feel free to try your hand at it.
It makes a lot of sense especially for uproot.dask.

@chrispap95
Copy link
Contributor Author

Okay, I am giving it a try now. There is some awkwardness regarding the use of the "treepath" argument. I am converging towards making it optional as it is useful only when the user is passing an uproot.reading.ReadOnlyDirectory. In all other cases, the user must embed the treepath within the file argument as they would do when interfacing directly with uproot. What do you think?

@lgray
Copy link
Collaborator

lgray commented Jun 12, 2023

That may break most of the runner interface but it's fairly broken in coffea 2023 anyway (I'm starting to work on that this week).

@chrispap95
Copy link
Contributor Author

Yeah, I am not sure. Your input is very welcome if you can of a better way. But I can't think of a better way if we expose the uproot interface directly.

@lgray
Copy link
Collaborator

lgray commented Jun 12, 2023

Actually I think you're safe here.
The only place where you have to specify the tree path is when the root file is already open and that's the only entrypoint into the file you have. Then you must additionally specify the treepath to get.

In all other cases uproot will know how to get the tree.

@chrispap95
Copy link
Contributor Author

@lgray I basically got rid of the type-checking. Now it works as expected when passing a dict like:

{
    "file_1.root": "Events", 
    "file_2.root"`: "Events"
}

or a list of dicts:

[ {"file_1.root": "Events"}, {"file_2.root": "Events"} ]

Let me know if you think this should be the expected behavior.

@lgray
Copy link
Collaborator

lgray commented Jun 14, 2023

Cool - this is good. I'm working on what will use this in the daskified runner interface. If tests pass (on macos) I'll merge. Hopefully this + revamped runner will get tests running again.

@lgray
Copy link
Collaborator

lgray commented Jun 14, 2023

OK this seems to have broken the case of passing a single file or str + tree path, or is the point to remove those as entrypoints?

I'm actually more interested in the latter implementation since it makes more sense. since between str/list/dict/set you expect different bits of specification.

@chrispap95
Copy link
Contributor Author

If a single file that contains multiple trees is passed to uproot.dask() without specifying the tree by either using ":treename" or the dictionary then uproot will spit an error out so I guess that's the expected behavior. Unless you are referring to something different.

@lgray
Copy link
Collaborator

lgray commented Jun 14, 2023

Yeah - exactly. I've updated the tests to not use the older way.

I think just promoting the uproot interface to NanoEventsFactory makes a ton of sense.

@lgray lgray merged commit 7aa88ad into CoffeaTeam:master Jun 14, 2023
8 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants