-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance impact of opening a new XRootD File
in every cat
operation (TBasket in Uproot)
#55
Comments
So, I'm going to try to write up my performance study in scikit-hep/uproot5#1157 to collect all in one spot, but yes fsspec has |
I spent a bit of time this morning reading the protocol doc https://xrootd.slac.stanford.edu/doc/dev56/XRdv520.pdf and I don't see any obvious way to interact with an xrootd server in a stateless way: one needs to acquire an opaque |
Does anyone actually write ROOT files directly over XRootD? I understand why |
It is now technically possible, but I don't think it's a good idea. Writing a ROOT file involves a lot of seeking back and forth (they can't be written directly from beginning to end, unless the sizes of everything that is to be written is known in advance), and that would mean a lot of interaction over the network. Since it was a requested feature, we can't break it, but we don't need to ensure that it is the optimal path. If reopening the file is necessary for writing but not for reading, that would be fine. |
Writing files in general over xrootd is a very desired feature. For example, I am writing several GB of parquet files to FNAL EOS storage in my skim example . It works quite well. I would hope we extend uproot writing to support fsspec sinks, using the simplecache local cache feature to only write (commit) the whole file at the end of the writing process. All that said, I am happy to re-start work on #54 |
That's just the thing: the Parquet format is defined in such a way that all metadata that needs to know the sizes of things gets written after (at larger seek values) than the data it represents. With causal knowledge of only the past, it can be written from the beginning to the end of the file, in order. That can't be done with the ROOT format, especially if the file is to be valid between writing individual objects and if sizes of everything that will be written isn't known in advance. |
Reported by @chrisburr in scikit-hep/uproot5#1157 (comment)_:
In my experience, these
File
objects are heavy; slow to open. fsspec'scat
interface is stateless, so it seems that you have to create a new one of these for every call, but that means every TBasket in Uproot.Is there an alternative that we can use, some
multi_cat
or a context that holds theFile
object so that we don't need so many? Is there a way to use XRootD in a lightweight, stateless way (like HTTP connections)?The text was updated successfully, but these errors were encountered: