Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file locations: distinguish between EOSPUBLIC and OPENDATA URIs #115

Open
tiborsimko opened this issue Jan 4, 2021 · 0 comments
Open

file locations: distinguish between EOSPUBLIC and OPENDATA URIs #115

tiborsimko opened this issue Jan 4, 2021 · 0 comments

Comments

@tiborsimko
Copy link
Member

Current behaviour

The client currently exposes EOSPUBLIC locations of files, for example:

$ cernopendata-client get-file-locations --recid 5000          
http://opendata.cern.ch/eos/opendata/cms/software/2011-doubleelectron-doublemu-mueg-ttbar/2011-doubleelectron-doublemu-mueg-ttbar-1.0.0.tar.gz

This file also exist attached to the record as /record/NNN/files/FILE.EXTENSION, which would give:

http://opendata.cern.ch/record/5000/files/2011-doubleelectron-doublemu-mueg-ttbar-1.0.0.tar.gz

What is the difference? In the first case, the file is served from OPENDATA via reverse HTTP proxy to EOSPUBLIC (and is not cached). In the second case, the file is served from OPENDATA via XRootD proxy to EOSPUBLIC (and is cached if it is sufficiently small).

Due to several issues with EOSPUBLIC reverse proxy, in PR #113 we have introduced file index lookups from the latter URIs, while still exposing the former URIs.

Expected behaviour

It would be good to consistently expose both kind of URIs and allow user to specify a command-line switch to use one or the other.

Example: we can introduce a new command-line option --uri-style having two values, "eos" and "record":

$ cernopendata-client get-file-locations --recid 5000 --uri-style=eos
http://opendata.cern.ch/eos/opendata/cms/software/2011-doubleelectron-doublemu-mueg-ttbar/2011-doubleelectron-doublemu-mueg-ttbar-1.0.0.tar.gz
$ cernopendata-client get-file-locations --recid 5000 --uri-style record
http://opendata.cern.ch/record/5000/files/2011-doubleelectron-doublemu-mueg-ttbar-1.0.0.tar.gz

The default value could be "eos" to keep the old behaviour, but we could switch to "record" if this one is more stable.

Things to beware about:

  • The new option --uri-style would be used for everything, i.e. for exposing URI locations, for downloading files, etc.
  • All files attached to records are usually accessible under "record" URI style, with the exception of files behind file indexes (see next point).
  • The files behind file indexes, such as for record ID 1, are a special case. The file index files themselves (example: http://opendata.cern.ch/eos/opendata/cms/Run2010B/BTau/AOD/Apr21ReReco-v1/file-indexes/CMS_Run2010B_BTau_AOD_Apr21ReReco-v1_0000_file_index.json) are well acessible also under "record" URI style, but the data files (example: http://opendata.cern.ch/eos/opendata/cms/Run2010B/BTau/AOD/Apr21ReReco-v1/0005/FE3F8388-E471-E011-9377-00E08179189B.root) are only accessible under "eos" URI style. Hence a special care will have to be made regarding the difference between cernopendata-client get-file-locations --recid 1 --no-expand and cernopendata-client get-file-locations --recid 1. Namely, in the "expand" use case, the "eos" URI style is forced; while in the "no-expand" use case, people could use both "eos" style and "record" style.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant