Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a backend for rucio #300

Merged
merged 16 commits into from Aug 14, 2020
Merged

Conversation

SidAhmedMa
Copy link
Contributor

Hello everyone, given the particular structure of rucio, which for each file matches a sub-sub-folder, it was mandatory to create a new backend to manage this new structure,

In the new structure, we use strax.rucio as the backend and the database frontend, you will find a notebook below which shows the tests I performed.

Notebook: /dali/lgrandi/sahmedma/rucio-backend-test.ipynb

rucio.py Outdated
@@ -0,0 +1,41 @@
import json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you accidentally uploaded this file twice. Can you remove this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the remark

dirname = str(dirname)
prefix = dirname_to_prefix(dirname)
metadata_json = f'{prefix}-metadata.json'
fn = rucio_path(metadata_json, dirname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you raise the same error as here if the md is not available. Strax may rely on this later:
https://github.com/AxFoundation/strax/blob/master/strax/storage/files.py#L215

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, furthermore, I had much more outputs with this error than without 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright thanks for adding it

return json.loads(f.read())

def _read_chunk(self, dirname, chunk_info, dtype, compressor):
#print('yes')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to remove this line ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry 😅

"Cannot save directly into rucio, upload with admix instead")


def rucio_path(filename, dirname):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is a bit hard to follow without comments (sorry I'm not an expert in the rucio naming convention). What do the paths look like and are we sure its always these hard-coded conversions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Te path look like ('/dali/lgrandi/rucio/xnt_008710/1f/fb/peaklets-b7dgmtzaef-000047'), It is a hard-coded convention on the Rucio code, you can see it here https://github.com/rucio/rucio/blob/671fe6253981eb632aae3c9ddfe54eb83e63fd1a/lib/rucio/rse/protocols/protocol.py#L114



def rucio_path(filename, dirname):
root_path ='/dali/lgrandi/rucio'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't hard-code it like this as this might change to e.g. '/dali/lgrandi/xenonnt/rucio' I guess you want to get the info from the dirname.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, dirname does not include the root path 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm okay. So how about this:

You pass it on at initialization:
https://github.com/XENONnT/straxen/blob/5b762f32eec7d3fb8a1a96e31ae504fe8c8cf3cd/straxen/rundb.py#L110
and set the default to /dali/lgrandi/rucio at the init of the rundb here:
https://github.com/XENONnT/straxen/blob/5b762f32eec7d3fb8a1a96e31ae504fe8c8cf3cd/straxen/rundb.py#L42

You would also have to change these lines:

(with adding a def init_ (..) ) and
fn = rucio_path(chunk_info['filename'], dirname)

If you want some help I can also commit this idea to prevent hardcodes?

@JoranAngevaare
Copy link
Member

JoranAngevaare commented Aug 11, 2020

Thanks Sid, this would be an absolute wonderful feature to add! Can you address the comments and/or provide some more info to this PR (e.g. the rucio naming convention would be useful)

File uploaded twice
Remonving a commented test
@SidAhmedMa
Copy link
Contributor Author

Thank you very much @jorana for reviewing my pr, I hope that I answered to all your remarks 🙂

@JoranAngevaare
Copy link
Member

If you'd use SidAhmedMa#1 you can prevent specifying the rootdir

raise data not available if folder does not exist
fix codefactor
@JoranAngevaare
Copy link
Member

Hi Sid, I tested your implementation on dali and found some things about data not being available that are fixed by 52d6a99.

Since this is working and a really nice addition I'm going to merge this PR.

@SidAhmedMa
Copy link
Contributor Author

Hi Sid, I tested your implementation on dali and found some things about data not being available that are fixed by 52d6a99.

Since this is working and a really nice addition I'm going to merge this PR.

I see I thought that the test to see if the directory exists or not was done upstream before going to the backend, but it is probably also useful to do a new test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants