Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a backend for rucio #300

Merged
merged 16 commits into from
Aug 14, 2020
Merged
41 changes: 41 additions & 0 deletions rucio.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you accidentally uploaded this file twice. Can you remove this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the remark

import hashlib
import os.path as osp

import strax
from strax.storage.files import dirname_to_prefix

export, __all__ = strax.exporter()


@export
class rucio(strax.StorageBackend):
"""Get data from a rucio directory
"""

def get_metadata(self, dirname, **kwargs):
dirname = str(dirname)
prefix = dirname_to_prefix(dirname)
metadata_json = f'{prefix}-metadata.json'
fn = rucio_path(metadata_json, dirname)
with open(fn, mode='r') as f:
return json.loads(f.read())

def _read_chunk(self, dirname, chunk_info, dtype, compressor):
#print('yes')
fn = rucio_path(chunk_info['filename'], dirname)
return strax.load_file(fn, dtype=dtype, compressor=compressor)

def _saver(self, dirname, metadata):
raise NotImplementedError(
"Cannot save directly into rucio, upload with admix instead")


def rucio_path(filename, dirname):
root_path ='/dali/lgrandi/rucio'
scope = "xnt_"+dirname.split('-')[0]
rucio_did = "{0}:{1}".format(scope,filename)
rucio_md5 = hashlib.md5(rucio_did.encode('utf-8')).hexdigest()
t1 = rucio_md5[0:2]
t2 = rucio_md5[2:4]
return osp.join(root_path,scope,t1,t2,filename)
1 change: 1 addition & 0 deletions strax/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

from .storage.common import *
from .storage.files import *
from .storage.rucio import *
from .storage.mongo import *
from .storage.s3 import *
from .storage.zipfiles import *
Expand Down
41 changes: 41 additions & 0 deletions strax/storage/rucio.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import json
import hashlib
import os.path as osp

import strax
from strax.storage.files import dirname_to_prefix

export, __all__ = strax.exporter()


@export
class rucio(strax.StorageBackend):
"""Get data from a rucio directory
"""

def get_metadata(self, dirname, **kwargs):
dirname = str(dirname)
prefix = dirname_to_prefix(dirname)
metadata_json = f'{prefix}-metadata.json'
fn = rucio_path(metadata_json, dirname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you raise the same error as here if the md is not available. Strax may rely on this later:
https://github.com/AxFoundation/strax/blob/master/strax/storage/files.py#L215

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, furthermore, I had much more outputs with this error than without 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright thanks for adding it

with open(fn, mode='r') as f:
return json.loads(f.read())

def _read_chunk(self, dirname, chunk_info, dtype, compressor):
#print('yes')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to remove this line ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry 😅

fn = rucio_path(chunk_info['filename'], dirname)
return strax.load_file(fn, dtype=dtype, compressor=compressor)

def _saver(self, dirname, metadata):
raise NotImplementedError(
"Cannot save directly into rucio, upload with admix instead")


def rucio_path(filename, dirname):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bit is a bit hard to follow without comments (sorry I'm not an expert in the rucio naming convention). What do the paths look like and are we sure its always these hard-coded conversions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Te path look like ('/dali/lgrandi/rucio/xnt_008710/1f/fb/peaklets-b7dgmtzaef-000047'), It is a hard-coded convention on the Rucio code, you can see it here https://github.com/rucio/rucio/blob/671fe6253981eb632aae3c9ddfe54eb83e63fd1a/lib/rucio/rse/protocols/protocol.py#L114

root_path ='/dali/lgrandi/rucio'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't hard-code it like this as this might change to e.g. '/dali/lgrandi/xenonnt/rucio' I guess you want to get the info from the dirname.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, dirname does not include the root path 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm okay. So how about this:

You pass it on at initialization:
https://github.com/XENONnT/straxen/blob/5b762f32eec7d3fb8a1a96e31ae504fe8c8cf3cd/straxen/rundb.py#L110
and set the default to /dali/lgrandi/rucio at the init of the rundb here:
https://github.com/XENONnT/straxen/blob/5b762f32eec7d3fb8a1a96e31ae504fe8c8cf3cd/straxen/rundb.py#L42

You would also have to change these lines:

(with adding a def init_ (..) ) and
fn = rucio_path(chunk_info['filename'], dirname)

If you want some help I can also commit this idea to prevent hardcodes?

scope = "xnt_"+dirname.split('-')[0]
rucio_did = "{0}:{1}".format(scope,filename)
rucio_md5 = hashlib.md5(rucio_did.encode('utf-8')).hexdigest()
t1 = rucio_md5[0:2]
t2 = rucio_md5[2:4]
return osp.join(root_path,scope,t1,t2,filename)