Skip to content

Commit

Permalink
Minimal implementation of a ora2 as an uncurl variant
Browse files Browse the repository at this point in the history
The basic idea here is:

- Once we can describe the structure of a RIA store in an `uncurl` URL
  template, generic implementations can handle actual operations
- We need a dataset ID for this, and take it either from the traditional
  `archive-id` remote parameter, or the DataLad dataset ID in the repo
- With this information and the base URL from the `url` parameter, we
  build the `uncurl` URL template, and write it into the active config
  (as an override). From there `uncurl` picks it up, and we ensure this
  by running its `prepare()` in the `ora2` `prepare()` method last.

An included test confirm the basic principle by copying an annex key
to a local (`file://` RIA store) via a standard `git annex copy`
operation.

This setup is particularly attractive, because now specialized URL
handlers can also be selected via standard `uncurl` means for handler
selection (based on URL matching, which includes protocol switching).
If this is working as expected, we can have things like a
persistent-SSH-shell handler to only be active for particular hosts
(and even only temporarily).

This changeset includes several TODOs that outline further steps that
need to be approached in the future.
  • Loading branch information
mih committed Sep 28, 2023
1 parent 0195327 commit 18e5e3a
Show file tree
Hide file tree
Showing 2 changed files with 71 additions and 0 deletions.
54 changes: 54 additions & 0 deletions datalad_ria/ora_remote.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import uuid

from datalad_next.annexremotes import (
RemoteError,
super_main,
Expand Down Expand Up @@ -43,6 +45,9 @@ def initremote(self):
# Adopt git-annex's style of messaging
raise RemoteError('ria+<scheme>://... URL expected for url=')

# TODO run _get_ria_dsid() to confirm the validity of the ID
# setup

# here we could do all kinds of sanity and version checks.
# however, in some sense `initremote` is not very special in
# this regard. Most (or all) of these checks would also run
Expand All @@ -52,6 +57,55 @@ def initremote(self):
# not seeing a relevant remote-specific git-config). Therefore
# we are not doing any checks here (for now).

def prepare(self):
# UUID to use for this dataset in the store
dsid = self._get_ria_dsid()

# check for a remote-specific uncurl config
# self.get_remote_gitcfg() would also consider a remote-type
# general default, which is undesirable here
tmpl_var = f'remote.{self.remotename}.uncurl-url'
url_tmpl = self.repo.config.get(tmpl_var, None)
if url_tmpl is None:
# pull the recorded ria URL from git-annex
ria_url = self.annex.getconfig('url')
assert ria_url.startswith('ria+')
# TODO check the layout settings of the actual store
# to match this template
base_url = ria_url[4:]
url_tmpl = (
# we fill in base url and dsid directly here (not via
# uncurl templating), because it is simpler
f'{base_url}/{dsid[:3]}/{dsid[3:]}/annex/objects/'
# RIA v? uses the "mixed" dirhash
'{annex_dirhash}{annex_key}/{annex_key}'
)
# we set the URL template in the config for the base class
# routines to find
self.repo.config.set(tmpl_var, url_tmpl, scope='override')
# the rest is UNCURL "business as usual"
super().prepare()

#
# helpers
#
def _get_ria_dsid(self):
# check if the remote has a particular dataset ID configured
# via git-annex
dsid = self.annex.getconfig('archive-id')
# if not, fall back on the datalad dataset ID
if not dsid:
dsid = self.repo.config.get('datalad.dataset.id')
# under all circumstances this must be a valid UUID
try:
uuid.UUID(dsid)
except ValueError as e:
raise RemoteError(
'No valid dataset UUID identifier found,'
'specify via archive-id='
) from e
return dsid


def main():
"""CLI entry point installed as ``git-annex-remote-ora2``"""
Expand Down
17 changes: 17 additions & 0 deletions datalad_ria/tests/test_ora.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,20 @@ def test_ora_localops(ria_store_localaccess, populated_dataset):

# smoke test that it can run
repo.call_annex(ir_cmd + [f'url=ria+{store_path.as_uri()}'])

cp_cmd = [
'copy',
'-t', f'test-{ora_external_type}',
]

repo.call_annex(cp_cmd + ['one.txt'])
# the annex key properties (and dirhash) are determined by the
# file content and the MD5E backend default.
# If neither of those changes, they must not change
key_fpath = store_path / \
ds.id[:3] / ds.id[3:] / 'annex' / 'objects' / \
'X9' / '6J' / \
'MD5E-s8--7e55db001d319a94b0b713529a756623.txt' / \
'MD5E-s8--7e55db001d319a94b0b713529a756623.txt'
assert key_fpath.exists()
assert key_fpath.read_text() == 'content1'

0 comments on commit 18e5e3a

Please sign in to comment.