Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data-proxy.ebrain.eu hosting unsupported #47

Closed
mih opened this issue Jan 30, 2023 · 4 comments
Closed

data-proxy.ebrain.eu hosting unsupported #47

mih opened this issue Jan 30, 2023 · 4 comments

Comments

@mih
Copy link
Member

mih commented Jan 30, 2023

Example https://ci.appveyor.com/project/mih/datalad-ebrains/builds/46083682/job/i2viw0n9me58r2l0

@mih
Copy link
Member Author

mih commented Feb 3, 2023

This is the cause for the breakage

-> assert dvr_prefix.startswith('prefix=')
(Pdb) l
160             # output is: https://example.com/<basepath>
161             # the prefix is part of the file IRIs again
162             dvr_url_p = urlparse(dvr.iri.value)
163             dvr_prefix = dvr_url_p.query
164             # this is a prefix and there are no other variables
165  ->         assert dvr_prefix.startswith('prefix=')
166             assert dvr_prefix.count('=') == 1
167             dvr_prefix = dvr_prefix[len('prefix='):]
168             dvr_baseurl = dvr_url_p._replace(query='').geturl()
169             for f in self.iter_files(dvr):
170                 f_url = f.iri.value
(Pdb) dvr_url_p
ParseResult(scheme='https', netloc='data-proxy.ebrains.eu', path='/api/v1/public/buckets/d-900a1c2d-4914-42d5-a316-5472afca0d90', params='', query='', fragment='')
(Pdb) p kg_dsver.version_identifier
'v3.0'
(Pdb) p kg_dsver.release_date
None

So the cause is the v3.0 release of the dataset used for testing in the CI (no release date is available in the metadata). The version_innovation does not look much different from previous releases, but in fact a lot has changed about the dataset:

The internal layout is quite different. Here is v2.9

image

and here is the latest version v3.0:

image

Except for the license file, everything is different now. This in itself would not be a problem.

However, also the data hosting has changed. Presently, only the CSCS file repositories are supported. But v3.0 is pointing to data-proxy.ebrains.eu.

It needs an investigation what supporting this type of data hosting would require.

Until that happens, it would seem sensible to switch to a different dataset for testing.

@mih mih changed the title Reporting of file repository content broke cloning data-proxy.ebrain.eu hosting unsupported Feb 3, 2023
@mih
Copy link
Member Author

mih commented Feb 3, 2023

With #48 resolved, this is no longer causing an assertion error, but a regular error message pointing to the presently unsupported data-proxy access.

@mih
Copy link
Member Author

mih commented Feb 4, 2023

https://data-proxy.ebrains.eu/api/docs has the API docs for the data proxy. Looks doable.

@mih
Copy link
Member Author

mih commented Feb 6, 2023

It seems that very little needs to be done in order to start supporting free-access data via the data-proxy. Fairgraph readily supports reporting on data-proxy bucket content. We pretty much need to RF the code a little to make filename/path generation better fit individual reporters.

@mih mih closed this as completed in 736f542 Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant