Run client against catalog in-process #473

danielballan · 2023-06-24T00:41:11Z

Consider the following use case.

User connects to a remote Tiled server.

from tiled.client import from_uri

client = from_uri("https://tiled.nsls2.bnl.gov")
...

Perhaps a client-side cache is engaged---maybe by default---and automatically stashes locally any data or metadata they browse, up to a certain size, just as a web browser cache does.

Then, they decide they want to download a swath of the data for local, offline use. Something like:

# This does not exist---just a proposal.
from tiled.client import download

download(client.search(...), "stuff/")

That could also be available as a CLI (tiled download ...) or a button in a web app. However it happens, suppose that this creates a zip archive or directory with contents like:

stuff/
    catalog.db
    data/
        ...

where data/ contains files. These would be the same files backing Tiled on the server side, perhaps exactly the files the detector wrote. (Notice that if there were a client-side cache engaged, download(...) would naturally use it, so the user would not be downloading anything twice.)

The user can then use this local archive in three different ways.

Option 1: Just the files, please

Open the files in data/, and just ignore catalog.db and Tiled.

Option 2: Local Tiled Server

Run a local tiled server against this data

tiled serve catalog --public stuff/

Then navigate to http://localhost:8000 in a web browser, or connect to it from any other program, or use the Tiled Python client:

from tiled.client import from_uri

client = from_uri("http://localhost:8000")

Option 3: In-process access

But if we want to access the local data from Python specifically it's not even necessary to start a server in a separate process with tiled serve catalog .... We can skip that step and do everything from one Python process.

# This does not exist---just a proposal.
from tiled.client import from_catalog

client = from_catalog("stuff/catalog.db", readable_storage=["stuff/data"])

The above uses ASGI to run a "server" and the client in the user's Python process, passing HTTP messages via Python function calls within one process instead of TCP packets between separate server and client processes.

Because this runs in a single process, it can easily be wrapped up in third-party convenience libraries which can be as "magical" or explicit as one wants

from nsls2_data_thingie import remote_access, download, local_access

client = remote_access()

download()  # perhaps downloads to some default location, like ~/.cache/nsls2_data_thingie
client = local_access()

The text was updated successfully, but these errors were encountered:

danielballan · 2023-06-24T00:48:42Z

P.S. The above imagines a SQLite database catalog.db and that is probably best for the vast majority of users. But we could easily support a PostgreSQL target for download and from_catalog. I am not sure how necessary it is, but it is easy to support.

danielballan · 2023-06-27T20:23:51Z

This will require adding an endpoint like /assets/{id}, which is something we wanted anyway.

padraic-shafer · 2023-08-30T12:35:00Z

This will require adding an endpoint like /assets/{id}, which is something we wanted anyway.

What should this endpoint return when the asset is a directory (or directory-like, such as an HDF5 virtual dataset)?

An archived collection — .zip, .tar
A list of the underlying asset endpoint URLs — probably just the next level down
Other

danielballan · 2023-08-30T13:53:45Z

Good question, this wrinkle had not occurred to me yet.

In #450, we use ZIP to bundle multiple buffers (numpy arrays) into one response. We considered TAR, but ZIP has two important points in its favor. ZIP supports random access---unlike TAR, it has an index---and ZIP is also understood by web browsers, which could be useful in the context of web apps.

I wonder if we'll decide to support both a ZIP bundle and individual asset URLs ("the next level down").

padraic-shafer · 2023-08-30T14:35:37Z

ZIP seems like a good starting point to move forward with. We could wait to add the option for asset URLs at a later time when/if they become necessary.

danielballan added this to the v0.1.0 release milestone Jun 24, 2023

This was referenced Jul 4, 2023

Refactor client cache to use httpx transport #281

Closed

Rewrite client-side HTTP response cache #497

Merged

danielballan mentioned this issue Aug 21, 2023

Roadmap for v0.1.0 #552

Open

10 tasks

danielballan mentioned this issue Aug 30, 2023

Support a fallback structure family for "opaque bytes" #434

Open

padraic-shafer mentioned this issue Sep 5, 2023

Support for reading and writing data as simple "bytes" #570

Draft

danielballan mentioned this issue Oct 12, 2023

Add POST option for long GET URL queries #579

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run client against catalog in-process #473

Run client against catalog in-process #473

danielballan commented Jun 24, 2023 •

edited

Loading

danielballan commented Jun 24, 2023

danielballan commented Jun 27, 2023

padraic-shafer commented Aug 30, 2023

danielballan commented Aug 30, 2023

padraic-shafer commented Aug 30, 2023

Run client against catalog in-process #473

Run client against catalog in-process #473

Comments

danielballan commented Jun 24, 2023 • edited Loading

Option 1: Just the files, please

Option 2: Local Tiled Server

Option 3: In-process access

danielballan commented Jun 24, 2023

danielballan commented Jun 27, 2023

padraic-shafer commented Aug 30, 2023

danielballan commented Aug 30, 2023

padraic-shafer commented Aug 30, 2023

danielballan commented Jun 24, 2023 •

edited

Loading