Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run client against catalog in-process #473

Open
Tracked by #552
danielballan opened this issue Jun 24, 2023 · 5 comments
Open
Tracked by #552

Run client against catalog in-process #473

danielballan opened this issue Jun 24, 2023 · 5 comments

Comments

@danielballan
Copy link
Member

danielballan commented Jun 24, 2023

Consider the following use case.

User connects to a remote Tiled server.

from tiled.client import from_uri

client = from_uri("https://tiled.nsls2.bnl.gov")
...

Perhaps a client-side cache is engaged---maybe by default---and automatically stashes locally any data or metadata they browse, up to a certain size, just as a web browser cache does.

Then, they decide they want to download a swath of the data for local, offline use. Something like:

# This does not exist---just a proposal.
from tiled.client import download

download(client.search(...), "stuff/")

That could also be available as a CLI (tiled download ...) or a button in a web app. However it happens, suppose that this creates a zip archive or directory with contents like:

stuff/
    catalog.db
    data/
        ...

where data/ contains files. These would be the same files backing Tiled on the server side, perhaps exactly the files the detector wrote. (Notice that if there were a client-side cache engaged, download(...) would naturally use it, so the user would not be downloading anything twice.)

The user can then use this local archive in three different ways.

Option 1: Just the files, please

Open the files in data/, and just ignore catalog.db and Tiled.

Option 2: Local Tiled Server

Run a local tiled server against this data

tiled serve catalog --public stuff/

Then navigate to http://localhost:8000 in a web browser, or connect to it from any other program, or use the Tiled Python client:

from tiled.client import from_uri

client = from_uri("http://localhost:8000")

Option 3: In-process access

But if we want to access the local data from Python specifically it's not even necessary to start a server in a separate process with tiled serve catalog .... We can skip that step and do everything from one Python process.

# This does not exist---just a proposal.
from tiled.client import from_catalog

client = from_catalog("stuff/catalog.db", readable_storage=["stuff/data"])

The above uses ASGI to run a "server" and the client in the user's Python process, passing HTTP messages via Python function calls within one process instead of TCP packets between separate server and client processes.

Because this runs in a single process, it can easily be wrapped up in third-party convenience libraries which can be as "magical" or explicit as one wants

from nsls2_data_thingie import remote_access, download, local_access

client = remote_access()

download()  # perhaps downloads to some default location, like ~/.cache/nsls2_data_thingie
client = local_access()
@danielballan danielballan added this to the v0.1.0 release milestone Jun 24, 2023
@danielballan
Copy link
Member Author

P.S. The above imagines a SQLite database catalog.db and that is probably best for the vast majority of users. But we could easily support a PostgreSQL target for download and from_catalog. I am not sure how necessary it is, but it is easy to support.

@danielballan
Copy link
Member Author

This will require adding an endpoint like /assets/{id}, which is something we wanted anyway.

@padraic-shafer
Copy link
Contributor

This will require adding an endpoint like /assets/{id}, which is something we wanted anyway.

What should this endpoint return when the asset is a directory (or directory-like, such as an HDF5 virtual dataset)?

  • An archived collection — .zip, .tar
  • A list of the underlying asset endpoint URLs — probably just the next level down
  • Other

@danielballan
Copy link
Member Author

Good question, this wrinkle had not occurred to me yet.

In #450, we use ZIP to bundle multiple buffers (numpy arrays) into one response. We considered TAR, but ZIP has two important points in its favor. ZIP supports random access---unlike TAR, it has an index---and ZIP is also understood by web browsers, which could be useful in the context of web apps.

I wonder if we'll decide to support both a ZIP bundle and individual asset URLs ("the next level down").

@padraic-shafer
Copy link
Contributor

ZIP seems like a good starting point to move forward with. We could wait to add the option for asset URLs at a later time when/if they become necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants