# Storing and Accessing objects via Descartes Labs Storage
## Blob Demo

The Storage module provides users with the ability to upload, store, and access a wide variety of objects in the Descartes Labs infrastructure. Object type is intentionally arbitrary to enable users to store and access things like compute logs, model weight parameters, ...etc. 

Storage objects are accessible by an associated `Blob` object. These `Blob`s are queryable by name, geospatial location (E.g., points, polygons,...etc.), and assigned tags. `Blob`s  can be downloaded to local files or retrieved directly as Python `bytes` objects. Storage supports the same sharing mechanisms as Catalog products and includes `owners`, `writers`, and `readers` attributes.

**Improvement notes coming:**
 * Interoperability with Explorer
 * Access Batch Compute results via Storage
 * Expiration dates for Storage objects

**Note:** The new AWS Catalog Storage service is only available via `descarteslabs` version 2.0.0. Please run the following once before running the demo to install the correct `descarteslabs` client version.

In [1]:
# Install edge client of descarteslabs
!pip install -U git+https://github.com/descarteslabs/descarteslabs-python.git

Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://github.com/descarteslabs/descarteslabs-python.git
  Cloning https://github.com/descarteslabs/descarteslabs-python.git to /tmp/pip-req-build-cb2jqpp_
  Running command git clone --filter=blob:none --quiet https://github.com/descarteslabs/descarteslabs-python.git /tmp/pip-req-build-cb2jqpp_
  Resolved https://github.com/descarteslabs/descarteslabs-python.git to commit 58d0681b1aa0417928ad9b4b1aaf408cf5e38822
  Preparing metadata (setup.py) ... [?25ldone
Building wheels for collected packages: descarteslabs
  Building wheel for descarteslabs (setup.py) ... [?25ldone
[?25h  Created wheel for descarteslabs: filename=descarteslabs-2.0.0rc4-py3-none-any.whl size=443105 sha256=62be89e05874de0ffb385191e1ca83438db2e5b6998fb69b765f6adf0d4603e6
  Stored in directory: /tmp/pip-ephem-wheel-cache-i696rxbe/wheels/f3/77/44/42127262e38a146014a3649de9c3326600c722582ca79fbbf1
Successfully built descarte

In [1]:
import json
import descarteslabs as dl
dl.select_env(dl.AWS_ENVIRONMENT)

from descarteslabs.catalog import Blob, properties
import os

In [4]:
dl.auth.Auth().

<descarteslabs.auth.auth.Auth at 0x7f17dcc76670>

In [3]:
# Check that we are working with the AWS offering and correct version
print("descarteslabs version:", dl.__version__)
print("Current Env: ", dl.get_settings().peek_settings().current_env)

descarteslabs version: 2.0.0rc4
Current Env:  aws-production


### Store JSON information as Blob

First let's create a new `Blob` with a JSON object for data, and an associated geometry. For example, let's say that we have the geometry for a field of hops in Yakima Vally, WA and want to connect it with some (very brief) information the crop. 

In [2]:
# JSON of crop info
crop_info = {
    "crop": "hops",
    "acreage": 450,
}
# Geometry for the field
field_geom = {
    "type": "Polygon",
    "coordinates": [[
        [-120.4023, 46.551], 
        [-120.3859, 46.551], 
        [-120.3859, 46.5534], 
        [-120.4023, 46.5534], 
    ]],
}

As with creating an `Image` in the Catalog module, you construct an unsaved `Blob` with whatever attributes you desire, and then use either the `upload()` method to upload a file from the local filesystem, or the `upload_data()` method to upload arbitrary python data. All data is handled as bytes internally; strings will be automatically encoded using UTF-8, anything else should be serialized to byte or string data. Upon completion of the upload, the `Blob` object will have all the recorded attributes populated.

In [3]:
blob = Blob(name="hop_field_info_demo", geometry=field_geom, tags=["hops_project"])
blob.upload_data(json.dumps(crop_info))
blob

Blob: hop_field_info_demo
  id: data/descarteslabs/hop_field_info_demo
  created: Wed May 17 19:40:29 2023

The resulting saved Blob's has several attributes including those identified below. Note that the `id` is the concatenation of the `storage_type`, the `namespace`, and the `name`. (Only) The name may contain internal '/' characters.

In [4]:
print("Blob ID:", blob.id)
print("Blob size:", blob.size_bytes)
print("Blob geometry: ", blob.geometry)
print("Blob assigned tags: ", blob.tags)

Blob ID: data/descarteslabs/hop_field_info_demo
Blob size: 32
Blob geometry:  POLYGON ((-120.4023 46.551, -120.3859 46.551, -120.3859 46.5534, -120.4023 46.5534, -120.4023 46.551))
Blob assigned tags:  ['hops_project']


To see the remaining attributes available, run the cell below

### Storing a file

If we want to store a file in Storage, let's say a derived GeoTIFF of the NIR and red bands for a Sentinel-2 L2A image, we can create our `Blob` object with the desired attributes and use the `.upload()` method to upload directly from a local file.

In [None]:
file_blob = Blob(
    namespace="hop_yields_project",
    name="sentinel-2_image_demo",
    readers=["group:hop_researchers"],
    tags=[]  # Even if no tags are assigned, an empty list should be passed here
)
file_blob.upload("sentinel-2_l2a_nir_red.tif")
file_blob

In [4]:
# Get your organization ID
org = dl.auth.Auth().payload['org']
file_blob = Blob(
    namespace="compute_demo",
    name="yakima_valley",
    readers=[f"org:{org}"],
    tags=["compute_demo"]  # Even if no tags are assigned, an empty list should be passed here
)
file_blob.upload("../data/CountyZoning.zip")
file_blob

Blob: yakima_valley
  id: data/descarteslabs:compute_demo/yakima_valley
  created: Tue May 23 01:39:18 2023

Also note, for this upload we set a specific `namespace` to store this object as part of the "hope_yields_project" within our organization. The previous blob received the default `namespace`, which is the owner's org. A specified `namespace` will automatically be prefixed with the user's `org` to ensure that while users within an org can work in the same namespaces, there can be no collisions between users in different orgs.

Another attribute that may be useful is the href, to see where in the DL managed S3 bucket or region this `Blob` is stored

In [6]:
print("S3 HREF for Blob")
file_blob.href

S3 HREF for Blob


's3://dl-catalog-storage-production.us-west-2/792f2f3d48174547a7cdac4534761291/data/descarteslabs:compute_demo/yakima_valley'

In [7]:
for k,v in file_blob._attributes.items():
    print(f"{k}: {v}")

id: data/descarteslabs:compute_demo/yakima_valley
created: 2023-05-17 19:42:58.266937+00:00
description: None
expires: None
extra_properties: {}
geometry: None
hash: 6cabda957e28608b3b2e3a2516ae14c1
href: s3://dl-catalog-storage-production.us-west-2/792f2f3d48174547a7cdac4534761291/data/descarteslabs:compute_demo/yakima_valley
modified: 2023-05-17 19:42:58.266937+00:00
name: yakima_valley
namespace: descarteslabs:compute_demo
owners: ['org:descarteslabs', 'user:ca44eb051c6fdcfa155a20166b234253eb33538c']
readers: ['org:descarteslabs']
size_bytes: 1036082
storage_state: available
storage_type: data
tags: ['compute_demo']
writers: []


### Searching across storage objects
Catalog search methods can be performed across your storage objects, including geospatial searches.

In [8]:
# List all Blob objects in your storage
for b in Blob.search():
    print(b.id)

data/descarteslabs/foo/bar
data/descarteslabs/foo/baz
data/descarteslabs/hop_field_info
data/descarteslabs/hop_field_info_demo
data/descarteslabs:compute_demo/yakima_valley
data/descarteslabs:hop_yields_project/sentinel-2_image


In [9]:
# Geospatial searches by intersection
## Intersect particular coordinate
print([b.id for b in Blob.search().intersects({"type": "Point", "coordinates": [-120.40, 46.552]})])
## Intersecting geometry object
print([b.id for b in Blob.search().intersects(field_geom)])
# Filter by tags
print([b.id for b in Blob.search().filter(properties.tags == "hops_project")])
# Filter by namespace
# Get your org for namespace
org = dl.auth.Auth().payload['org']
print([b.id for b in Blob.search().filter(properties.namespace == f"{org}:hop_yields_project")])

['data/descarteslabs/hop_field_info', 'data/descarteslabs/hop_field_info_demo']
['data/descarteslabs/hop_field_info', 'data/descarteslabs/hop_field_info_demo']
['data/descarteslabs/hop_field_info', 'data/descarteslabs/hop_field_info_demo']
['data/descarteslabs:hop_yields_project/sentinel-2_image']


The `name` field allows embedded "/" characters, allowing you to structure your blobs as if in a file system. As with typical cloud storage systems, there's no real directories, but the consistent use of the "/" delimiter will allow powerful prefix searches. First we'll create a few more interesting blobs:

In [10]:
blob3 = Blob(name="foo/bar")
blob3.upload_data("this is a test")
blob4 = Blob(name="foo/baz")
blob4.upload_data("this is not a test")

ConflictError: 
    409: A document with id `data/descarteslabs/foo/bar` already exists

Now we can use a prefix filter to pick out these new blobs:

In [11]:
for b in Blob.search().filter(properties.name.prefix("foo/")):
    print(b.id)

data/descarteslabs/foo/bar
data/descarteslabs/foo/baz


### Retrieving data from Storage
The `Blob` data may be retrieved, either by downloading directly to a local file or some other file-like object (e.g. an `io.IOBase` object), or directly into memory. Here's a simple download to a file:

In [12]:
blob.download("sentinel-2_image_demo")
!ls -l sentinel-2_l2a_nir_red.tif
!md5sum sentinel-2_l2a_nir_red.tif

ls: cannot access 'sentinel-2_l2a_nir_red.tif': No such file or directory
md5sum: sentinel-2_l2a_nir_red.tif: No such file or directory


You can also download in raw bytes:

In [13]:
print(blob.data())

b'{"crop": "hops", "acreage": 450}'


It's also possible to do streaming, iterative downloads. The `iter_data()` method iterates on chunks of bytes, while the `iter_lines()` method iterates on delimited lines.

Let's quickly create a multi-line text file and push it to storage:

In [14]:
with open("some_file", "w") as f:
    f.write("This is some text.\nThis is some more.\n")
!ls -l some_file
!md5sum some_file

-rw-r--r-- 1 jovyan jovyan 38 May 17 19:46 some_file
fa939c0e9a504cd9d395a93b77e496fd  some_file


In [15]:
multi_blob = Blob(
    namespace="some-project",
    name="its-a-file_demo",
    size_bytes=38,
    hash="fa939c0e9a504cd9d395a93b77e496fd",
    readers=["group:thecoolones"]
)
multi_blob.upload("some_file")
multi_blob

Blob: its-a-file_demo
  id: data/descarteslabs:some-project/its-a-file_demo
  created: Wed May 17 19:46:48 2023

In [16]:
for lineno, line in enumerate(multi_blob.iter_lines(decode_unicode=True)):
    print(f"{lineno} {line}")

0 This is some text.
1 This is some more.


### Sharing Blob objects
As with Catalog products, you can add specific organizations or users as readers or writers to your `Blob` objects to give others access. Simply update the `.readers`, `.writers`, ...etc. then save the `Blob` using `.save()`

In [44]:
# Adding coworker and org as readers for previously stored GeoTiff file
file_blob.readers = ["email:rockstar@wherewework.com", "org:wherewework"]
file_blob.readers
file_blob.save()

### Deleting blob files
`Blob`s can be deleted using the Catalog module search methods we used earlier.

In [17]:
# Delete by tags
for b in Blob.search().filter(properties.tags == "hops_project"):
    b.delete()
# Delete by project namespace
for b in Blob.search().filter(properties.namespace == "hop_yields_project"):
    b.delete()

ServerError: 
    Access denied: No delete permission for object: data/descarteslabs/hop_field_info

 Let's clean up our mess using the namespace for our personal account: 

In [18]:
# Delete all files
for b in Blob.search().filter(properties.owners == "user:" + dl.auth.Auth().namespace):
    b.delete()
assert Blob.search().filter(properties.owners == "user:" + dl.auth.Auth().namespace).count() == 0

*Note: all Blob objects created here are automatically available in your organization's namespace. Before sharing this demo with others in your org, delete all unnecessary created objects to prevent errors.*