# Storing and Accessing objects via Descartes Labs Storage
## Blob Demo

The Storage module provides users with the ability to upload, store, and access a wide variety of objects in the Descartes Labs infrastructure. Object type is intentionally arbitrary to enable users to store and access things like compute logs, model weight parameters, ...etc. 

Storage objects are accessible by an associated `Blob` object. These `Blob`s are queryable by name, geospatial location (E.g., points, polygons,...etc.), and assigned tags. `Blob`s  can be downloaded to local files or retrieved directly as Python `bytes` objects. Storage supports the same sharing mechanisms as Catalog products and includes `owners`, `writers`, and `readers` attributes.

**Improvement notes coming:**
 * Interoperability with Explorer
 * Access Batch Compute results via Storage
 * Expiration dates for Storage objects

In [None]:
import json, os

import descarteslabs as dl
from descarteslabs.catalog import Blob, properties as p

### Store JSON information as Blob

First let's create a new `Blob` with a JSON object for data, and an associated geometry. For example, let's say that we have the geometry for a field of hops in Yakima Vally, WA and want to connect it with some (very brief) information the crop. 

In [None]:
# JSON of crop info
crop_info = {
    "crop": "hops",
    "acreage": 450,
}
# Geometry for the field
field_geom = {
    "type": "Polygon",
    "coordinates": [
        [
            [-120.4023, 46.551],
            [-120.3859, 46.551],
            [-120.3859, 46.5534],
            [-120.4023, 46.5534],
        ]
    ],
}

As with creating an `Image` in the Catalog module, you construct an unsaved `Blob` with whatever attributes you desire, and then use either the `upload()` method to upload a file from the local filesystem, or the `upload_data()` method to upload arbitrary python data. All data is handled as bytes internally; strings will be automatically encoded using UTF-8, anything else should be serialized to byte or string data. Upon completion of the upload, the `Blob` object will have all the recorded attributes populated.

*Note* `Blob` names must be _unique to your organization's namespace_, for this demo we will include a unique string as to avoid organization conflicts.

In [None]:
from uuid import uuid4

blob = Blob(
    name=f"hop_field_info_demo-{uuid4()}",
    geometry=field_geom,
    tags=["hops_project", "storage_demo"],
)
blob.upload_data(json.dumps(crop_info))
blob

The resulting saved Blob's has several attributes including those identified below. Note that the `id` is the concatenation of the `storage_type`, the `namespace`, and the `name`. (Only) The name may contain internal '/' characters.

If we inspect the `ID` field we can break the namespace down further:
* `data/` is the object type
* `your-organization-name:your-user-hash/` is your organization's namespace
* `hop_field_demo-XXXXX` is the `Blob` object's unique name

In [None]:
print("Blob ID:", blob.id)
print("Blob size:", blob.size_bytes)
print("Blob geometry: ", blob.geometry)
print("Blob assigned tags: ", blob.tags)

To see the remaining attributes available, run the cell below

### Storing a file

If we want to store a file in Storage, let's say a derived GeoTIFF of the NIR and red bands for a Sentinel-2 L2A image, we can create our `Blob` object with the desired attributes and use the `.upload()` method to upload directly from a local file.

In [None]:
file_blob = Blob(
    namespace=f"hop_yields_project-{uuid4()}",
    name=f"sentinel-2_image_demo-{uuid4()}",
    readers=["group:hop_researchers"],
    tags=[
        "storage_demo"
    ],  # Even if no tags are assigned, an empty list should be passed here
)
file_blob.upload("data/rgb.tif")
file_blob

In [None]:
# Get your organization ID
org = dl.auth.Auth().payload["org"]
file_blob = Blob(
    namespace="storage_demo",
    name="yakima_valley",
    readers=[f"org:{org}"],
    tags=[
        "storage_demo"
    ],  # Even if no tags are assigned, an empty list should be passed here
)
file_blob.upload("data/yakima.geojson")
file_blob

Also note, for this upload we set a specific `namespace` to store this object as part of the "hope_yields_project" within our organization. The previous blob received the default `namespace`, which is the owner's org. A specified `namespace` will automatically be prefixed with the user's `org` to ensure that while users within an org can work in the same namespaces, there can be no collisions between users in different orgs.

Another attribute that may be useful is the href, to see where in the DL managed S3 bucket or region this `Blob` is stored

In [None]:
print("S3 HREF for Blob")
file_blob.href

In [None]:
for k, v in file_blob._attributes.items():
    print(f"{k}: {v}")

### Searching across storage objects
Catalog search methods can be performed across your storage objects, including geospatial searches.

In [None]:
# Geospatial searches by intersection
## Intersect particular coordinate
print("Point Intersection:")
print(
    [
        b.id
        for b in Blob.search().intersects(
            {"type": "Point", "coordinates": [-120.40, 46.552]}
        )
    ]
)
print("Polygon Intersectsion:")
## Intersecting geometry object
print([b.id for b in Blob.search().intersects(field_geom)])
print("Tag Filter:")
# Filter by tags
print([b.id for b in Blob.search().filter(p.tags == "storage_demo")])
print("Namespace Filter:")
# Filter by namespace
# Get your org for namespace
org = dl.auth.Auth().payload["org"]
print([b.id for b in Blob.search().filter(p.namespace == f"{org}:hop_yields_project")])

The `name` field allows embedded "/" characters, allowing you to structure your blobs as if in a file system. As with typical cloud storage systems, there's no real directories, but the consistent use of the "/" delimiter will allow powerful prefix searches. First we'll create a few more interesting blobs:

In [None]:
blob3 = Blob(name=f"foo/bar-{uuid4()}", tags=["storage_demo"])
blob3.upload_data("this is a test")
blob4 = Blob(name=f"foo/baz-{uuid4()}", tags=["storage_demo"])
blob4.upload_data("this is not a test")

Now we can use a prefix filter to pick out these new blobs:

In [None]:
for b in Blob.search().filter(p.name.prefix("foo/")):
    print(b.id)

### Retrieving data from Storage
The `Blob` data may be retrieved, either by downloading directly to a local file or some other file-like object (e.g. an `io.IOBase` object), or directly into memory. Here's a simple download to a file:

In [None]:
file_blob.download("yakima_valley.geojson")

You can also download in raw bytes:

In [None]:
print(file_blob.data())

It's also possible to do streaming, iterative downloads. The `iter_data()` method iterates on chunks of bytes, while the `iter_lines()` method iterates on delimited lines.

Let's quickly create a multi-line text file and push it to storage:

In [None]:
with open("some_file", "w") as f:
    f.write("This is some text.\nThis is some more.\n")
!ls -l some_file
!md5sum some_file

In [None]:
multi_blob = Blob(
    namespace="some-project",
    name="its-a-file_demo",
    size_bytes=38,
    hash="fa939c0e9a504cd9d395a93b77e496fd",
    readers=["group:thecoolones"],
    tags=["storage_demo"],
)
multi_blob.upload("some_file")
multi_blob

In [None]:
for lineno, line in enumerate(multi_blob.iter_lines(decode_unicode=True)):
    print(f"{lineno} {line}")

### Sharing Blob objects
As with Catalog products, you can add specific organizations or users as readers or writers to your `Blob` objects to give others access. Simply update the `.readers`, `.writers`, ...etc. then save the `Blob` using `.save()`

In [None]:
# Adding coworker and org as readers for previously stored GeoTiff file
file_blob.readers = ["email:rockstar@wherewework.com", "org:wherewework"]
file_blob.save()
file_blob.readers

### Deleting blob files
`Blob`s can be deleted using the Catalog module search methods we used earlier.

In [None]:
# Delete by tags
for b in Blob.search().filter(p.tags == "storage_demo"):
    print(f"Deleting {b.id}")
    b.delete()
os.remove("yakima_valley.geojson")
os.remove("some_file")

### *Note: all Blob objects created here are automatically available in your organization's namespace. Before sharing this demo with others in your org, delete all unnecessary created objects to prevent errors.*