In [11]:
%pip install -U odp-sdk

Collecting odp-sdk==0.4.5
  Using cached odp_sdk-0.4.5-py3-none-any.whl (20 kB)
Note: you may need to restart the kernel to use updated packages.


# SDK - Tabular Roundtrip

In this example we will do the following:
 1. Create a tabular dataset
 2. Create schema
 3. Insert data into the dataset
 4. Query the data
 5. Update the data
 6. Delete the data
 7. Cleanup

In [12]:
from odp_sdk.client import OdpClient # The SDK
from odp_sdk.dto import ResourceDto # Resource Data Transfer Object
from odp_sdk.dto.table_spec import TableSpec # Table Specification
from odp_sdk.exc import OdpResourceNotFoundError

## Initiate the client
This is where we set up the client for our enviroment.
When we initiate a client within workspaces - it automagically authenticates requests to the plaform.
Using the SDK on your own computer you will need to authieticate, either with env variables or with our interactive login.

In [13]:
client = OdpClient()

## Create a resource data trasfer object
This object it what's being sent back and forth to the api to reference a certain resource.

In [14]:
my_dataset = ResourceDto(
    **{
        "kind": "catalog.hubocean.io/dataset",
        "version": "v1alpha3",
        "metadata": {
            "name": "narwhals",  # Add your name to the dataset
        },
        "spec": {
            "storage_controller": "registry.hubocean.io/storageController/storage-tabular",
            "storage_class": "registry.hubocean.io/storageClass/tabular",
            "maintainer": {"contact": "Just Me <raw_client_example@hubocean.earth>"},  # <-- strict syntax here
        },
    }
)

## Create the dataset
Managing resources like datasets and collections happens in the catalog part of the platform. 
Which is why we are using the catalog client part of the sdk.

In [15]:
# The dataset is created in the catalog.
my_dataset = client.catalog.create(my_dataset)

## Response

When creating a dataset the platform adds some extra data to the Resource dto. This is the same type of object we sent to create the dataset, but there are some additional fields set. 
Like the UUID, which is now the unique identifier of the dataset. 

In [16]:
print(my_dataset)

kind='catalog.hubocean.io/dataset' version='v1alpha3' metadata=MetadataDto(name='narwhals', display_name=None, description=None, uuid=UUID('ff17fbb2-5dd6-4f89-897f-64ccd9adc2e2'), labels={}, owner=UUID('9f3aecc0-3b11-41a6-a029-773b33d2d5b9')) status=ResourceStatusDto(num_updates=0, created_time=datetime.datetime(2024, 1, 17, 13, 21, 15, 846136), created_by=UUID('9f3aecc0-3b11-41a6-a029-773b33d2d5b9'), updated_time=datetime.datetime(2024, 1, 17, 13, 21, 15, 846136), updated_by=UUID('9f3aecc0-3b11-41a6-a029-773b33d2d5b9'), deleted_time=None, deleted_by=None) spec={'storage_class': 'registry.hubocean.io/storageClass/tabular', 'storage_controller': 'registry.hubocean.io/storageController/storage-tabular', 'data_collection': None, 'maintainer': {'contact': 'Just Me <raw_client_example@hubocean.earth>', 'organisation': None}, 'citation': None, 'documentation': [], 'attributes': [], 'facets': None, 'tags': []}


## Create a schema

A schema needs to be created before any data can be inserted. The schema defins the shape of the data. create_schema returns the updated table_schema object.

In [17]:
table_schema = {"Data": {"type": "string"}}  # Our schema has one field named Data with the type string
my_table_spec = TableSpec(table_schema=table_schema)

mt_table_spec = client.tabular.create_schema(resource_dto=my_dataset, table_spec=my_table_spec)


## Insert data

Now that we have our schema set we can start inserting data into our dataset.

In [18]:
test_data = [{"Data": "Test"}, {"Data": "Test1"}]

client.tabular.write(resource_dto=my_dataset, data=test_data)

## Query data

With our data inserted into the dataset we can query it.

In [19]:
our_data = client.tabular.select_as_list(my_dataset)

print("-------DATA IN DATASET--------")
print(our_data)

-------DATA IN DATASET--------
[{'Data': 'Test'}, {'Data': 'Test1'}]


## Update the data

To update the data filters must be declared to specify which data points will be updated and the data to replace must be given. The number of data that is filtered and the number of data that is provided must be the same. The system updates the data 1 to 1. Our filtering structure uses a system named OQS. OQS spesifications can be found in our documentation. https://docs.hubocean.earth/guides/querying/querying-resources/

In [None]:
update_filters = {"#EQUALS": ["$Data", "Test"]}
new_data = [{"Data": "Test Updated"}]

client.tabular.update(
    resource_dto=my_dataset,
    data=test_data,
    filter_query=update_filters,
)

result = client.tabular.select_as_list(my_dataset)

print("-------UPDATED DATA IN DATASET--------")
print(result)

## Delete the data

To delete data we again need filters to specify which data to delete.

In [20]:
delete_filters = {"#EQUALS": ["$Data", "Test1"]}
client.tabular.delete(resource_dto=my_dataset, filter_query=delete_filters)

result = client.tabular.select_as_list(my_dataset)

print("-------DATA IN DATASET AFTER DELETION--------")
print(result)

-------DATA IN DATASET AFTER DELETION--------
[{'Data': 'Test'}]


## Cleanup

For cleanup we remove the schema and delete the dataset.

In [21]:
# Delete the schema
client.tabular.delete_schema(my_dataset)

try:
    client.tabular.get_schema(my_dataset)
except OdpResourceNotFoundError as e:
    print("Schema not found error since it is deleted:")
    print(e)

# Delete the dataset
client.catalog.delete(my_dataset)
print("Dataset deleted successfully")

Schema not found error since it is deleted:
Schema not found
Dataset deleted successfully
