# Hands-On Hecuba


## Part 1
Hecuba is built around two main data structures; `StorageObj` and `StorageDict`. The `StorageObj` is a python regular object with a set of persistent attributes, annotated as `@ClassField name type`, for example: `@ClassField myattr int`

On the other hand, the `StorageDict` represents a dictionary. To describe its data model, one can write `@TypeSpec dict<<key>,value>`, where key and value follow the format `name:type`. Keep in mind that an StorageObj can have many `ClassFields` while a StorageDict will have exactly one `TypeSpec`.


List of supported data types:
https://github.com/bsc-dd/hecuba/wiki/1:-User-Manual#immutable-types-supported

Naming conventions:
https://github.com/bsc-dd/hecuba/wiki/1:-User-Manual#hecuba-data-classes

### Exercise 1 - Define data models

Define a class that inherits from either `StorageObj` or `StorageDict`. Then, add a data model that uses more than one attribute for the StorageObj, or more than one value if you chose the `StorageDict`.

Create one instance, using the empty constructor `MyClass()`. Add some data, and then, invoke the method `make_persistent("name")`. At this point, the data will be sent to persistent storage.

Now, you can also access the data on storage with `cqlsh`, an interface to access Cassandra which can run SQL-like commands. Run `cqlsh` from your terminal, and explore the data. Also, you can run queries from the Notebook like:

In [None]:
!cqlsh -e 'DESCRIBE my_app'

In [None]:
!cqlsh -e 'SELECT storage_id,name FROM hecuba.istorage LIMIT 10'

Finally, add a method to the class definition. The method should combine multiple attributes, or the values of a given key.

Instantiate the object again, but this time use the same "name" previously used to make the data persistent. In this way, the object will be able to recover the previous data. You can also try to call the new method.

### Exercise 2 - Let's parallelize workloads

Now, declare a class that inherits from `StorageDict`, and define a data model.

Then, declare one instance using the persistent constructor `MyClass("someid")`. Populate the object with data, let's say, with 100k to 10 Millions key-value pairs.

All Hecuba object's have a generator method, `split()`, that yields subsets of the object until all data has been fetch. 
Try that, you will see that data is split randomly, but all data is there.