# Push data to and load data from ManiVault

> Install the modules listed in requirements.txt before running this notebook!

In order to interact with the ManiVault application that started this Python kernel, we use a dedicated module:

In [None]:
import mvstudio.data

Our main point of entrance to ManiVault is its data hierarchy:

In [None]:
dh = mvstudio.data.Hierarchy()
print(dh)

At this point, no data is loaded and the data hierarchy is empty.

Let's load some data on the Python side and push it to ManiVault:

In [None]:
from skimage import data
import matplotlib.pyplot as plt

cat = data.cat()

plt.imshow(cat)

We push our cat image to ManiVault via the data hierarchy `dh`. We may add `Point`, `Image` and `Cluster` data.  ManiVault handles images as two data set items, a collection of data points and a description of how they relate to image position. Similarly, ManiVault considers clusters as a meta-datasets that is connected to point data and stores clusters as sets of indices of the parent points.

We can `addPointsItem` if we have a simply point data set, but want to use `addImageItem` for automatically creating both ManiVault data items:

In [None]:
cat_mv_item = dh.addImageItem(cat, "Cat")

ManiVault informs us that the data points were internally converted to `float`.

We can inspect the data item that ManiVault created and see that indeed two data sets were populated, a `Points` and a `Image` data set:

In [None]:
print(cat_mv_item)

Each dataset is assigned a unique identifier, a data set ID. When loading data from ManiVault we can use this ID, it's position in the data hierarchy or name to identify a data set.
The unique identifier of a data set is accessible via the data hierarchy in ManiVault: right-click a data entry and select "Copy dataset ID" to save the ID to the clipboard. Using `Ctrl` + `V` you can insert the ID in this notebook.

In [None]:
cat1 = dh.getItemByDataID(cat_mv_item.datasetId)
cat2 = dh.getItemByIndex([1])
cat3 = dh.getItemByName("Cat")

print(cat1)
print(cat2)
print(cat3)

> When multiple data sets have the same name, `getItemByName` will only return the first instance!

Our cat `Point` data does not know anything about the spatial arrangement in image space:

In [None]:
print(f"Data points and dimensions: {cat1.points.shape}")

The corresponging cat `Image` data arranges the data into a proper shape:

In [None]:
cat_img = dh.getItemByIndex([1, 1])
print(f"Image height, width and channels: {cat_img.image.shape}")

Remember that ManiVault converted the data into `float`. For a full roundtrip, we'd like to convert them back to their original type:

In [None]:
import numpy as np
cat_roundtrip = cat_img.image.astype(np.uint8)
print(f"Value type of original image: {cat.dtype}")
print(f"Value type of roundtrip image: {cat_roundtrip.dtype}")

Let's have a look again, it should be the same!

In [None]:
plt.imshow(cat_roundtrip)

Next, let's push a `Cluster` data set to ManiVault. 

We first need to create a list of indices that define our clusters. Here, we simply define the upper and lower half of our image as clusters.

In [None]:
def split_array(n):
    import numpy as np
    if n % 2 != 0:
        raise ValueError("n must be an even number")
    half = n // 2
    array1 = np.arange(0, half)
    array2 = np.arange(half, n)
    return [array1, array2]

clusterIndices = split_array(cat_mv_item.numpoints)

We need to tell ManiVault which point data set our clusters refer to:

In [None]:
cluster_mv_item = dh.addClusterItem(cat_mv_item.datasetId, clusterIndices, "CatClusters")

Let's have a look at the data hierarchy again. It mirrors what you see in ManiVault!

In [None]:
dh = mvstudio.data.Hierarchy()
print(dh)

Clusters in ManiVault are automatically assigned names and colors. Optionally, you can set them yourself when calling `addClusterItem(names=[...], colors=[...])` and pass lists of string and numpy arrays respectively.

In [None]:
clusters_retrieved = dh.getItemByIndex([1, 2])
print(f"First indices of first cluster: {clusters_retrieved.cluster.indices[0][:5]}")
print(f"Cluster names: {clusters_retrieved.cluster.names}")
print(f"Cluster colors: {clusters_retrieved.cluster.colors}")