# Data

The `geoh5` format allows storing data (values) on different parts of an ``Object``. The data types currently supported by `geoh5py` are

- Float
- Integer
- Text
- Colormap
- Well log

![data](./images/data.png)

In [57]:
from geoh5py.workspace import Workspace
import numpy as np

# Re-use the previous workspace
workspace = Workspace("my_project.geoh5")

# Get the curve from previous section
curve = workspace.get_entity("Curve")[0]

## Float

Numerical `float` data can be attached to the various elements making up object. Data can be added to an `Object` entity using the `add_data` method.

In [58]:
curve.add_data({
    "my_cell_values": {
        "association":"CELL", 
        "values": np.random.randn(curve.n_cells)
    }
})

<geoh5py.data.float_data.FloatData at 0x1a33c1ac8b0>

The `association` can be one of:

- OBJECT: Single element characterizing the parent object
- VERTEX: Array of values associated with the parent object vertices
- CELL: Array of values associated with the parent object cells 

The length and order of the array of values must be consistent with the corresponding element of `association`. If the `association` argument is omited, `geoh5py` will attempt to assign the data to the correct part based on the shape of the data values, either `object.n_values` or `object.n_cells`

In [59]:
# Add multiple data vectors on a single call
data = {}
for ii in range(8):
    data[f"Period:{ii}"] = {
        "association":"VERTEX", 
        "values": (ii+1) * np.cos(ii*curve.vertices[:, 0]*np.pi/curve.vertices[:, 0].max()/4.)
    }

data_list = curve.add_data(data)
print([obj.name for obj in data_list])

['Period:0', 'Period:1', 'Period:2', 'Period:3', 'Period:4', 'Period:5', 'Period:6', 'Period:7']


The newly created data is directly added to the project's `geoh5` file and available for visualization:

![adddata](./images/adddata.png)

## Integer

## Text

Text (string) data can only be associated to the object itself.

In [60]:
curve.add_data({
    "my_comment": {
        "association":"OBJECT", 
        "values": "hello_world"
    }
})

<geoh5py.data.text_data.TextData at 0x1a33c1d6e50>

## Colormap

The colormap data type can be used to store or customize the color palette used by Geoscience ANALYST.

In [61]:
from geoh5py.data.color_map import ColorMap

# Create some data on a grid2D entity.
grid = workspace.get_entity("Grid2D")[0]

# Add data
radius = grid.add_data({
    "radial": {"values": np.linalg.norm(grid.centroids, axis=1)}
})

![colormap](./images/default_colormap.png)

In [62]:
# Create a simple colormap that spans the data range
nc = 10
rgba = np.vstack([
    np.linspace(radius.values.min(), radius.values.max(), nc), # Values
    np.linspace(0, 255, nc), # Red
    np.linspace(255, 0, nc), # Green
    np.linspace(125, 15, nc), # Blue,
    np.ones(nc) * 255, # Alpha,
])

We now have an array that contains a range of integer values for red, green, blue and alpha (RGBA) over the span of the data values. This array can be used to create a `ColorMap` class.

In [63]:
# Create a record array with labels
cmap = np.asarray(
    np.core.records.fromarrays(
        rgba, 
        names=["Value", "Red", "Green", "Blue", "Alpha"], 
        formats=["<f8", "u1", "u1", "u1", "u1"]
     )
)

# Assign the colormap to the data type
radius.entity_type.color_map = {
    "values": cmap, "name": "Custom colormap"
}

workspace.finalize() # Update the geoh5

![colormap](./images/custom_colormap.png)

## Well Data

In the case of [Drillhole](#Drillhole) objects, data are added as either `interval log` or `point log` values.

### Point Log Data

Log data are used to represent measurements recorded at discrete depths along the well path. A `depth` attribute is required on creation. If the `Drillhole` object already holds point log data, `geoh5py` will attempt to match collocated depths within tolerance. By default, depth markers within 1 centimeter are merged (`collocation_distance=1e-2`).    

In [64]:
well = workspace.get_entity("Drillhole")[0]
depths_A = np.arange(0, 50.) # First list of depth

# Second list slightly offsetted on the first few depths
depths_B = np.arange(47.1, 100) 

# Add both set of log data with 0.5 m tolerance
well.add_data({
    "my_log_values": {
            "depth": depths_A,
            "values": np.random.randn(depths_A.shape[0]),
    },
    "log_wt_tolerance": {
            "depth": depths_B,
            "values": np.random.randn(depths_B.shape[0]),
            "collocation_distance": 0.5
    }
})

[<geoh5py.data.float_data.FloatData at 0x1a33c1d6a30>,
 <geoh5py.data.float_data.FloatData at 0x1a33c1d6670>]

![DHlog](./images/DHlog.png){width="50%"}

### Interval Log Data

Interval log data are defined by constant values bounded by a start an end depth. A `from-to` attribute is expected on creation. Users can also control matching intervals by supplying a `tolerance` argument in meters (default `tolerance: 1e-3` meter).

In [65]:
# Add some geology as interval data  
well.add_data({
    "interval_values": {
        "values": [1, 2, 3], 
        "from-to": np.vstack([
            [0.25, 25.5],
            [30.1, 55.5],
            [56.5, 80.2]
        ]),
        "value_map": {
            1: "Unit_A",
            2: "Unit_B",
            3: "Unit_C"
        },
        "type": "referenced",
    }
})

<geoh5py.data.referenced_data.ReferencedData at 0x1a33c7c4640>

![DHinterval](./images/DHinterval.png){width="50%"}

## Get data
Just like any `Entity`, data can be retrieved from the `Workspace` using the `get_entity` method. For convenience, `Objects` also have a `get_data_list` and `get_data` method that focusses only on their respective children `Data`.

In [66]:
my_list = curve.get_data_list()
print(my_list, curve.get_data(my_list[0]))

['Period:0', 'Period:0', 'Period:1', 'Period:1', 'Period:2', 'Period:2', 'Period:3', 'Period:3', 'Period:4', 'Period:4', 'Period:5', 'Period:5', 'Period:6', 'Period:6', 'Period:7', 'Period:7', 'Visual Parameters', 'my_cell_values', 'my_cell_values', 'my_comment', 'my_comment'] [<geoh5py.data.float_data.FloatData object at 0x000001A33C1A33A0>, <geoh5py.data.float_data.FloatData object at 0x000001A33BAB8D30>]


# Property Groups

`Data` entities sharing the same parent `Object` and `association` can be linked within a `property_groups` and made available through profiling. This can be used to group data that would normally be stored as 2D array.

In [67]:
# Add another VERTEX data and create a group with previous
curve.add_data_to_group([obj.name for obj in data_list], "my_trig_group")

<geoh5py.groups.property_group.PropertyGroup at 0x1a33c7c4880>

![propgroups](./images/propgroups.png)