# Examine Clusters

This notebook is for examining the clustering algorithm used by Spyral. To use this notebook you *must* first have run the point cloud phase of Spyral and generated those results. The clustering algorithm will be applied to the point clouds and plots will be displayed showing the results. This is useful for tuning the various clustering parameters. Note that data generated here is NOT saved. This is only for testing. 

First we import all the things

In [None]:
import sys
sys.path.append('..')
from spyral.core.config import load_config
from spyral.core.workspace import Workspace
from spyral.core.point_cloud import PointCloud
from spyral.core.clusterize import form_clusters, join_clusters, cleanup_clusters

import h5py as h5
import numpy.random as random
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from plotly.colors import DEFAULT_PLOTLY_COLORS

# Utility for syncing plot colors
def get_color(value: int) -> str:
    color_index = value
    if color_index >= len(DEFAULT_PLOTLY_COLORS):
        color_index = color_index % len(DEFAULT_PLOTLY_COLORS)
    elif color_index == -1:
        return "black"
    return DEFAULT_PLOTLY_COLORS[color_index]

Now we'll load the point clouds generated by phase one. This is very similar to the previous example where we loaded traces, so please reference that section if anything is unclear.

First load the config and the workspace

In [None]:
config = load_config('../local_config.json')
# Tweak some parameters
# config.cluster.min_points = 3
# config.cluster.n_neighbors_outiler_test = 5
# Create our workspace
ws = Workspace(config.workspace)

Now use the workspace to load the point cloud file

In [None]:
run_number = config.run.run_min
point_file = h5.File(ws.get_point_cloud_file_path(run_number), 'r')

cloud_group: h5.Group = point_file.get('cloud')
min_event = cloud_group.attrs['min_event']
max_event = cloud_group.attrs['max_event']

Now load a random cloud from the file. If you want to debug with a fixed event, you can do that as well by uncommenting a line in the block below.

In [None]:
event = random.randint(min_event, max_event)
# You can hardcode a specific event to debug
# event = 79
print(f'Event {event}')
event_data = cloud_group[f'cloud_{event}']
cloud = PointCloud()
cloud.load_cloud_from_hdf5_data(event_data[:].copy(), event)
print(f'Cloud size: {len(cloud.cloud)}')

fig = make_subplots(2,1,specs=[[{"type": "scene"}],[{"type": "xy"}]],row_heights=[0.6,0.4])
fig.add_trace(
    go.Scatter3d(
        x=cloud.cloud[:, 2], 
        y=cloud.cloud[:, 0], 
        z=cloud.cloud[:, 1], 
        mode="markers", 
        marker= {
            "size": 3, 
            "color": cloud.cloud[:, 3], 
            "showscale": True
        }, 
        name="Point Cloud"
    ),
    row=1,
    col=1
)
fig.add_trace(
    go.Scatter(x=np.linalg.norm(cloud.cloud[:, :3], axis=1), y=cloud.cloud[:, 4], mode="markers", name="Charge"),
    row=2,
    col=1
)
fig.update_layout(
    xaxis_title="Distance (mm)",
    yaxis_title="Integrated Charge",
    scene = {
        "xaxis_range": [0.0, 1000.0],
        "yaxis_range": [-300.0, 300.0],
        "zaxis_range": [-300.0, 300.0],
        "xaxis_title": "Z (mm)",
        "yaxis_title": "X (mm)",
        "zaxis_title": "Y (mm)",
        "aspectratio": {
            "x": 3.3,
            "y": 1.0,
            "z": 1.0
        }
    },
    width=1300,
    height=1000,
)

Above you should see two plots. One is the 3-D point cloud and the other is the integrated charge on the pad as a function of distance (a proxy of the Bragg Curve). These are essentially the feautres we will be clustering on. Now lets cluster!

### Analysis

Now that we have our cloud were ready to cluster! The first step returns a list of clusters identified by the algorithm.

In [None]:
clusters = form_clusters(cloud, config.cluster)
total_points = 0
for cluster in clusters:
    total_points += len(cluster.point_cloud.cloud)
print(f"Size: {total_points}")

We can then plot the clusters together to check the performance of the algorithm

In [None]:
fig = make_subplots(2,1,specs=[[{"type": "scene"}],[{"type": "xy"}]],row_heights=[0.6,0.4])
for cluster in clusters:
    fig.add_trace(
        go.Scatter3d(
            x=cluster.point_cloud.cloud[:, 2], 
            y=cluster.point_cloud.cloud[:, 0], 
            z=cluster.point_cloud.cloud[:, 1], 
            mode="markers",
            legendgroup="clusters",
            marker= {
                "size": 3,
                "color": get_color(cluster.label)
            }, 
            name=f"Cluster {cluster.label}"
        ),
        row=1,
        col=1
    )
    fig.add_trace(
        go.Scatter(
            x=np.linalg.norm(cluster.point_cloud.cloud[:, :3], axis=1), 
            y=cluster.point_cloud.cloud[:, 4],
            legendgroup="clusters",
            mode="markers",
            marker= {
                "color": get_color(cluster.label)
            },
            showlegend=False,
            name=f"Cluster {cluster.label}"
        ),
        row=2,
        col=1
    )
fig.update_layout(
    xaxis_title="Distance (mm)",
    yaxis_title="Integrated Charge",
    scene = {
        "xaxis_range": [0.0, 1000.0],
        "yaxis_range": [-300.0, 300.0],
        "zaxis_range": [-300.0, 300.0],
        "xaxis_title": "Z (mm)",
        "yaxis_title": "X (mm)",
        "zaxis_title": "Y (mm)",
        "aspectratio": {
            "x": 3.3,
            "y": 1.0,
            "z": 1.0
        }
    },
    width=1300,
    height=1000,
)

Above shows the different clusters identified by the algorithm, with the labels supplied by the algorithm. Points labeled -1 were identified to be noise.

Typically the algorithm breaks trajectories into many clusters due to varying pad denisty, trajectory gaps, Bragg effects, etc. So we need to rejoin these cluster pieces into an actual trajectory cluster. We do this by fitting a circle to each cluster and seeing how much the circles overlap. If they overlap enough, they are deemed to be from the same trajectory. We also check the mean charge of each cluster segment to avoid including cross talk clusters.

In [None]:
joined_clusters = join_clusters(clusters, config.cluster)

Now we can again plot our clusters

In [None]:
fig = make_subplots(2,1,specs=[[{"type": "scene"}],[{"type": "xy"}]],row_heights=[0.6,0.4])
for cluster in joined_clusters:
    fig.add_trace(
        go.Scatter3d(
            x=cluster.point_cloud.cloud[:, 2], 
            y=cluster.point_cloud.cloud[:, 0], 
            z=cluster.point_cloud.cloud[:, 1], 
            mode="markers",
            legendgroup="clusters",
            marker= {
                "size": 3,
                "color": get_color(cluster.label)
            }, 
            name=f"Cluster {cluster.label}"
        ),
        row=1,
        col=1
    )
    fig.add_trace(
        go.Scatter(
            x=np.linalg.norm(cluster.point_cloud.cloud[:, :3], axis=1), 
            y=cluster.point_cloud.cloud[:, 4],
            legendgroup="clusters",
            mode="markers",
            marker= {
                "color": get_color(cluster.label)
            },
            showlegend=False,
            name=f"Cluster {cluster.label}"
        ),
        row=2,
        col=1
    )
fig.update_layout(
    xaxis_title="Distance (mm)",
    yaxis_title="Integrated Charge",
    scene = {
        "xaxis_range": [0.0, 1000.0],
        "yaxis_range": [-300.0, 300.0],
        "zaxis_range": [-300.0, 300.0],
        "xaxis_title": "Z (mm)",
        "yaxis_title": "X (mm)",
        "zaxis_title": "Y (mm)",
        "aspectratio": {
            "x": 3.3,
            "y": 1.0,
            "z": 1.0
        }
    },
    width=1300,
    height=1000,
)

Now you should see well defined trajectory clusters! If you don't, try tweaking some of the parameters or cycling to a different point cloud.

Finally, a cleanup pass is run on the joined clusters to reduce noise and smooth the trajectory. Note that we change types here. Previously our clusters were of type LabeledCloud, a temporary holding type. Now our clusters are of type Cluster, so the semantics change a small amount.

In [None]:
cleaned_clusters = cleanup_clusters(joined_clusters, config.cluster)

We can again plot our projections to examine the results

In [None]:
fig = make_subplots(2,1,specs=[[{"type": "scene"}],[{"type": "xy"}]],row_heights=[0.6,0.4])
for cluster in cleaned_clusters:
    fig.add_trace(
        go.Scatter3d(
            x=cluster.data[:, 2], 
            y=cluster.data[:, 0], 
            z=cluster.data[:, 1], 
            mode="markers",
            legendgroup="clusters",
            marker= {
                "size": 3,
                "color": get_color(cluster.label)
            }, 
            name=f"Cluster {cluster.label}"
        ),
        row=1,
        col=1
    )
    fig.add_trace(
        go.Scatter(
            x=np.linalg.norm(cluster.data[:, :3], axis=1), 
            y=cluster.data[:, 3],
            legendgroup="clusters",
            mode="markers",
            marker= {
                "color": get_color(cluster.label)
            },
            showlegend=False,
            name=f"Cluster {cluster.label}"
        ),
        row=2,
        col=1
    )
fig.update_layout(
    xaxis_title="Distance (mm)",
    yaxis_title="Integrated Charge",
    scene = {
        "xaxis_range": [0.0, 1000.0],
        "yaxis_range": [-300.0, 300.0],
        "zaxis_range": [-300.0, 300.0],
        "xaxis_title": "Z (mm)",
        "yaxis_title": "X (mm)",
        "zaxis_title": "Y (mm)",
        "aspectratio": {
            "x": 3.3,
            "y": 1.0,
            "z": 1.0
        }
    },
    width=1300,
    height=1000,
)

### Conclusion

We've now generated clusters from our point clouds and tested the parameters, so now you can take these parameters and run the full phase 2 analysis. The next step is perfom basic physics analysis and estimate some pararameters (phase 3).