# KMeans Clustering

This application lets users cluster data stored on [Geoscience ANALYST](https://mirageoscience.com/mining-industry-software/geoscience-analyst) objects using the [Scikit-Learn.KMeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html?highlight=kmeans#sklearn.cluster.KMeans)
clustering algorithm. Leveraging [Plotly](https://plotly.com/) visualization tools, users are able to assess the clustering
results using histogram, box, scatter, inertia and cross-correlation plots.

New user? Visit the [Getting Started](../installation.rst) page.

<img src="./images/cluster/cluster_app.png">

## Project Selection

Select and connect to an existing **geoh5** or **ui.json** project file containing data. 

<img align="left" src="./images/dash_upload.png">

See the [Project Panel](base_application.ipynb#Project-Panel) page for more details.

## Object and Data Selection

The `Object` dropdown contains a list of objects available from the target `geoh5` project. Only the data selected in the `Data subset` dropdown are used in the clustering routine.

<img align="left" src="./images/cluster/cluster_object.png">

<img align="left" src="./images/cluster/cluster_data.png">

## Downsampling

Reduce the points displayed on the clustering plots to a percentage of the original amount, with a maximum of 5000 points.

<img align="left" src="./images/scatter/scatter_downsampling.png">

## Clustering

Select the number of clusters (groups) desired.

<img align="left" src="./images/cluster/cluster_nclusters.png">

By default, the application will run
KMeans for 2, 4, 8, 16 and 32 groups in order to draw a meaningful [Inertia Curve](#inertia_curve)

## Clusters Color

Check the `Select cluster color` checkbox to display a colorpicker that can be used to assign a specific color to a given cluster group.

<img align="left" src="./images/cluster/cluster_colorselect.png">

## Analytics

Plotting options to analyze the selected data and KMeans clusters. The default displayed plots are the Crossplot and Inertia Plot. Use the `Show Analytics & Normalization` checkbox to display the Histogram, Boxplot, Statistics, and Confusion Matrix.

### Crossplot

See the [Scatter Plot](#Scatter-Plot) documentation for details.

In [1]:
import plotly.io as io
import plotly.graph_objects as go

go.FigureWidget(io.read_json("./images/cluster/cluster_scatter.json"))

FigureWidget({
    'data': [{'marker': {'color': [0.05115252244738129, 0.6809584014932993,
                   …

By default, the color values displayed correspond to the cluster groups.

### Statistics

Display statistics for the chosen data channels using [pandas.DataFrame.describe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html).

<img align="left" src="./images/cluster/cluster_stats.png">

### Confusion Matrix

Display the confusion matrix for the chosen data channels.

In [2]:
import plotly.io as io
import plotly.graph_objects as go

go.FigureWidget(io.read_json("./images/cluster/cluster_matrix.json"))

FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
               …

### Histograms

Display histograms for each data field.

In [3]:
import plotly.io as io
import plotly.graph_objects as go

go.FigureWidget(io.read_json("./images/cluster/cluster_hist.json"))

FigureWidget({
    'data': [{'histnorm': 'percent',
              'name': 'Al2O3',
              'type': 'hist…

By default, all fields are normalized between [0, 1].

#### Scale

Option to increase the weight of a specific data field.

<img align="left" src="./images/cluster/cluster_scale.png">

#### Upper Bound

Upper bound (maximum) value used for the KMeans clustering.

<img align="left" src="./images/cluster/cluster_upper.png">

#### Lower Bound

Lower bound (minimum) value used for the KMeans clustering.

<img align="left" src="./images/cluster/cluster_lower.png">

### Inertia

Display the cluster's inertia, or sum squares of distances between each sample to the center of its cluster group. The optimal number of clusters is generally thought to be at the point of maximum curvature.

In [4]:
import plotly.io as io
import plotly.graph_objects as go

go.FigureWidget(io.read_json("./images/cluster/cluster_inertia.json"))

FigureWidget({
    'data': [{'mode': 'lines',
              'type': 'scatter',
              'uid': '8d1f1257-…

### Boxplot

Display boxplots describing the range of values within each cluster for a chosen data field.

In [5]:
from IPython.display import IFrame
IFrame(src='./images/cluster/cluster_boxplot.html', width=700, height=650)
import plotly.io as io
import plotly.graph_objects as go

go.FigureWidget(io.read_json("./images/cluster/cluster_boxplot.json"))

FigureWidget({
    'data': [{'fillcolor': '#000000',
              'line': {'color': '#000000'},
             …

## Output panel

Clusters can be exported directly to the target object by clicking on the `Export` button. This can yield two possible outcomes:

- If cluster data with the same name exists on the object, a new data field is created.
- If a data field with the same name is found on the target object, values are replaced. This allows users to quickly experiment with different numbers of clusters without having to delete previous trials.

<img align="left" src="./images/cluster/cluster_output.png">

### (Optional) Geoscience ANALYST Pro - Live link

Activate the Live Link between Geoscience ANALYST and the application. The `Output path` lets users select the target monitoring folder used by Geoscience ANALYST.

  <img align="left" width="50%" src="./images/monitoring_folder.png"> 

Every time the `Export` button gets triggered, the application will write the result to the target `geoh5` while also writing a light weight temporary `geoh5` to the monitoring folder that contains only the result. If activated in Geoscience ANALYST, the result will be automatically processed and displayed in the workspace. This low-level interaction allows users to directly see the outcome of a computation in 3D.

Need help? Contact us at support@mirageoscience.com