# KMeans Clustering

This application lets users cluster data stored on [Geoscience ANALYST](https://mirageoscience.com/mining-industry-software/geoscience-analyst) objects using the [Scikit-Learn.KMeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html?highlight=kmeans#sklearn.cluster.KMeans)
clustering algorithm. Leveraging [Plotly](https://plotly.com/) visualization tools, users are able to assess the clustering
results using histogram, box, scatter, inertia and cross-correlation plots.


  <img align="right" width="50%" src="./images/clustering_app.gif">


New user? Visit the [Getting Started](../installation.rst) page.

## Application
The following sections provide details on the different parameters controlling the application. Interactive widgets shown below are for demonstration purposes only.

In [1]:
from geoapps.processing import Clustering

app = Clustering(h5file=r"../../../assets/FlinFlon.geoh5")
app.main

VBox(children=(VBox(children=(Label(value='Workspace', style=DescriptionStyle(description_width='initial')), H…

## Project Selection

Select and connect to an existing **geoh5** project file containing data. 

In [2]:
app.project_panel

VBox(children=(Label(value='Workspace', style=DescriptionStyle(description_width='initial')), HBox(children=(F…

See the [Project Panel](base_application.ipynb#Project-Panel) page for more details.

## Object and Data Selection

List of objects available from the target `geoh5` project. Only the selected data channels are used in the clustering routine.

In [3]:
app.data_panel

VBox(children=(Dropdown(description='Object:', index=70, options=(['', None], ['fault_splay1', UUID('0febc8ad-…

## Clustering

Select the number of clusters (groups) desired.

In [4]:
app.n_clusters

IntSlider(value=8, continuous_update=False, description='Number of clusters', min=2, style=SliderStyle(descrip…

By default, the application will run
KMeans for 2, 4, 8, 16 and 32 groups in order to draw a meaningful :ref:`Inertia Curve <inertia_curve>`

### Refresh

Re-run the clustering after changing the list of input data or [Population Downsampling](#Population-Downsampling).

In [5]:
app.refresh_clusters



## Clusters Color

Assign a specific color to a given cluster group.

In [6]:
app.clusters_panel

VBox(children=(Dropdown(description='Cluster', options=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …

## Analytics

Plotting options to analyze the selected data and KMeans clusters.

In [7]:
app.plotting_options

ToggleButtons(description='Analytics', options=('Crossplot', 'Statistics', 'Confusion Matrix', 'Histogram', 'B…

### Crossplot

See the [Scatter Plot](#Scatter-Plot) documentation for details.

In [8]:
app.figure

FigureWidget({
    'data': [{'marker': {'color': array([0.57142857, 1.        , 0.71428571, ..., 0.57142857, 0…

By default, the color values displayed correspond to the cluster groups.

### Statistics

Display statistics for the chosen data channels using [pandas.DataFrame.describe](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html).

In [9]:
app.dataframe.describe(percentiles=None, include=None, exclude=None)

Unnamed: 0,Al2O3,CaO,V,MgO,Ba
count,4310.0,4310.0,4310.0,4310.0,4310.0
mean,14.007694,6.860206,170.181636,4.566049,295.058564
std,2.611845,3.771473,141.316527,3.651026,304.941323
min,-17.0,-17.0,-17.0,-17.0,-20.0
25%,12.7,4.09,48.0,2.02,99.0
50%,14.21,7.01,133.0,4.31,193.0
75%,15.41,9.02,284.75,5.69,385.0
max,25.5,29.799999,640.0,24.77,3200.0


### Confusion Matrix

Display the confusion matrix for the chosen data channels.

In [10]:
app.plotting_options.value = "Confusion Matrix" # Emulate button click
app.heatmap_fig

FigureWidget({
    'data': [{'colorscale': [[0.0, '#440154'], [0.1111111111111111, '#482878'],
               …

### Histograms

Display histograms for each data field.

In [11]:
app.plotting_options.value = "Histogram" # Emulate button click
app.histo_plots["Al2O3"]

FigureWidget({
    'data': [{'histnorm': 'percent',
              'name': 'Al2O3',
              'type': 'hist…

By default, all fields are normalized between [0, 1].

#### Scale

Option to increase the weight of a specific data field.

In [12]:
app.scalings["Al2O3"]

IntSlider(value=1, continuous_update=False, description='Scale', max=10, min=1)

#### Upper Bound

Upper bound (maximum) value used for the KMeans clustering.

In [13]:
app.upper_bounds["Al2O3"]

FloatText(value=25.5, description='Upper bound')

#### Lower Bound

Lower bound (minimum) value used for the KMeans clustering.

In [14]:
app.lower_bounds["Al2O3"]

FloatText(value=-17.0, description='Lower bound')

### Inertia

Display the clusters inertia, or sum squares of distances between each sample
to the center of its cluster group. The optimal number of clusters is
generally thought to be at the point of maximum curvature.

In [15]:
app.plotting_options.value = "Inertia" # Emulate button click
app.inertia_plot

FigureWidget({
    'data': [{'mode': 'lines',
              'type': 'scatter',
              'uid': 'd7f7e25c-…

### Boxplot

Display boxplots describing the range of values within each cluster for a chosen data field.

In [16]:
app.plotting_options.value = "Boxplot" # Emulate button click
app.box_plots["Al2O3"]

FigureWidget({
    'data': [{'fillcolor': '#000000',
              'line': {'color': '#000000'},
             …

## Output panel

Clusters can be exported directly to the target object by clicking on the export button. This can yield two possible outcomes:

- If cluster data with the same name exists on the object, a new data field is created.
- If a data field with the same name is found on the target object, values are replaced. This allows users to quickly experiment with different number of cluster without having to delete previous trials.

In [17]:
app.output_panel

VBox(children=(VBox(children=(Button(button_style='danger', description='Export', icon='check', style=ButtonSt…

### (Optional) GA Pro - Live link
See [Output Panel](base_application.ipynb#Output-Panel) base applications.

In [18]:
app.channels_plot_options.value = "V"
app.box_plots["V"].write_image("images/cluster_thumbnail.png")
app.plotting_options.value = "Crossplot" # Emulate button click

Need help? Contact us at support@mirageoscience.com