## Creating Feature Class & Running Multivariate Clustering

This notebook is a walk-through of the following steps: 
- Resample bathymetry layer
- Convert one of the BTM outputs raster to points
- Create feature class containing all layers 
- Aggregate points to hexagons for better visualization
- Run Multivariate Clustering 

### <u>Aggregate bathymetry data to coarser resolution<u>

The GEBCO bathymetry data is at ~450m resolution, while the Bio-ORACLE and POC flux layers are much coarser (1/12-degree resolution or ~9km). To have grid cells of adequate size for all layers, we will down-sample one of the bathymetry layers (in this case to 5km resolution) using the **Aggregate** Spatial Analyst tool. Use the following settings: Cell factor = 11, Aggregation technique = Mean. Note that any of the bathymetry layers can be used here (bathymetry, BBPI, FBPI, or slope). <br>

### <u>Convert Raster to Points, create bins, and create Feature Class<u>

1. Convert the same raster that the previous step was run on to points using the **Raster to Point** tool. <br>
After this step is done, you can right click on the layer in the contents pane and select "Attribute Table" to look at the contents of the table. <br>
The next step is to add all other layers (BTM outputs and environmental layers) to the same Feature Class. 

2. Optional: convert points to hexagons using the **Aggregate Points** tool

3. Use the **Extract Multi Values to Points** tool to add other layers to the feature class generated in the previous steps. Select the input rasters to include in the feature class: all BTM output layers, Bio-ORACLE layers, and benthic POC flux model layer.

<img src="images/table.png" style="width:600px; height:auto;" />

### <u>Run Clustering Analysis<u>

Use the **[Multivariate Clustering](https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/multivariate-clustering.htm)** Spatial Statistics tool by doing the following:
- Input Features: select the feature class generated in the previous step
- Select the Analysis Fields to be used in the clustering (temperature, salinity, poc flux, bbpi, fbpi, slope). 
- Select the k-medoids clustering method, and the "optimized seed locatios" as the initialization method
- Enter a name/path for an output table for evaluating the number of clusters

<img src="images/clustering_settings.png" style="width:400px; height:auto;" />

Once the clustering is done running, you can open the Pseudo-F statistic chart to evaluate the number of clusters that you want to use for your analysis. The Pseudo-F statistic describes the ratio of between cluster variance to within-cluster variance. Once you've decided the number of clusters to use, re-run the clustering, this time inputting a number under "Number of Clusters".

<img src="images/pseudo_f_allparams.png" style="width:700px; height:auto;" />

Once the clustering is done running using your specified number of clusters, box-plots will be generated to visualize how different clustering groups correspond to the various input layers. Select Mulivariate Clustering Box-Plots in the content pane to interact with those figures. <br>
As an example, values corresponding to the cluster "7" (yellow line in the plot below) tends to be productive (high POC flux), low salinity, and higher temperature. 

<img src="images/boxplot.png" style="width:700px; height:auto;" />

## <u>Visualizing outputs<u>

There are several ways to visualize the outputs of the clustering in a map:
- One option is to use the **Point to Raster** tool to generate a raster layer from the point feature class. In this case, each color corresponds to a cluster ID generated during the clustering step.

<img src="images/output_raster.png" style="width:400px; height:auto;" />

- Using hexagonal bins to visualize the point-level data can also be done using the **Aggregate Points** GeoAnalyst Desktop Tool