## Hot spot and outlier analysis

Hot spot analyses and outlier analyses can identify meaningful clusters in your data, helping to minimize the subjectivity in your maps. In this tutorial we will cover how these analysis tools use statistics to detect spatial patterns in your data.

### Hot Spot Analysis - identifying statistical clusters

The Hot Spot Analysis tool uses the [Getis-Ord Gi*](https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/h-how-hot-spot-analysis-getis-ord-gi-spatial-stati.htm) statistic to identify statistically significant clusters of high and low values. In other words, it finds places where values that are very different from the average, either really high or really low,cluster together spatially in a nonrandom way. 

Let's use this set of polygons to illustrate how a hot spot analysis works. First, let's get clear on some terms.

### Feature

![feature.PNG](attachment:5ff95486-8955-4ca2-aa94-788924ffe435.PNG)

Each polygon in this dataset is called a `feature`. With the hot spot analysis, we're looking for clusters of high and low values. So each feature must have an associated numeric value. The value can be something like a count, a rate, an average, or any other numeric measure. Values can be attributes associated with your features, or you can aggregate incident data to produce a count value.

### Neighborhood

![neighborhood.PNG](attachment:22214ed4-1335-4dcf-89a9-85ae07887d71.PNG)

Next, every feature has what we call a `neighborhood`, which is made up of its surrounding features and includes the feature itself. 

### Study Area

![study_area.PNG](attachment:85bf8114-448e-4cfd-8c47-e9973f963a94.PNG)

Finally, the study area refers to all of the features in our dataset combined.

When we run the hot spot analysis, we're asking,

>"Is this neighborhood significantly different from the study area?" 

If the neighborhood's value is found to be significantly higher than the study area, the neighborhood's parent feature is marked as a hot spot. So each feature's neighborhood is compared to the study area, and the feature is assigned a probability designating whether or not it belongs to a cluster.

### Hot spot analysis

![hot_and_cold.PNG](attachment:2a88b514-754b-4b93-b336-d2cc82155f86.PNG)

The Hot Spot Analysis tool returns three levels of confidence. We can be 90%, 95%, or 99% confident that a feature belongs to a nonrandom cluster of high values, a hot spot, or to a nonrandom cluster of low values, a cold spot. And the colors in the resulting map correspond directly to those confidence levels.


### Cluster and Outlier Analysis - identifying spatial outliers


The Cluster and Outlier Analysis tool uses the [Anselin Local Moran's I](https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/h-how-cluster-and-outlier-analysis-anselin-local-m.htm#:~:text=Potential%20applications-,The%20Cluster%20and%20Outlier%20Analysis%20(Anselin%20Local%20Moran's%20I)%20tool,poverty%20in%20a%20study%20area%3F) statistic to identify both value clusters and local outliers. Similar to a hot spot analysis, a cluster and outlier analysis finds places where high and low values cluster together,and it also identifies features with values significantly different from their surroundings, like a low value surrounded by high values, or a high value surrounded by low values.

Just like with the hot spot analysis, we have the feature, the neighborhood, and the study area. But in a Cluster and Outlier Analysis, the feature does not belong to its own neighborhood. This allows for two different comparisons. 

> Is the neighborhood significantly different from the study area?\
> Is the feature significantly different from the study area?

If both the feature and the neighborhood have values very different from the study area, the feature is considered significant.

### The Significant Four

There are `four` possible types of significance. 

![significant_4.PNG](attachment:9c578196-4713-481d-9148-cf4d7d860eb0.PNG)

If the feature and the neighborhood have similar values, the feature is considered to belong to a cluster. So a feature with a high value surrounded by a neighborhood with a high value is marked as a `high-high` cluster, while a feature with a low value surrounded by a neighborhood with a low value is marked as a `low-low` cluster.

If the feature and the neighborhood have dissimilar values, the feature is considered an `outlier`. Outliers are features with values very different from their surroundings. So a feature with a high value surrounded by neighborhood with a low value is marked as a `high-low outlier`. And a feature with a low value surrounded by a neighborhood with a high value is marked as a `low-high outlier`.

The resulting map shows the clusters in the lighter shades of pink and blue and the outliers in the brighter shades of red and blue. Both of these statistical clustering methods help us identify meaningful patterns in our spatial data by analyzing features in the context of their immediate surroundings.

### How are neighborhood sizes determined ?

![neighborhood_size.PNG](attachment:ea8b43ea-3f5e-45ab-a6a1-c5d7c0984ea7.PNG)

As Tobler's first law of geography states,

>“…we expect things that are near each other to be related.”

So an important part of these statistical cluster analysis methods is defining the distance at which we expect the phenomenathat we are studying to be related. This is called conceptualizing spatial relationships or defining what it means to be a neighbor. And there are many different ways to do this. 

#### Fixed distance band

![fixed_distance_band.PNG](attachment:aae8932d-b9a5-4ada-b2f5-8d561434438e.PNG)

One of the more common methods is to use a `fixed distance band`. With a fixed distance band, you specify a distance of spatial influence, and all features within that distance are considered to be spatially related or neighbors. This is the method used in the Optimized Hot Spot and Outlier Analysis tools, which interrogate your data to provide a default distance at which clustering is most pronounced.

#### Contiguity

![contiguity.PNG](attachment:f116a3b1-13a8-4359-95c3-fd9f900d4f5e.PNG)

Another approach is to use `contiguity`. With this method, anything that shares a border with a feature will be considered its neighbor. 

#### K-nearest neighbors


![k1.PNG](attachment:446e48cc-5a0a-4dba-8a0f-de34277bc4ec.PNG) ![k2.PNG](attachment:05e1f3e5-bcaf-4083-adcb-230be095466a.PNG) ![k3.PNG](attachment:617c8d9a-00c3-477e-8529-eb982db6b00a.PNG)

The k-nearest neighbors approach allows neighborhood sizes to vary by including the feature's closest neighbors no matter how far they actually are.

In this illustration, we've specified four nearest neighbors, which for some features are a lot closer than others. But despite the distance, those nearest features are still the ones most likely to be related. 

#### Network Spatial Weights

![nwtwork_spatial_weights.PNG](attachment:6430b3ea-78af-488d-b57f-2ae1493fd3ac.PNG)

The `Network Spatial Weights` method is similar to a fixed distance band, except that it measures distance along a network. So you could specify everything within a 10-minute drive time or a 5-mile driving radius, which is particularly useful for capturing human behavior.

No matter which conceptualization of spatial relationships you choose, it's extremely important that you consider both the subject matter and your analysis question to thoughtfully define what it means to be a neighbor.
