## Section 4 Exercise 1: Detect patterns

## Introduction

Statistical cluster analysis can help you minimize the subjectivity in your maps by identifying meaningful clusters in your data. The Hot Spot Analysis and Outlier Analysis tools use statistics to detect spatial patterns in your data, but each provides slightly different information about these patterns.

The `Hot Spot Analysis` tool uses the **Getis-Ord Gi*** statistic to identify statistically significant spatial clusters of high values (hot spots) and low values (cold spots).

The `Outlier Analysis` tool uses the **Anselin Local Moran's I** statistic to identify statistically significant clusters of high and low values and to detect spatial outliers, or features with values that are significantly dissimilar from their neighbors.

ArcGIS provides traditional and optimized statistical cluster analysis tools. The optimized tools interrogate your data to provide smart default values, optimizing the analysis workflow. The traditional tools allow you more flexibility in defining the spatial relationships in your data, giving you more control over your analysis. In this exercise, you will use the optimized statistical cluster analysis tools to explore spatial patterns in data.



## Scenario
The Supplemental Nutrition Assistance Program (SNAP) is a federal program in the United States that helps families buy nutritional food to maintain their health and well-being. In this exercise, you will complete a hot spot analysis and an outlier analysis to find meaningful patterns of high and low SNAP participation. This information can help decision makers distribute resources more efficiently and equitably, ensuring that healthy food is accessible to all SNAP recipients.

### **Step 1: Download the exercise data files**

In this step, you will download the exercise data files.

1. Open a new web browser tab or window.

2. Go to https://links.esri.com/Section04/DataOpens in new window and download the exercise data ZIP file.

>Note: The complete URL to the exercise data file is https://www.arcgis.com/home/item.html?id=c8cc198207034a9f957df448527f21e3Opens in new window.

3. Extract the files to the EsriTraining folder on your local computer.

### **Step 2: Open an ArcGIS Pro project**

In this step, you will open the ArcGIS Pro project that you downloaded.

1. Start ArcGIS Pro.

2. If necessary, sign in using the provided course ArcGIS account.

3. On the Start page, click Open Another Project.

>Note: If you have configured ArcGIS Pro to start without a project template or with a default project, you will not see the Start page. On the Project tab, click Open, and then click Open Another Project.

4. In the Open Project dialog box, browse to the `PatternDetection_SpaceTime` folder that you saved on your computer.

5. Click `PatternDetection_SpaceTime.aprx` to select it, and then click OK.

Your ArcGIS Pro project includes a map of the counties in the contiguous United States. Each county is symbolized by the rate of the population that participated in SNAP during 2019.

![pattern_detection.PNG](attachment:ca93e4e9-183f-41f4-8b4a-ed90d4011818.PNG)


### **Step 3: Run a hot spot analysis**

The statistical cluster analysis tools analyze your data to detect patterns of high and low values. In this analysis, you will analyze the distribution of a value that is named SNAPRate, which is the percentage of eligible people who participate in the SNAP program in each county. 

1. In the Geoprocessing pane, under the search field, click the Toolboxes tab.

> Note: If you closed the Geoprocessing pane, from the Analysis tab, in the Geoprocessing group, click Tools.

2. Expand Spatial Statistics Tools, and then expand Mapping Clusters.

3. Click Optimized Hot Spot Analysis.

![optimized_hot_spot_analysis.PNG](attachment:287f9ac5-392b-485f-a931-07ec875b8c2c.PNG)

The Optimized Hot Spot Analysis tool opens in the Geoprocessing pane.

4. In the Geoprocessing pane, set the following parameters:

- Input Features: US_Counties
- Output Features: **SNAPHotSpots**
- Analysis Field: SNAPRate

5. Expand Override Settings.

The Optimized Hot Spot Analysis tool uses a fixed-distance band to define each feature's neighborhood. If you do not specify a distance, the tool uses incremental spatial autocorrelation to determine whether there is a scale, or distance, at which clustering across the dataset is most pronounced. If the Optimized Hot Spot Analysis tool cannot identify a distance using this method, it will compute the average distance that would yield 30 neighbors for each feature. To learn more about the Incremental Spatial Autocorrelation tool, go to ArcGIS Pro Help: [Incremental Spatial Autocorrelation (Spatial Statistics)](https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/incremental-spatial-autocorrelation.htm).

The fixed-distance band method can be used to define feature neighborhoods. Other methods are available with the traditional Hot Spot Analysis (Getis-Ord Gi*) tool. To learn more about this tool, go to ArcGIS Pro Help: [Hot Spot Analysis (Getis-Ord Gi*) Spatial Statistics](https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/hot-spot-analysis.htm).

6. Click Run.

The result of your analysis is a layer displaying hot spots in three shades of red and cold spots in three shades of blue. The varying shades correspond to three confidence intervals, indicating how confident you can be that these patterns are not the result of random chance.

![SNAP_hotspots.PNG](attachment:54acacbf-feea-4d60-b120-4e7de66ff88a.PNG)

### **Step 4: Review the analysis parameters**

Next, you will review the analysis details to ensure that the parameters were appropriate for your analysis of SNAP participation.

1. At the bottom of the Geoprocessing pane, click View Details.

The Optimized Hot Spot Analysis tool message window appears, listing the tool's geoprocessing steps in detail.

> Tip: If you choose the default values for the Optimized Hot Spot Analysis tool, review the geoprocessing details to identify the default parameter values. Ensure that these values are appropriate for the scale of your analysis.

2. Click Messages, and then review the geoprocessing details to answer the following question.

> Q. Which distance band was chosen for this analysis?

> A. ![distance_band.PNG](attachment:c818e4e7-3085-4924-a74f-afbaf3b5e38e.PNG)

The tool chose a default distance band of approximately 150 kilometers based on the average distance to 30 nearest neighbors. This default value is a good place to start exploring your data, but it may not represent the scale at which you want to analyze patterns in your dataset. In this example, a 150-kilometer distance band is too large because you want to analyze more local patterns in SNAP participation. You will reduce the distance band to 75 kilometers to detect more local patterns in this county-level dataset.

3. Close the Optimized Hot Spot Analysis tool message window.

4. In the Geoprocessing pane, expand Override Settings, if necessary.

5. Under Distance Band, in the left-hand field, type 75.

6. In the field to the right of 75, click the down arrow and choose Kilometers.

Click Run.

![SNAP_hotspots_75km.PNG](attachment:4076a6fb-8b7a-44e2-bc39-5768d93b8e71.PNG)

Reducing the size of the distance band identified more detailed patterns. This scale is more appropriate for this particular analysis.

### **Step 5: Interpret the results**

In this step, you will interpret the results of your statistical analysis.

1. In the Contents pane, locate the SNAPHotSpots and US_States layers.

2. Drag the US_States layer above the SNAPHotSpots layer.

3. If necessary, expand the SNAPHotSpots layer to see the legend.

![SNAP_hotspots_states.PNG](attachment:3efbb257-3d87-4861-8af6-05670e55728d.PNG)

> Q. What statistically significant spatial patterns can you detect from this analysis?

> A. Generally, the southeastern areas of the contiguous United States have statistically significantly high SNAP participation, and the north-central areas of the contiguous United States have statistically significantly low SNAP participation.

The results of this statistical analysis provide a measure of confidence that can help you identify areas with clusters of high SNAP participation. You can use this information to investigate these areas and their access to stores that accept SNAP and carry healthy foods.

4. Save the project.



### **Step 6: Run an outlier analysis**

Completing an outlier analysis will help you identify features that have values that are statistically significantly different from neighbors' values. This analysis will provide additional insight into the spatial patterns of the data.

1. In the top-left corner of the Geoprocessing pane, click the Back button Back ![image.png](attachment:cec9bb5c-f584-4a9d-a9de-ff4cd9ed6388.png), and then search for **outlier**.

2. Click Optimized Outlier Analysis (Spatial Statistics Tools).

Similar to the Optimized Hot Spot Analysis tool, the Optimized Outlier Analysis tool will interrogate your data to determine an appropriate neighborhood distance for your analysis. The traditional outlier analysis tool, Cluster And Outlier Analysis (Anselin Local Moran's I), gives you more control over the analysis parameters. To learn more about the Cluster And Outlier Analysis (Anselin Local Moran's I) tool, go to ArcGIS Pro Help: Cluster and Outlier Analysis (Anselin Local Moran's I) (Spatial Statistics)Opens in new window.

3. In the Optimized Outlier Analysis tool, set the following parameters:

- Input Features: US_Counties
- Output Features: **SNAPOutliers**
- Analysis Field: SNAPRate

The Performance Adjustment field defines the number of permutations to create a random distribution. The tool will then compare your data's spatial distribution with the randomly generated values. To balance precision and processing time, you will leave the default. For more information about permutations, go to ArcGIS Pro Help: [How Cluster and Outlier Analysis (Anselin Local Moran's I) works](https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/h-how-cluster-and-outlier-analysis-anselin-local-m.htm).

4. Expand Override Settings.

>Tip: When you compare the results of a hot spot analysis and an outlier analysis, use the same distance band in the analysis.

5. Update the Distance Band fields to 75 Kilometers.

6. Click Run.

>Note: The permutations in the Optimized Outlier Analysis tool compare your data values to a set of randomly generated values. Therefore, your results may vary slightly from the following graphic.

![optimized_outlier_analysis.PNG](attachment:a0be9f42-21b7-4358-a62f-37f798c3b5cb.PNG)

The bright red and blue features represent spatial outliers. Features with high values surrounded by areas with low values are called High-Low outliers and are displayed in red. Features with low values surrounded by areas with high values are called Low-High outliers and are displayed in dark blue. The pink and light blue colors indicate clusters of features with statistically significantly high values (pink) and statistically significantly low values (light blue). These clusters typically align with the hot spots and cold spots from the Optimized Hot Spot Analysis tool.

### **Step 7: Compare the results of analyses**

You have completed two different methods of pattern detection—a hot spot analysis and an outlier analysis. You may be wondering which method is more appropriate to use. The answer is often both. Each method answers a different question and is valuable in understanding the patterns in your data. You can compare these analyses to gain additional insight into the spatial patterns of your data.

1. In the Contents pane, locate the SNAPOutliers and US_States layers.

2. Drag the US_States layer above the SNAPOutliers layer.

3. From the Feature Layer tab, in the Compare group, click Swipe.

4. In the Contents pane, click the SNAPOutliers layer.

5. Click the map, and then drag your pointer to the left, to the right, or up and down to compare the results from the Optimized Hot Spot Analysis tool and the Optimized Outlier Analysis tool.

Using a hot spot analysis and an outlier analysis, you located statistically significant clusters of both high and low SNAP participation. This information can help in the allocation of SNAP resources to areas of higher food insecurities and also help identify areas where eligible residents may be underutilizing or not enrolling in the SNAP program. The results can help drive the decision to distribute resources more efficiently and equitably.

6. At the top of the map view, next to Pattern Detection, click the X to close the map.

7. Save the project and exit ArcGIS Pro.

### **Step 8: Stretch goal (Optional)**

Another way to run a hot spot analysis in ArcGIS Pro is by using R. The R-ArcGIS Bridge offers you the ability to tap directly into R from your current ArcGIS Pro project. You can then use the R-ArcGIS Bridge to combine the power of R and ArcGIS to solve spatial problems and for spatial data access and visualization. This stretch goal introduces you to the R-ArcGIS Bridge. You will use R to run a hot spot analysis and an outlier analysis.

> Note: To complete the stretch goal, you must install the following software:

- R 3.5 or newer (https://www.r-project.org) 
- R-ArcGIS Bridge (https://links.esri.com/RArcGISBridgeOpens)

1. Use the following high-level steps to complete this analysis using ArcGIS Pro and RStudio:

- Install R, if necessary.
- Install R-ArcGIS Bridge, if necessary.
- Clone and activate a new arcgispro-py3 environment.
- Install R-Essentials.
- In ArcGIS Pro, in the Python Package Manager, make sure that you set your environment to arcgispro-r.
- Explore your data.
- Run a hot spot analysis using R.
- Run an outlier analysis using R.

>Note: For additional support, see the following sites:

- GitHub: R-Bridge-Tutorial-NotebooksOpens in new window (https://github.com/R-ArcGIS/R-Bridge-Tutorial-Notebooks)
- Esri: R-ArcGIS SupportOpens in new window (https://links.esri.com/RArcGIS/Support)
- Esri: R-ArcGIS Bridge OverviewOpens in new window (https://links.esri.com/RArcGIS/Overview)
- R Tutorial: Hotspot Analysis using Getis Ord GiOpens in new window (https://rpubs.com/heatherleeleary/hotspot_getisOrd_tut)

2. Use the Lesson Forum to post your questions and observations. Be sure to include the #stretch hashtag in the posting title.

3. When you are finished, save the project and exit ArcGIS Pro.