## Reclassifying Data in GIS Analysis

Reclassifying data according to specific criteria plays a pivotal role in GIS analysis. This segment aims to demonstrate the process of reclassifying values based on predetermined criteria. 


### Tutorial Objectives

In this tutorial, we will:

- Utilize classification schemes from the [`PySAL mapclassify`](https://pysal.org/mapclassify/) library to categorize population counts in different classes.


In [None]:
import pathlib 
NOTEBOOK_PATH = pathlib.Path().resolve()
DATA_DIRECTORY = NOTEBOOK_PATH / "Data_L4"

Here we will use a population polygon grid from the statistical agency of Sweden at a 1 kilometer resolution for Värmland

In [None]:
import geopandas

pop_1km = geopandas.read_file(
    DATA_DIRECTORY
    / "pop_1km_clipped.gpkg"
)

pop_1km.head()

In [None]:
pop_1km.plot()

## Common Classifiers in Spatial Analysis

### Classification Schemes for Thematic Maps

The [PySAL](https://pysal.org/) module stands out as a comprehensive Python library dedicated to spatial analysis. It encompasses a wide array of common data classifiers, often utilized in data visualization tasks. The [mapclassify](https://github.com/pysal/mapclassify) module within `PySAL` provides access to the following classifiers:

- **Box Plot**
- **Equal Interval**
- **Fisher Jenks**
- **Fisher Jenks Sampled**
- **HeadTail Breaks**
- **Jenks Caspall**
- **Jenks Caspall Forced**
- **Jenks Caspall Sampled**
- **Max P Classifier**
- **Maximum Breaks**
- **Natural Breaks**
- **Quantiles**
- **Percentiles**
- **Std Mean**
- **User Defined**

Here we are interested in the POP column which not surprisingly, indicates population counts.

Let’s plot the data and see how it looks like

- cmap parameter defines the color map. Read more about choosing [colormaps](https://matplotlib.org/3.1.0/tutorials/colors/colormaps.html) in matplotlib

- scheme option scales the colors according to a classification scheme (requires mapclassify module to be installed):

In [None]:
# Plot using 5 classes and classify the values using "Natural Breaks" classification
pop_1km.plot(column="POP", scheme="Natural_Breaks", k=5, cmap="RdYlBu", linewidth=1, legend=True)

In [None]:
# Plot using 5 classes and classify the values using "Quantiles" classification
pop_1km.plot(column="POP", scheme="Quantiles", k=5, cmap="RdYlBu", linewidth=0, legend=True)

Population distributions can be hugely varying so often a log transform is applied to make them more easy to work with.

In [None]:
# Logarithmic transformation with natural breaks
import numpy as np
pop_1km['POP_log'] = np.log1p(pop_1km['POP'])  
pop_1km.plot(column="POP_log", scheme="Natural_Breaks", k=5, cmap="RdYlBu", linewidth=0, edgecolor="gray", legend=True)



### Applying classifiers to data
As mentioned, the scheme option defines the classification scheme using pysal/mapclassify. Let’s have a closer look at how these classifiers work.

In [None]:
import mapclassify

In [None]:
# Natural breaks
mapclassify.NaturalBreaks(y=pop_1km["POP"], k=6)


In [None]:
# Quantiles
mapclassify.Quantiles(y=pop_1km["POP"],k=5)

## Εxtract threshold values
It’s possible to extract the threshold values into an array

In [None]:
classifier = mapclassify.NaturalBreaks(y=pop_1km["POP"], k=6)
classifier.bins

Let’s apply one of the Pysal classifiers into our data and classify population counts into 6 classes The classifier needs to be initialized first with make() function that takes the number of desired classes as input parameter

In [None]:
# Create a Quantiles classifier
classifier = mapclassify.Quantiles.make(k=5)

In [None]:
# Classify the data
classifications = pop_1km[["POP"]].apply(classifier)

# Let's see what we have
classifications.head()

In [None]:
type(classifications)

We can also add the classification values directly into a new column in our dataframe:

In [None]:
# Rename the column so that we know that it was classified with natural breaks
pop_1km["nb_POP"] = pop_1km[["POP"]].apply(classifier)

# Check the original values and classification
pop_1km[["nb_POP", "POP"]].head()

 Let’s visualize the results and see how they look.

In [None]:
# Plot
pop_1km.plot(column="nb_POP", linewidth=0, legend=True)

### Plotting a histogram
A histogram is a graphic representation of the distribution of the data. When classifying the data, it’s always good to consider how the data is distributed, and how the classification shceme divides values into different ranges.

In [None]:
# Histogram for pop data
pop_1km["POP"].plot.hist(bins=10)

In [None]:
# Histogram for log pop data
pop_1km["POP_log"].plot.hist(bins=10)

Let’s also add threshold values on top of the histogram as vertical lines.

Natural Breaks using the log pop:

In [None]:
import matplotlib.pyplot as plt

# Define classifier
classifier = mapclassify.NaturalBreaks(y=pop_1km["POP_log"], k=5)

# Plot histogram 
pop_1km["POP_log"].plot.hist(bins=50)

# Add vertical lines for class breaks
for break_point in classifier.bins:
    plt.axvline(break_point, color="k", linestyle="dashed", linewidth=1)

### Applying a custom classifier

Sometimes we want to classify our data using our own specific criteria or methods (for instance unsupervised clustering with k-means or user defined scenarios). Let's see a simple example of that.

In [None]:


# Custom classifier function
def classify_population(POP):
    if POP < 50:
        return 'Low'
    elif 50 <= POP < 250:
        return 'Medium'
    else:
        return 'High'

# Apply the custom classifier 
pop_1km["POP_class"] = pop_1km['POP'].apply(classify_population)

# Display the modified GeoDataFrame
print(pop_1km.head())


Lets see how this worked out:

In [None]:
# Plot
pop_1km.plot(column="POP_class", linewidth=0)