# GG4257 - Urban Analytics: A Toolkit for Sustainable Urban Development
## Lab Workbook No 4: Geovisualization II - Apps.
## CHALLENGE THREE
---
Dr Fernando Benitez -  University of St Andrews - School of Geography and Sustainable Development - Iteration 2024

# Ploty - Dash

Ploty is probably one of the most common and powerful open-source libraries to create interactive charts, maps, and, in general, visualisation components that give you the tools to display your processed data. Ploty is available for Python, R, Julia, Javascript, ggplot2, F#, MATLAB®, and Dash.

> More information at: https://plotly.com/graphing-libraries/

Using Plotty, we will practice how to create Choropleth Maps, which, as we described in the lecture, are composed of coloured polygons. These maps are used to represent **spatial variations of a quantity**. 

Plotly figures made with **Plotly Express** `px.scatter_geo`, `px.line_geo` or `px.choropleth` functions or containing `go.Choropleth` or `go.Scattergeo` graph objects have a `go.layout.Geo` object which can be used to control the appearance of the base map onto which data is plotted.

**What are the main components you need to be aware of?**

1. Geometry information:
This can either be a supplied GeoJSON file where each feature has either an id field or some identifying value in properties (Code bellow adapted from: https://plotly.com/python/choropleth-maps/#indexing-by-geojson-properties )
2. A list of values indexed by feature identifier.


To begin with, we'll use a library (request) to read a JSON file from the cloud and obtain a response from a web service. The "counties" variable contains the response in the form of an array. By validating the first element of this array, you can view all the attributes provided by the service. Check that you have the ID of each polygon `id`

In [None]:
from urllib.request import urlopen
import json

url_path='https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json'

with urlopen(url_path) as response:
    counties = json.load(response)


counties["features"][0]

Using an ID as a shared key, we will merge a table with unemployment data. The attribute `fips` will be used as the shared key.

In [None]:
import pandas as pd

csv_path='https://raw.githubusercontent.com/plotly/datasets/master/fips-unemp-16.csv'
df = pd.read_csv(csv_path,
                 dtype={"fips": str}
                )
df.head()

In [None]:
# Here we can get all the properties of the px.choropleth object
# https://plotly.com/python-api-reference/generated/plotly.express.choropleth.html

# One useful parameter is scope to define where you want to centre and locate your map.

import plotly.express as px

fig = px.choropleth(df, 
                    geojson=counties,
                    locations='fips',
                    color='unemp',
                    color_continuous_scale="Viridis",
                    range_color=(0, 12),
                    scope="usa",
                    labels={'unemp':'unemployment rate'}
                    )
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

## choropleth
Now, let's try to use a similar example but using local data (e.g. shapefile) and geopandas. As I described before, you need an `ID`, which could also be one of the attributes of your data. 

In this case, we will use the `DataZone` from the SIMD - Scottish Index of MultiDeprevition from the official website: https://www.spatialdata.gov.scot


In [None]:
import plotly.express as px
import geopandas as gpd

shapefile_path = 'SIMD_2020_GlasgowCity.shp'
gdf = gpd.read_file(shapefile_path)

In [None]:
gdf.set_index('DataZone', inplace=True)
gdf.head()

In [None]:
gdf.columns

In [None]:
fig = px.choropleth(gdf,
                    geojson=gdf.geometry,
                    locations=gdf.index,
                    color="Quintilev2",
                    projection="mercator", #Why do you think we had to use this?
                   )
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

In Plotly Express, you can adjust the classification method for choropleth maps using the `color_continuous_scale` parameter. The `color_continuous_scale` parameter allows you to specify a color scale for continuous data, and you can choose from a variety of built-in scales provided by Plotly.

Here you can find all the built-in scales. https://plotly.com/python/builtin-colorscales/

## choropleth_mapbox

For the next example, we will use another object, `choropleth_mapbox`, which provides other properties that create more stylish maps that include base maps. Reference site: https://plotly.com/python-api-reference/generated/plotly.express.choropleth_mapbox 

In [None]:
fig = px.choropleth_mapbox(gdf,
                           geojson=gdf.geometry,
                           locations=gdf.index,
                           color="Quintilev2",
                           color_continuous_scale="ylgn",
                           range_color=(1, 5),
                           opacity=0.5,
                           center={"lat": 55.866193, "lon": -4.258246},
                           mapbox_style="carto-positron",
                           zoom=9.5)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

## mapclassify from PySAL

To complete this section, we can utilize the `mapclassify` library https://pysal.org/mapclassify/ from PySAL to choose the appropriate classification method and specify the number of classes, based on your data's nature and type. As discussed in the lecture, it's crucial to choose the right method to avoid misleading your audience. Choropleth maps are subjective, and different maps can represent the same data but with varying classification methods and colour schemes. Both approaches are valid, but the classification method determines how data is classified and coloured.

In [None]:
import mapclassify as mc
import matplotlib.pyplot as plt
import folium
import seaborn as sns

shapefile_path = 'SIMD_2020_GlasgowCity.shp'
gdf = gpd.read_file(shapefile_path)
gdf.head()

Classifying data can be a complex task since it is often subjective. However, regardless of the numerical attribute you want to represent, you can plot a histogram to visualize how the attribute is distributed. After that, you can find a classification method that not only better represents the data distribution but also makes sense with how your data can be clustered into classes.

Now we can make several histograms to make some EDA analyses and define how the data is distributed in three attributes: Crime Counts (`CrimeCount`), Number of working-age people who are employment deprived (`EmpNumDep`) and Hospital stays related to alcohol use (`HlthAlcSR`)

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(15, 5))


sns.histplot(data=gdf, x="CrimeCount",ax=axes[0], kde=True) 
sns.histplot(data=gdf, x="EmpNumDep",ax=axes[1], kde=True) # Number of working age people who are employment deprived
sns.histplot(data=gdf, x="HlthAlcSR",ax=axes[2], kde=True) # Hospital stays related to alcohol use

axes[0].set_title("Crime Counts ")
axes[1].set_title("# of working age people who are employment deprived ")
axes[2].set_title("Hospital stays related to alcohol use ")

plt.tight_layout()
plt.show()

Next, we call the library classifier to run the Natural Breaks; if we print the object, we can see the intervals (classes) created and the number of elements that fall into each interval. We can also see how the bins are created and which one is the minimum and maximum value from the classification method.

In [None]:
# Number of classes for classification
num_classes = 5

# Using Natural Breaks (Jenks) classification
# http://nbviewer.jupyter.org/github/pysal/mapclassify/blob/main/notebooks/03_choropleth.ipynb
classifier_nb = mc.NaturalBreaks(gdf['EmpNumDep'], k=num_classes)
print(classifier_nb)
print(min(classifier_nb.bins), max(classifier_nb.bins))
print(classifier_nb.bins) # bins object stores the break points when considering the classification method.

# Using Equal Interval classification
classifier_ei = mc.EqualInterval(gdf['EmpNumDep'], k=num_classes)
print(classifier_ei)
print(min(classifier_ei.bins), max(classifier_ei.bins))
print(classifier_ei.bins) # bins object stores the break points when considering the classification method.

In [None]:
fig, ax = plt.subplots(figsize=(8, 5))


sns.histplot(data=gdf, x="EmpNumDep", ax=ax, kde=True, bins=20)

# The following is a bit tricky, so let's break it down. The idea is to plot the bins or breakpoints
# along with the histogram to see how the data is being classified.
# 1. Initially, define the style of the lines to represent the breakpoints.
ax.axvline(classifier_nb.bins[0], color='red', linestyle='dashed', linewidth=2, label='Breakpoints') 
# 2. A simple For to loop over all the elements in the array 'classifier_nb.bins', you printed that in the cell above.
for bin_value in classifier_nb.bins:
    ax.axvline(bin_value, color='red', linestyle='dashed', linewidth=2) # here I use axvline https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.axvline.html
 
# 3. Now, just styling the plot
ax.set_title("Histogram with Breakpoints for Natural Breaks")

# We need a legend...but is optional.
plt.legend()
plt.show()

Now, what if we want to see the difference in the classification methods...lets plot both in the same figure, so you can see how the classes are different. 

In [None]:
# 1. You know this already.
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# 2. Same as I explained before., butt one extra property, ax=axes[0], which means it is a matrix of 1 row, two columns.
sns.histplot(data=gdf, x="EmpNumDep", ax=axes[0], kde=True, bins=20)
axes[0].axvline(classifier_nb.bins[0], color='red', linestyle='dashed', linewidth=2, label='Natural Breaks')
for bin_value in classifier_nb.bins:
    axes[0].axvline(bin_value, color='red', linestyle='dashed', linewidth=2)
axes[0].set_title("EmpNumDep Histogram with Natural Breaks")
axes[0].legend()

# 3. The next plot using now ax=axes[1]
sns.histplot(data=gdf, x="EmpNumDep", ax=axes[1], kde=True, bins=20)
axes[1].axvline(classifier_ei.bins[0], color='blue', linestyle='dashed', linewidth=2, label='Quantiles')
for bin_value in classifier_ei.bins:
    axes[1].axvline(bin_value, color='blue', linestyle='dashed', linewidth=2)
axes[1].set_title("EmpNumDep Histogram with Quantiles")
axes[1].legend()

# 4. Adjust the plot.
plt.tight_layout()
plt.show()

Finally we use matplotlib to make a map of the data using intervals created by the classifier library.

In [None]:
fig, ax = plt.subplots(figsize=(12, 10))
gdf.plot(column='EmpNumDep', ax=ax,
         legend=True, cmap='viridis',
         scheme='UserDefined',
         classification_kwds={'bins': classifier_nb.bins} 
        )
plt.title("Choropleth Map using mapclassify with Natural Breaks - Map 1")
plt.show()

Now, same process but using another classification method.

In [None]:
fig, ax = plt.subplots(figsize=(12, 10))
gdf.plot(column='EmpNumDep', ax=ax,
         legend=True, cmap='viridis',
         scheme='UserDefined',
         classification_kwds={'bins': classifier_ei.bins},
        )
plt.title("Choropleth Map using Classifier with Equal Intervals - Map 2")
plt.show()

We can integrate both plots in one figure using subplots

In [None]:
fig, axs = plt.subplots(1, 2, figsize=(18, 8))

gdf.plot(column='EmpNumDep', ax=axs[0],
         legend=True, cmap='viridis',
         scheme='UserDefined',
         classification_kwds={'bins': classifier_nb.bins}
        )

axs[0].set_title("Choropleth Map with Natural Breaks")

gdf.plot(column='EmpNumDep', ax=axs[1],
         legend=True, cmap='viridis',
         scheme='UserDefined',
         classification_kwds={'bins': classifier_ei.bins})

axs[1].set_title("Choropleth Map with Equal Intervals")

plt.tight_layout() #Optional but useful.
plt.show()

**Finally**, let's make it interactive using Ploty; in this case, we need to create a column with the values from the classifier object. Then, we will use that new column to plot the map. I will also use another attribute CrimeCount

In [None]:
num_classes = 5

classifier_nb = mc.NaturalBreaks(gdf['CrimeCount'], k=num_classes)
gdf['classification_nb'] = classifier_nb.yb #yb to get the values from the array.

print(classifier_nb)
print(gdf[['CrimeCount', 'classification_nb']])


In [None]:
fig = px.choropleth_mapbox(gdf,
                           geojson=gdf.geometry,
                           locations=gdf.index,
                           color="classification_nb",
                           color_continuous_scale="viridis",
                           range_color= (1, 5),
                           opacity=0.5,
                           center={"lat": 55.866193, "lon": -4.258246},
                           mapbox_style="carto-positron",
                           zoom=9.5)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

In [None]:
num_classes = 5

classifier_quant = mc.Quantiles(gdf['CrimeCount'], k=num_classes)
gdf['classification_qn'] = classifier_quant.yb

print(classifier_quant)
print(gdf[['CrimeCount', 'classification_qn']])

In [None]:
fig_2 = px.choropleth_mapbox(gdf,
                           geojson=gdf.geometry,
                           locations=gdf.index,
                           color="classification_qn",
                           color_continuous_scale="viridis",
                           range_color= (1, 5),
                           opacity=0.5,
                           center={"lat": 55.866193, "lon": -4.258246},
                           mapbox_style="carto-positron",
                           zoom=9.5)
fig_2.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig_2.show()

## Challenge 3
 
Now is the time for you to create some choropleth maps. 

1. Go to this portal https://www.spatialdata.gov.scot/geonetwork/srv/eng/catalog.search#/home
2. Get the Scottish Index of Multiple Deprivation (SIMD) 2020 dataset and extract the data only for the city of Edinburgh.
3. Create two static choropleth maps (e.g. `matplotlib`). These maps should represent an attribute you find interesting in the SIMD dataset. Using two different classifier methods, you need to show how the maps appear different even though the data and attributes are the same. Include a clear description of your choice and the difference in the classification method for the attribute chosen (e.g. Plotting histograms with breakpoints(bins). You can find a complete list of classifiers at https://pysal.org/mapclassify/api.html.
4. Finally, create other two interactive maps (e.g. `choropleth_mapbox`) - one for Glasgow and one for Edinburgh - to represent the difference in deprivation for both cities. Pick any of the available attributes.
   > As always include the appropriate descriptions and code comments where you narrate how you are processing the data. And the insights you get from the results.

In [None]:
import requests
import zipfile
import os
import pandas as pd
import geopandas as gpd

#downloading shapefile
url = "https://maps.gov.scot/ATOM/shapefiles/SG_SIMD_2020.zip"
zip_path = "SG_SIMD_2020.zip"
extract_path = "SG_SIMD_2020"
with open(zip_path, "wb") as f:
    f.write(requests.get(url).content)

#unzipping shapefile
with zipfile.ZipFile(zip_path, "r") as zip_ref:
    zip_ref.extractall(extract_path)

#loading the shapefile into our gdf
shapefile = [f for f in os.listdir(extract_path) if f.endswith(".shp")][0]
gdf = gpd.read_file(os.path.join(extract_path, shapefile))

#printing the first five rows of data
print(gdf.head())

**References**: bytecode (2024). Reading multiple shapefiles with geopandas from a zip file in memory. [online] Stack Overflow. Available at: https://stackoverflow.com/questions/77823335/reading-multiple-shapefiles-with-geopandas-from-a-zip-file-in-memory.

In [None]:
#subset our data frame to keep only data for edinburgh city 

gdf_subset == gdf[gdf["LAName"] == "City of Edinburgh"]
gdf_subset.head(3)

In [None]:
#seeing the column titles
gdf_subset.columns

In [None]:
#using matplot lib to create chloropleth maps 
import numpy as np
import mapclassify as mc
import matplotlib.pyplot as plt
import folium
import seaborn as sns

#choosing my axes for the chloropleth map
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

#exploring different variables through generating histograms
sns.histplot(data=gdf_subset, x="IncRate",ax=axes[0], kde=True) 
sns.histplot(data=gdf_subset, x="EduAttend",ax=axes[1], kde=True) 
sns.histplot(data=gdf_subset, x="CrimeRate",ax=axes[2], kde=True) 

axes[0].set_title("Income Rate")
axes[1].set_title("Educational Attendance")
axes[2].set_title("Crime Rate")

plt.tight_layout()
plt.show()

In [None]:
#checking for any missing values
print(gdf_subset['IncRate'].isna().sum())
print(gdf_subset['EduAttend'].isna().sum())
print(gdf_subset['CrimeRate'].isna().sum())

In [None]:
#checking the data types
print(gdf_subset['IncRate'].dtype)
print(gdf_subset['EduAttend'].dtype)
print(gdf_subset['CrimeRate'].dtype)

In [None]:
# number of classes for classification
num_classes = 5

# using natural breaks (jenks classification)
classifier_nb = mc.NaturalBreaks(gdf_subset['CrimeRate'], k=num_classes)
print(classifier_nb)
print(min(classifier_nb.bins), max(classifier_nb.bins))
print(classifier_nb.bins) #

# using equal interval classification
classifier_ei = mc.EqualInterval(gdf_subset['CrimeRate'], k=num_classes)
print(classifier_ei)
print(min(classifier_ei.bins), max(classifier_ei.bins))
print(classifier_ei.bins) 

In [None]:
#creating a histogram with breakpoints for the crime rate in Edinburgh

fig, ax = plt.subplots(figsize=(8, 5))

sns.histplot(data=gdf_subset, x="CrimeRate", ax=ax, kde=True, bins=20)

# defining the style of the lines to represent the breakpoints
ax.axvline(classifier_nb.bins[0], color='red', linestyle='dashed', linewidth=2, label='Breakpoints') 
# a simple For to loop over all the elements in the array 'classifier_nb.bins'
for bin_value in classifier_nb.bins:
    ax.axvline(bin_value, color='red', linestyle='dashed', linewidth=2) 
 
#styling the histogram
ax.set_title("Histogram with Breakpoints for Natural Breaks")

#adding a legend
plt.legend()
plt.show()

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(15, 5))
#generating histogram for crime rates in Edinburgh with natural breaks
sns.histplot(data=gdf_subset, x="CrimeRate", ax=axes[0], kde=True, bins=20)
axes[0].axvline(classifier_nb.bins[0], color='red', linestyle='dashed', linewidth=2, label='Natural Breaks')
for bin_value in classifier_nb.bins:
    axes[0].axvline(bin_value, color='red', linestyle='dashed', linewidth=2)
axes[0].set_title("Crime Rate Histogram with Natural Breaks")
axes[0].legend()

sns.histplot(data=gdf_subset, x="CrimeRate", ax=axes[1], kde=True, bins=20)
axes[1].axvline(classifier_ei.bins[0], color='blue', linestyle='dashed', linewidth=2, label='Quantiles')
for bin_value in classifier_ei.bins:
    axes[1].axvline(bin_value, color='blue', linestyle='dashed', linewidth=2)
axes[1].set_title("Crime Rate Histogram with Quantiles")
axes[1].legend()
#using the tight layout to display the histogram with natural breaks
plt.tight_layout()
plt.show()

In [None]:
#generating a chloropleth map with natural breaks                                                                                                                                        fig, ax = plt.subplots(figsize=(12, 10))
gdf_subset.plot(column='CrimeRate', ax=ax,
         legend=True, cmap='viridis',
         scheme='UserDefined',
         classification_kwds={'bins': classifier_nb.bins} 
        )
plt.title("Choropleth Map using mapclassify with Natural Breaks - Map 1")
plt.show()

In [None]:
#generating a chloropleth map with equal intervals
fig, ax = plt.subplots(figsize=(12, 10))
gdf_subset.plot(column='CrimeRate', ax=ax,
         legend=True, cmap='viridis',
         scheme='UserDefined',
         classification_kwds={'bins': classifier_ei.bins},
        )
plt.title("Choropleth Map using Classifier with Equal Intervals - Map 2")
plt.show()

In [None]:
#comparing the two chloropleth maps for crime rates in Edinburgh
fig, axs = plt.subplots(1, 2, figsize=(18, 8))

gdf_subset.plot(column='CrimeRate', ax=axs[0],
         legend=True, cmap='viridis',
         scheme='UserDefined',
         classification_kwds={'bins': classifier_nb.bins}
        )

axs[0].set_title("Choropleth Map with Natural Breaks")

gdf_subset.plot(column='CrimeRate', ax=axs[1],
         legend=True, cmap='viridis',
         scheme='UserDefined',
         classification_kwds={'bins': classifier_ei.bins})

axs[1].set_title("Choropleth Map with Equal Intervals")
#using the tight layout
plt.tight_layout() 
plt.show()

In [None]:
#creating chloropleth mapbox using natural breaks
num_classes = 5

classifier_edi = mc.NaturalBreaks(gdf_subset['CrimeRate'], k=num_classes)
gdf_subset['classification_edi'] = classifier_edi.yb #yb to get the values from the array.

print(classifier_edi)
print(gdf_subset[['CrimeRate', 'classification_nb']])

In [None]:
fig = px.choropleth_mapbox(gdf_subset,
                           geojson=gdf_subset.geometry,
                           locations=gdf_subset.index,
                           color="classification_nb",
                           color_continuous_scale="viridis",
                           range_color= (1, 5),
                           opacity=0.5,
                           center={"lat": 55.866193, "lon": -4.258246},
                           mapbox_style="carto-positron",
                           zoom=9.5)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

In [None]:
#checking for the LA name for glasgow
print(gdf["LAName"].unique())  

In [None]:
#comparing glasgow and edinburgh 

#creating a subset for glasgow to compare to edinburgh
gdf_subset2 == gdf[gdf["LAName"] == "Glasgow City"]
gdf_subset2.head(3)

In [None]:
#observing the columns
gdf_subset2.columns

In [None]:
#generating a mapbox for Glasgow crime rate
num_classes = 5

classifier_gla = mc.NaturalBreaks(gdf_subset2['CrimeRate'], k=num_classes)
gdf_subset2['classification_gla'] = classifier_nb.yb #yb to get the values from the array.

print(classifier_gla)
print(gdf_subset2[['CrimeRate', 'classification_nb']])

In [None]:
fig = px.choropleth_mapbox(gdf_subset2,
                           geojson=gdf_subset2.geometry,
                           locations=gdf_subset2.index,
                           color="classification_nb",
                           color_continuous_scale="viridis",
                           range_color= (1, 5),
                           opacity=0.5,
                           center={"lat": 55.866193, "lon": -4.258246},
                           mapbox_style="carto-positron",
                           zoom=9.5)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

In [None]:
#comparing the mapbox for Glasgow vs Edinburgh's crime rate
ig = make_subplots(rows=1, cols=2, subplot_titles=("Glasgow Crime Rate", "Edinburgh Crime Rate"),
                    specs=[[{"type": "mapbox"}, {"type": "mapbox"}]])

#combining both figures
fig.add_trace(fig_glasgow.data[0], row=1, col=1)
fig.add_trace(fig_edinburgh.data[0], row=1, col=2)

#updating the layout
fig.update_layout(mapbox_style="carto-positron",
                  margin={"r":0,"t":50,"l":0,"b":0},
                  height=600,
                  showlegend=False)
#displaying the figure
fig.show()

# Finishing the Lab

Please ensure that you save all your code and upload the latest version of this notebook to your **GitHub repository**. 

> Always check the size of your notebook before making any commit; use the `.gitignore` to skip big data sets or undesired files., but describe where to find the data and the correct structure, so when the marker forks, your code will be able to reproduce your results. 


# More resources

* [Ploty](https://www.geeksforgeeks.org/choropleth-maps-using-plotly-in-python/) - some useful examples of ploty implementations
* [mapclassify from PySAL](https://nbviewer.org/github/pysal/mapclassify/blob/main/notebooks/03_choropleth.ipynb) - Examples of implementing mapclassify from PySAL
* [Choroplet Maps Theory Behind](https://geographicdata.science/book/notebooks/05_choropleth.html) - Spatial Data Science Book