# Lecture 4 - Geopandas
![](images/panda.jpeg)

## Geopandas uses the same strucutre but adds geometry
![](images/geodataframe.png)
- https://geopandas.org/getting_started/introduction.html

### What is geometry?
- Good question. 
  - The purpose of geopandas is to add geometry (spatial) data
  - But its also to add spatial operations, too. 
- First however we do need to know what geometry is:
  - Its a represenatation of a spatial location
    - It can only have one CRS (coordinate reference system)
  - It can come in several types, well beyond point, line, and polygon
    - We can actually mix points, lines, and polygons in the same geodataframe
      - I _really_ don't recommend this. It's like mixing array items, but worse
  - Spatial inforamtion is stored as spatially encoded objects (using a library called _shapely_ but we don't really need to know about it, it in turn is built on GDAL)
- Lets use an example we are familiar with and make it spatial

### Loading data
- Really easy to do with geopandas.
- Here is how you load a shapefile
  - Note here that we are actually loading a zip file!
    - This is simply awesome that we can do this, as it means we no longer have to mess about with .shp .shx .prj .dbf
    - You can actually store your entire dataset in a zip file with multiple folders and datasets. it is simply fantastic!


In [2]:
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt

In [3]:
nzpop = gpd.read_file('data/2023-census-population-change-by-ethnic-group-and-regional-c.shp')


In [4]:
lookup = pd.read_csv('data/2023_census_population_change_by_ethnic_group_and_rc_lookup_table.csv')

In [5]:
nzpop

Unnamed: 0,REGC2023_V,REGC2023_1,REGC2023_2,VAR_1_1,VAR_1_2,VAR_1_3,VAR_1_4,VAR_1_5,VAR_1_6,VAR_1_7,...,VAR_1_35,VAR_1_36,VAR_1_37,VAR_1_38,VAR_1_39,VAR_1_40,AREA_SQ_KM,LAND_AREA_,Shape_Leng,geometry
0,13,Canterbury Region,Canterbury Region,448650,41910,12720,35847,4374,10236,539433,...,26.1,29.6,40.8,-1.2,8.6,8.6,56773.925695,44503.596192,1543745.0,"POLYGON ((1662227.733 5360071.829, 1662324.677..."
1,15,Southland Region,Southland Region,79731,11607,1917,2841,315,2031,93342,...,32.8,32.3,87.8,5.7,2.7,2.7,55237.978437,31218.954918,1403398.0,"POLYGON ((1205425.608 5087372.12, 1205533.939 ..."
2,14,Otago Region,Otago Region,171618,14388,3933,10038,2043,4164,202470,...,35.7,27.5,29.8,5.0,7.0,7.0,38514.354961,31186.161571,1277273.0,"POLYGON ((1333232.513 5127595.856, 1333277.532..."
3,12,West Coast Region,West Coast Region,27438,3171,315,678,117,837,32148,...,13.2,25.0,7.1,25.5,5.7,5.7,36339.584151,23245.518393,1582117.0,"POLYGON ((1536071.582 5480250.38, 1536107.374 ..."
4,3,Waikato Region,Waikato Region,296097,83742,14700,26382,3561,6660,403641,...,26.0,38.9,39.4,2.2,8.9,8.9,34888.83171,23900.953428,1268420.0,"POLYGON ((1871103.957 5970628.885, 1871289.964..."
5,1,Northland Region,Northland Region,105057,44931,4461,3927,555,2565,151689,...,25.8,32.7,40.3,4.0,8.3,8.3,30084.273236,12507.139052,811359.8,"POLYGON ((1611941.312 6214121.225, 1613308.196..."
6,8,Manawatū-Whanganui Region,Manawatu-Whanganui Region,172101,43599,7341,10863,1335,4422,222672,...,23.5,25.7,30.4,-0.9,5.3,5.3,25322.178157,22220.638989,1176417.0,"POLYGON ((1821624.069 5738734.422, 1823326.936..."
7,4,Bay of Plenty Region,Bay of Plenty Region,189597,68943,7728,12963,1266,4407,267741,...,30.0,32.3,81.3,8.4,8.3,8.3,21883.742229,12071.549623,1093737.0,"POLYGON ((1911825.034 5859943.054, 1912783.041..."
8,6,Hawke's Bay Region,Hawke's Bay Region,110940,34662,6270,5115,666,2763,151179,...,16.6,37.7,50.8,15.2,5.2,5.2,21444.158181,14139.051332,928780.5,"POLYGON ((1959613.233 5721027.67, 1972259.876 ..."
9,18,Marlborough Region,Marlborough Region,37041,4776,969,1182,246,1044,43416,...,24.4,38.3,83.6,-3.5,4.4,4.4,17688.82252,10457.888026,766969.9,"POLYGON ((1729284.9 5448401.834, 1726981.459 5..."


In [6]:
print(nzpop)

   REGC2023_V                 REGC2023_1                 REGC2023_2  VAR_1_1  \
0          13          Canterbury Region          Canterbury Region   448650   
1          15           Southland Region           Southland Region    79731   
2          14               Otago Region               Otago Region   171618   
3          12          West Coast Region          West Coast Region    27438   
4          03             Waikato Region             Waikato Region   296097   
5          01           Northland Region           Northland Region   105057   
6          08  Manawatū-Whanganui Region  Manawatu-Whanganui Region   172101   
7          04       Bay of Plenty Region       Bay of Plenty Region   189597   
8          06         Hawke's Bay Region         Hawke's Bay Region   110940   
9          18         Marlborough Region         Marlborough Region    37041   
10         02            Auckland Region            Auckland Region   789306   
11         09          Wellington Region

In [9]:
nzpop.crs

<Projected CRS: EPSG:2193>
Name: NZGD2000 / New Zealand Transverse Mercator 2000
Axis Info [cartesian]:
- N[north]: Northing (metre)
- E[east]: Easting (metre)
Area of Use:
- name: New Zealand - North Island, South Island, Stewart Island - onshore.
- bounds: (166.37, -47.33, 178.63, -34.1)
Coordinate Operation:
- name: New Zealand Transverse Mercator 2000
- method: Transverse Mercator
Datum: New Zealand Geodetic Datum 2000
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

In [11]:
lookup

Unnamed: 0,Data_file_content,Column_name,Shapefile_name,Unit_count,Subject_population,Year,Measure,Variable1,Variable1_category,Field_name_alias
0,2023 Census population change by ethnic group ...,REGC2023_V1_00,REGC2023_V,,,,,Usual residence - Regional council,Code,Regional council (RC) 2023 code
1,2023 Census population change by ethnic group ...,REGC2023_V1_00_NAME,REGC2023_1,,,,,Usual residence - Regional council,Name,Regional council (RC) 2023 name
2,2023 Census population change by ethnic group ...,REGC2023_V1_00_NAME_ASCII,REGC2023_2,,,,,Usual residence - Regional council,ASCII name,Regional council (RC) 2023 name no macrons
3,2023 Census population change by ethnic group ...,VAR_1_1,VAR_1_1,Individual,Census usually resident population count,2013,Count,Ethnicity (grouped total responses),European,Subject pop: Census usually resident populatio...
4,2023 Census population change by ethnic group ...,VAR_1_2,VAR_1_2,Individual,Census usually resident population count,2013,Count,Ethnicity (grouped total responses),Māori,Subject pop: Census usually resident populatio...
5,2023 Census population change by ethnic group ...,VAR_1_3,VAR_1_3,Individual,Census usually resident population count,2013,Count,Ethnicity (grouped total responses),Pacific Peoples,Subject pop: Census usually resident populatio...
6,2023 Census population change by ethnic group ...,VAR_1_4,VAR_1_4,Individual,Census usually resident population count,2013,Count,Ethnicity (grouped total responses),Asian,Subject pop: Census usually resident populatio...
7,2023 Census population change by ethnic group ...,VAR_1_5,VAR_1_5,Individual,Census usually resident population count,2013,Count,Ethnicity (grouped total responses),Middle Eastern/Latin American/African,Subject pop: Census usually resident populatio...
8,2023 Census population change by ethnic group ...,VAR_1_6,VAR_1_6,Individual,Census usually resident population count,2013,Count,Ethnicity (grouped total responses),Other Ethnicity,Subject pop: Census usually resident populatio...
9,2023 Census population change by ethnic group ...,VAR_1_7,VAR_1_7,Individual,Census usually resident population count,2013,Count,Ethnicity (grouped total responses),Total,Subject pop: Census usually resident populatio...


In [None]:
nzpop.plot()

In [None]:
nzpop.boundary.plot()

In [None]:
nzpop.plot(column="VAR_1_23")

In [None]:
nzpop.plot(column='VAR_1_23', legend=True,
           legend_kwds={'label': "Population in 2023",
                        'orientation': "vertical"},
           cmap='OrRd')
plt.title('Choropleth Map of Population in 2023')
plt.show()

In [None]:
import mapclassify as mc

In [None]:
# Classify using Natural Breaks (Jenks)
classifier = mc.NaturalBreaks(y=nzpop["VAR_1_23"], k=5)  # k is the number of classes

# Plot the choropleth map
fig, ax = plt.subplots(1, 1, figsize=(10, 6))
nzpop.plot(column="VAR_1_23", cmap='OrRd', linewidth=0.8, ax=ax, edgecolor='0.8', 
           scheme='quantiles', classification_kwds={'k': 5}, legend=True)

# Add a title
plt.title('Choropleth Map of Population with Natural Breaks')

# Show the plot
plt.show()


In [None]:
nzpop.plot(kind="scatter", x="VAR_1_15", y="VAR_1_23")

In [None]:
temp_df = nzpop.copy()
temp_df['VAR_1_15_div'] = temp_df['VAR_1_15'] / 1000
temp_df['VAR_1_23_div'] = temp_df['VAR_1_23'] / 1000

In [None]:
temp_df

In [None]:
# Plot using the temporary DataFrame
temp_df.plot(kind="scatter", x="VAR_1_23_div", y="VAR_1_15_div")

# Optionally add labels and title
ax = plt.gca() # get current axis
ax.set_xlabel('Total Pop 2023 (in thousands)')
ax.set_ylabel('Total Pop 2018 (in thousands)')
ax.set_title('Census 2018 vs Census 2023')
plt.show()

In [None]:
nzpop[["VAR_1_15", "VAR_1_23", "geometry"]].plot.hist(alpha=.4)

In [None]:
# Select the columns you want to plot
selected_columns = nzpop[["VAR_1_15", "VAR_1_23"]]

# Create a histogram for each selected column
selected_columns.plot.hist(alpha=0.4, subplots=True, layout=(1, 2), figsize=(12, 6), bins=30)

# Optionally adjust spacing
plt.tight_layout()
plt.show()

In [None]:
# Create a figure with 1 row and 3 columns for subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

# Plot the first variable
nzpop.plot(column='VAR_1_15', ax=ax1, legend=True, cmap='OrRd')
ax1.set_title('Population 2018')
cbar1 = ax1.get_figure().get_axes()[2]  # Get the first colorbar axis
cbar1.ticklabel_format(style='plain') # Access the colorbars (legends) and apply ticklabel_format

# Plot the second variable
nzpop.plot(column='VAR_1_23', ax=ax2, legend=True, cmap='YlGn')
ax2.set_title('Population 2023')
cbar2 = ax2.get_figure().get_axes()[3]  # Get the second colorbar axis
cbar2.ticklabel_format(style='plain')

# Adjust layout to prevent overlap
plt.tight_layout()
# Show the plot
plt.show()

In [None]:
# Create a figure with 1 row and 3 columns for subplots
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(18, 6))

# Plot the first variable
nzpop.plot(column='VAR_1_7', ax=ax1, legend=True, cmap='magma')
ax1.set_title('Population 2018')


nzpop.plot(column='VAR_1_15', ax=ax2, legend=True, cmap='OrRd')
ax1.set_title('Population 2018')
cbar1 = ax1.get_figure().get_axes()[2]  # Get the first colorbar axis
cbar1.ticklabel_format(style='plain') # Access the colorbars (legends) and apply ticklabel_format

# Plot the second variable
nzpop.plot(column='VAR_1_23', ax=ax3, legend=True, cmap='YlGn')
ax2.set_title('Population 2023')
cbar2 = ax2.get_figure().get_axes()[3]  # Get the second colorbar axis
cbar2.ticklabel_format(style='plain')

# Adjust layout to prevent overlap
plt.tight_layout()
# Show the plot
plt.show()

In [None]:
import geodatasets

In [None]:
chicago = gpd.read_file(geodatasets.get_path("geoda.chicago_commpop"))
groceries = gpd.read_file(geodatasets.get_path("geoda.groceries"))

In [None]:
chicago.crs

In [None]:
chicago_shapes = chicago[['geometry', 'NID']]
chicago_names = chicago[['community', 'NID']]

In [None]:
chicago_shapes = chicago_shapes.merge(chicago_names, on='NID')

In [None]:
chicago.head()
groceries.head()

In [None]:
groceries.crs

In [None]:
groceries = groceries.to_crs(4326)

In [None]:
groceries_with_community = groceries.sjoin(chicago, how="inner", predicate='intersects')

In [None]:
groceries_with_community.head()

In [None]:
groceries_with_community.plot()

The default spatial index in GeoPandas currently supports the following values for predicate which are defined in the Shapely documentation:

* intersects
* contains
* within
* touches
* crosses
* overlaps