# Working with Spatial Sociodemographic Data - Quickstart

In this tutorial we will explore some methods for analysing sociodemographic data with a spatial component.

In [1]:
import pandas as pd
import geopandas as gpd
import folium 
import matplotlib.pyplot as plt

from pysal.lib import weights  
import segregation as seg


import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In a future release, GeoPandas will switch to using Shapely by default. If you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas as gpd
  from .autonotebook import tqdm as notebook_tqdm


# Segregation 

Residential segregation measures how spatially separated two poplation groups are located within census tracts (Massey & Denton, 1988). Massey and Denton (1988) defined several dimensions of residential segregation, such as (1) evenness, or the overrepresentation of a group in one area, (2) exposure, in which a group is rarely in the neighbourhood of another group, (3) concentration, where a single group occupies a limited amount of space, (4) centralization, and (5) clustering.

The segregation module of the PySal package allows for the exploration of an extended set of segregation indices.

For mode information, see: *Massey, D. S., & Denton, N. A. (1988). The dimensions of residential segregation. Social Forces, 67(2), 281–315* and *Cortes, R. X., Rey, S., Knaap, E., & Wolf, L. J. (2020). An open-source framework for non-spatial and spatial segregation measures: The PySAL segregation module. Journal of Computational Social Science, 3(1), 135–166*.

In [2]:
zensus_bremen_df = pd.read_csv("../data_example/Example_Bremen_Zensus_Population.csv", sep = ";", encoding = 'utf-8-sig')

In [3]:
zensus_bremen_df

Unnamed: 0.1,Unnamed: 0,Grid_Code,population_total_units,al_10_under,al_10_19,al_20_29,al_30_39,al_40_49,al_50_59,al_60_69,...,sh_russia,sh_turkey,sh_ukraine,sh_other,sk_germany,sk_abroad,sz_one,sz_mult_german_foreign,sz_mult_foreign_only,sz_unknown
0,9654,100mN33306E42414,61,3,0,7,7,6,12,14,...,0,0,0,4,56,5,57,4,0,0
1,9655,100mN33306E42415,41,0,0,18,13,4,3,0,...,0,0,0,3,37,4,40,0,0,0
2,9827,100mN33307E42414,22,0,0,8,5,3,4,0,...,0,0,0,9,8,14,20,3,0,0
3,9828,100mN33307E42415,176,3,5,54,41,25,24,10,...,0,4,0,46,113,63,165,11,0,0
4,9829,100mN33307E42416,231,3,13,84,39,31,33,17,...,0,3,0,40,173,58,207,24,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1275,20902,100mN33362E42396,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1276,20903,100mN33362E42397,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1277,20904,100mN33362E42398,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1278,20905,100mN33362E42399,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


We will analyse one dimension of segregation for population sub-groups defined by age. The first sub-group is that of seniors, or population of age 65 or older, and the second sub-group is that of children, or population of age 18 or younger.

In [4]:
single_var_1 = "alk_65_over"; single_var_2 = "alk_18_under"; total_pop_var = "population_total_units"
    
zensus_bremen_df[total_pop_var] = zensus_bremen_df[total_pop_var].astype(int) # do this in utils
zensus_bremen_df[single_var_1] = zensus_bremen_df[single_var_1].astype(int) ; zensus_bremen_df[single_var_2] = zensus_bremen_df[single_var_2].astype(int)

The chosen segregation measure is **interaction**, or the extent to which members of a sub-group are exposed to members of the rest of the population, in a unit area. Interaction is a single-group measure, in the sense that it is assessed for one individual sub-groups at a time. This is in contrast with multi-group measures, where segregation is assesed between multiple population sub-groups concomitently. Interaction takes values between 0 and 1.

We observe that both seniors and children interact highly wih the rest of the population, but there is a higher interaction between children and population of other age categories. 

In [5]:
zensus_bremen_df = zensus_bremen_df[ zensus_bremen_df[total_pop_var] > 0]
    
# A-spatial segregation index     
int1 = seg.singlegroup.Interaction(data = zensus_bremen_df, group_pop_var = single_var_1, total_pop_var = total_pop_var)
int2 = seg.singlegroup.Interaction(data = zensus_bremen_df, group_pop_var = single_var_2, total_pop_var = total_pop_var)

print("Interaction of age category %s: %.2f and Interaction of age category %s: %.2f" %(single_var_1, int1.statistic, single_var_2, int2.statistic))

Interaction of age category alk_65_over: 0.72 and Interaction of age category alk_18_under: 0.84


We will now measure a form of spatial segregation, where the spatial connections between census unit areas is also considered when assesing the segregation of a population sub-group.

In [6]:
zensus_bremen_grid = gpd.read_file("../data_example/Example_Bremen_Zensus_Grid_100m.gpkg")
idx_column = "Grid_Code"
zensus_bremen_grid = zensus_bremen_grid.merge(zensus_bremen_df, on = idx_column, how = "inner") 

For the computation of spatial indices to be successful, all unit areas that are isolated (i.e. present no connection with another area) should be removed first. We use as measure of connection between cells the Queen proximity method.

In [7]:
wr = weights.contiguity.Queen.from_dataframe(zensus_bremen_grid, geom_col = "geometry", ids = idx_column)    
zensus_bremen_grid_copy = zensus_bremen_grid[~zensus_bremen_grid[idx_column].isin(wr.islands)].copy()

As you can see the newly created dataset does not contained isolated areas.

In [8]:
m = zensus_bremen_grid.explore(height=500, width=1000, color="gray", name="Zensus Grid Cells 100mx100m")
m = zensus_bremen_grid_copy.explore(m=m, color="blue", name="Zensus Grid Cells (filtered)")

folium.LayerControl().add_to(m)
m

We now measure the extent to which members of one-subgroup are exposed to members of the rest of the population, anywhere in space, using **distance decay interaction**.  

Results are very similar to the a-spatial interaction values.

In [9]:
dint1 = seg.singlegroup.DistanceDecayInteraction(data = zensus_bremen_grid_copy, group_pop_var = single_var_1, total_pop_var = total_pop_var)
dint2 = seg.singlegroup.DistanceDecayInteraction(data = zensus_bremen_grid_copy, group_pop_var = single_var_2, total_pop_var = total_pop_var)
    
print("Interaction (spatial) of age category %s: %.2f and Interaction (spatial) of age category %s: %.2f" %(single_var_1, dint1.statistic, 
                                                                                                            single_var_2, dint2.statistic))

Interaction (spatial) of age category alk_65_over: 0.76 and Interaction (spatial) of age category alk_18_under: 0.86


Finally, to get a sense of the distribution of the sub-groups in space, we can display the percentages represented by a sub-group in the population of each unit area. This is exemplified for seniors below.

In [10]:
zensus_bremen_grid_copy[single_var_1 + "_perc"] = [x * 100 / y if y != 0 else 0 for (x,y) in zip(zensus_bremen_grid_copy[single_var_1], zensus_bremen_grid_copy[total_pop_var])]
 
m = zensus_bremen_grid_copy.explore(height=500, width=1000, name="Seniors > 65yo",
                             column = single_var_1 + "_perc", scheme = "EqualInterval", cmap = "inferno", legend = True)


folium.LayerControl().add_to(m)
m