# Introduction: From Space to Place
Thank you for checking out the code for: 

> Hogan, Bernie (2022, forthcoming) _From Social Science to Data Science_. Sage Publications. 

This notebook contains the code from the book, along with the headers and additional author notes that are not in the book as a way to help navigate the code. You can run this notebook in a browser by clicking the buttons below. 
    
The version that is uploaded to GitHub should have all the results pasted, but the best way to follow along is to clear all outputs and then start afresh. To do this in Jupyter go the menu and select "Kernel -> Restart Kernel and Clear all Outputs...". To do this on Google Colab go to the menu and select "Edit -> Clear all outputs".
    
The most up-to-date version of this code can be found at https://www.github.com/berniehogan/fsstds 

Additional resources and teaching materials can be found on Sage's forthcoming website for this book. 

All code for the book and derivative code on the book's repository is released open source under the  MIT license. 
    

[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/berniehogan/fsstds/main?filepath=chapters%2FCh.14.Geography.ipynb)[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/berniehogan/fsstds/blob/main/chapters/Ch.14.Geography.ipynb)

# Kinds of spatial data 

## From a sphere to a rectangle.

## Mapping places on to spaces 

### Installing GeoPandas 

In [1]:
import geopandas 
import matplotlib.pyplot as plt 
%config InlineBackend.figure_format = 'svg'

In [2]:
world = geopandas.read_file(geopandas.datasets.\
                            get_path('naturalearth_lowres'))

world.columns = [x.replace("_","-") for x in world.columns]
print(world.columns)

Index(['pop-est', 'continent', 'name', 'iso-a3', 'gdp-md-est', 'geometry'], dtype='object')


In [None]:
world.plot() 
plt.show()

In [None]:
world[world['iso-a3'] == "GBR"].plot()
plt.show()

## GeoPandas GeoDataFrames discussed

In [4]:
world[["name","iso-a3","gdp-md-est","geometry"]].head()

Unnamed: 0,name,iso-a3,gdp-md-est,geometry
0,Fiji,FJI,8374.0,"MULTIPOLYGON (((180.00000 -16.06713, 180.00000..."
1,Tanzania,TZA,150600.0,"POLYGON ((33.90371 -0.95000, 34.07262 -1.05982..."
2,W. Sahara,ESH,906.5,"POLYGON ((-8.66559 27.65643, -8.66512 27.58948..."
3,Canada,CAN,1674000.0,"MULTIPOLYGON (((-122.84000 49.00000, -122.9742..."
4,United States of America,USA,18560000.0,"MULTIPOLYGON (((-122.84000 49.00000, -120.0000..."


In [5]:
(world[["name","iso-a3","gdp-md-est","geometry"]]
            .head()
            .style.to_latex(hrules=True))

'\\begin{tabular}{lllrl}\n\\toprule\n & name & iso-a3 & gdp-md-est & geometry \\\\\n\\midrule\n0 & Fiji & FJI & 8374.000000 & MULTIPOLYGON (((180 -16.06713266364245, 180 -16.5552165666392, 179.3641426619641 -16.80135407694688, 178.7250593629971 -17.01204167436804, 178.5968385951171 -16.63915, 179.0966093629971 -16.4339842775474, 179.4135093629971 -16.3790542775474, 180 -16.06713266364245)), ((178.12557 -17.50481, 178.3736 -17.33992, 178.71806 -17.62846, 178.55271 -18.15059, 177.93266 -18.28799, 177.38146 -18.16432, 177.28504 -17.72465, 177.67087 -17.38114, 178.12557 -17.50481)), ((-179.7933201090486 -16.02088225674122, -179.9173693847653 -16.5017831356494, -180 -16.5552165666392, -180 -16.06713266364245, -179.7933201090486 -16.02088225674122))) \\\\\n1 & Tanzania & TZA & 150600.000000 & POLYGON ((33.90371119710453 -0.9500000000000001, 34.07261999999997 -1.059819999999945, 37.69868999999994 -3.096989999999948, 37.7669 -3.67712, 39.20222 -4.67677, 38.74053999999995 -5.908949999999948, 38

In [None]:
world['gdp-per-cap'] = world['gdp-md-est'] / world['pop-est']

world_na = world[world.name!="Antarctica"]

world_na.plot(column='gdp-per-cap')

plt.show()

In [67]:
from mpl_toolkits.axes_grid1 import make_axes_locatable

In [None]:
fig, ax = plt.subplots()

divider = make_axes_locatable(ax)

cax = divider.append_axes("right", size="5%", pad=0.1)

world_na.plot(column='gdp-per-cap', ax=ax, legend=True, cax=cax); 
plt.show()

In [None]:
world_na.sort_values("gdp_per_cap")\
    [["name","iso_a3","pop_est","gdp_md_est","gdp_per_cap"]].tail()

## Splitting the data into intervals using `mapclassify` 

In [None]:
world_na.plot(column='gdp-per-cap', 
              scheme='quantiles',
              cmap='YlGn',
              legend =True);

plt.show()

In [27]:
import mapclassify as mc 

print(mc.Quantiles(world["gdp-per-cap"], k=5), sep="\n\n")

Quantiles           

  Interval     Count
--------------------
[0.00, 0.00] |    36
(0.00, 0.01] |    35
(0.01, 0.02] |    35
(0.02, 0.04] |    35
(0.04, 0.20] |    36


## Plotting points

In [None]:
cities = geopandas.read_file(
    geopandas.datasets.get_path('naturalearth_cities'))

cities.head()

In [None]:
ax = world_na.plot() 

cities.plot(marker = "*", color = "black", markersize = 5, ax=ax)

plt.show()

# Creating your own GeoDataFrame

In [31]:
import pandas as pd 

In [None]:
landmarks = pd.DataFrame({"name":["Greenville"],
             "lat":[33.398],"long":[-91.048]})

gdf = geopandas.GeoDataFrame(landmarks, 
        geometry=geopandas.points_from_xy(landmarks.long,landmarks.lat))

display(gdf)

In [None]:
ax = world_na.plot(color="grey") 

gdf.plot(color="black", ax=ax)

plt.show()

## Loading your own maps

In [38]:
from pathlib import Path

In [None]:
data_dir = Path.cwd().parent / "data" 
gbshp = geopandas.read_file(data_dir / "gadm36_GBR_shp" / "gadm36_GBR_1.shp")
gbshp.plot()

plt.show()

In [42]:
gbshp = geopandas.read_file(data_dir / "gadm36_GBR_shp" / "gadm36_GBR_3.shp")

gbshp.columns = [x.replace("_","-") for x in gbshp.columns]

print(gbshp.columns)

Index(['GID-0', 'NAME-0', 'GID-1', 'NAME-1', 'NL-NAME-1', 'GID-2', 'NAME-2',
       'NL-NAME-2', 'GID-3', 'NAME-3', 'VARNAME-3', 'NL-NAME-3', 'TYPE-3',
       'ENGTYPE-3', 'CC-3', 'HASC-3', 'geometry'],
      dtype='object')


In [None]:
gbshp[["NAME-1","GID-3","NAME-3","geometry"]].sample(5,random_state=7)

## Linking maps to other data sources 

In [52]:
covid_df = pd.read_csv(data_dir / "covid19db-epidemiology-GBR_PHE.csv.bz2")

print(len(covid_df))
display(covid_df.loc[0])

260119


source                       GBR_PHE
date                      14-09-2021
country               United Kingdom
countrycode                      GBR
adm_area_1                   England
adm_area_2                 Hampshire
adm_area_3                   Gosport
tested                           NaN
confirmed                     6949.0
recovered                        NaN
dead                           158.0
hospitalised                     NaN
hospitalised_icu                 NaN
quarantined                      NaN
gid                 ['GBR.1.38.5_1']
Name: 0, dtype: object

In [54]:
cdf = covid_df[covid_df.date=="14-09-2021"].copy()
print(len(cdf),len(gbshp[gbshp['NAME-1'] == "England"]))

315 326


In [55]:
import ast

In [56]:
try: 
    cdf.gid = cdf.gid.map(ast.literal_eval)
    print(len(cdf.gid.sum()))
except ValueError: 
    print("You can only convert the data once")

326


In [58]:
cdf["lengids"]= cdf.gid.map(len)
cdf["ndead"] = cdf["dead"] / cdf["lengids"]
cdf["nconfirmed"] = cdf["confirmed"] / cdf["lengids"]

cdf_ex = cdf[["gid","ndead","nconfirmed"]].explode("gid")

# Some data integrity checks 
print(len(gbshp[gbshp["NAME-1"] == "England"]) == len(cdf_ex))
print(cdf["confirmed"].sum() == cdf_ex["nconfirmed"].sum())

True
True


In [59]:
cdf_ex["cfr"] = cdf_ex['ndead'] / cdf_ex['nconfirmed']

print(cdf_ex["cfr"].describe())

count    326.000000
mean       0.020349
std        0.005692
min        0.008807
25%        0.016426
50%        0.019581
75%        0.023528
max        0.045195
Name: cfr, dtype: float64


In [None]:
cdf_ex["cfr"].plot(kind="hist");
plt.show()

In [None]:
fig, ax = plt.subplots() 

cdf_ex["cfr"].plot(kind="hist",bins=20,
                   color="lightgrey",label="Case-Fatality Ratio") 

res_fj = mc.FisherJenks(cdf_ex.cfr)
ax.axvline(res_fj.bins[0],color="black",label="Fisher Jenks")
for i in res_fj.bins[1:]: 
    ax.axvline(i,color="black")

res_q = mc.Quantiles(cdf_ex.cfr)
ax.axvline(res_q.bins[0],linestyle=":",color="red",label="Quantiles")
for i in res_q.bins[1:]: 
    ax.axvline(i,linestyle=":", color="red")

ax.legend()

plt.show()

In [65]:
eng_covid_df = gbshp.merge(cdf_ex,
                     left_on="GID-3",right_on="gid")

In [None]:
ax = eng_covid_df.plot(column="cfr", 
         scheme="fisher_jenks", legend=True)

leg = ax.get_legend()
leg.set_bbox_to_anchor((0., 0.5, 1.5, 0.))
ax.set_axis_off()

plt.show()

# Summary

# Further topics and reading

# Extending and reflecting