<center><img src="https://github.com/DACSS-CSSmeths/guidelines/blob/main/pics/small_logo_ccs_meths.jpg?raw=true" width="700"></center>

## Geo Merging

Remember we have these maps:

In [1]:
import geopandas as gpd

mainLink='https://github.com/DACSS-CSSmeths/Spatial-Exploring/raw/refs/heads/main/'
mapsLink=mainLink+'worldMaps_Py.gpkg'

gpd.list_layers(mapsLink)

Unnamed: 0,name,geometry_type
0,countries_poly,MultiPolygon
1,rivers_line,MultiLineString
2,cities_point,Point


Let's see what the polygons have:

In [2]:
world=gpd.read_file(mapsLink, layer='countries_poly')
world.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 252 entries, 0 to 251
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   COUNTRY   252 non-null    object  
 1   geometry  252 non-null    geometry
dtypes: geometry(1), object(1)
memory usage: 4.1+ KB


This map has no interesting information beyond the geometry, which, as we know, serves to plot the polygons.

This sections is about adding columns to the geodataframe, so that we can do more than plotting the polygons. 

Let me bring some data on countries fragility from this [study](https://fragilestatesindex.org/indicators/). I have saved the data in the GitHub repo of this session, too. Let's see those indicators per country for 2023:

In [28]:
import pandas as pd


fragilityLink=mainLink+'dataFragility/fragility2023.csv'

fragility=pd.read_csv(fragilityLink)

fragility.head()

Unnamed: 0,Country,Year,Total,C1_SecurityApparatus,C2_FactionalizedElites,C3_GroupGrievance,E1_Economy,E2_EconomicInequality,E3_HumanFlightandBrainDrain,P1_StateLegitimacy,P2_PublicServices,P3_HumanRights,S1_DemographicPressures,S2_RefugeesandIDPs,X1_ExternalIntervention,iso2,iso3
0,SOMALIA,2023,111.9,9.5,10.0,8.7,9.5,9.1,8.6,9.6,9.8,9.0,10.0,9.0,9.1,SO,SOM
1,CONGO DEMOCRATIC REPUBLIC,2023,107.2,8.8,9.6,9.4,8.1,8.4,6.4,9.3,9.3,9.3,9.7,9.8,9.1,CD,COD
2,SUDAN,2023,106.2,8.3,9.6,9.3,9.3,8.5,7.5,9.4,8.6,9.2,8.8,9.6,8.1,SD,SDN
3,SOUTH SUDAN,2023,108.5,9.9,9.2,8.6,8.6,8.6,6.5,9.8,9.7,8.7,9.7,10.0,9.2,SS,SSD
4,CHAD,2023,104.6,8.7,9.5,8.1,8.4,8.7,7.7,9.1,9.6,8.4,9.5,9.0,7.9,TD,TCD


As you see, you have several variables:

In [4]:
fragility.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 179 entries, 0 to 178
Data columns (total 17 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Country                      179 non-null    object 
 1   Year                         179 non-null    int64  
 2   Total                        179 non-null    float64
 3   C1_SecurityApparatus         179 non-null    float64
 4   C2_FactionalizedElites       179 non-null    float64
 5   C3_GroupGrievance            179 non-null    float64
 6   E1_Economy                   179 non-null    float64
 7   E2_EconomicInequality        179 non-null    float64
 8   E3_HumanFlightandBrainDrain  179 non-null    float64
 9   P1_StateLegitimacy           179 non-null    float64
 10  P2_PublicServices            179 non-null    float64
 11  P3_HumanRights               179 non-null    float64
 12  S1_DemographicPressures      179 non-null    float64
 13  S2_RefugeesandIDPs  

These indicators are explained [here](https://fragilestatesindex.org/indicators/); and at the end, you have the ISO codes for the countries.

# Basic merging

For the merging process, we need a common column. The country name is the option.

In [5]:
world.head()

Unnamed: 0,COUNTRY,geometry
0,Aruba (Netherlands),"MULTIPOLYGON (((-69.88223 12.41111, -69.94695 ..."
1,Antigua and Barbuda,"MULTIPOLYGON (((-61.73889 17.54055, -61.75195 ..."
2,Afghanistan,"MULTIPOLYGON (((61.27656 35.60725, 61.29638 35..."
3,Algeria,"MULTIPOLYGON (((-5.15213 30.18047, -5.13917 30..."
4,Azerbaijan,"MULTIPOLYGON (((46.54037 38.87559, 46.49554 38..."


The map has the country in lower case, while the fragility data has the countries in upper case; then:

In [6]:
# to upper case.
world['COUNTRY']=world['COUNTRY'].str.upper()

When you add data to the map, you have to follow this order:

In [7]:
world.merge(fragility, left_on='COUNTRY', right_on='Country')

Unnamed: 0,COUNTRY,geometry,Country,Year,Total,C1_SecurityApparatus,C2_FactionalizedElites,C3_GroupGrievance,E1_Economy,E2_EconomicInequality,E3_HumanFlightandBrainDrain,P1_StateLegitimacy,P2_PublicServices,P3_HumanRights,S1_DemographicPressures,S2_RefugeesandIDPs,X1_ExternalIntervention,iso2,iso3
0,ANTIGUA AND BARBUDA,"MULTIPOLYGON (((-61.73889 17.54055, -61.75195 ...",ANTIGUA AND BARBUDA,2023,53.8,4.9,3.7,3.6,6.6,5.1,6.2,3.6,3.8,3.8,3.7,2.7,6.1,AG,ATG
1,AFGHANISTAN,"MULTIPOLYGON (((61.27656 35.60725, 61.29638 35...",AFGHANISTAN,2023,106.6,9.7,8.7,8.3,9.6,8.2,8.5,9.4,10.0,8.7,9.2,8.6,7.7,AF,AFG
2,ALGERIA,"MULTIPOLYGON (((-5.15213 30.18047, -5.13917 30...",ALGERIA,2023,70.0,5.8,6.9,7.0,6.2,5.2,5.1,7.6,5.0,6.9,5.0,6.2,3.1,DZ,DZA
3,AZERBAIJAN,"MULTIPOLYGON (((46.54037 38.87559, 46.49554 38...",AZERBAIJAN,2023,72.7,5.8,7.9,5.9,4.5,4.5,4.6,9.2,4.9,7.5,3.8,6.3,7.8,AZ,AZE
4,ALBANIA,"MULTIPOLYGON (((20.79192 40.43154, 20.78722 40...",ALBANIA,2023,56.8,4.8,6.2,3.5,6.1,2.9,8.5,5.0,3.8,3.6,4.1,2.8,5.5,AL,ALB
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
163,YEMEN,"MULTIPOLYGON (((48.68639 14.0375, 48.61 14.044...",YEMEN,2023,108.9,8.6,9.9,8.8,9.9,7.9,6.4,9.8,9.6,9.6,9.6,9.6,9.2,YE,YEM
164,ZAMBIA,"MULTIPOLYGON (((30.21302 -14.98172, 30.21917 -...",ZAMBIA,2023,81.8,3.9,5.6,5.9,8.2,9.1,6.3,6.7,8.1,7.5,9.4,4.9,6.2,ZM,ZMB
165,ZIMBABWE,"MULTIPOLYGON (((32.48888 -21.34445, 32.46541 -...",ZIMBABWE,2023,96.9,8.4,10.0,5.9,9.2,7.8,7.1,8.9,8.8,7.8,8.7,7.6,6.7,ZW,ZWE
166,SOUTH SUDAN,"MULTIPOLYGON (((34.21807 9.96458, 34.20722 9.9...",SOUTH SUDAN,2023,108.5,9.9,9.2,8.6,8.6,8.6,6.5,9.8,9.7,8.7,9.7,10.0,9.2,SS,SSD


Notice that the merge produces 168 rows, and the map has originally:

In [8]:
# amount of rows in map
world.shape[0]

252

The fragility has these amount of rows:

In [9]:
fragility.shape[0]

179

Using the normal merge we may lose some rows. Let's see if we can recover some rows.

## Fuzzy merging

Let's see what is not matching:

In [29]:
onlyFragil=set(fragility.Country)- set(world.COUNTRY)
onlyMap=set(world.COUNTRY)- set(fragility.Country)

Check here:

In [30]:
onlyFragil

{'BRUNEI DARUSSALAM',
 'CONGO DEMOCRATIC REPUBLIC',
 'CONGO REPUBLIC',
 "COTE D'IVOIRE",
 'ESWATINI',
 'GUINEA BISSAU',
 'KYRGYZ REPUBLIC',
 'MICRONESIA',
 'SAMOA',
 'SLOVAK REPUBLIC',
 'TIMOR-LESTE'}

In [31]:
# and here
onlyMap

{'AMERICAN SAMOA (US)',
 'AMERICAN VIRGIN ISLANDS (US)',
 'ANDORRA',
 'ANGUILLA (UK)',
 'ANTARCTICA',
 'ARUBA (NETHERLANDS)',
 'BAKER ISLAND (US)',
 'BERMUDA (UK)',
 'BONAIRE (NETHERLANDS)',
 'BOUVET ISLAND (NORWAY)',
 'BRITISH INDIAN OCEAN TERRITORY (UK)',
 'BRITISH VIRGIN ISLANDS(UK)',
 'BRUNEI',
 'CAYMAN ISLANDS (UK)',
 'CHRISTMAS ISLAND (AUSTRALIA)',
 'COCOS (KEELING) ISLANDS (AUSTRALIA)',
 'CONGO',
 'COOK ISLANDS (NEW ZEALAND)',
 'CURACAO (NETHERLANDS)',
 'DEMOCRATIC REPUBLIC OF THE CONGO',
 'DOMINICA',
 'EAST TIMOR',
 'FALKLAND ISLANDS (UK)',
 'FAROE ISLANDS (DENMARK)',
 'FEDERATED STATES OF MICRONESIA',
 'FRENCH GUIANA (FRANCE)',
 'FRENCH POLYNESIA (FRANCE)',
 'FRENCH SOUTHERN & ANTARCTIC LANDS (FRANCE)',
 'GIBRALTAR (UK)',
 'GLORIOSO ISLANDS (FRANCE)',
 'GREENLAND (DENMARK)',
 'GUADELOUPE (FRANCE)',
 'GUAM (US)',
 'GUERNSEY (UK)',
 'GUINEA-BISSAU',
 'HEARD ISLAND & MCDONALD ISLANDS (AUSTRALIA)',
 'HOWLAND ISLAND (US)',
 'ISLE OF MAN (UK)',
 'IVORY COAST',
 'JAN MAYEN (NORWAY)',

Let's find similar names:

In [32]:
from thefuzz import process

[(country, process.extractOne(country,onlyMap)) for country in sorted(onlyFragil)]

[('BRUNEI DARUSSALAM', ('BRUNEI', 90)),
 ('CONGO DEMOCRATIC REPUBLIC', ('DEMOCRATIC REPUBLIC OF THE CONGO', 95)),
 ('CONGO REPUBLIC', ('CONGO', 90)),
 ("COTE D'IVOIRE", ('IVORY COAST', 63)),
 ('ESWATINI', ('MARTINIQUE (FRANCE)', 60)),
 ('GUINEA BISSAU', ('GUINEA-BISSAU', 100)),
 ('KYRGYZ REPUBLIC', ('DEMOCRATIC REPUBLIC OF THE CONGO', 86)),
 ('MICRONESIA', ('FEDERATED STATES OF MICRONESIA', 90)),
 ('SAMOA', ('AMERICAN SAMOA (US)', 90)),
 ('SLOVAK REPUBLIC', ('DEMOCRATIC REPUBLIC OF THE CONGO', 86)),
 ('TIMOR-LESTE', ('EAST TIMOR', 81))]

Let's change the difficult ones manually:

In [33]:
changesDict={'Country':{'ESWATINI': 'SWAZILAND'}}

fragility.replace(changesDict,inplace=True)

In [36]:
onlyFragil=set(fragility.Country)- set(world.COUNTRY)
onlyMap=set(world.COUNTRY)- set(fragility.Country)
[(country, process.extractOne(country,onlyMap)) for country in sorted(onlyFragil) if country!='SAMOA']

[('BRUNEI DARUSSALAM', ('BRUNEI', 90)),
 ('CONGO DEMOCRATIC REPUBLIC', ('DEMOCRATIC REPUBLIC OF THE CONGO', 95)),
 ('CONGO REPUBLIC', ('CONGO', 90)),
 ("COTE D'IVOIRE", ('IVORY COAST', 63)),
 ('GUINEA BISSAU', ('GUINEA-BISSAU', 100)),
 ('KYRGYZ REPUBLIC', ('DEMOCRATIC REPUBLIC OF THE CONGO', 86)),
 ('MICRONESIA', ('FEDERATED STATES OF MICRONESIA', 90)),
 ('SLOVAK REPUBLIC', ('DEMOCRATIC REPUBLIC OF THE CONGO', 86)),
 ('TIMOR-LESTE', ('EAST TIMOR', 81))]

In [37]:
[(country, process.extractOne(country,onlyMap)) for country in sorted(onlyFragil) 
 if country!='SAMOA' and process.extractOne(country,onlyMap)[1]>=90]

[('BRUNEI DARUSSALAM', ('BRUNEI', 90)),
 ('CONGO DEMOCRATIC REPUBLIC', ('DEMOCRATIC REPUBLIC OF THE CONGO', 95)),
 ('CONGO REPUBLIC', ('CONGO', 90)),
 ('GUINEA BISSAU', ('GUINEA-BISSAU', 100)),
 ('MICRONESIA', ('FEDERATED STATES OF MICRONESIA', 90))]

In [38]:
# then:
try1={country: process.extractOne(country,onlyMap)[0] for country in sorted(onlyFragil) 
 if country!='SAMOA' and process.extractOne(country,onlyMap)[1]>=90}
try1

{'BRUNEI DARUSSALAM': 'BRUNEI',
 'CONGO DEMOCRATIC REPUBLIC': 'DEMOCRATIC REPUBLIC OF THE CONGO',
 'CONGO REPUBLIC': 'CONGO',
 'GUINEA BISSAU': 'GUINEA-BISSAU',
 'MICRONESIA': 'FEDERATED STATES OF MICRONESIA'}

In [39]:
changesDict1={'Country':try1}

fragility.replace(changesDict1,inplace=True)


# updating
onlyFragil=set(fragility.Country)- set(world.COUNTRY)
onlyMap=set(world.COUNTRY)- set(fragility.Country)

In [42]:
# new matches
[(country, process.extractOne(country,onlyMap)) for country in sorted(onlyFragil) if country!='SAMOA']

[("COTE D'IVOIRE", ('IVORY COAST', 63)),
 ('KYRGYZ REPUBLIC', ('KYRGYZSTAN', 68)),
 ('SLOVAK REPUBLIC', ('SLOVAKIA', 77)),
 ('TIMOR-LESTE', ('EAST TIMOR', 81))]

In [43]:
# then:
try2={country: process.extractOne(country,onlyMap)[0] for country in sorted(onlyFragil) 
 if country!='SAMOA'}
try2

{"COTE D'IVOIRE": 'IVORY COAST',
 'KYRGYZ REPUBLIC': 'KYRGYZSTAN',
 'SLOVAK REPUBLIC': 'SLOVAKIA',
 'TIMOR-LESTE': 'EAST TIMOR'}

In [44]:
changesDict2={'Country':try2}

fragility.replace(changesDict2,inplace=True)


# updating
onlyFragil=set(fragility.Country)- set(world.COUNTRY)
onlyMap=set(world.COUNTRY)- set(fragility.Country)



In [45]:
# new matches
[(country, process.extractOne(country,onlyMap)) for country in sorted(onlyFragil)]

[('SAMOA', ('WESTERN SAMOA', 90))]

In [46]:
# then:
try3={country: process.extractOne(country,onlyMap)[0] for country in sorted(onlyFragil) }
try3

{'SAMOA': 'WESTERN SAMOA'}

Making changes and updating:

In [47]:
changesDict3={'Country':try3}

fragility.replace(changesDict3,inplace=True)


# updating
onlyFragil=set(fragility.Country)- set(world.COUNTRY)
onlyMap=set(world.COUNTRY)- set(fragility.Country)

# new matches
[(country, process.extractOne(country,onlyMap)) for country in sorted(onlyFragil)]

[]

We can not improve the situation.

Now, when you merge a GDF with a DF, **the GDF has to be on the left**:

In [48]:
theMapAndData=world.merge(fragility,left_on='COUNTRY', right_on='Country')
# here it is (new map):
theMapAndData.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 179 entries, 0 to 178
Data columns (total 19 columns):
 #   Column                       Non-Null Count  Dtype   
---  ------                       --------------  -----   
 0   COUNTRY                      179 non-null    object  
 1   geometry                     179 non-null    geometry
 2   Country                      179 non-null    object  
 3   Year                         179 non-null    int64   
 4   Total                        179 non-null    float64 
 5   C1_SecurityApparatus         179 non-null    float64 
 6   C2_FactionalizedElites       179 non-null    float64 
 7   C3_GroupGrievance            179 non-null    float64 
 8   E1_Economy                   179 non-null    float64 
 9   E2_EconomicInequality        179 non-null    float64 
 10  E3_HumanFlightandBrainDrain  179 non-null    float64 
 11  P1_StateLegitimacy           179 non-null    float64 
 12  P2_PublicServices            179 non-null    float64 
 1

# Choropleths

We should plan how to color the polygons based on some variable:

In [57]:
theMapAndData.describe()

Unnamed: 0,Year,Total,C1_SecurityApparatus,C2_FactionalizedElites,C3_GroupGrievance,E1_Economy,E2_EconomicInequality,E3_HumanFlightandBrainDrain,P1_StateLegitimacy,P2_PublicServices,P3_HumanRights,S1_DemographicPressures,S2_RefugeesandIDPs,X1_ExternalIntervention
count,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0,179.0
mean,2023.0,65.832402,5.014525,6.618436,5.57486,5.687151,5.323464,5.184358,5.741341,5.459218,5.436872,5.955866,4.764246,5.072067
std,0.0,23.966251,2.37981,2.427869,2.367757,2.200741,2.068546,2.079591,2.901853,2.581299,2.602588,2.278726,2.373935,2.577801
min,2023.0,14.5,0.3,1.0,0.3,1.0,1.4,0.4,0.3,0.9,0.4,1.1,0.5,0.3
25%,2023.0,49.0,3.35,4.95,3.6,4.1,3.65,3.7,3.65,3.45,3.6,4.1,2.8,3.15
50%,2023.0,68.2,5.1,7.2,5.5,6.0,5.2,5.6,6.4,5.1,5.7,5.9,4.5,5.3
75%,2023.0,82.2,6.7,8.55,7.55,7.15,7.2,6.6,8.1,7.95,7.5,8.05,6.45,7.0
max,2023.0,111.9,10.0,10.0,9.7,9.9,9.6,10.0,10.0,10.0,9.9,10.0,10.0,10.0


In [66]:
theMapAndData.iloc[:,5:15].boxplot()  

AssertionError: 7

In [56]:
import seaborn as sea

sea.histplot(data=theMapAndData, x="C1_SecurityApparatus")

<Axes: xlabel='C1_SecurityApparatus', ylabel='Count'>

Let's see other possibilities to cut the data (instead of the amount of intervals presented in the histogram), but please install [**numba**](https://numba.readthedocs.io/en/stable/user/installing.html) before runing the next code; also make sure you have **pysal**, **mapclassify** and **numpy** installed: 

In [23]:
pip show numba pysal mapclassify numpy

Name: numba
Version: 0.59.1
Summary: compiling Python code using LLVM
Home-page: https://numba.pydata.org
Author: 
Author-email: 
License: BSD
Location: /Users/JoseManuel/opt/anaconda3/envs/ASIES/lib/python3.11/site-packages
Requires: llvmlite, numpy
Required-by: hyppo, pynndescent, quantecon, segregation, umap-learn
---
Name: pysal
Version: 23.7
Summary: A library of spatial analysis functions.
Home-page: http://pysal.org
Author: 
Author-email: 
License: BSD
Location: /Users/JoseManuel/opt/anaconda3/envs/ASIES/lib/python3.11/site-packages
Requires: access, esda, giddy, inequality, libpysal, mapclassify, mgwr, momepy, pointpats, segregation, spaghetti, spglm, spint, splot, spopt, spreg, spvcm, tobler
Required-by: 
---
Name: mapclassify
Version: 2.6.1
Summary: Classification Schemes for Choropleth Maps.
Home-page: 
Author: 
Author-email: 
License: BSD 3-Clause
Location: /Users/JoseManuel/opt/anaconda3/envs/ASIES/lib/python3.11/site-packages
Requires: networkx, numpy, pandas, scikit-lear

In [24]:
import mapclassify 
import numpy as np

np.random.seed(12345) # so we all get the same results!

# let's try 5 intervals
K=5
theVar=theMapAndData.Total
# same interval width, easy interpretation
ei5 = mapclassify.EqualInterval(theVar, k=K)
# same interval width based on standard deviation, easy - but not as the previous one, poor when high skewness
msd = mapclassify.StdMean(theVar)
# interval width varies, counts per interval are close, not easy to grasp, repeated values complicate cuts                                
q5=mapclassify.Quantiles(theVar,k=K)

# based on similarity, good for multimodal data 
mb5 = mapclassify.MaximumBreaks(theVar, k=K)
# based on similarity, good for skewed data
ht = mapclassify.HeadTailBreaks(theVar) # no K needed
# based on similarity, optimizer
fj5 = mapclassify.FisherJenks(theVar, k=K)
# based on similarity, optimizer
jc5 = mapclassify.JenksCaspall(theVar, k=K)
# based on similarity, optimizer
mp5 = mapclassify.MaxP(theVar, k=K) 

How can we select the right classification?
Let me use the the Absolute deviation around class median (ADCM) to make the comparisson:

In [25]:
class5 = ei5,msd, q5,mb5,  ht, fj5, jc5, mp5
# Collect ADCM for each classifier
fits = np.array([ c.adcm for c in class5])
# Convert ADCM scores to a DataFrame
adcms = pd.DataFrame(fits)
# Add classifier names
adcms['classifier'] = [c.name for c in class5]
# Add column names to the ADCM
adcms.columns = ['ADCM', 'Classifier']

Now, plot the **adcms**:

In [26]:
adcms.sort_values('ADCM').plot.barh(x='Classifier')

<Axes: ylabel='Classifier'>

Let's save the best three strategies:

In [27]:
theMapAndData.loc[:,'Total_ei5'] = ei5.yb
theMapAndData.loc[:,'Total_fj5'] = fj5.yb
theMapAndData.loc[:,'Total_jc5'] = jc5.yb

In [28]:
# there you are
theMapAndData.head()

Unnamed: 0,COUNTRY,geometry,Country,Year,Total,C1_SecurityApparatus,C2_FactionalizedElites,C3_GroupGrievance,E1_Economy,E2_EconomicInequality,...,P2_PublicServices,P3_HumanRights,S1_DemographicPressures,S2_RefugeesandIDPs,X1_ExternalIntervention,iso2,iso3,Total_ei5,Total_fj5,Total_jc5
0,ANTIGUA AND BARBUDA,"MULTIPOLYGON (((-61.73889 17.54055, -61.75195 ...",ANTIGUA AND BARBUDA,2023,53.8,4.9,3.7,3.6,6.6,5.1,...,3.8,3.8,3.7,2.7,6.1,AG,ATG,2,1,1
1,AFGHANISTAN,"MULTIPOLYGON (((61.27656 35.60725, 61.29638 35...",AFGHANISTAN,2023,106.6,9.7,8.7,8.3,9.6,8.2,...,10.0,8.7,9.2,8.6,7.7,AF,AFG,4,4,4
2,ALGERIA,"MULTIPOLYGON (((-5.15213 30.18047, -5.13917 30...",ALGERIA,2023,70.0,5.8,6.9,7.0,6.2,5.2,...,5.0,6.9,5.0,6.2,3.1,DZ,DZA,2,2,2
3,AZERBAIJAN,"MULTIPOLYGON (((46.54037 38.87559, 46.49554 38...",AZERBAIJAN,2023,72.7,5.8,7.9,5.9,4.5,4.5,...,4.9,7.5,3.8,6.3,7.8,AZ,AZE,2,2,2
4,ALBANIA,"MULTIPOLYGON (((20.79192 40.43154, 20.78722 40...",ALBANIA,2023,56.8,4.8,6.2,3.5,6.1,2.9,...,3.8,3.6,4.1,2.8,5.5,AL,ALB,2,2,1


Let's check the mean of 'Total_mnmx' by the labels of the columns created (from '0' to '4')

In [29]:
indexList=['Total_ei5','Total_fj5','Total_jc5']
aggregator={'Total': ['mean']}

pd.concat([theMapAndData[['Total',col]].groupby(col,as_index=False).agg(aggregator) for col in indexList],axis=1)

Unnamed: 0_level_0,Total_ei5,Total,Total_fj5,Total,Total_jc5,Total
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,Unnamed: 3_level_1,mean.1,Unnamed: 5_level_1,mean.2
0,0,22.854545,0,23.356522,0,23.356522
1,1,44.793333,1,46.771429,1,47.318919
2,2,64.109804,2,65.377778,2,66.102222
3,3,82.029412,3,80.406977,3,80.555
4,4,100.8,4,98.2875,4,98.00303


Verify data types:

In [30]:
theMapAndData.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 22 columns):
 #   Column                       Non-Null Count  Dtype   
---  ------                       --------------  -----   
 0   COUNTRY                      178 non-null    object  
 1   geometry                     178 non-null    geometry
 2   Country                      178 non-null    object  
 3   Year                         178 non-null    int64   
 4   Total                        178 non-null    float64 
 5   C1_SecurityApparatus         178 non-null    float64 
 6   C2_FactionalizedElites       178 non-null    float64 
 7   C3_GroupGrievance            178 non-null    float64 
 8   E1_Economy                   178 non-null    float64 
 9   E2_EconomicInequality        178 non-null    float64 
 10  E3_HumanFlightandBrainDrain  178 non-null    float64 
 11  P1_StateLegitimacy           178 non-null    float64 
 12  P2_PublicServices            178 non-null    float64 
 1

Let me create a copy of those columns with new names:

In [31]:
newColNames=[ name+"_cat" for name in indexList]

theMapAndData[newColNames]=theMapAndData.loc[:,indexList]
theMapAndData.head()

Unnamed: 0,COUNTRY,geometry,Country,Year,Total,C1_SecurityApparatus,C2_FactionalizedElites,C3_GroupGrievance,E1_Economy,E2_EconomicInequality,...,S2_RefugeesandIDPs,X1_ExternalIntervention,iso2,iso3,Total_ei5,Total_fj5,Total_jc5,Total_ei5_cat,Total_fj5_cat,Total_jc5_cat
0,ANTIGUA AND BARBUDA,"MULTIPOLYGON (((-61.73889 17.54055, -61.75195 ...",ANTIGUA AND BARBUDA,2023,53.8,4.9,3.7,3.6,6.6,5.1,...,2.7,6.1,AG,ATG,2,1,1,2,1,1
1,AFGHANISTAN,"MULTIPOLYGON (((61.27656 35.60725, 61.29638 35...",AFGHANISTAN,2023,106.6,9.7,8.7,8.3,9.6,8.2,...,8.6,7.7,AF,AFG,4,4,4,4,4,4
2,ALGERIA,"MULTIPOLYGON (((-5.15213 30.18047, -5.13917 30...",ALGERIA,2023,70.0,5.8,6.9,7.0,6.2,5.2,...,6.2,3.1,DZ,DZA,2,2,2,2,2,2
3,AZERBAIJAN,"MULTIPOLYGON (((46.54037 38.87559, 46.49554 38...",AZERBAIJAN,2023,72.7,5.8,7.9,5.9,4.5,4.5,...,6.3,7.8,AZ,AZE,2,2,2,2,2,2
4,ALBANIA,"MULTIPOLYGON (((20.79192 40.43154, 20.78722 40...",ALBANIA,2023,56.8,4.8,6.2,3.5,6.1,2.9,...,2.8,5.5,AL,ALB,2,2,1,2,2,1


In [32]:
# renaming
newLabelsForLevels={0:"0_Great", 1:"1_Good", 2:"2_Middle", 3:"3_Bad", 4:"4_Poor"}

theMapAndData[newColNames]=theMapAndData.loc[:,newColNames].replace(newLabelsForLevels)
theMapAndData.drop(columns=['Country'],inplace=True)
theMapAndData

Unnamed: 0,COUNTRY,geometry,Year,Total,C1_SecurityApparatus,C2_FactionalizedElites,C3_GroupGrievance,E1_Economy,E2_EconomicInequality,E3_HumanFlightandBrainDrain,...,S2_RefugeesandIDPs,X1_ExternalIntervention,iso2,iso3,Total_ei5,Total_fj5,Total_jc5,Total_ei5_cat,Total_fj5_cat,Total_jc5_cat
0,ANTIGUA AND BARBUDA,"MULTIPOLYGON (((-61.73889 17.54055, -61.75195 ...",2023,53.8,4.9,3.7,3.6,6.6,5.1,6.2,...,2.7,6.1,AG,ATG,2,1,1,2_Middle,1_Good,1_Good
1,AFGHANISTAN,"MULTIPOLYGON (((61.27656 35.60725, 61.29638 35...",2023,106.6,9.7,8.7,8.3,9.6,8.2,8.5,...,8.6,7.7,AF,AFG,4,4,4,4_Poor,4_Poor,4_Poor
2,ALGERIA,"MULTIPOLYGON (((-5.15213 30.18047, -5.13917 30...",2023,70.0,5.8,6.9,7.0,6.2,5.2,5.1,...,6.2,3.1,DZ,DZA,2,2,2,2_Middle,2_Middle,2_Middle
3,AZERBAIJAN,"MULTIPOLYGON (((46.54037 38.87559, 46.49554 38...",2023,72.7,5.8,7.9,5.9,4.5,4.5,4.6,...,6.3,7.8,AZ,AZE,2,2,2,2_Middle,2_Middle,2_Middle
4,ALBANIA,"MULTIPOLYGON (((20.79192 40.43154, 20.78722 40...",2023,56.8,4.8,6.2,3.5,6.1,2.9,8.5,...,2.8,5.5,AL,ALB,2,2,1,2_Middle,2_Middle,1_Good
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,ZAMBIA,"MULTIPOLYGON (((30.21302 -14.98172, 30.21917 -...",2023,81.8,3.9,5.6,5.9,8.2,9.1,6.3,...,4.9,6.2,ZM,ZMB,3,3,3,3_Bad,3_Bad,3_Bad
174,ZIMBABWE,"MULTIPOLYGON (((32.48888 -21.34445, 32.46541 -...",2023,96.9,8.4,10.0,5.9,9.2,7.8,7.1,...,7.6,6.7,ZW,ZWE,4,4,4,4_Poor,4_Poor,4_Poor
175,SOUTH SUDAN,"MULTIPOLYGON (((34.21807 9.96458, 34.20722 9.9...",2023,108.5,9.9,9.2,8.6,8.6,8.6,6.5,...,10.0,9.2,SS,SSD,4,4,4,4_Poor,4_Poor,4_Poor
176,INDONESIA,"MULTIPOLYGON (((123.21846 -10.80917, 123.19832...",2023,65.6,5.2,7.1,6.9,4.1,4.4,5.7,...,4.4,3.7,ID,IDN,2,2,2,2_Middle,2_Middle,2_Middle


We are ready for a choropleth:

In [33]:
import matplotlib.pyplot as plt

f, ax = plt.subplots(1, figsize=(10, 10))
theMapAndData.plot(column='Total_ei5', # variable to plot
                   cmap='viridis', # set of colors
                   categorical=True, # can be interpreted as category
                   edgecolor='white', # border color
                   linewidth=0., # width of border
                   alpha=1, # level of transparency (0 is invisible)
                   legend=True, # need a legend?
                   # location of legend: 'best', 'upper right', 'upper left', 'lower left',
                   # 'lower right', 'right', 'center left', 'center right',
                   # 'lower center', 'upper center', 'center'
                   legend_kwds={'loc':"lower left"}, 
        ax=ax
       )

ax.set_axis_off()

In [34]:
# alternatively:

import matplotlib.pyplot as plt

f, ax = plt.subplots(1, figsize=(10, 10))
theMapAndData.plot(column='Total_ei5_cat', # annotated
        cmap='viridis', 
        categorical=True,
        edgecolor='white', 
        linewidth=0., 
        alpha=1, 
        legend=True,
        legend_kwds={'loc':3},
        ax=ax
       )

ax.set_axis_off()

Once you know the ADCM, you can request the choropleth without creating a variable:

In [35]:
import matplotlib.pyplot as plt

f, ax = plt.subplots(1, figsize=(10, 10))
theMapAndData.plot(column='Total', 
        cmap='viridis', 
                   scheme="equal_interval",
        edgecolor='white', 
        linewidth=0., 
        alpha=0.75, 
        legend=True,
        legend_kwds={'loc':3},
        ax=ax
       )

ax.set_axis_off()

In [36]:
import matplotlib.pyplot as plt

f, ax = plt.subplots(1, figsize=(10, 10))
theMapAndData.plot(column='Total_ei5_cat', 
        cmap='viridis', 
        categorical=True,
        edgecolor='white', 
        linewidth=0., 
        alpha=0.75, 
        legend=True,
        legend_kwds={'loc':"lower right"},
        ax=ax
       )

ax.set_axis_off()

Let's save our data

In [37]:
theMapAndData.to_file(os.path.join("maps","theMapAndData.gpkg"), layer='fragility', driver="GPKG")

NameError: name 'os' is not defined

<div class="alert alert-danger">
  <strong>CHALLENGE 1</strong> 
    <br> * Create a public repo named "week2_spatial" with its README file. (1 point)
    <br> * Clone the repo to your computer. (1 point)
    <br> * In the local repo in your computer, create a folder named "data". (1 point)
    <br> * Get Three maps for the same country: the lines can be rivers, highways or similar; the points have to be airports; and the polygons  of the 2rd administrative division ('provinces' in Perú, 'counties' in USA). Download those maps into the "data" folder. You can find airports here: https://ourairports.com/data/ (5 points)
    <br> * Plot in one map the three layers of maps, including the code. (5 points)
    <br> * Publish the three layer map. (3 points)
    <br> * Update the README to offer a quick explanation, the data dictionary, and the link to the published map. (2 points)
    <br> * Make sure the code is well organized (explanations, comments, no warnings, no python messages). (2 points)
    
</div>


<div class="alert alert-danger">
  <strong>CHALLENGE 2</strong> 
    <br> * Create a public repo named week2_spatial with its README file. (1 point)
    <br> * Clone the repo to your computer. (1 point)
    <br> * In the local repo in your computer, create a folder named "data". (1 point)
    <br> * Get for the provinces of Peru data for any variable of your concern, the variable has to be measured in several years (you need at least 3 measures if the measures were every 5 or 10 years, or 10 measures if taken yearly). (4 points)
    <br> * Merge that data into the map of provinces of Peru. (3 points)
    <br> * Plot two maps, one with the provinces that improved, and other with the ones that worsen, include the code. (3 points)
    <br> * Publish the two maps. (3 points)
    <br> * Update the README to offer a quick explanation, the data dictionary, and the link to the published map. (2 points)
    <br> * Make sure the code is well organized (explanations, comments, no warnings, no python messages). (2 points)
    
</div>
