<center><img src="https://i.imgur.com/zRrFdsf.png" width="700"></center>

# The Thematic map

Last session we opened this map:

In [2]:
import geopandas as gpd

linkGitSession='https://github.com/CienciaDeDatosEspacial/dataSets/raw/main/'
linkCountries='WORLD/World_Countries.zip'

fullLinkCountries=linkGitSession+linkCountries
countries=gpd.read_file(fullLinkCountries)
countries

Unnamed: 0,COUNTRY,geometry
0,Aruba (Netherlands),"POLYGON ((-69.88223 12.41111, -69.94695 12.436..."
1,Antigua and Barbuda,"MULTIPOLYGON (((-61.73889 17.54055, -61.75195 ..."
2,Afghanistan,"POLYGON ((61.27656 35.60725, 61.29638 35.62853..."
3,Algeria,"POLYGON ((-5.15213 30.18047, -5.13917 30.19236..."
4,Azerbaijan,"MULTIPOLYGON (((46.54037 38.87559, 46.49554 38..."
...,...,...
247,South Sudan,"POLYGON ((34.21807 9.96458, 34.20722 9.905, 34..."
248,Indonesia,"MULTIPOLYGON (((123.21846 -10.80917, 123.19832..."
249,East Timor,"MULTIPOLYGON (((124.41824 -9.3001, 124.40446 -..."
250,Curacao (Netherlands),"POLYGON ((-68.96556 12.19889, -68.91196 12.181..."


As you see, the GDF above has just two colums; enough to plot a map, but no more than that.

Let me open a DF:

In [3]:
import pandas as pd

someDataLink='WORLD/FragilityCia_isos.csv'

someData=pd.read_csv(linkGitSession+someDataLink)

someData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 170 entries, 0 to 169
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Country            170 non-null    object 
 1   Officialstatename  170 non-null    object 
 2   InternetccTLD      170 non-null    object 
 3   iso2               169 non-null    object 
 4   iso3               170 non-null    object 
 5   fragility          170 non-null    float64
 6   co2                170 non-null    float64
 7   region             170 non-null    object 
 8   ForestRev_gdp      170 non-null    float64
dtypes: float64(3), object(6)
memory usage: 12.1+ KB


The DF has some interesting numerical data (_float64_): 
* fragility: fragility index 2023 [details here](https://fragilestatesindex.org/2023/06/14/fragile-states-index-2023-annual-report/)
* co2: metric tonnes of CO2 emmitted [details here](https://www.cia.gov/the-world-factbook/field/carbon-dioxide-emissions/country-comparison/)
* ForestRev_gdp: percentage of a country's GDP, from the harvesting of forests  [details here](https://www.cia.gov/the-world-factbook/about/archives/2023/field/revenue-from-forest-resources/)

There are also other columns that may be of help:

In [4]:
someData.head()

Unnamed: 0,Country,Officialstatename,InternetccTLD,iso2,iso3,fragility,co2,region,ForestRev_gdp
0,AFGHANISTAN,The Islamic Republic of Afghanistan,.af,AF,AFG,105.0,7893000.0,SOUTH ASIA,0.2
1,ALBANIA,The Republic of Albania,.al,AL,ALB,58.9,3794000.0,EUROPE,0.18
2,ALGERIA,The People's Democratic Republic of Algeria,.dz,DZ,DZA,75.4,151633000.0,AFRICA,0.1
3,ANGOLA,The Republic of Angola,.ao,AO,AGO,87.8,19362000.0,AFRICA,0.36
4,ANTIGUA AND BARBUDA,Antigua and Barbuda,.ag,AG,ATG,54.4,729000.0,CENTRAL AMERICA AND THE CARIBBEAN,0.0


Preparing thematic maps requires social data about the geometry (line, polygon, point). The object **countries** has no social data, so the preprocessing requires geomerging.

## Geo Merging

This s a critical preprocessing operation, as it is charge of combining to data sets, in this case combine a DF into a GDF. Some rules are needed:

* There is at least one common column needed to serve for the matches.
* The common column(s), or KEY(s), need contents to be written exactly in both columns.
* It is recommended that both KEY(s) are named the same.

Here we have the KEYs:

In [5]:
countries.COUNTRY.head()

0    Aruba (Netherlands)
1    Antigua and Barbuda
2            Afghanistan
3                Algeria
4             Azerbaijan
Name: COUNTRY, dtype: object

In [7]:
someData.Country.head()

0            AFGHANISTAN
1                ALBANIA
2                ALGERIA
3                 ANGOLA
4    ANTIGUA AND BARBUDA
Name: Country, dtype: object

Let's solve the name differences for thr KEYs:

In [9]:
countries.rename(columns={'COUNTRY':'Country'},inplace=True)

We can notice the different case in the KEYs. Let's work with uppercases:

In [10]:
countries['Country']=countries.Country.str.upper()

It is very unlikely the names are written the same. Verify:

In [11]:
onlyDF=set(someData.Country)- set(countries.Country)
onlyGDF=set(countries.Country)- set(someData.Country)

Check here:

In [12]:
onlyDF

{'BAHAMAS (THE)',
 'BOLIVIA (PLURINATIONAL STATE OF)',
 'BRUNEI DARUSSALAM',
 'CABO VERDE',
 'CENTRAL AFRICAN REPUBLIC (THE)',
 'COMOROS (THE)',
 'CONGO (THE DEMOCRATIC REPUBLIC OF THE)',
 'CONGO (THE)',
 "CÔTE D'IVOIRE",
 'DOMINICAN REPUBLIC (THE)',
 'ESWATINI',
 'GAMBIA (THE)',
 'IRAN (ISLAMIC REPUBLIC OF)',
 'KOREA (THE REPUBLIC OF)',
 "LAO PEOPLE'S DEMOCRATIC REPUBLIC (THE)",
 'MICRONESIA (FEDERATED STATES OF)',
 'MOLDOVA (THE REPUBLIC OF)',
 'NETHERLANDS (THE)',
 'NIGER (THE)',
 'NORTH MACEDONIA',
 'NORTHERN MARIANA ISLANDS (THE)',
 'PHILIPPINES (THE)',
 'RUSSIAN FEDERATION (THE)',
 'SAMOA',
 'SUDAN (THE)',
 'TANZANIA, THE UNITED REPUBLIC OF',
 'TIMOR-LESTE',
 'UNITED ARAB EMIRATES (THE)',
 'UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND (THE)',
 'UNITED STATES OF AMERICA (THE)',
 'VIET NAM'}

There are several countries in the DF that did not find a match in the GDF (map). Of course, most of them are in the map, let's see why they were not matched:

### Fuzzy merging

We use this to detect similarities between strings. You need this package **thefuzz**:

In [15]:
# !pip install thefuzz

Collecting thefuzz
  Downloading thefuzz-0.22.1-py3-none-any.whl.metadata (3.9 kB)
Collecting rapidfuzz<4.0.0,>=3.0.0 (from thefuzz)
  Downloading rapidfuzz-3.14.0-cp311-cp311-macosx_11_0_arm64.whl.metadata (12 kB)
Downloading thefuzz-0.22.1-py3-none-any.whl (8.2 kB)
Downloading rapidfuzz-3.14.0-cp311-cp311-macosx_11_0_arm64.whl (1.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m6.6 MB/s[0m  [33m0:00:00[0m eta [36m0:00:01[0m
[?25hInstalling collected packages: rapidfuzz, thefuzz
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2/2[0m [thefuzz]z]
[1A[2KSuccessfully installed rapidfuzz-3.14.0 thefuzz-0.22.1


This is the basic idea:

In [27]:
from thefuzz.process import extractOne as best

best('BAHAMAS (THE)',onlyGDF)

('BAHAMAS', 90)

As you see, you get 90% match between those strings. That may be a reliable result.

Let's do the same for all the unmatched countries:

In [46]:
[(country, best(country,onlyGDF)) for country in onlyDF]

[('GAMBIA (THE)', ('GAMBIA', 90)),
 ('DOMINICAN REPUBLIC (THE)', ('DOMINICAN REPUBLIC', 95)),
 ('BAHAMAS (THE)', ('BAHAMAS', 90)),
 ('VIET NAM', ('VIETNAM', 93)),
 ('NORTH MACEDONIA', ('MACEDONIA', 90)),
 ('NIGER (THE)', ('NIGER', 90)),
 ('MICRONESIA (FEDERATED STATES OF)', ('FEDERATED STATES OF MICRONESIA', 95)),
 ('RUSSIAN FEDERATION (THE)', ('RUSSIA', 90)),
 ('CENTRAL AFRICAN REPUBLIC (THE)', ('CENTRAL AFRICAN REPUBLIC', 95)),
 ('TIMOR-LESTE', ('EAST TIMOR', 81)),
 ('CONGO (THE DEMOCRATIC REPUBLIC OF THE)',
  ('DEMOCRATIC REPUBLIC OF THE CONGO', 95)),
 ('COMOROS (THE)', ('COMOROS', 90)),
 ('PHILIPPINES (THE)', ('PHILIPPINES', 95)),
 ('NETHERLANDS (THE)', ('NETHERLANDS', 95)),
 ('BOLIVIA (PLURINATIONAL STATE OF)', ('BOLIVIA', 90)),
 ('UNITED ARAB EMIRATES (THE)', ('UNITED ARAB EMIRATES', 95)),
 ('NORTHERN MARIANA ISLANDS (THE)', ('NORTHERN MARIANA ISLANDS (US)', 91)),
 ('MOLDOVA (THE REPUBLIC OF)', ('MOLDOVA', 90)),
 ('BRUNEI DARUSSALAM', ('BRUNEI', 90)),
 ('IRAN (ISLAMIC REPUBLIC OF

In [47]:
#or

[(country, best(country,onlyGDF)[0],best(country,onlyGDF)[1]) for country in onlyDF]

[('GAMBIA (THE)', 'GAMBIA', 90),
 ('DOMINICAN REPUBLIC (THE)', 'DOMINICAN REPUBLIC', 95),
 ('BAHAMAS (THE)', 'BAHAMAS', 90),
 ('VIET NAM', 'VIETNAM', 93),
 ('NORTH MACEDONIA', 'MACEDONIA', 90),
 ('NIGER (THE)', 'NIGER', 90),
 ('MICRONESIA (FEDERATED STATES OF)', 'FEDERATED STATES OF MICRONESIA', 95),
 ('RUSSIAN FEDERATION (THE)', 'RUSSIA', 90),
 ('CENTRAL AFRICAN REPUBLIC (THE)', 'CENTRAL AFRICAN REPUBLIC', 95),
 ('TIMOR-LESTE', 'EAST TIMOR', 81),
 ('CONGO (THE DEMOCRATIC REPUBLIC OF THE)',
  'DEMOCRATIC REPUBLIC OF THE CONGO',
  95),
 ('COMOROS (THE)', 'COMOROS', 90),
 ('PHILIPPINES (THE)', 'PHILIPPINES', 95),
 ('NETHERLANDS (THE)', 'NETHERLANDS', 95),
 ('BOLIVIA (PLURINATIONAL STATE OF)', 'BOLIVIA', 90),
 ('UNITED ARAB EMIRATES (THE)', 'UNITED ARAB EMIRATES', 95),
 ('NORTHERN MARIANA ISLANDS (THE)', 'NORTHERN MARIANA ISLANDS (US)', 91),
 ('MOLDOVA (THE REPUBLIC OF)', 'MOLDOVA', 90),
 ('BRUNEI DARUSSALAM', 'BRUNEI', 90),
 ('IRAN (ISLAMIC REPUBLIC OF)', 'IRAN', 90),
 ('SUDAN (THE)', 'S

At this point we should make a decision. The format above makes it complex, maybe this one could help:

In [48]:
pd.DataFrame([(country, best(country,onlyGDF)[0],best(country,onlyGDF)[1]) for country in onlyDF]).sort_values(2)

Unnamed: 0,0,1,2
23,CÔTE D'IVOIRE,IVORY COAST,58
24,CABO VERDE,CAPE VERDE,80
9,TIMOR-LESTE,EAST TIMOR,81
0,GAMBIA (THE),GAMBIA,90
22,CONGO (THE),CONGO,90
21,"TANZANIA, THE UNITED REPUBLIC OF",TANZANIA,90
20,SUDAN (THE),SUDAN,90
19,IRAN (ISLAMIC REPUBLIC OF),IRAN,90
18,BRUNEI DARUSSALAM,BRUNEI,90
17,MOLDOVA (THE REPUBLIC OF),MOLDOVA,90


The previous result helps make a plan based on these cases:
* **ESWATINI** is always a problem, because it is also *SWAZILAND*.
* **KOREA (THE REPUBLIC OF)** is not _NORTH KOREA_.
* **LAO PEOPLE'S DEMOCRATIC REPUBLIC (THE)** is too long to match *LAOS* (if it exists in the map like that).
* **SAMOA** is always a problem, because it is also *WESTERN SAMOA*.

Based on this, we should see what our GDF has:

In [38]:
countries.Country[countries.Country.str.contains('SWAZ|LAO|SAMOA|KORE')]

9      AMERICAN SAMOA (US)
120            NORTH KOREA
122            SOUTH KOREA
126                   LAOS
242          WESTERN SAMOA
243              SWAZILAND
Name: Country, dtype: object

Then, it makes sense to change those manually:

In [43]:
manualChanges={'SWAZILAND':'ESWATINI',
               'LAOS':"LAO PEOPLE'S DEMOCRATIC REPUBLIC (THE)",
               'SOUTH KOREA':'KOREA (THE REPUBLIC OF)',
               'WESTERN SAMOA':'SAMOA',
              }

countries.replace(to_replace={'Country':manualChanges},inplace=True)

At this stage, we should recompute the differences:

In [44]:
# updating
onlyDF=set(someData.Country)- set(countries.Country)
onlyGDF=set(countries.Country)- set(someData.Country)

An re run this code:

In [49]:
# keeping high scores

pd.DataFrame([(country, best(country,onlyGDF)[0],best(country,onlyGDF)[1]) for country in onlyDF]).sort_values(2)

Unnamed: 0,0,1,2
23,CÔTE D'IVOIRE,IVORY COAST,58
24,CABO VERDE,CAPE VERDE,80
9,TIMOR-LESTE,EAST TIMOR,81
0,GAMBIA (THE),GAMBIA,90
22,CONGO (THE),CONGO,90
21,"TANZANIA, THE UNITED REPUBLIC OF",TANZANIA,90
20,SUDAN (THE),SUDAN,90
19,IRAN (ISLAMIC REPUBLIC OF),IRAN,90
18,BRUNEI DARUSSALAM,BRUNEI,90
17,MOLDOVA (THE REPUBLIC OF),MOLDOVA,90


All this is correct now!

Let's prepare the changes:

In [51]:
changesToDF={country: best(country,onlyGDF)[0] for country in onlyDF}
changesToDF

{'GAMBIA (THE)': 'GAMBIA',
 'DOMINICAN REPUBLIC (THE)': 'DOMINICAN REPUBLIC',
 'BAHAMAS (THE)': 'BAHAMAS',
 'VIET NAM': 'VIETNAM',
 'NORTH MACEDONIA': 'MACEDONIA',
 'NIGER (THE)': 'NIGER',
 'MICRONESIA (FEDERATED STATES OF)': 'FEDERATED STATES OF MICRONESIA',
 'RUSSIAN FEDERATION (THE)': 'RUSSIA',
 'CENTRAL AFRICAN REPUBLIC (THE)': 'CENTRAL AFRICAN REPUBLIC',
 'TIMOR-LESTE': 'EAST TIMOR',
 'CONGO (THE DEMOCRATIC REPUBLIC OF THE)': 'DEMOCRATIC REPUBLIC OF THE CONGO',
 'COMOROS (THE)': 'COMOROS',
 'PHILIPPINES (THE)': 'PHILIPPINES',
 'NETHERLANDS (THE)': 'NETHERLANDS',
 'BOLIVIA (PLURINATIONAL STATE OF)': 'BOLIVIA',
 'UNITED ARAB EMIRATES (THE)': 'UNITED ARAB EMIRATES',
 'NORTHERN MARIANA ISLANDS (THE)': 'NORTHERN MARIANA ISLANDS (US)',
 'MOLDOVA (THE REPUBLIC OF)': 'MOLDOVA',
 'BRUNEI DARUSSALAM': 'BRUNEI',
 'IRAN (ISLAMIC REPUBLIC OF)': 'IRAN',
 'SUDAN (THE)': 'SUDAN',
 'TANZANIA, THE UNITED REPUBLIC OF': 'TANZANIA',
 'CONGO (THE)': 'CONGO',
 "CÔTE D'IVOIRE": 'IVORY COAST',
 'CABO VERD

In [52]:
someData.replace(to_replace={'Country':changesToDF},inplace=True)

What is left to match?

In [53]:
# updating
onlyDF=set(someData.Country)- set(countries.Country)
onlyGDF=set(countries.Country)- set(someData.Country)
[(country, best(country,onlyGDF)) for country in onlyDF]

[]

Once you are here, merging can proceed:

In [57]:
## GDF to the 'left'
theMapAndData=countries.merge(someData, on='Country')
theMapAndData

Unnamed: 0,Country,geometry,Officialstatename,InternetccTLD,iso2,iso3,fragility,co2,region,ForestRev_gdp
0,ANTIGUA AND BARBUDA,"MULTIPOLYGON (((-61.73889 17.54055, -61.75195 ...",Antigua and Barbuda,.ag,AG,ATG,54.4,729000.0,CENTRAL AMERICA AND THE CARIBBEAN,0.00
1,AFGHANISTAN,"POLYGON ((61.27656 35.60725, 61.29638 35.62853...",The Islamic Republic of Afghanistan,.af,AF,AFG,105.0,7893000.0,SOUTH ASIA,0.20
2,ALGERIA,"POLYGON ((-5.15213 30.18047, -5.13917 30.19236...",The People's Democratic Republic of Algeria,.dz,DZ,DZA,75.4,151633000.0,AFRICA,0.10
3,AZERBAIJAN,"MULTIPOLYGON (((46.54037 38.87559, 46.49554 38...",The Republic of Azerbaijan,.az,AZ,AZE,73.2,35389000.0,MIDDLE EAST,0.02
4,ALBANIA,"POLYGON ((20.79192 40.43154, 20.78722 40.39472...",The Republic of Albania,.al,AL,ALB,58.9,3794000.0,EUROPE,0.18
...,...,...,...,...,...,...,...,...,...,...
165,ZAMBIA,"POLYGON ((30.21302 -14.98172, 30.21917 -15.096...",The Republic of Zambia,.zm,ZM,ZMB,85.7,6798000.0,AFRICA,4.45
166,ZIMBABWE,"POLYGON ((32.48888 -21.34445, 32.46541 -21.325...",The Republic of Zimbabwe,.zw,ZW,ZWE,99.5,7902000.0,AFRICA,1.61
167,SOUTH SUDAN,"POLYGON ((34.21807 9.96458, 34.20722 9.905, 34...",The Republic of South Sudan,.ss,SS,SSD,112.2,1778000.0,AFRICA,2.65
168,INDONESIA,"MULTIPOLYGON (((123.21846 -10.80917, 123.19832...",The Republic of Indonesia,.id,ID,IDN,70.4,563543000.0,EAST AND SOUTHEAST ASIA,0.39


And our GDF has social data now!

In [59]:
theMapAndData.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 170 entries, 0 to 169
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype   
---  ------             --------------  -----   
 0   Country            170 non-null    object  
 1   geometry           170 non-null    geometry
 2   Officialstatename  170 non-null    object  
 3   InternetccTLD      170 non-null    object  
 4   iso2               169 non-null    object  
 5   iso3               170 non-null    object  
 6   fragility          170 non-null    float64 
 7   co2                170 non-null    float64 
 8   region             170 non-null    object  
 9   ForestRev_gdp      170 non-null    float64 
dtypes: float64(3), geometry(1), object(6)
memory usage: 13.4+ KB


# Choropleths

## Transformation of data values

### Re Scaling

We should plan how to color the polygons based on some variable, let me check our variables of interest:

In [None]:
DataNames=['fragility', 'co2', 'ForestRev_gdp']

In [None]:

pd.melt(theMapAndData[DataNames])

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

sns.displot(pd.melt(theMapAndData[DataNames]),
            x="value", hue="variable",kind="kde",
            log_scale=(False,False))

The variables are in different units, we should try a data rescaling strategy:

In [None]:
# !pip install -U scikit-learn

* **StandardScaler**:

In [None]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
normalized_data = scaler.fit_transform(theMapAndData[DataNames])
sns.displot(pd.melt(pd.DataFrame(normalized_data,columns=DataNames)),
            x="value", hue="variable",kind="kde",
            log_scale=(False,False))

* **MinMaxScaler**:

In [None]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data=scaler.fit_transform(theMapAndData[DataNames])

sns.displot(pd.melt(pd.DataFrame(scaled_data,columns=DataNames)),
            x="value", hue="variable",kind="kde",
            log_scale=(False,False))

* **RobustScaler**:

In [None]:
from sklearn.preprocessing import RobustScaler
scaler = RobustScaler()
robScaled_data = scaler.fit_transform(theMapAndData[DataNames])

sns.displot(pd.melt(pd.DataFrame(robScaled_data,columns=DataNames)),
            x="value", hue="variable",kind="kde",
            log_scale=(False,False))

* **QuantileTransformer**:

In [None]:
from sklearn.preprocessing import QuantileTransformer
scaler = QuantileTransformer(n_quantiles=99, random_state=0,output_distribution='normal') #or 'uniform'
QtScaled_data = scaler.fit_transform(theMapAndData[DataNames])

sns.displot(pd.melt(pd.DataFrame(QtScaled_data,columns=DataNames)),
            x="value", hue="variable",kind="kde",
            log_scale=(False,False))

Let's keep the last one:

In [None]:
theMapAndData['fragility_Qt']=QtScaled_data[:,0]

### Discretizing

I will keep the _data_Qt_ data frame. Now, I want cut the data.
Please install [**numba**](https://numba.readthedocs.io/en/stable/user/installing.html) before runing the next code; also make sure you have **pysal**, **mapclassify** and **numpy** installed:

In [None]:
! pip show numba mapclassify numpy

In [None]:
# !pip install mapclassify

Let me discretize **fragility_Qt**:

In [None]:
import mapclassify
import numpy as np

np.random.seed(12345) # so we all get the same results!

# let's try 5 intervals
K=5
theVar=theMapAndData.fragility_Qt
# same interval width, easy interpretation
ei5 = mapclassify.EqualInterval(theVar, k=K)
# same interval width based on standard deviation, easy - but not as the previous one, poor when high skewness
msd = mapclassify.StdMean(theVar)
# interval width varies, counts per interval are close, not easy to grasp, repeated values complicate cuts
q5=mapclassify.Quantiles(theVar,k=K)

# based on similarity, good for multimodal data
mb5 = mapclassify.MaximumBreaks(theVar, k=K)
# based on similarity, good for skewed data
ht = mapclassify.HeadTailBreaks(theVar) # no K needed
# based on similarity, optimizer
fj5 = mapclassify.FisherJenks(theVar, k=K)
# based on similarity, optimizer
jc5 = mapclassify.JenksCaspall(theVar, k=K)
# based on similarity, optimizer
mp5 = mapclassify.MaxP(theVar, k=K)

How can we select the right classification?
Let me use the the Absolute deviation around class median (ADCM) to make the comparisson:

In [None]:
class5 = ei5,msd, q5,mb5,  ht, fj5, jc5, mp5
# Collect ADCM for each classifier
fits = np.array([ c.adcm for c in class5])
# Convert ADCM scores to a DataFrame
adcms = pd.DataFrame(fits)
# Add classifier names
adcms['classifier'] = [c.name for c in class5]
# Add column names to the ADCM
adcms.columns = ['ADCM', 'Classifier']

Now, plot the **adcms**:

In [None]:
adcms.sort_values('ADCM').plot.barh(x='Classifier')

Let's save the best strategy:

In [None]:
theMapAndData['fragility_Qt_jc5'] = jc5.yb

In [None]:
# there you are
theMapAndData[['fragility_Qt','fragility_Qt_jc5']].head()

Let's check the mean of 'fragility_Qt' by the labels of the columns created (from '0' to '4')

In [None]:
indexList=['fragility_Qt_jc5'] # add more?
aggregator={'fragility_Qt': ['mean']}

pd.concat([theMapAndData[['fragility_Qt',col]].groupby(col,as_index=False).agg(aggregator) for col in indexList],axis=1)

We could create a new column:

In [None]:
# renaming
newLabelsForLevels={0:"0_Great", 1:"1_Good", 2:"2_Middle", 3:"3_Bad", 4:"4_Poor"}

theMapAndData['fragility_Qt_jc5_cat']=theMapAndData.loc[:,'fragility_Qt_jc5'].replace(newLabelsForLevels)

# we have
theMapAndData[['fragility_Qt','fragility_Qt_jc5','fragility_Qt_jc5_cat']].head(20)

We are ready for a choropleth:

In [None]:
import matplotlib.pyplot as plt

f, ax = plt.subplots(1, figsize=(10, 10))
theMapAndData.plot(column='fragility_Qt_jc5_cat', # variable to plot
                   cmap='viridis', # set of colors
                   categorical=True, # can be interpreted as category
                   edgecolor='white', # border color
                   linewidth=0., # width of border
                   alpha=1, # level of transparency (0 is invisible)
                   legend=True, # need a legend?
                   # location of legend: 'best', 'upper right', 'upper left', 'lower left',
                   # 'lower right', 'right', 'center left', 'center right',
                   # 'lower center', 'upper center', 'center'
                   legend_kwds={'loc':"lower left"},
        ax=ax
       )

ax.set_axis_off()

However, once you know the ADCM, you can request the choropleth without creating a variable:

In [None]:
import matplotlib.pyplot as plt

f, ax = plt.subplots(1, figsize=(10, 10))
theMapAndData.plot(column='fragility_Qt',
                   cmap='OrRd',
                   scheme="jenkscaspall",k=5,
        edgecolor='grey',
        linewidth=0.5,
        alpha=1,
        legend=True,
        legend_kwds={'loc':3},
        ax=ax
       )

ax.set_axis_off()

In [None]:
# finally

theMapAndData.to_file(os.path.join("maps","worldMaps.gpkg"), layer='indicators', driver="GPKG")