## Exercises

1. Create a GeoDataFrame containing a list of countries and their capital cities. Add a geometry column with the locations of the capitals.
2. Load a shapefile of your choice, filter the data to only include a specific region or country, and save the filtered GeoDataFrame to a new file.
3. Perform a spatial join between two GeoDataFrames: one containing polygons (e.g., country borders) and one containing points (e.g., cities). Find out which points fall within which polygons.
4. Plot a map showing the distribution of a particular attribute (e.g., population) across different regions.

#### This script reads in two country datasets. One is used for the shapefiles, the other contributes a COUNTRYAFF (Country affiliation) field. They are merged, checked, corrected, remerged, and saved to checkpoint1

## **Section 1:**
## Read country geometry data
##### At the end of each section we save a checkpoint file. Can skip to Section 3, just run import package cell

In [27]:
# Import a few packages
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import folium
import os
pd.options.display.max_rows = 300

In [28]:
print(os.getcwd())
os.chdir('G:/My Drive/Clark')

G:\My Drive\Clark


In [3]:
# Get country boundaries from Arc online API
countries_url = "https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/World_Countries_(Generalized)/FeatureServer/0/query?outFields=*&where=1%3D1&f=geojson"
countries_gdf = gpd.read_file(countries_url)
print(len(countries_gdf))
countries_gdf.head(20)

251


Unnamed: 0,FID,COUNTRY,ISO,COUNTRYAFF,AFF_ISO,Shape__Area,Shape__Length,geometry
0,1,Afghanistan,AF,Afghanistan,AF,934649400000.0,6110457.0,"POLYGON ((61.27655 35.60725, 61.29638 35.62854..."
1,2,Albania,AL,Albania,AL,50586700000.0,1271948.0,"POLYGON ((19.57083 41.68527, 19.58195 41.69569..."
2,3,Algeria,DZ,Algeria,DZ,3014489000000.0,8316049.0,"POLYGON ((4.60335 36.88791, 4.63555 36.88638, ..."
3,4,American Samoa,AS,United States,US,175458100.0,67291.2,"POLYGON ((-170.7439 -14.37555, -170.74942 -14...."
4,5,Andorra,AD,Andorra,AD,934995600.0,117137.5,"POLYGON ((1.44584 42.60194, 1.48653 42.65042, ..."
5,6,Angola,AO,Angola,AO,1318923000000.0,6619216.0,"MULTIPOLYGON (((23.47611 -17.62584, 23.28916 -..."
6,7,Anguilla,AI,United Kingdom,GB,101736200.0,52905.18,"POLYGON ((-63.16778 18.16445, -63.15695 18.177..."
7,8,Antarctica,AQ,,,696611700000000.0,250272500.0,"MULTIPOLYGON (((-179.99999 -84.30535, -179.931..."
8,9,Antigua and Barbuda,AG,Antigua and Barbuda,AG,592132400.0,132436.2,"MULTIPOLYGON (((-61.73806 16.98972, -61.82917 ..."
9,10,Argentina,AR,Argentina,AR,4313167000000.0,17681590.0,"MULTIPOLYGON (((-71.85916 -41.01128, -71.83806..."


In [4]:
# print(countries_gdf.crs)
# # Reproject to a global equal-area projection
# # This is necessary when doing any kind of distance calculation, such as determining centroids
# countries_gdf = countries_gdf.to_crs("EPSG:6933")
# print(countries_gdf.crs)

In [5]:
# Set COUNTRY as the index for easier reference
#countries_gdf = countries_gdf.set_index("COUNTRY")

In [5]:
#countries_gdf.explore('Shape__Area', legend=False)

##### This country layer is not very precise, and it leaves out Kosovo, Taiwan, Hong Kong, etc
##### Lets find a different countries layer for the actual shapefiles
##### What I do like about the date layer above is the COUNTRYAFF field
##### This data is edited here, transfered back to Arc, and then re-read into the script. 
##### Edits performed on original data in arcgis pro are mostly merging areas "In dispute" to larger country

In [33]:
# Source: https://international.ipums.org/international/gis.shtml
# The file being read in here has already had some edits in Arc Pro
countries2_path = "GIS Tutorials/Geog-312/Geog-312/1.Geopandas/inputData/IPUMSI_Countries/Corrected/countriesCorrect.shp"
countries2 = gpd.read_file(countries2_path)
print(len(countries2))
countries2.head(10)

249


Unnamed: 0,CNTRY_NAME,geometry
0,Afghanistan,"POLYGON ((74.88986 37.23409, 74.88962 37.23314..."
1,Akrotiri and Dhekelia,"MULTIPOLYGON (((32.8388 34.70555, 32.84127 34...."
2,Albania,"MULTIPOLYGON (((20.0789 42.5558, 20.07939 42.5..."
3,Algeria,"MULTIPOLYGON (((8.64188 36.94206, 8.64196 36.9..."
4,American Samoa (Eastern Samoa),"MULTIPOLYGON (((-171.07753 -11.06622, -171.080..."
5,Andorra,"POLYGON ((1.7258 42.5044, 1.71149 42.49224, 1...."
6,Angola,"MULTIPOLYGON (((13.10288 -4.68421, 13.10173 -4..."
7,Anguilla,"MULTIPOLYGON (((-63.42216 18.59739, -63.42672 ..."
8,Antarctica,"MULTIPOLYGON (((-46.15775 -60.51078, -46.1787 ..."
9,Antigua & Barbuda,"MULTIPOLYGON (((-61.84592 17.72958, -61.83383 ..."


In [34]:
# countries_gdf has a different field for what country territories are affiliated with
# So we can get rid of the country names in parentheses in countryPopTable
countries2['CNTRY_NAME'] = countries2['CNTRY_NAME'].str.replace(r' \(.+\)', '', regex=True)

In [35]:
# print(countries2.crs)
# # Reproject to a global equal-area projection
# # This is necessary when doing any kind of distance calculation, such as determining centroids
# countries2 = countries2.to_crs("EPSG:6933")
# print(countries2.crs)

In [36]:
# Merge based on 'COUNTRY' in countries_gdf and 'Location' in countryPopTable
countries_w_aff = countries2.merge(
    countries_gdf[['COUNTRY', 'ISO', 'COUNTRYAFF']],  # Select only necessary columns
    left_on='CNTRY_NAME', 
    right_on='COUNTRY', 
    how='left'
)

# Drop 'Location' column from the merged result, if no longer needed
#countries_gdf = countries_gdf.drop(columns='Location')

# Show the result

countries_w_aff

Unnamed: 0,CNTRY_NAME,geometry,COUNTRY,ISO,COUNTRYAFF
0,Afghanistan,"POLYGON ((74.88986 37.23409, 74.88962 37.23314...",Afghanistan,AF,Afghanistan
1,Akrotiri and Dhekelia,"MULTIPOLYGON (((32.8388 34.70555, 32.84127 34....",,,
2,Albania,"MULTIPOLYGON (((20.0789 42.5558, 20.07939 42.5...",Albania,AL,Albania
3,Algeria,"MULTIPOLYGON (((8.64188 36.94206, 8.64196 36.9...",Algeria,DZ,Algeria
4,American Samoa,"MULTIPOLYGON (((-171.07753 -11.06622, -171.080...",American Samoa,AS,United States
5,Andorra,"POLYGON ((1.7258 42.5044, 1.71149 42.49224, 1....",Andorra,AD,Andorra
6,Angola,"MULTIPOLYGON (((13.10288 -4.68421, 13.10173 -4...",Angola,AO,Angola
7,Anguilla,"MULTIPOLYGON (((-63.42216 18.59739, -63.42672 ...",Anguilla,AI,United Kingdom
8,Antarctica,"MULTIPOLYGON (((-46.15775 -60.51078, -46.1787 ...",Antarctica,AQ,
9,Antigua & Barbuda,"MULTIPOLYGON (((-61.84592 17.72958, -61.83383 ...",,,


In [37]:
# However we know the two datasets will not line up perfectly, because they have different numbers of rows
# And there are naming discrepencies
# Identify mismatches:
# If 'Location' is NaN that means the COUNTRY field in countries_gdf does not have an exact match in countries_w_pop
no_match = countries_w_aff[countries_w_aff['COUNTRY'].isna()]
print(len(no_match))
no_match

26


Unnamed: 0,CNTRY_NAME,geometry,COUNTRY,ISO,COUNTRYAFF
1,Akrotiri and Dhekelia,"MULTIPOLYGON (((32.8388 34.70555, 32.84127 34....",,,
9,Antigua & Barbuda,"MULTIPOLYGON (((-61.84592 17.72958, -61.83383 ...",,,
27,Bosnia & Herzegovina,"MULTIPOLYGON (((19.0222 44.85549, 19.02346 44....",,,
41,Caribbean Netherlands,"MULTIPOLYGON (((-68.23875 12.09664, -68.23917 ...",,,
51,Cook Island,"MULTIPOLYGON (((-158.00752 -8.95122, -158.0067...",,,
77,French Southern & Antarctic Lands,"MULTIPOLYGON (((47.37331 -11.51044, 47.3725 -1...",,,
95,Heard Island and McDonald Island,"MULTIPOLYGON (((73.58247 -52.91919, 73.57833 -...",,,
97,Hong Kong,"MULTIPOLYGON (((114.22564 22.54497, 114.22601 ...",,,
116,Kosovo,"POLYGON ((21.587 42.26278, 21.58644 42.26252, ...",,,
128,Macau,"MULTIPOLYGON (((113.53481 22.20751, 113.53416 ...",,,


In [38]:
countries_gdf_unique = countries_gdf['COUNTRY'].unique()
print(countries_gdf_unique)

['Afghanistan' 'Albania' 'Algeria' 'American Samoa' 'Andorra' 'Angola'
 'Anguilla' 'Antarctica' 'Antigua and Barbuda' 'Argentina' 'Armenia'
 'Aruba' 'Australia' 'Austria' 'Azerbaijan' 'Azores' 'Bahamas' 'Bahrain'
 'Bangladesh' 'Barbados' 'Belarus' 'Belgium' 'Belize' 'Benin' 'Bermuda'
 'Bhutan' 'Bolivia' 'Bonaire' 'Bosnia and Herzegovina' 'Botswana'
 'Bouvet Island' 'Brazil' 'British Indian Ocean Territory'
 'British Virgin Islands' 'Brunei' 'Bulgaria' 'Burkina Faso' 'Burundi'
 'Cape Verde' 'Cambodia' 'Cameroon' 'Canada' 'Canarias' 'Cayman Islands'
 'Central African Republic' 'Chad' 'Chile' 'China' 'Christmas Island'
 'Cocos Islands' 'Colombia' 'Comoros' 'Republic of the Congo'
 'Democratic Republic of the Congo' 'Cook Islands' 'Costa Rica'
 'Ivory Coast' 'Croatia' 'Cuba' 'Curacao' 'Cyprus' 'Czechia' 'Denmark'
 'Djibouti' 'Dominica' 'Dominican Republic' 'Ecuador' 'Egypt'
 'El Salvador' 'Equatorial Guinea' 'Eritrea' 'Estonia' 'Eswatini'
 'Ethiopia' 'Falkland Islands' 'Faroe Islands' 'Fij

In [39]:
# First thing we notice is '&' vs 'and'
# We can fix all of those with one line:
countries2['CNTRY_NAME'] = countries2['CNTRY_NAME'].str.replace('&', 'and', regex=False)

In [40]:
# Then we notice some other discrepancies
# Modify tables pre-merge based on what does not line up post-merge
# Cook Islands
countries2.loc[countries2['CNTRY_NAME'] == 'Cook Island', 'CNTRY_NAME'] = 'Cook Islands'
# Curacao
countries2.loc[countries2['CNTRY_NAME'] == 'Netherlands Antilles', 'CNTRY_NAME'] = 'Curacao'
# Heard Island and McDonald Islands
countries2.loc[countries2['CNTRY_NAME'] == 'Heard Island and McDonald Island', 'CNTRY_NAME'] = 'Heard Island and McDonald Islands'
# Norfolk Island
countries2.loc[countries2['CNTRY_NAME'] == 'Norfolk Islands', 'CNTRY_NAME'] = 'Norfolk Island'
# Norfolk Island
countries2.loc[countries2['CNTRY_NAME'] == 'Norfolk Islands', 'CNTRY_NAME'] = 'Norfolk Island'
# North Macedonia
countries2.loc[countries2['CNTRY_NAME'] == 'Macedonia', 'CNTRY_NAME'] = 'North Macedonia'
# Turks and Caicos Islands
countries2.loc[countries2['CNTRY_NAME'] == 'Turks and Caicos', 'CNTRY_NAME'] = 'Turks and Caicos Islands'


# Match Brunei
countries_gdf.loc[countries_gdf['COUNTRY'] == 'Brunei Darussalam', 'COUNTRY'] = 'Brunei'
# Match Cape Verde
countries_gdf.loc[countries_gdf['COUNTRY'] == 'Cabo Verde', 'COUNTRY'] = 'Cape Verde'
countries_gdf.loc[countries_gdf['COUNTRY'] == 'Congo', 'COUNTRY'] = 'Republic of the Congo'
# Match Democratic Republic of Congo
countries_gdf.loc[countries_gdf['COUNTRY'] == 'Congo DRC', 'COUNTRY'] = 'Democratic Republic of the Congo'
# Match Democratic Ivory Coast
countries_gdf.loc[countries_gdf['COUNTRY'] == "Côte d'Ivoire", 'COUNTRY'] = 'Ivory Coast'
# Match French Southern and Antarctic Lands
countries_gdf.loc[countries_gdf['COUNTRY'] == 'French Southern Territories', 'COUNTRY'] = 'French Southern and Antarctic Lands'
# Match Palestine
countries_gdf.loc[countries_gdf['COUNTRY'] == 'Palestinian Territory', 'COUNTRY'] = 'Palestine'
# Match Pitcairn Islands
countries_gdf.loc[countries_gdf['COUNTRY'] == 'Pitcairn', 'COUNTRY'] = 'Pitcairn Islands'
# Match Russia
countries_gdf.loc[countries_gdf['COUNTRY'] == 'Russian Federation', 'COUNTRY'] = 'Russia'
# Svalbard & Jan Mayen
countries_gdf.loc[countries_gdf['COUNTRY'] == 'Svalbard', 'COUNTRY'] = 'Svalbard and Jan Mayen'
# Match Timor Leste
countries_gdf.loc[countries_gdf['COUNTRY'] == 'Timor-Leste', 'COUNTRY'] = 'Timor Leste'
# Match Turkey
countries_gdf.loc[countries_gdf['COUNTRY'] == 'Turkiye', 'COUNTRY'] = 'Turkey'
# Match Czechia
countries_gdf.loc[countries_gdf['COUNTRY'] == 'Czech Republic', 'COUNTRY'] = 'Czechia'

In [41]:
# Do the merge again after having modified the original territories
countries_w_aff2 = countries2.merge(
    countries_gdf[['COUNTRY', 'ISO', 'COUNTRYAFF']],  # Select only necessary columns
    left_on='CNTRY_NAME', 
    right_on='COUNTRY', 
    how='left'
)

# Drop 'Location' column from the merged result, if no longer needed
#countries_gdf = countries_gdf.drop(columns='Location')

# Show the result

countries_w_aff2

Unnamed: 0,CNTRY_NAME,geometry,COUNTRY,ISO,COUNTRYAFF
0,Afghanistan,"POLYGON ((74.88986 37.23409, 74.88962 37.23314...",Afghanistan,AF,Afghanistan
1,Akrotiri and Dhekelia,"MULTIPOLYGON (((32.8388 34.70555, 32.84127 34....",,,
2,Albania,"MULTIPOLYGON (((20.0789 42.5558, 20.07939 42.5...",Albania,AL,Albania
3,Algeria,"MULTIPOLYGON (((8.64188 36.94206, 8.64196 36.9...",Algeria,DZ,Algeria
4,American Samoa,"MULTIPOLYGON (((-171.07753 -11.06622, -171.080...",American Samoa,AS,United States
5,Andorra,"POLYGON ((1.7258 42.5044, 1.71149 42.49224, 1....",Andorra,AD,Andorra
6,Angola,"MULTIPOLYGON (((13.10288 -4.68421, 13.10173 -4...",Angola,AO,Angola
7,Anguilla,"MULTIPOLYGON (((-63.42216 18.59739, -63.42672 ...",Anguilla,AI,United Kingdom
8,Antarctica,"MULTIPOLYGON (((-46.15775 -60.51078, -46.1787 ...",Antarctica,AQ,
9,Antigua and Barbuda,"MULTIPOLYGON (((-61.84592 17.72958, -61.83383 ...",Antigua and Barbuda,AG,Antigua and Barbuda


In [42]:
# This will give us countries that were missing from the first shapefile we read in
# We will have to fill in the COUNTRYAFF field
no_match = countries_w_aff2[countries_w_aff2['COUNTRY'].isna()]
print(len(no_match))
no_match

10


Unnamed: 0,CNTRY_NAME,geometry,COUNTRY,ISO,COUNTRYAFF
1,Akrotiri and Dhekelia,"MULTIPOLYGON (((32.8388 34.70555, 32.84127 34....",,,
41,Caribbean Netherlands,"MULTIPOLYGON (((-68.23875 12.09664, -68.23917 ...",,,
97,Hong Kong,"MULTIPOLYGON (((114.22564 22.54497, 114.22601 ...",,,
116,Kosovo,"POLYGON ((21.587 42.26278, 21.58644 42.26252, ...",,,
128,Macau,"MULTIPOLYGON (((113.53481 22.20751, 113.53416 ...",,,
164,Northern Mariana Island,"MULTIPOLYGON (((144.89828 20.54539, 144.90167 ...",,,
181,Reunion,"MULTIPOLYGON (((55.45831 -20.85878, 55.46581 -...",,,
207,South Georgia and the South Sandwich Islands,"MULTIPOLYGON (((-42.03108 -53.54346, -42.03186...",,,
218,Taiwan,"MULTIPOLYGON (((120.47583 26.39625, 120.47789 ...",,,
245,Western Sahara,"MULTIPOLYGON (((-8.66667 27.66877, -8.66667 27...",,,


In [43]:
# Match remaining COUNTRYAFF NaNs
countries_w_aff2.loc[countries_w_aff2['CNTRY_NAME'] == 'Akrotiri and Dhekelia', 'COUNTRYAFF'] = 'United Kingdom'
countries_w_aff2.loc[countries_w_aff2['CNTRY_NAME'] == 'Caribbean Netherlands', 'COUNTRYAFF'] = 'Netherlands'
countries_w_aff2.loc[countries_w_aff2['CNTRY_NAME'] == 'Hong Kong', 'COUNTRYAFF'] = 'China'
countries_w_aff2.loc[countries_w_aff2['CNTRY_NAME'] == 'Kosovo', 'COUNTRYAFF'] = 'Kosovo'
countries_w_aff2.loc[countries_w_aff2['CNTRY_NAME'] == 'Macau', 'COUNTRYAFF'] = 'China'
countries_w_aff2.loc[countries_w_aff2['CNTRY_NAME'] == 'Northern Mariana Island', 'COUNTRYAFF'] = 'United States'
countries_w_aff2.loc[countries_w_aff2['CNTRY_NAME'] == 'Reunion', 'COUNTRYAFF'] = 'France'
countries_w_aff2.loc[countries_w_aff2['CNTRY_NAME'] == 'South Georgia and the South Sandwich Islands', 'COUNTRYAFF'] = 'United Kingdom'
countries_w_aff2.loc[countries_w_aff2['CNTRY_NAME'] == 'Hong Kong', 'COUNTRYAFF'] = 'China'
# Making the editorial choice to consider Taiwan an independent country
countries_w_aff2.loc[countries_w_aff2['CNTRY_NAME'] == 'Taiwan', 'COUNTRYAFF'] = 'Taiwan'
# We will consider Western Sahara as part of Morocco
countries_w_aff2.loc[countries_w_aff2['CNTRY_NAME'] == 'Western Sahara', 'COUNTRYAFF'] = 'Morocco'

In [20]:
countries_w_aff2

Unnamed: 0,CNTRY_NAME,geometry,COUNTRY,ISO,COUNTRYAFF
0,Afghanistan,"POLYGON ((74.88986 37.23409, 74.88962 37.23314...",Afghanistan,AF,Afghanistan
1,Akrotiri and Dhekelia,"MULTIPOLYGON (((32.8388 34.70555, 32.84127 34....",,,United Kingdom
2,Albania,"MULTIPOLYGON (((20.0789 42.5558, 20.07939 42.5...",Albania,AL,Albania
3,Algeria,"MULTIPOLYGON (((8.64188 36.94206, 8.64196 36.9...",Algeria,DZ,Algeria
4,American Samoa,"MULTIPOLYGON (((-171.07753 -11.06622, -171.080...",American Samoa,AS,United States
5,Andorra,"POLYGON ((1.7258 42.5044, 1.71149 42.49224, 1....",Andorra,AD,Andorra
6,Angola,"MULTIPOLYGON (((13.10288 -4.68421, 13.10173 -4...",Angola,AO,Angola
7,Anguilla,"MULTIPOLYGON (((-63.42216 18.59739, -63.42672 ...",Anguilla,AI,United Kingdom
8,Antarctica,"MULTIPOLYGON (((-46.15775 -60.51078, -46.1787 ...",Antarctica,AQ,
9,Antigua and Barbuda,"MULTIPOLYGON (((-61.84592 17.72958, -61.83383 ...",Antigua and Barbuda,AG,Antigua and Barbuda


##### One thing I want to be able to do is quickly create a dataset of just primary countries, not including territories
##### We'd do this with a filter: primaryCountries = countries_w_pop[countries_w_pop['COUNTRY'] == countries_w_pop['COUNTRYAFF']]
##### But for this to work we need to clean up/ standardize the names in 'COUNTRYAFF', which, again, leaves some subjectivity
##### We are not correcting to exact/legal/official names, just matching standard names

In [44]:
# Bolivia
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Bolivia, Plurinational State of', 'COUNTRYAFF'] = 'Bolivia'
# Brunei
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Brunei Darussalam', 'COUNTRYAFF'] = 'Brunei'
# Cape Verde
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Cabo Verde', 'COUNTRYAFF'] = 'Cape Verde'
# DRC
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Congo, The Democratic Republic of the', 'COUNTRYAFF'] = 'Democratic Republic of the Congo'
# Ivory Coast
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == "Côte d'Ivoire", 'COUNTRYAFF'] = 'Ivory Coast'
# Czechia
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Czechia', 'COUNTRYAFF'] = 'Czechia'
# Iran
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Iran, Islamic Republic of', 'COUNTRYAFF'] = 'Iran'
# Laos
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == "Lao People's Democratic Republic", 'COUNTRYAFF'] = 'Laos'
# Micronesia
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Micronesia, Federated States of', 'COUNTRYAFF'] = 'Micronesia'
# Moldova
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Moldova, Republic of', 'COUNTRYAFF'] = 'Moldova'
# North Korea
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == "Korea, Democratic People's Republic of", 'COUNTRYAFF'] = 'North Korea'
# Palestine
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Palestine, State of', 'COUNTRYAFF'] = 'Palestine'
# Republic of the Congo
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Congo', 'COUNTRYAFF'] = 'Republic of the Congo'
# Russia
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Russian Federation', 'COUNTRYAFF'] = 'Russia'
# Sao Tome and Principe
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'São Tomé and Príncipe', 'COUNTRYAFF'] = 'Sao Tome and Principe'
# South Korea
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Korea, Republic of', 'COUNTRYAFF'] = 'South Korea'
# Syria
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Syrian Arab Republic', 'COUNTRYAFF'] = 'Syria'
# Tanzania
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Tanzania, United Republic of', 'COUNTRYAFF'] = 'Tanzania'
# Timor Leste
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Timor-Leste', 'COUNTRYAFF'] = 'Timor Leste'
# Turkey
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Turkiye', 'COUNTRYAFF'] = 'Turkey'
# Vatican City
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Holy See (Vatican City State)', 'COUNTRYAFF'] = 'Vatican City'
# Venezuela
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Venezuela, Bolivarian Republic of', 'COUNTRYAFF'] = 'Venezuela'
# Vietnam
countries_w_aff2.loc[countries_w_aff2['COUNTRYAFF'] == 'Viet Nam', 'COUNTRYAFF'] = 'Vietnam'

In [45]:
countries_w_aff2.head()

Unnamed: 0,CNTRY_NAME,geometry,COUNTRY,ISO,COUNTRYAFF
0,Afghanistan,"POLYGON ((74.88986 37.23409, 74.88962 37.23314...",Afghanistan,AF,Afghanistan
1,Akrotiri and Dhekelia,"MULTIPOLYGON (((32.8388 34.70555, 32.84127 34....",,,United Kingdom
2,Albania,"MULTIPOLYGON (((20.0789 42.5558, 20.07939 42.5...",Albania,AL,Albania
3,Algeria,"MULTIPOLYGON (((8.64188 36.94206, 8.64196 36.9...",Algeria,DZ,Algeria
4,American Samoa,"MULTIPOLYGON (((-171.07753 -11.06622, -171.080...",American Samoa,AS,United States


In [46]:
countries_w_aff2.tail(10)

Unnamed: 0,CNTRY_NAME,geometry,COUNTRY,ISO,COUNTRYAFF
239,Uzbekistan,"MULTIPOLYGON (((70.94322 42.26363, 70.94461 42...",Uzbekistan,UZ,Uzbekistan
240,Vanuatu,"MULTIPOLYGON (((166.54581 -13.07461, 166.55086...",Vanuatu,VU,Vanuatu
241,Vatican City,"POLYGON ((12.45547 41.90742, 12.4557 41.90631,...",Vatican City,VA,Vatican City
242,Venezuela,"MULTIPOLYGON (((-63.61687 15.66669, -63.61814 ...",Venezuela,VE,Venezuela
243,Vietnam,"MULTIPOLYGON (((107.99838 21.5465, 108.00004 2...",Vietnam,VN,Vietnam
244,Wallis and Futuna,"MULTIPOLYGON (((-176.20213 -13.1825, -176.2033...",Wallis and Futuna,WF,France
245,Western Sahara,"MULTIPOLYGON (((-8.66667 27.66877, -8.66667 27...",,,Morocco
246,Yemen,"MULTIPOLYGON (((53.10909 16.65008, 53.10647 16...",Yemen,YE,Yemen
247,Zambia,"POLYGON ((32.95964 -9.40068, 32.94946 -9.41174...",Zambia,ZM,Zambia
248,Zimbabwe,"POLYGON ((30.42184 -15.62117, 30.42186 -15.624...",Zimbabwe,ZW,Zimbabwe


In [47]:
# At this point, any row where 'COUNTRY' != 'COUNTRYAFF' should be a territory
# Check for rows where 'COUNTRY' != 'COUNTRYAFF'
# Look through this list to verify they are only sub-national entities
territories = countries_w_aff2[countries_w_aff2['CNTRY_NAME'] != countries_w_aff2['COUNTRYAFF']]

print("Rows where 'CNTRY_NAME' != 'COUNTRYAFF':")
print('Number of territories in dataset:', len(territories))
print(territories['CNTRY_NAME'])

Rows where 'CNTRY_NAME' != 'COUNTRYAFF':
Number of territories in dataset: 52
1                             Akrotiri and Dhekelia
4                                    American Samoa
7                                          Anguilla
8                                        Antarctica
12                                            Aruba
24                                          Bermuda
29                                    Bouvet Island
31                   British Indian Ocean Territory
32                           British Virgin Islands
41                            Caribbean Netherlands
42                                   Cayman Islands
47                                 Christmas Island
48                                    Cocos Islands
51                                     Cook Islands
70                                 Falkland Islands
71                                    Faroe Islands
75                                    French Guiana
76                                 Fre

In [48]:
# Looks good
# Do additional edits in Arc GIS Pro
countries_w_aff2.to_file("GIS Tutorials/Geog-312/geopandas_Files/checkpoint1/countriesMerged.shp", driver='ESRI Shapefile')
