# Removes Null Values for Geographic Data<br>

## Purpose
Takes in the .csv created in the 275-GDP_Null_Values.ipynb notebook and fills in the null values that exist within it for geography based columns

## Datasets
* .csv created in the 275-GDP_Null_Values.ipynb notebook

Imports necessary libraries

In [42]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os.path
from glob import glob

Loads the file into a dataframe

In [43]:
# Ensure the file exists
if not os.path.exists( r"..\..\data\prep\Countries\countries_275.csv"):
    print("Missing dataset file")
else:
    df = pd.read_csv(  r"..\..\data\prep\Countries\countries_275.csv" , encoding = "ISO-8859-1")
    print("File Read")

File Read


Prints the first 5 lines of the dataframe

In [44]:
df.head()

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
0,Afghanistan,1960,8996351.0,4649361.0,4346990.0,32.337561,537777800.0,West and Central Asia,1884.71,646212.0,66.1685,33.78231,13.921671,414.371,,,414.371,,AFG
1,Afghanistan,1964,9731361.0,4996990.0,4734371.0,34.101902,800000000.0,West and Central Asia,1884.71,646212.0,66.1685,33.78231,15.059084,839.743,,,839.743,,AFG
2,Afghanistan,1968,10604346.0,5419182.0,5185164.0,35.832415,1373333000.0,West and Central Asia,1884.71,646212.0,66.1685,33.78231,16.410011,1224.778,,,1224.778,,AFG
3,Afghanistan,1972,11721940.0,5967987.0,5753953.0,37.620171,1595555000.0,West and Central Asia,1884.71,646212.0,66.1685,33.78231,18.139465,1532.806,9170.59,2530.158,13233.554,,AFG
4,Afghanistan,1976,12840299.0,6524577.0,6315722.0,39.58539,2555556000.0,West and Central Asia,1884.71,646212.0,66.1685,33.78231,19.870103,1987.514,10535.6,3265.633,15788.747,,AFG


Counts the number of null values associated with elevation, if elevation is null then nulls are present for all of the geographic features

In [45]:
len(df[df.Elevation.isnull()])

315

Prints the countries with null values

In [46]:
df[df.Elevation.isnull()].Country.unique()

array(['Bahamas, The', 'Bahrain', 'Barbados', 'Bermuda', 'Fiji', 'Grenada',
       'Hong Kong SAR, China', 'Korea, Dem. People?s Rep.', 'Kosovo',
       'Liechtenstein', 'Mauritius', 'Samoa', 'Singapore', 'Tonga',
       'Virgin Islands (U.S.)'], dtype=object)

## Filling in null values<br>
Filling in the null elevation values using external sources<br>
All of these countries also lack area data so using the same sources we will input this

### Source of elevations:
<br> <b>https://www.cia.gov/library/publications/the-world-factbook/fields/print_2020.html</b><br>
<b>https://www.graphicmaps.com/bahamas</b><br>
For any countries lacking data on mean elevation I will find the average of their highest and lowest points of elevation<br>
Each of the following countries were filled in manually

* <b>Bahamas

In [47]:
elevation = 0
area = 13880.00
for i in df[df.Country == 'Bahamas, The'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Bahamas, The'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
105,"Bahamas, The",1960,109528.0,52385.0,57143.0,62.729049,169803900.0,Caribbean Islands,0.0,13880.0,,,,410.704,,,410.704,,BAH


* <b>Bahrain

In [48]:
elevation = 0
area = 760.00
for i in df[df.Country == 'Bahrain'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Bahrain'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
126,Bahrain,1960,162427.0,86961.0,75466.0,52.092439,1004431000.0,West and Central Asia,0.0,760.0,,,,575.719,,,575.719,,BRN


* <b>Barbados

In [49]:
elevation = 0
area = 430.00
for i in df[df.Country == 'Barbados'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Barbados'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
147,Barbados,1960,230939.0,103461.0,127478.0,60.738049,509597500.0,Caribbean Islands,0.0,430.0,,,,172.349,,,172.349,,BAR


* <b>Bermuda

In [50]:
elevation = 0
area = 54.00
for i in df[df.Country == 'Bermuda'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Bermuda'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
189,Bermuda,1960,44400.0,22184.0,22216.0,,84466650.0,Caribbean Islands,0.0,54.0,,,,157.681,,,157.681,,BER


* <b>Fiji

In [51]:
elevation = 0
area = 18274.00
for i in df[df.Country == 'Fiji'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Fiji'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
672,Fiji,1960,393386.0,202612.0,190774.0,55.958488,112328400.0,Oceania,0.0,18274.0,,,,194.351,,,194.351,,FIJ


* <b>Grenada

In [52]:
elevation = 0
area = 344.00
for i in df[df.Country == 'Grenada'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Grenada'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
819,Grenada,1960,89869.0,41696.0,48173.0,59.815927,881823.349342,Caribbean Islands,0.0,344.0,,,,22.002,,,22.002,,GRN


* <b>Hong Kong

In [53]:
elevation = 0
area = 1073.00
for i in df[df.Country == 'Hong Kong SAR, China'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Hong Kong SAR, China'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
882,"Hong Kong SAR, China",1960,3075605.0,1580690.0,1494915.0,66.961683,1320797000.0,East Asia,0.0,1073.0,,,,2955.602,,,2955.602,,HKG


* <b>Korea (North)

In [54]:
elevation = 600
area = 120408.00
for i in df[df.Country == 'Korea, Dem. People?s Rep.'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Korea, Dem. People?s Rep.'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
1176,"Korea, Dem. People?s Rep.",1960,11424176.0,5279170.0,6145006.0,51.077171,18010260000.0,East Asia,600.0,120408.0,,,,,,,0.0,,PRK


* <b> Kosovo

In [55]:
elevation = 450
area = 10887.00
for i in df[df.Country == 'Kosovo'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Kosovo'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
1218,Kosovo,1960,947000.0,447829.0,499171.0,,7153878000000000.0,Europe,450.0,10887.0,,,,,,,0.0,,KOS


* <b>Liechtenstein

In [56]:
elevation = (2599-430)/2
area = 160.00
for i in df[df.Country == 'Liechtenstein'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Liechtenstein'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
1281,Liechtenstein,1960,16495.0,7800.0,8695.0,,23992000.0,Europe,1084.5,160.0,,,,,,,0.0,,LIE


* <b>Mauritius

In [57]:
elevation = 0
area = 2030.00
for i in df[df.Country == 'Mauritius'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Mauritius'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
1344,Mauritius,1960,659351.0,328768.0,330583.0,58.74522,131489900.0,Sub-Saharan Africa,0.0,2030.0,,,,179.683,,,179.683,,MRI


* <b>Samoa

In [58]:
elevation = 0
area = 2821.00
for i in df[df.Country == 'Samoa'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Samoa'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
1785,Samoa,1960,108646.0,55148.0,53498.0,49.969512,31481980.0,Oceania,0.0,2821.0,,,,14.668,,,14.668,,SAM


* <b>Singapore

In [59]:
elevation = 0
area = 687.00
for i in df[df.Country == 'Singapore'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Singapore'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
1848,Singapore,1960,1646400.0,867319.0,779081.0,65.659829,704462300.0,South and Southeast Asia,0.0,687.0,,,,1393.46,,,1393.46,,SGP


* <b> Tonga

In [60]:
elevation = 0
area = 717.00
for i in df[df.Country == 'Tonga'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Tonga'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
2100,Tonga,1960,61601.0,31271.0,30330.0,61.365024,21947860.0,Oceania,0.0,717.0,,,,11.001,,,11.001,,TGA


* <b> US Virgin Islands

In [61]:
elevation = 0
area = 346.00
for i in df[df.Country == 'Virgin Islands (U.S.)'].index:
    df.loc[i,'Elevation'] = elevation
    df.loc[i,'Area_SqKM'] = area
df[df.Country == 'Virgin Islands (U.S.)'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
2331,Virgin Islands (U.S.),1960,32000.0,15864.0,16136.0,66.224854,24200000.0,South America,0.0,346.0,,,,,,,0.0,,ISV


# Checking that all values null values have been filled

Check that the area and elevation figures are not null

In [62]:
df.isnull().sum()

Country                      0
Year                         0
Population                   0
Males                        0
Females                      0
Life_Expectancy            174
GDP                        171
Region                       0
Elevation                    0
Area_SqKM                    0
Centroid_Longitude         315
Centroid_Latitude          315
Population_Density         315
CO2_Emissions              453
Methane_Emissions          729
Nitrous_Oxide_Emisions     729
Total_Emissions              0
Emmisions_per_Capita      2793
Code                         0
dtype: int64

# Dealing with null values for<br> Population Density

Now that all null values have been filled for area it is possible to evaluate the values for population density that are currently null

In [63]:
for i in df[df.Population_Density.isnull()].index:
    pop = df.loc[i].Population
    area = df.loc[i].Area_SqKM
    df.loc[i,'Population_Density'] = (pop/area)

Now to ensure that all population density values have been filled we now check that there is no more null values

In [64]:
df.isnull().sum()

Country                      0
Year                         0
Population                   0
Males                        0
Females                      0
Life_Expectancy            174
GDP                        171
Region                       0
Elevation                    0
Area_SqKM                    0
Centroid_Longitude         315
Centroid_Latitude          315
Population_Density           0
CO2_Emissions              453
Methane_Emissions          729
Nitrous_Oxide_Emisions     729
Total_Emissions              0
Emmisions_per_Capita      2793
Code                         0
dtype: int64

# Dealing with nulls for <br>centroid longitude and latitude

The next section of this notebook deals with establishing the correct values corresponding to the null values for longitude and latitude

In [65]:
df[df.Centroid_Longitude.isnull()].Country.unique()

array(['Bahamas, The', 'Bahrain', 'Barbados', 'Bermuda', 'Fiji', 'Grenada',
       'Hong Kong SAR, China', 'Korea, Dem. People?s Rep.', 'Kosovo',
       'Liechtenstein', 'Mauritius', 'Samoa', 'Singapore', 'Tonga',
       'Virgin Islands (U.S.)'], dtype=object)

Using an external data source I filled in the missing values for longitude and latitude

### Bahamas <br>
<b>Source - </b>https://www.latlong.net/place/the-bahamas-9095.html

In [66]:
long = -78.035889
lat = 25.025885
for i in df[df.Country == 'Bahamas, The'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Bahamas, The'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
105,"Bahamas, The",1960,109528.0,52385.0,57143.0,62.729049,169803900.0,Caribbean Islands,0.0,13880.0,-78.035889,25.025885,7.891066,410.704,,,410.704,,BAH


### Bahrain <br>
<b>Source - </b>https://www.indexmundi.com/factbook/fields/geographic-coordinates

In [67]:
long = 50.55
lat = 26
for i in df[df.Country == 'Bahrain'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Bahrain'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
126,Bahrain,1960,162427.0,86961.0,75466.0,52.092439,1004431000.0,West and Central Asia,0.0,760.0,50.55,26.0,213.719737,575.719,,,575.719,,BRN


### Barbados <br>
<b>Source - </b>https://www.indexmundi.com/factbook/fields/geographic-coordinates

In [68]:
long = -59.533333
lat = 13.183333
for i in df[df.Country == 'Barbados'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Barbados'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
147,Barbados,1960,230939.0,103461.0,127478.0,60.738049,509597500.0,Caribbean Islands,0.0,430.0,-59.533333,13.183333,537.067442,172.349,,,172.349,,BAR


### Bermuda<br>
<b>Source - </b>https://www.indexmundi.com/factbook/fields/geographic-coordinates

In [69]:
long = -64.75
lat = 32.333333
for i in df[df.Country == 'Bermuda'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Bermuda'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
189,Bermuda,1960,44400.0,22184.0,22216.0,,84466650.0,Caribbean Islands,0.0,54.0,-64.75,32.333333,822.222222,157.681,,,157.681,,BER


### Fiji<br>
<b>Source - </b>https://www.indexmundi.com/factbook/fields/geographic-coordinates

In [70]:
long = 175
lat = -18
for i in df[df.Country == 'Fiji'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Fiji'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
672,Fiji,1960,393386.0,202612.0,190774.0,55.958488,112328400.0,Oceania,0.0,18274.0,175.0,-18.0,21.527088,194.351,,,194.351,,FIJ


### Grenada<br>
<b>Source - </b>https://www.indexmundi.com/factbook/fields/geographic-coordinates

In [71]:
long = 12.116667
lat = -61.666667
for i in df[df.Country == 'Grenada'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Grenada'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
819,Grenada,1960,89869.0,41696.0,48173.0,59.815927,881823.349342,Caribbean Islands,0.0,344.0,12.116667,-61.666667,261.247093,22.002,,,22.002,,GRN


### Hong Kong SAR, China<br>
<b>Source - </b>https://www.indexmundi.com/factbook/fields/geographic-coordinates

In [72]:
long = 114.166667
lat = 22.25
for i in df[df.Country == 'Hong Kong SAR, China'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Hong Kong SAR, China'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
882,"Hong Kong SAR, China",1960,3075605.0,1580690.0,1494915.0,66.961683,1320797000.0,East Asia,0.0,1073.0,114.166667,22.25,2866.360671,2955.602,,,2955.602,,HKG


### Korea, Dem. People?s Rep.<br>
<b>Source - </b>https://www.indexmundi.com/factbook/fields/geographic-coordinates

In [73]:
long = 127
lat = 40
for i in df[df.Country == 'Korea, Dem. People?s Rep.'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Korea, Dem. People?s Rep.'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
1176,"Korea, Dem. People?s Rep.",1960,11424176.0,5279170.0,6145006.0,51.077171,18010260000.0,East Asia,600.0,120408.0,127.0,40.0,94.878878,,,,0.0,,PRK


### Kosovo<br>
<b>Source - </b>https://www.indexmundi.com/factbook/fields/geographic-coordinates

In [74]:
long = 21
lat = 42.583333
for i in df[df.Country == 'Kosovo'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Kosovo'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
1218,Kosovo,1960,947000.0,447829.0,499171.0,,7153878000000000.0,Europe,450.0,10887.0,21.0,42.583333,86.984477,,,,0.0,,KOS


### Liechtenstein<br>
<b>Source - </b>https://www.indexmundi.com/factbook/fields/geographic-coordinates

In [75]:
long = 9.533333
lat = 47.266667
for i in df[df.Country == 'Liechtenstein'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Liechtenstein'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
1281,Liechtenstein,1960,16495.0,7800.0,8695.0,,23992000.0,Europe,1084.5,160.0,9.533333,47.266667,103.09375,,,,0.0,,LIE


### Mauritius<br>
<b>Source - </b>https://www.indexmundi.com/factbook/fields/geographic-coordinates

In [76]:
long = 57.55
lat = -20.283333
for i in df[df.Country == 'Mauritius'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Mauritius'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
1344,Mauritius,1960,659351.0,328768.0,330583.0,58.74522,131489900.0,Sub-Saharan Africa,0.0,2030.0,57.55,-20.283333,324.803448,179.683,,,179.683,,MRI


### Samoa<br>
<b>Source - </b>https://www.indexmundi.com/factbook/fields/geographic-coordinates

In [77]:
long = -172.333333
lat = -13.583333
for i in df[df.Country == 'Samoa'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Samoa'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
1785,Samoa,1960,108646.0,55148.0,53498.0,49.969512,31481980.0,Oceania,0.0,2821.0,-172.333333,-13.583333,38.513293,14.668,,,14.668,,SAM


### Singapore<br>
<b>Source - </b>https://www.indexmundi.com/factbook/fields/geographic-coordinates

In [78]:
long = 103.8
lat = 1.366667
for i in df[df.Country == 'Singapore'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Singapore'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
1848,Singapore,1960,1646400.0,867319.0,779081.0,65.659829,704462300.0,South and Southeast Asia,0.0,687.0,103.8,1.366667,2396.50655,1393.46,,,1393.46,,SGP


### Tonga<br>
<b>Source - </b>https://www.indexmundi.com/factbook/fields/geographic-coordinates

In [79]:
long = -175
lat = -20
for i in df[df.Country == 'Tonga'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Tonga'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
2100,Tonga,1960,61601.0,31271.0,30330.0,61.365024,21947860.0,Oceania,0.0,717.0,-175.0,-20.0,85.914923,11.001,,,11.001,,TGA


### Virgin Islands (U.S.)<br>
<b>Source - </b>https://www.indexmundi.com/factbook/fields/geographic-coordinates

In [80]:
long = -64.833333
lat = 18.333333
for i in df[df.Country == 'Virgin Islands (U.S.)'].index:
    df.loc[i,'Centroid_Longitude'] = long
    df.loc[i,'Centroid_Latitude'] = lat
df[df.Country == 'Virgin Islands (U.S.)'].head(1)

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
2331,Virgin Islands (U.S.),1960,32000.0,15864.0,16136.0,66.224854,24200000.0,South America,0.0,346.0,-64.833333,18.333333,92.485549,,,,0.0,,ISV


Now we establish that there are no null values remaining for any of the geographic features

In [81]:
df.isnull().sum()

Country                      0
Year                         0
Population                   0
Males                        0
Females                      0
Life_Expectancy            174
GDP                        171
Region                       0
Elevation                    0
Area_SqKM                    0
Centroid_Longitude           0
Centroid_Latitude            0
Population_Density           0
CO2_Emissions              453
Methane_Emissions          729
Nitrous_Oxide_Emisions     729
Total_Emissions              0
Emmisions_per_Capita      2793
Code                         0
dtype: int64

### Output

In [82]:
df.to_csv('../../data/prep/Countries/countries_300.csv', index=False)