# Get Elevation Range <br>

## Purpose
Takes in the .csv created in the 300-Geographical_Nulls.ipynb notebook and establishes the range of elevations that exist within every country

## Datasets
* .csv created in the 300-Geographical_Nulls.ipynb

Imports necessary libraries

In [39]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os.path
from glob import glob
import urllib.request
from bs4 import BeautifulSoup

Loads file into dataframe

In [40]:
# Ensure the file exists
if not os.path.exists( r"..\..\data\prep\Countries\countries_300.csv"):
    print("Missing dataset file")
else:
    df = pd.read_csv(  r"..\..\data\prep\Countries\countries_300.csv" , encoding = "ISO-8859-1")
    print("File Read")

File Read


Prints the first 5 lines of the dataframe

In [41]:
df.head()

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,Centroid_Longitude,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code
0,Afghanistan,1960,8996351.0,4649361.0,4346990.0,32.337561,537777800.0,West and Central Asia,1884.71,646212.0,66.1685,33.78231,13.921671,414.371,,,414.371,,AFG
1,Afghanistan,1964,9731361.0,4996990.0,4734371.0,34.101902,800000000.0,West and Central Asia,1884.71,646212.0,66.1685,33.78231,15.059084,839.743,,,839.743,,AFG
2,Afghanistan,1968,10604346.0,5419182.0,5185164.0,35.832415,1373333000.0,West and Central Asia,1884.71,646212.0,66.1685,33.78231,16.410011,1224.778,,,1224.778,,AFG
3,Afghanistan,1972,11721940.0,5967987.0,5753953.0,37.620171,1595555000.0,West and Central Asia,1884.71,646212.0,66.1685,33.78231,18.139465,1532.806,9170.59,2530.158,13233.554,,AFG
4,Afghanistan,1976,12840299.0,6524577.0,6315722.0,39.58539,2555556000.0,West and Central Asia,1884.71,646212.0,66.1685,33.78231,19.870103,1987.514,10535.6,3265.633,15788.747,,AFG


## Scrapes Elevation Data 

The CIA keep information on the elevation of every country so we will scrape this data from their page

In [42]:
url='https://www.cia.gov/library/publications/the-world-factbook/fields/print_2020.html'

Opens the page and extracts all table rows

In [43]:
with urllib.request.urlopen(url) as response:
            page = response.read()
soup = BeautifulSoup(page, 'html.parser')
links = [link for link in soup.findAll("tr")]

Removes first line of table due to the fact it only contains titles

In [44]:
links = links[1:]
eldf = pd.DataFrame(columns=['Country','Mean_Elevation','Lowest_Point','Highest_Point'])

Goes through each countries information extracting the data, prints error if the format of the infroamtion differs from the norm

In [45]:
for i in range(len(links)):
    country = links[i].find("td").getText()
    text = links[i].findAll("td")[1].getText().split("\n")
    vals = []
    for line in text:
        if not line == "":
            nums = [float(s) for s in line.replace(",",'').split() if s.isdigit() or (s[0] == '-' and len(s) > 1)]
            if len(nums) == 0:
                vals.append(np.nan)
            else:
                vals.append(nums[0])
    if len(vals) != 3:
        print(country+" Error in Format")
    else:
        row = [country, vals[0], vals[1], vals[2]]
        eldf.loc[len(eldf)] = row

Antarctica Error in Format
Colombia Error in Format
Ecuador Error in Format
France Error in Format
Jan Mayen Error in Format
Netherlands Error in Format
United States Pacific Island Wildlife Refuges Error in Format
United States Error in Format
World Error in Format


Deals with values that caused errors, most of these countries didnt follow the convential format on the page#

In [46]:
eldf.loc[len(eldf)] = ['Colombia', 593, 0, 5730]
eldf.loc[len(eldf)] = ['Ecuador', 1117, 0, 6267]
eldf.loc[len(eldf)] = ['France', 375, -2, 4810]
eldf.loc[len(eldf)] = ['Netherlands', 30, -7, 322]
eldf.loc[len(eldf)] = ['United States', 760, -86, 6190]

Print the first 5 lines of the dataframe

In [47]:
eldf.head()

Unnamed: 0,Country,Mean_Elevation,Lowest_Point,Highest_Point
0,Afghanistan,1884.0,258.0,7492.0
1,Albania,708.0,0.0,2764.0
2,Algeria,800.0,-40.0,2908.0
3,American Samoa,,0.0,964.0
4,Andorra,1996.0,840.0,2946.0


Prints the countries not included in the elevation dataframe

In [48]:
temp = list(eldf.Country)
df[~df.Country.isin(temp)].Country.unique()

array(['Egypt, Arab Rep.', 'Hong Kong SAR, China', 'Iran, Islamic Rep.',
       'Korea, Dem. People?s Rep.', 'Korea, Rep.', 'Syrian Arab Republic',
       'Venezuela, RB', 'Virgin Islands (U.S.)', 'Czech Republic',
       'Kyrgyz Republic', 'Macedonia, FYR', 'Russian Federation',
       'Slovak Republic'], dtype=object)

Changes the country names so that the data in the elevation table matches that of the country dataset

In [49]:
eldf.loc[eldf[eldf.Country == 'Egypt'].index[0],'Country'] = 'Egypt, Arab Rep.'
eldf.loc[eldf[eldf.Country == 'Hong Kong'].index[0],'Country'] = 'Hong Kong SAR, China'
eldf.loc[eldf[eldf.Country == 'Korea, North'].index[0],'Country'] = 'Korea, Dem. People?s Rep.'
eldf.loc[eldf[eldf.Country == 'Iran'].index[0],'Country'] = 'Iran, Islamic Rep.'
eldf.loc[eldf[eldf.Country == 'Korea, South'].index[0],'Country'] = 'Korea, Rep.'
eldf.loc[eldf[eldf.Country == 'Kyrgyzstan'].index[0],'Country'] = 'Kyrgyz Republic'
eldf.loc[eldf[eldf.Country == 'Macedonia'].index[0],'Country'] = 'Macedonia, FYR'
eldf.loc[eldf[eldf.Country == 'Russia'].index[0],'Country'] = 'Russian Federation'
eldf.loc[eldf[eldf.Country == 'Slovakia'].index[0],'Country'] = 'Slovak Republic'
eldf.loc[eldf[eldf.Country == 'Syria'].index[0],'Country'] = 'Syrian Arab Republic'
eldf.loc[eldf[eldf.Country == 'Venezuela'].index[0],'Country'] = 'Venezuela, RB'
eldf.loc[eldf[eldf.Country == 'Virgin Islands'].index[0],'Country'] = 'Virgin Islands (U.S.)'

Only one country did not have data on its elevation present on the page

In [50]:
temp = list(eldf.Country)
df[~df.Country.isin(temp)].Country.unique()

array(['Czech Republic'], dtype=object)

Manual Entry of <b>Czech Republic</b><br>
<b>Source</b> - https://www.worldatlas.com/webimage/countrys/europe/czechrepublic/czland.htm

In [51]:
eldf.loc[len(eldf)] = ['Czech Republic', np.nan, 115,1603]

In [52]:
temp = list(eldf.Country)
df[~df.Country.isin(temp)].Country.unique()

array([], dtype=object)

Removes the mean_elevation column as we already have this information

In [53]:
eldf.isnull().sum()

Country            0
Mean_Elevation    85
Lowest_Point       0
Highest_Point      0
dtype: int64

Selects the relevant data from the elevation dataframme

In [54]:
eldf = eldf[['Country','Lowest_Point','Highest_Point']]
eldf.head()

Unnamed: 0,Country,Lowest_Point,Highest_Point
0,Afghanistan,258.0,7492.0
1,Albania,0.0,2764.0
2,Algeria,-40.0,2908.0
3,American Samoa,0.0,964.0
4,Andorra,840.0,2946.0


Merges the two tables together and outputs them to a .csv file

In [55]:
new_df = pd.merge(df, eldf,  how='left', left_on=['Country'], right_on = ['Country'])
new_df.head()

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,...,Centroid_Latitude,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code,Lowest_Point,Highest_Point
0,Afghanistan,1960,8996351.0,4649361.0,4346990.0,32.337561,537777800.0,West and Central Asia,1884.71,646212.0,...,33.78231,13.921671,414.371,,,414.371,,AFG,258.0,7492.0
1,Afghanistan,1964,9731361.0,4996990.0,4734371.0,34.101902,800000000.0,West and Central Asia,1884.71,646212.0,...,33.78231,15.059084,839.743,,,839.743,,AFG,258.0,7492.0
2,Afghanistan,1968,10604346.0,5419182.0,5185164.0,35.832415,1373333000.0,West and Central Asia,1884.71,646212.0,...,33.78231,16.410011,1224.778,,,1224.778,,AFG,258.0,7492.0
3,Afghanistan,1972,11721940.0,5967987.0,5753953.0,37.620171,1595555000.0,West and Central Asia,1884.71,646212.0,...,33.78231,18.139465,1532.806,9170.59,2530.158,13233.554,,AFG,258.0,7492.0
4,Afghanistan,1976,12840299.0,6524577.0,6315722.0,39.58539,2555556000.0,West and Central Asia,1884.71,646212.0,...,33.78231,19.870103,1987.514,10535.6,3265.633,15788.747,,AFG,258.0,7492.0


Creates an elevation range columns to hold the range of elevations

In [56]:
new_df['Elevation_Range'] = new_df['Highest_Point']-new_df['Lowest_Point']
new_df.head()

Unnamed: 0,Country,Year,Population,Males,Females,Life_Expectancy,GDP,Region,Elevation,Area_SqKM,...,Population_Density,CO2_Emissions,Methane_Emissions,Nitrous_Oxide_Emisions,Total_Emissions,Emmisions_per_Capita,Code,Lowest_Point,Highest_Point,Elevation_Range
0,Afghanistan,1960,8996351.0,4649361.0,4346990.0,32.337561,537777800.0,West and Central Asia,1884.71,646212.0,...,13.921671,414.371,,,414.371,,AFG,258.0,7492.0,7234.0
1,Afghanistan,1964,9731361.0,4996990.0,4734371.0,34.101902,800000000.0,West and Central Asia,1884.71,646212.0,...,15.059084,839.743,,,839.743,,AFG,258.0,7492.0,7234.0
2,Afghanistan,1968,10604346.0,5419182.0,5185164.0,35.832415,1373333000.0,West and Central Asia,1884.71,646212.0,...,16.410011,1224.778,,,1224.778,,AFG,258.0,7492.0,7234.0
3,Afghanistan,1972,11721940.0,5967987.0,5753953.0,37.620171,1595555000.0,West and Central Asia,1884.71,646212.0,...,18.139465,1532.806,9170.59,2530.158,13233.554,,AFG,258.0,7492.0,7234.0
4,Afghanistan,1976,12840299.0,6524577.0,6315722.0,39.58539,2555556000.0,West and Central Asia,1884.71,646212.0,...,19.870103,1987.514,10535.6,3265.633,15788.747,,AFG,258.0,7492.0,7234.0


#### Outputs

In [57]:
new_df.to_csv('../../data/prep/Countries/countries_325.csv', index=False)