# Big Mountains in North America

We want to find the list of mountains in North America with a prominence of at least 2000 m and an isolation of at least 100 km.

## Preparations

Import necessary packages.

In [1]:
from bs4 import BeautifulSoup
import requests
import smtplib
import time
import datetime
import numpy as np
import pandas as pd
pd.options.mode.chained_assignment = None
import matplotlib
import matplotlib.pyplot as plt
import csv

%matplotlib inline

Connect to the correct webpage.

In [2]:
URL = "https://en.wikipedia.org/wiki/List_of_Ultras_of_North_America"

Get user agent etc.

In [3]:
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/109.0",
          "Accept-Encoding": "gzip, deflate",
           "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
          "DNT": "1", "Connection": "close", "Upgrade-Insecure-Requests":"1"}

## Getting the data from the Wikipedia page.

Get html code of the webpage.

In [4]:
page = requests.get(URL, headers = headers)
soup1 = BeautifulSoup(page.content, "html.parser")
soup2 = BeautifulSoup(soup1.prettify(), "html.parser")

Let's see if we have the right page.

In [5]:
print(soup2.title.string)


   List of Ultras of North America - Wikipedia
  


Now we find the data that we are looking for, namely the table of ultras in North America.

In [6]:
NAultrashtml = soup2.find_all("table")[-4]
NAultrasdf = pd.read_html(str(NAultrashtml))[0]
NAultrasdf = NAultrasdf.drop("Rank", axis = 1)
display(NAultrasdf.head(5))
print(f'There are {NAultrasdf["Isolation"].isna().sum()} NaNs in the Isolation column.')

Unnamed: 0,Mountain peak,Region,Mountain range,Elevation,Prominence,Isolation,Location
0,Denali [l],Alaska,Alaska Range,"20,310 ft","20,146 ft",,".mw-parser-output .geo-default,.mw-parser-outp..."
1,Mount Logan [m],Yukon,Saint Elias Mountains,"19,541 ft","17,215 ft",387 mi,60°34′02″N 140°24′20″W ﻿ / ﻿ 60.5671°N 140....
2,Pico de Orizaba [n] ( Citlaltépetl ),Puebla Veracruz,Cordillera Neovolcanica,"18,491 ft","16,148 ft",,19°01′50″N 97°16′11″W ﻿ / ﻿ 19.0305°N 97.26...
3,Mount Rainier [o] [p],Washington,Cascade Range,"14,417 ft","13,210 ft",,46°51′10″N 121°45′37″W ﻿ / ﻿ 46.8529°N 121....
4,Volcán Tajumulco [q],Guatemala,Sierra de las Nubes,"13,845 ft","13,091 ft",448 mi,15°02′35″N 91°54′13″W ﻿ / ﻿ 15.0430°N 91.90...


There are 8 NaNs in the Isolation column.


## Data cleaning

We see that there are 8 NaNs in the isolation column. That number is low enough to fix it by hand. We also want to clean up the columns a bit.

In [7]:
NAultrasdf = NAultrasdf.astype({"Mountain peak": "string", "Region": "string", "Mountain range": "string"})

for i in range(len(NAultrasdf["Mountain peak"])):
    for char in ["[", "("]:
        if char in NAultrasdf["Mountain peak"][i]:
            NAultrasdf["Mountain peak"][i] = NAultrasdf["Mountain peak"][i][0: NAultrasdf["Mountain peak"][i].find(char)-2]
            # NAultrasdf.loc[:,("Mountain peak",i)] = NAultrasdf["Mountain peak"][i][0: NAultrasdf["Mountain peak"][i].find(char)-2]

x = []
for i in range(len(NAultrasdf["Isolation"])):
    if NAultrasdf["Isolation"].isna()[i]:
        x.append(i)

km_to_mi = 1/1.60924
ft_to_m = 0.3048

y = [7450.24, 2690.14, 1176.72, 3254.13, 2649.47, 1079.15, 1318.95, 1913.49]

for i in range(len(x)):
    NAultrasdf["Isolation"][x[i]] = " ".join((str(round(y[i]*km_to_mi)),"mi"))

for x in ["Elevation","Prominence"]:
    for i in range(len(NAultrasdf[x])):
        NAultrasdf[x][i] = round(float("".join((NAultrasdf[x][i][0:-7], NAultrasdf[x][i][-6:-3])))*ft_to_m)
        NAultrasdf[x]

for i in range(len(NAultrasdf["Isolation"])):
    NAultrasdf["Isolation"][i] = round(float(NAultrasdf["Isolation"][i][0:-3])/km_to_mi)



NAultrasdf.rename(columns = {"Elevation": "Elevation in m", "Prominence": "Prominence in m", "Isolation": "Isolation in km"}, inplace = True)

for i in range(len(NAultrasdf["Location"])):
    NAultrasdf["Location"][i] = str(NAultrasdf["Location"][i]).split(" ")[-2:]
    NAultrasdf["Location"][i] = ", ".join((str(NAultrasdf["Location"][i][0][0:-2]), str("".join(("-", NAultrasdf["Location"][i][1][0:-2])))))
    NAultrasdf["Location"][i] = "".join(("(", NAultrasdf["Location"][i], ")"))

display(NAultrasdf)

Unnamed: 0,Mountain peak,Region,Mountain range,Elevation in m,Prominence in m,Isolation in km,Location
0,Denali,Alaska,Alaska Range,6190,6141,7451,"(63.0690, -151.0063)"
1,Mount Logan,Yukon,Saint Elias Mountains,5956,5247,623,"(60.5671, -140.4055)"
2,Pico de Orizaba,Puebla Veracruz,Cordillera Neovolcanica,5636,4922,2691,"(19.0305, -97.2698)"
3,Mount Rainier,Washington,Cascade Range,4394,4026,1176,"(46.8529, -121.7604)"
4,Volcán Tajumulco,Guatemala,Sierra de las Nubes,4220,3990,721,"(15.0430, -91.9037)"
...,...,...,...,...,...,...,...
348,Mount Joffre,Alberta British Columbia,Canadian Rockies,3433,1505,49,"(50.5285, -115.2069)"
349,Sierra de Agalta High Point,Honduras,Sierra de Agalta,2335,1505,122,"(14.9576, -85.9165)"
350,Kitlope Peak,British Columbia,Coast Mountains,1950,1505,16,"(53.0381, -127.6414)"
351,Robertson Peak,British Columbia,Coast Mountains,2252,1502,16,"(49.6460, -122.2502)"


## Selecting our mountains

We are interested in mountains with a prominence of at least 2000 meters and with an isolation of at least 100 km so let's pick those out.

In [8]:
NAbigmountainsdf = NAultrasdf[(NAultrasdf["Prominence in m"] >= 2000) & (NAultrasdf["Isolation in km"] >= 100)].set_index("Mountain peak")
print(f'There are {len(NAbigmountainsdf["Region"])} mountains in North America with a prominence of at least 2000 m and an isolation of at least 100 km, and they are listed in the table below.')
display(NAbigmountainsdf)

There are 57 mountains in North America with a prominence of at least 2000 m and an isolation of at least 100 km, and they are listed in the table below.


Unnamed: 0_level_0,Region,Mountain range,Elevation in m,Prominence in m,Isolation in km,Location
Mountain peak,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Denali,Alaska,Alaska Range,6190,6141,7451,"(63.0690, -151.0063)"
Mount Logan,Yukon,Saint Elias Mountains,5956,5247,623,"(60.5671, -140.4055)"
Pico de Orizaba,Puebla Veracruz,Cordillera Neovolcanica,5636,4922,2691,"(19.0305, -97.2698)"
Mount Rainier,Washington,Cascade Range,4394,4026,1176,"(46.8529, -121.7604)"
Volcán Tajumulco,Guatemala,Sierra de las Nubes,4220,3990,721,"(15.0430, -91.9037)"
Mount Fairweather,Alaska British Columbia,Saint Elias Mountains,4671,3961,200,"(58.9064, -137.5265)"
Chirripó Grande,Costa Rica,Cordillera de Talamanca,3819,3755,879,"(9.4843, -83.4889)"
Gunnbjørn Fjeld,Greenland,Island of Greenland,3694,3694,3254,"(68.9184, -29.8991)"
Mount Hayes,Alaska,Alaska Range,4216,3507,202,"(63.6203, -146.7178)"
Mount Waddington,British Columbia,Coast Mountains,4019,3289,562,"(51.3737, -125.2636)"


## Visualisations

To make visualisations, we use Tableau Public, so we export our table as a csv file so we can read the data with Tableau.

In [9]:
NAbigmountainsdf.to_csv("NAbigmountains.csv")

Check out [this Tableau Public](https://public.tableau.com/app/profile/arjan.van.denzen/viz/BigMountainsNorthAmerica/Sheet1) page for a map containing these mountains and two other visualisations pertaining to these mountains. We create a csv file from the dataframe above to use in Tableau.

The data used in this notebook are collected and adapted from a Wikipedia page, so they fall under [this license](https://creativecommons.org/licenses/by-sa/3.0/). That means that this notebook also falls under the same license. 