### Maven Everest Challenge

Use data storytelling to visualize the evolution of mankind's pursuit of the world's highest peak.

#### Challenge Objective:

For the Maven Everest Challenge, you’ll play the role of a data journalist tasked with telling the story of mankind’s quest to conquer Mount Everest. Using real expedition data, your goal is to craft a compelling visual narrative that highlights things like key milestones, shifting strategies, and the climbers who dared to reach the top of the world.

#### About The Data Set:

This dataset, based on the archives of Elizabeth Hawley, provides a comprehensive record of mountaineering expeditions in the Nepalese Himalaya, spanning from 1905 to 2024. It includes detailed information on 89,000+ members across 11,000+ expeditions and 480 mountain peaks, including dates, successes, and significant events.

In [1]:
# importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# show full output
pd.set_option('display.max_columns', 1000)
pd.set_option('display.max_rows', 1000)

In [2]:
# read all the datasets
expedition = pd.read_csv('E:\Maven_Analytics_Challenges\Everest_Challenge\HimalayanExpeditions\exped.csv')
members = pd.read_csv('E:\Maven_Analytics_Challenges\Everest_Challenge\HimalayanExpeditions\members.csv')
peaks = pd.read_csv('E:\Maven_Analytics_Challenges\Everest_Challenge\HimalayanExpeditions\peaks.csv')
reference = pd.read_csv('E:\\Maven_Analytics_Challenges\\Everest_Challenge\\HimalayanExpeditions\\refer.csv', encoding="ISO-8859-1")

In [3]:
print('Expedition shape:', expedition.shape)
print('Members shape:', members.shape)
print('Peaks shape:', peaks.shape)
print('Reference shape:', reference.shape)

Expedition shape: (11425, 65)
Members shape: (89000, 61)
Peaks shape: (480, 23)
Reference shape: (15586, 12)


In [4]:
# let's join the members and expedition dataframes on expid
members_expedition = pd.merge(members, expedition, on='expid', how='left')

In [5]:
members_expedition.shape

(89089, 125)

In [6]:
# lets check if values in peakid_x and peakid_y are same
for i in range(len(members_expedition)):
    if members_expedition['peakid_x'][i] != members_expedition['peakid_y'][i]:
        print('Not same:', members_expedition['peakid_x'][i], members_expedition['peakid_y'][i])

In [7]:
# drop the column peakid_y
members_expedition.drop(columns=['peakid_y'], inplace=True)

In [8]:
# now let's join the members_expedition dataframe with the peaks dataframe on peakid
members_expedition = pd.merge(members_expedition, peaks, left_on='peakid_x', right_on='peakid', how='left')

In [9]:
members_expedition.shape

(89089, 147)

In [10]:
# now finally let's join the members_expedition dataframe with the reference dataframe on expid
members_expedition = pd.merge(members_expedition, reference, on='expid', how='left')

In [11]:
members_expedition.shape

(200037, 158)

In [12]:
# let's remove the columns that has more than 90% null values
members_expedition = members_expedition.dropna(thresh=len(members_expedition) * 0.9, axis=1)

In [13]:
members_expedition.shape

(200037, 107)

In [14]:
# duplicate records
members_expedition.duplicated().sum()

110948

In [15]:
# lets remove the duplicate records
members_expedition = members_expedition.drop_duplicates()
members_expedition.shape

(89089, 107)

In [16]:
# colums with null values
members_expedition.columns[members_expedition.isnull().any()]

Index(['fname', 'lname', 'citizen', 'msmtterm', 'route1', 'leaders', 'sponsor',
       'smtdate', 'campsites', 'location', 'pyear', 'pseason', 'pmonth',
       'pday', 'pexpid', 'pcountry', 'psummiters'],
      dtype='object')

In [17]:
# let's fill the null values in the columns with "not available"
members_expedition = members_expedition.fillna('not available')

In [18]:
members_expedition.duplicated().sum()

0

In [19]:
# colums with null values
members_expedition.columns[members_expedition.isnull().any()]

Index([], dtype='object')

In [20]:
# now that we have cleaned the data, let's save the cleaned data to a csv file
members_expedition.to_csv('E:\Maven_Analytics_Challenges\Everest_Challenge\HimalayanExpeditions\members_expedition.csv', index=False)