# DataFrame Indexing and Selection

### Exercises


For this set of exercises we will look at a small dataset which contains a list of some of the highest mountains in the world (compiled using information from [wikipedia](https://en.wikipedia.org/wiki/List_of_highest_mountains_on_Earth)).

In [1]:
import pandas as pd

In [2]:
mtns = pd.DataFrame([
    {'name': 'Mount Everest',
        'height (m)': 8848,
        'summited': 1953,
        'mountain range': 'Mahalangur Himalaya'},
    {'name': 'K2',
        'height (m)': 8611,
        'summited': 1954,
        'mountain range': 'Baltoro Karakoram'},
    {'name': 'Kangchenjunga',
        'height (m)': 8586,
        'summited': 1955,
        'mountain range': 'Kangchenjunga Himalaya'},
    {'name': 'Lhotse',
        'height (m)': 8516,
        'summited': 1956,
        'mountain range': 'Mahalangur Himalaya'},
    {'name': 'Makalu', 
         'height (m)': 8485, 
         'summited': 1955, 
         'mountain range': 'Mahalangur Himalaya'},
    {'name' : 'Annapurna I',
         'height (m)': 8091, 
         'summited': 1950, 
         'mountain range': 'Annapurna Himalaya'},
    {'name' : 'Gyachung Kang', 
         'height (m)': 7952, 
         'summited': 1964, 
         'mountain range': 'Mahalangur Himalaya'},
])
mtns.set_index('name', inplace=True)
mtns

Unnamed: 0_level_0,height (m),summited,mountain range
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mount Everest,8848,1953,Mahalangur Himalaya
K2,8611,1954,Baltoro Karakoram
Kangchenjunga,8586,1955,Kangchenjunga Himalaya
Lhotse,8516,1956,Mahalangur Himalaya
Makalu,8485,1955,Mahalangur Himalaya
Annapurna I,8091,1950,Annapurna Himalaya
Gyachung Kang,7952,1964,Mahalangur Himalaya


Just for fun, here's another way to create the same DataFrame:

In [3]:
mtns_dict ={
    'name': ['Mount Everest','K2', 'Kangchenjunga', 'Lhotse', 'Makalu', 
              'Annapurna I', 'Gyachung Kang'],
    'height(m)': [8848, 8611, 8586, 8516, 8485, 8091, 7952],
    'summited': [1953, 1954, 1955, 1956, 1955, 1950, 1964],
    'mountain range': ['Mahalangur Himalaya', 'Baltoro Karakoram', 
                        'Kangchenjunga Himalaya', 'Mahalangur Himalaya',
                        'Mahalangur Himalaya', 'Annapurna Himalaya',
                        'Mahalangur Himalaya']}
mtns = pd.DataFrame(mtns_dict)
mtns.set_index('name', inplace=True)
mtns

Unnamed: 0_level_0,height(m),summited,mountain range
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mount Everest,8848,1953,Mahalangur Himalaya
K2,8611,1954,Baltoro Karakoram
Kangchenjunga,8586,1955,Kangchenjunga Himalaya
Lhotse,8516,1956,Mahalangur Himalaya
Makalu,8485,1955,Mahalangur Himalaya
Annapurna I,8091,1950,Annapurna Himalaya
Gyachung Kang,7952,1964,Mahalangur Himalaya


Using this ``DataFrame`` extract the following selections from the data:

* Extract the data for 'K2'

In [9]:
# K2
mtns.loc['K2']

height(m)                      8611
summited                       1954
mountain range    Baltoro Karakoram
Name: K2, dtype: object

* What is the height of K2?

In [10]:
# K2 height
mtns.loc['K2', 'height(m)']

8611

* Extract the 'summited' column of the data.

In [11]:
# summited
mtns['summited']

name
Mount Everest    1953
K2               1954
Kangchenjunga    1955
Lhotse           1956
Makalu           1955
Annapurna I      1950
Gyachung Kang    1964
Name: summited, dtype: int64

* All mountains that were summited between 1950 and 1955 (including 1955)

In [16]:
# all mountains summited between 1950 and 1955
mtns[(mtns['summited'] >= 1950) & (mtns['summited'] <= 1955)]

Unnamed: 0_level_0,height(m),summited,mountain range
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mount Everest,8848,1953,Mahalangur Himalaya
K2,8611,1954,Baltoro Karakoram
Kangchenjunga,8586,1955,Kangchenjunga Himalaya
Makalu,8485,1955,Mahalangur Himalaya
Annapurna I,8091,1950,Annapurna Himalaya


* Construct a ``DataFrame`` that contains the height and summited columns.

In [18]:
# DataFrame that contains the height and summited columns
mtns[['height(m)','summited']]

Unnamed: 0_level_0,height(m),summited
name,Unnamed: 1_level_1,Unnamed: 2_level_1
Mount Everest,8848,1953
K2,8611,1954
Kangchenjunga,8586,1955
Lhotse,8516,1956
Makalu,8485,1955
Annapurna I,8091,1950
Gyachung Kang,7952,1964


* Data on the five tallest mountains.  Use a ``DataFrame``'s [sort_values](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html) method.

In [21]:
# five tallest mountains
mtns.sort_values('height(m)', ascending=False).head(5)

Unnamed: 0_level_0,height(m),summited,mountain range
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mount Everest,8848,1953,Mahalangur Himalaya
K2,8611,1954,Baltoro Karakoram
Kangchenjunga,8586,1955,Kangchenjunga Himalaya
Lhotse,8516,1956,Mahalangur Himalaya
Makalu,8485,1955,Mahalangur Himalaya


* The least tall mountain in the dataset.  Use the `idxmin` method of a Series object (`pd.Series.idxmin`).

In [22]:
# least tall
mtns['height(m)'].idxmin()

'Gyachung Kang'

* A DataFrame containing all mountains whose height is above 8000 m, and contains only name and height information

In [27]:
# mountains above 8000m
mtns[mtns['height(m)'] > 8000][['height(m)']]

Unnamed: 0_level_0,height(m)
name,Unnamed: 1_level_1
Mount Everest,8848
K2,8611
Kangchenjunga,8586
Lhotse,8516
Makalu,8485
Annapurna I,8091
