# Pandas

In [1]:
import pandas as pd

## Introduction to pandas library : Series and DataFrames

The pandas library has two main objects which serve as containers for our data:

- a one-dimensional labeled array called Series
-  a two-dimensional labeled array called DataFrame

### Series


In [2]:
my_Series = pd.Series([1, "cat", 10.2, "dog"])
my_Series

0       1
1     cat
2    10.2
3     dog
dtype: object

In [3]:
my_Series[1]

'cat'

In [4]:
ages = pd.Series([20, 53, 68], index=["John", "Allen", "Mary"])
ages

John     20
Allen    53
Mary     68
dtype: int64

In [5]:
ages["John"]

20

### DataFrames

In [7]:
df = pd.DataFrame(
    {
        "user": [1, 2, 3],
        "age": [24, 54, 17],
        "sex": ["F", "F", "M"],
        "occupation": ["technician", "musician", "student"],
    }
)
df

Unnamed: 0,user,age,sex,occupation
0,1,24,F,technician
1,2,54,F,musician
2,3,17,M,student


In [8]:
display(df) # In place of print 

Unnamed: 0,user,age,sex,occupation
0,1,24,F,technician
1,2,54,F,musician
2,3,17,M,student


In [9]:
df.set_index("user") #I use that for the mountain exercices

Unnamed: 0_level_0,age,sex,occupation
user,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,24,F,technician
2,54,F,musician
3,17,M,student


### In-place changes
One thing we always have to pay attention to when modifying objects in Python is whether the changes are made in-place (meaning that the original object is modified) or whether in fact a new copy of the object is returned with desired changes.

An important feature of the pandas library is that most of its methods return a new copy of the modified object. We can see this in the case of the set_index() method.

If we take a look at df now we can see that the original DataFrame is unchanged, and still has the default indexing starting at 0.

If we want the changes to be applied to the original DataFrame we can use the inplace=True parameter as follows

In [10]:
df

Unnamed: 0,user,age,sex,occupation
0,1,24,F,technician
1,2,54,F,musician
2,3,17,M,student


In [11]:
df.set_index("user", inplace=True)
df

Unnamed: 0_level_0,age,sex,occupation
user,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,24,F,technician
2,54,F,musician
3,17,M,student


In [12]:
df.reset_index(inplace=True)
df

Unnamed: 0,user,age,sex,occupation
0,1,24,F,technician
1,2,54,F,musician
2,3,17,M,student


### Summarizing data

In [13]:
df.head()

Unnamed: 0,user,age,sex,occupation
0,1,24,F,technician
1,2,54,F,musician
2,3,17,M,student


In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   user        3 non-null      int64 
 1   age         3 non-null      int64 
 2   sex         3 non-null      object
 3   occupation  3 non-null      object
dtypes: int64(2), object(2)
memory usage: 224.0+ bytes


In [15]:
# extracting the index
df.index

RangeIndex(start=0, stop=3, step=1)

In [16]:
# extracting the labels of all the columns
df.columns

Index(['user', 'age', 'sex', 'occupation'], dtype='object')

In [17]:
df.shape

(3, 4)

In [19]:
# displaying the number of rows
df.shape[0]

3

In [20]:
# displaying the number of columns
df.shape[1]

4

In [21]:
df.dtypes

user           int64
age            int64
sex           object
occupation    object
dtype: object

In [22]:
df.describe()

Unnamed: 0,user,age
count,3.0,3.0
mean,2.0,31.666667
std,1.0,19.655364
min,1.0,17.0
25%,1.5,20.5
50%,2.0,24.0
75%,2.5,39.0
max,3.0,54.0


### Selecting columns

In [23]:
df["occupation"]

0    technician
1      musician
2       student
Name: occupation, dtype: object

### Changing a Series to a DataFrame

In [24]:
df["occupation"].to_frame()

Unnamed: 0,occupation
0,technician
1,musician
2,student


In [25]:
# We can rename it
ages.to_frame(name="age")

Unnamed: 0,age
John,20
Allen,53
Mary,68


### Importing data from files

In [26]:
df = pd.read_csv('file_name.csv')
df = pd.read_csv('file_name.csv', header=None)
df = pd.read_csv('file_name.csv', names=['Header1', 'Header2', ....])
df = pd.read_csv('file_name.csv', na_values=['?'])
df = pd.read_excel('file_name.xls')

## Manipulating the data

In [27]:
import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(9).reshape(3, 3), columns=["a", "b", "c"])
df

Unnamed: 0,a,b,c
0,0,1,2
1,3,4,5
2,6,7,8


In [28]:
### Dropping values

In [30]:
# So, for example, we can drop the first row as follows:

df.drop(0, axis=0)

Unnamed: 0,a,b,c
1,3,4,5
2,6,7,8


In [31]:
# Or we could drop the first and third row as follows:

df.drop([0, 2], axis=0)

Unnamed: 0,a,b,c
1,3,4,5


In [32]:
df.drop(["b", "c"], axis=1)

Unnamed: 0,a
0,0
1,3
2,6


### Arithmetic Operations

In [35]:
df["a"] + df["b"]

0     1
1     7
2    13
dtype: int64

In [37]:
df["a"].add(df["b"]) # add function provids by pandas

0     1
1     7
2    13
dtype: int64

Pandas also provides similar methods for the other operations:

- sub()
- div()
- mul()

What if we tried performing an operation between two DataFrames of different size? Let’s suppose we select the first two rows of our DataFrame and we add them to the original DataFrame:

In [38]:
df

Unnamed: 0,a,b,c
0,0,1,2
1,3,4,5
2,6,7,8


In [39]:
df.add(df.loc[0:1, :])

Unnamed: 0,a,b,c
0,0.0,2.0,4.0
1,6.0,8.0,10.0
2,,,


By default, when we have indices or column labels that do not overlap between the two DataFrames that we are performing an arithmetic operation with, pandas will fill in these entries with NaN. We can also specify our own value using the fill_value parameter.

For example, if we choose the value 0:


In [40]:

df.add(df.loc[0:1, :], fill_value=0)


Unnamed: 0,a,b,c
0,0.0,2.0,4.0
1,6.0,8.0,10.0
2,6.0,7.0,8.0


### Concatenating DataFrames
Suppose we have two DataFrames that we want to join together. Below we consider the two main cases:

- the DataFrames have the same columns and we want to stack them one on top of the other
- the DataFrames have the same rows and we want to glue them one next to the other

In [42]:
df1 = pd.DataFrame([["Mark", 50], ["Kate", 46]], columns=["name", "age"])
df2 = pd.DataFrame([["Jon", 3], ["David", 4]], columns=["name", "age"])
pd.concat([df1, df2])

Unnamed: 0,name,age
0,Mark,50
1,Kate,46
0,Jon,3
1,David,4


In [44]:
df3 = pd.DataFrame(["writer", "journalist"], columns=["occupation"])
pd.concat([df1, df3], axis=1)

Unnamed: 0,name,age,occupation
0,Mark,50,writer
1,Kate,46,journalist


## Indexing, selecting and filtering
### Selecting data

In [47]:
# We will work on a subset of the columns
columns = [
    "Mountain",
    "Height (m)",
    "Range",
    "Coordinates",
    "Parent mountain",
    "First ascent",
    "Ascents bef. 2004",
    "Failed attempts bef. 2004",
]

# Load the DataFrame, we will work on the first 10 rows (ten highest mountains)
df = pd.read_csv("resources/c1_mountains.csv", nrows=10, usecols=columns)
df

Unnamed: 0,Mountain,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
0,Mount Everest / Sagarmatha / Chomolungma,8848,Mahalangur Himalaya,27°59′17″N 86°55′31″E﻿,,1953,>>145,121
1,K2 / Qogir / Godwin Austen,8611,Baltoro Karakoram,35°52′53″N 76°30′48″E﻿,Mount Everest,1954,45,44
2,Kangchenjunga,8586,Kangchenjunga Himalaya,27°42′12″N 88°08′51″E﻿,Mount Everest,1955,38,24
3,Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26
4,Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52
5,Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79,28
6,Dhaulagiri I,8167,Dhaulagiri Himalaya,28°41′48″N 83°29′35″E﻿,K2,1960,51,39
7,Manaslu,8163,Manaslu Himalaya,28°33′00″N 84°33′35″E﻿,Cho Oyu,1956,49,45
8,Nanga Parbat,8126,Nanga Parbat Himalaya,35°14′14″N 74°35′21″E﻿,Dhaulagiri,1953,52,67
9,Annapurna I,8091,Annapurna Himalaya,28°35′44″N 83°49′13″E﻿,Cho Oyu,1950,36,47


In [48]:
df.set_index("Mountain", inplace=True)
df

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,Mahalangur Himalaya,27°59′17″N 86°55′31″E﻿,,1953,>>145,121
K2 / Qogir / Godwin Austen,8611,Baltoro Karakoram,35°52′53″N 76°30′48″E﻿,Mount Everest,1954,45,44
Kangchenjunga,8586,Kangchenjunga Himalaya,27°42′12″N 88°08′51″E﻿,Mount Everest,1955,38,24
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79,28
Dhaulagiri I,8167,Dhaulagiri Himalaya,28°41′48″N 83°29′35″E﻿,K2,1960,51,39
Manaslu,8163,Manaslu Himalaya,28°33′00″N 84°33′35″E﻿,Cho Oyu,1956,49,45
Nanga Parbat,8126,Nanga Parbat Himalaya,35°14′14″N 74°35′21″E﻿,Dhaulagiri,1953,52,67
Annapurna I,8091,Annapurna Himalaya,28°35′44″N 83°49′13″E﻿,Cho Oyu,1950,36,47


In [49]:
df.index

Index(['Mount Everest / Sagarmatha / Chomolungma',
       'K2 / Qogir / Godwin Austen', 'Kangchenjunga', 'Lhotse', 'Makalu',
       'Cho Oyu', 'Dhaulagiri I', 'Manaslu', 'Nanga Parbat', 'Annapurna I'],
      dtype='object', name='Mountain')

In [50]:
df.columns

Index(['Height (m)', 'Range', 'Coordinates', 'Parent mountain', 'First ascent',
       'Ascents bef. 2004', 'Failed attempts bef. 2004'],
      dtype='object')

### The attribute operator . to select columns

In [51]:
df.Range

Mountain
Mount Everest / Sagarmatha / Chomolungma       Mahalangur Himalaya
K2 / Qogir / Godwin Austen                       Baltoro Karakoram
Kangchenjunga                               Kangchenjunga Himalaya
Lhotse                                         Mahalangur Himalaya
Makalu                                         Mahalangur Himalaya
Cho Oyu                                        Mahalangur Himalaya
Dhaulagiri I                                   Dhaulagiri Himalaya
Manaslu                                           Manaslu Himalaya
Nanga Parbat                                 Nanga Parbat Himalaya
Annapurna I                                     Annapurna Himalaya
Name: Range, dtype: object

Now if we tried to use the attribute operator with the column 'Height (m)' we will obtain an error since Python’s language syntax cannot allow for certain characters such as white spaces direct attribute reference. We can get around this using the getattr function as follows

In [53]:
getattr(df, "Height (m)")

Mountain
Mount Everest / Sagarmatha / Chomolungma    8848
K2 / Qogir / Godwin Austen                  8611
Kangchenjunga                               8586
Lhotse                                      8516
Makalu                                      8485
Cho Oyu                                     8188
Dhaulagiri I                                8167
Manaslu                                     8163
Nanga Parbat                                8126
Annapurna I                                 8091
Name: Height (m), dtype: int64

### The index operator [ ] to select columns

In [55]:
df["Height (m)"]

Mountain
Mount Everest / Sagarmatha / Chomolungma    8848
K2 / Qogir / Godwin Austen                  8611
Kangchenjunga                               8586
Lhotse                                      8516
Makalu                                      8485
Cho Oyu                                     8188
Dhaulagiri I                                8167
Manaslu                                     8163
Nanga Parbat                                8126
Annapurna I                                 8091
Name: Height (m), dtype: int64

In [56]:
df[["Height (m)", "Range", "Coordinates"]]

Unnamed: 0_level_0,Height (m),Range,Coordinates
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,Mahalangur Himalaya,27°59′17″N 86°55′31″E﻿
K2 / Qogir / Godwin Austen,8611,Baltoro Karakoram,35°52′53″N 76°30′48″E﻿
Kangchenjunga,8586,Kangchenjunga Himalaya,27°42′12″N 88°08′51″E﻿
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿
Dhaulagiri I,8167,Dhaulagiri Himalaya,28°41′48″N 83°29′35″E﻿
Manaslu,8163,Manaslu Himalaya,28°33′00″N 84°33′35″E﻿
Nanga Parbat,8126,Nanga Parbat Himalaya,35°14′14″N 74°35′21″E﻿
Annapurna I,8091,Annapurna Himalaya,28°35′44″N 83°49′13″E﻿


### The index operator [ ] to select rows

In [57]:
# a[start:stop:step]
df[2:8]

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Kangchenjunga,8586,Kangchenjunga Himalaya,27°42′12″N 88°08′51″E﻿,Mount Everest,1955,38,24
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79,28
Dhaulagiri I,8167,Dhaulagiri Himalaya,28°41′48″N 83°29′35″E﻿,K2,1960,51,39
Manaslu,8163,Manaslu Himalaya,28°33′00″N 84°33′35″E﻿,Cho Oyu,1956,49,45


In [58]:
df["Lhotse":"Manaslu"]

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79,28
Dhaulagiri I,8167,Dhaulagiri Himalaya,28°41′48″N 83°29′35″E﻿,K2,1960,51,39
Manaslu,8163,Manaslu Himalaya,28°33′00″N 84°33′35″E﻿,Cho Oyu,1956,49,45


### The iloc operator to select rows and columns by position

In [59]:
# df.iloc[rows, columns]
df.iloc[:, 2:6]

Unnamed: 0_level_0,Coordinates,Parent mountain,First ascent,Ascents bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Mount Everest / Sagarmatha / Chomolungma,27°59′17″N 86°55′31″E﻿,,1953,>>145
K2 / Qogir / Godwin Austen,35°52′53″N 76°30′48″E﻿,Mount Everest,1954,45
Kangchenjunga,27°42′12″N 88°08′51″E﻿,Mount Everest,1955,38
Lhotse,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26
Makalu,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45
Cho Oyu,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79
Dhaulagiri I,28°41′48″N 83°29′35″E﻿,K2,1960,51
Manaslu,28°33′00″N 84°33′35″E﻿,Cho Oyu,1956,49
Nanga Parbat,35°14′14″N 74°35′21″E﻿,Dhaulagiri,1953,52
Annapurna I,28°35′44″N 83°49′13″E﻿,Cho Oyu,1950,36


In [60]:
# Here we are using the slicing notation ::2 for the rows which selects every second row 
# from the first to and up to the last. 
# We are then using the slicing notation 2: for the columns which selects every column 
# from position 2 to the last column of the DataFrame.
df.iloc[::2, 2:]

Unnamed: 0_level_0,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Mount Everest / Sagarmatha / Chomolungma,27°59′17″N 86°55′31″E﻿,,1953,>>145,121
Kangchenjunga,27°42′12″N 88°08′51″E﻿,Mount Everest,1955,38,24
Makalu,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52
Dhaulagiri I,28°41′48″N 83°29′35″E﻿,K2,1960,51,39
Nanga Parbat,35°14′14″N 74°35′21″E﻿,Dhaulagiri,1953,52,67


### The loc operator to select rows and columns by label

In [62]:
# df.loc[rows, columns]
df.loc[:, "Height (m)":"First ascent"]

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,Mahalangur Himalaya,27°59′17″N 86°55′31″E﻿,,1953
K2 / Qogir / Godwin Austen,8611,Baltoro Karakoram,35°52′53″N 76°30′48″E﻿,Mount Everest,1954
Kangchenjunga,8586,Kangchenjunga Himalaya,27°42′12″N 88°08′51″E﻿,Mount Everest,1955
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954
Dhaulagiri I,8167,Dhaulagiri Himalaya,28°41′48″N 83°29′35″E﻿,K2,1960
Manaslu,8163,Manaslu Himalaya,28°33′00″N 84°33′35″E﻿,Cho Oyu,1956
Nanga Parbat,8126,Nanga Parbat Himalaya,35°14′14″N 74°35′21″E﻿,Dhaulagiri,1953
Annapurna I,8091,Annapurna Himalaya,28°35′44″N 83°49′13″E﻿,Cho Oyu,1950


In [63]:
df.loc[:, "Height (m)":"First ascent":2]

Unnamed: 0_level_0,Height (m),Coordinates,First ascent
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,27°59′17″N 86°55′31″E﻿,1953
K2 / Qogir / Godwin Austen,8611,35°52′53″N 76°30′48″E﻿,1954
Kangchenjunga,8586,27°42′12″N 88°08′51″E﻿,1955
Lhotse,8516,27°57′42″N 86°55′59″E﻿,1956
Makalu,8485,27°53′23″N 87°05′20″E﻿,1955
Cho Oyu,8188,28°05′39″N 86°39′39″E﻿,1954
Dhaulagiri I,8167,28°41′48″N 83°29′35″E﻿,1960
Manaslu,8163,28°33′00″N 84°33′35″E﻿,1956
Nanga Parbat,8126,35°14′14″N 74°35′21″E﻿,1953
Annapurna I,8091,28°35′44″N 83°49′13″E﻿,1950


### Boolean selection of rows using the [ ] operator

In [65]:
df["Parent mountain"] == "Mount Everest"

Mountain
Mount Everest / Sagarmatha / Chomolungma    False
K2 / Qogir / Godwin Austen                   True
Kangchenjunga                                True
Lhotse                                       True
Makalu                                       True
Cho Oyu                                      True
Dhaulagiri I                                False
Manaslu                                     False
Nanga Parbat                                False
Annapurna I                                 False
Name: Parent mountain, dtype: bool

In [66]:
df[df["Parent mountain"] == "Mount Everest"]

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
K2 / Qogir / Godwin Austen,8611,Baltoro Karakoram,35°52′53″N 76°30′48″E﻿,Mount Everest,1954,45,44
Kangchenjunga,8586,Kangchenjunga Himalaya,27°42′12″N 88°08′51″E﻿,Mount Everest,1955,38,24
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79,28


Although in Python we can use the syntax and, or, and not, these will not work when testing multiple conditions with Pandas. Instead, we must use the operators

- & for and
- | for or
- ~ for not

In addition, we must use parentheses to separate the boolean conditions. Let’s give this a try

In [67]:
# Mountains with Mount Everest as parent with a first ascent after 1955
df[(df["Parent mountain"] == "Mount Everest") & (df["First ascent"] > 1955)]

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26


### Boolean selection of rows and columns using the .loc operator

In [68]:
df.loc[(df["Parent mountain"] == "Mount Everest") & (df["First ascent"] > 1955), :]

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26


In [69]:
df.loc[
    (df["Parent mountain"] == "Mount Everest") & (df["First ascent"] > 1955),
    "Height (m)":"Range",
]

Unnamed: 0_level_0,Height (m),Range
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1
Lhotse,8516,Mahalangur Himalaya


In [70]:
# In fact, we can even include a boolean selection for the columns as well.
# Here is an example
col_criteria = [True, False, False, False, True, True, False]
df.loc[df["Height (m)"] > 8300, col_criteria]

Unnamed: 0_level_0,Height (m),First ascent,Ascents bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,1953,>>145
K2 / Qogir / Godwin Austen,8611,1954,45
Kangchenjunga,8586,1955,38
Lhotse,8516,1956,26
Makalu,8485,1955,45


## Views vs Copies

In [71]:
f = pd.DataFrame(
    {
        "user": [1, 2, 3],
        "age": [24, 54, 17],
        "sex": ["F", "F", "M"],
        "occupation": ["technician", "musician", "student"],
    }
)
df

Unnamed: 0_level_0,Height (m),Range,Coordinates,Parent mountain,First ascent,Ascents bef. 2004,Failed attempts bef. 2004
Mountain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Mount Everest / Sagarmatha / Chomolungma,8848,Mahalangur Himalaya,27°59′17″N 86°55′31″E﻿,,1953,>>145,121
K2 / Qogir / Godwin Austen,8611,Baltoro Karakoram,35°52′53″N 76°30′48″E﻿,Mount Everest,1954,45,44
Kangchenjunga,8586,Kangchenjunga Himalaya,27°42′12″N 88°08′51″E﻿,Mount Everest,1955,38,24
Lhotse,8516,Mahalangur Himalaya,27°57′42″N 86°55′59″E﻿,Mount Everest,1956,26,26
Makalu,8485,Mahalangur Himalaya,27°53′23″N 87°05′20″E﻿,Mount Everest,1955,45,52
Cho Oyu,8188,Mahalangur Himalaya,28°05′39″N 86°39′39″E﻿,Mount Everest,1954,79,28
Dhaulagiri I,8167,Dhaulagiri Himalaya,28°41′48″N 83°29′35″E﻿,K2,1960,51,39
Manaslu,8163,Manaslu Himalaya,28°33′00″N 84°33′35″E﻿,Cho Oyu,1956,49,45
Nanga Parbat,8126,Nanga Parbat Himalaya,35°14′14″N 74°35′21″E﻿,Dhaulagiri,1953,52,67
Annapurna I,8091,Annapurna Himalaya,28°35′44″N 83°49′13″E﻿,Cho Oyu,1950,36,47


### Part 1: Warning after failed attempt at setting values
