# Indexing, Selecting, and Filtering

## Setup

In [1]:
import pandas as pd

## Creation

In [2]:
data = {
    "Capital": {
        "Spain": "Madrid",
        "Belgium": "Brussels",
        "France": "Paris",
        "Italy": "Roma",
        "Germany": "Berlin",
        "Portugal": "Lisbon",
        "Norway": "Oslo",
        "Greece": "Athens",
    },
    "Population": {
        "Spain": 46733038,
        "Belgium": 11449656,
        "France": 67076000,
        "Italy": 60390560,
        "Germany": 83122889,
        "Portugal": 10295909,
        "Norway": 5391369,
        "Greece": 10718565,
    },
    "Monarch": {
        "Spain": "Felipe VI",
        "Belgium": "Philippe",
        "Norway": "Harald V",
    },
    "Area": {
        "Spain": 505990,
        "Belgium": 30688,
        "France": 640679,
        "Italy": 301340,
        "Germany": 357022,
        "Portugal": 92212,
        "Norway": 385207,
        "Greece": 131957,
    },
}

In [3]:
# For now, let's forget about these steps:
df = pd.DataFrame(data)
df["Capital"] = df["Capital"].astype("string")
df["Monarch"] = df["Monarch"].astype("string")

In [4]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Apple stock data, taken from the [`matplotlib` sample datasets](https://github.com/matplotlib/sample_data/blob/master/aapl.csv)

In [5]:
# For now, let's forget about these steps:
apple = pd.read_csv("AAPL.csv")
apple["Date"] = apple["Date"].astype("datetime64[ns]")
apple = apple.set_index("Date")
apple = apple.sort_index()

In [6]:
apple.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,26.5,26.87,26.25,26.5,2981600,3.02
1984-09-10,26.5,26.62,25.87,26.37,2346400,3.01
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
1984-09-12,26.87,27.0,26.12,26.12,4773600,2.98
1984-09-13,27.5,27.62,27.5,27.5,7429600,3.14


## `.loc[]` and `.iloc[]`

```python
df.loc[..., ...]
df.loc[rows, columns]

df.iloc[..., ...]
df.iloc[rows, columns]

```

In [7]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


`.loc[]` expects labels (from the columns or from the index):

In [8]:
df.loc["Germany":"Norway"]

Unnamed: 0,Capital,Population,Monarch,Area
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207


`.iloc[]` expects integers:

In [9]:
df.iloc[-3:]

Unnamed: 0,Capital,Population,Monarch,Area
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


## Demo 1: Selecting one column (as a `Series`)

In [10]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Select one column:

In [11]:
df["Capital"]

Spain         Madrid
Belgium     Brussels
France         Paris
Italy           Roma
Germany       Berlin
Portugal      Lisbon
Norway          Oslo
Greece        Athens
Name: Capital, dtype: string

In [12]:
df.loc[:, "Capital"]

Spain         Madrid
Belgium     Brussels
France         Paris
Italy           Roma
Germany       Berlin
Portugal      Lisbon
Norway          Oslo
Greece        Athens
Name: Capital, dtype: string

Check the type of the object returned:

In [None]:
type(df.loc[:, "Capital"])

## Exercise 1

In [13]:
apple.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,26.5,26.87,26.25,26.5,2981600,3.02
1984-09-10,26.5,26.62,25.87,26.37,2346400,3.01
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
1984-09-12,26.87,27.0,26.12,26.12,4773600,2.98
1984-09-13,27.5,27.62,27.5,27.5,7429600,3.14


Select the "Volume" column of the `apple` DataFrame:

In [16]:
apple.Volume

Date
1984-09-07     2981600
1984-09-10     2346400
1984-09-11     5444000
1984-09-12     4773600
1984-09-13     7429600
                ...   
2008-10-08    78847900
2008-10-09    57763700
2008-10-10    79260700
2008-10-13    54967000
2008-10-14    70749800
Name: Volume, Length: 6081, dtype: int64

In [17]:
apple.loc[:,"Volume"]

Date
1984-09-07     2981600
1984-09-10     2346400
1984-09-11     5444000
1984-09-12     4773600
1984-09-13     7429600
                ...   
2008-10-08    78847900
2008-10-09    57763700
2008-10-10    79260700
2008-10-13    54967000
2008-10-14    70749800
Name: Volume, Length: 6081, dtype: int64

Check the type of the object returned:

In [18]:
type(apple.loc[:,"Volume"])

pandas.core.series.Series

## Demo 2: Selecting several columns

In [19]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Select several columns:

In [20]:
df.loc[:, ["Capital", "Area"]]

Unnamed: 0,Capital,Area
Spain,Madrid,505990
Belgium,Brussels,30688
France,Paris,640679
Italy,Roma,301340
Germany,Berlin,357022
Portugal,Lisbon,92212
Norway,Oslo,385207
Greece,Athens,131957


Check the type of the object returned:

In [21]:
type(df.loc[:, ["Capital", "Area"]])

pandas.core.frame.DataFrame

## Exercise 2

In [22]:
apple.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,26.5,26.87,26.25,26.5,2981600,3.02
1984-09-10,26.5,26.62,25.87,26.37,2346400,3.01
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
1984-09-12,26.87,27.0,26.12,26.12,4773600,2.98
1984-09-13,27.5,27.62,27.5,27.5,7429600,3.14


Select the "Open" and "Close" columns of the `apple` DataFrame:

In [24]:
apple.loc[:, ["Open", "Close"]]

Unnamed: 0_level_0,Open,Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
1984-09-07,26.50,26.50
1984-09-10,26.50,26.37
1984-09-11,26.62,26.87
1984-09-12,26.87,26.12
1984-09-13,27.50,27.50
...,...,...
2008-10-08,85.91,89.79
2008-10-09,93.35,88.74
2008-10-10,85.70,96.80
2008-10-13,104.55,110.26


Check the type of the object returned:

In [25]:
type(apple.loc[:,["Open", "Close"]])

pandas.core.frame.DataFrame

## Demo 3: Selecting one column (as a `DataFrame`)

In [26]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Select one column, and return a DataFrame:

In [27]:
df.loc[:, ["Monarch"]]

Unnamed: 0,Monarch
Spain,Felipe VI
Belgium,Philippe
France,
Italy,
Germany,
Portugal,
Norway,Harald V
Greece,


Check the type of the object returned:

In [28]:
type(df.loc[:, ["Monarch"]])

pandas.core.frame.DataFrame

## Exercise 3

In [29]:
apple.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,26.5,26.87,26.25,26.5,2981600,3.02
1984-09-10,26.5,26.62,25.87,26.37,2346400,3.01
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
1984-09-12,26.87,27.0,26.12,26.12,4773600,2.98
1984-09-13,27.5,27.62,27.5,27.5,7429600,3.14


Select the "Adj Close" column of the `apple` DataFrame, and return a DataFrame:

In [30]:
apple.loc[:, ["Adj Close"]]

Unnamed: 0_level_0,Adj Close
Date,Unnamed: 1_level_1
1984-09-07,3.02
1984-09-10,3.01
1984-09-11,3.07
1984-09-12,2.98
1984-09-13,3.14
...,...
2008-10-08,89.79
2008-10-09,88.74
2008-10-10,96.80
2008-10-13,110.26


Check the type of the object returned:

In [31]:
type(apple.loc[:, ["Adj Close"]])

pandas.core.frame.DataFrame

## Demo 4: Slicing rows (using the index)

In [32]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Slice the first few rows until "Italy" included:

In [33]:
df.loc[:"Italy"]

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340


Check the shape:

In [34]:
df.loc[:"Italy"].shape

(4, 4)

<div class="alert alert-warning">

<b>Beware:</b> Unlike in <code>Python</code>, <b>the end point is included</b> when slicing in <code>pandas</code> <b>using the index!</b>

</div>

Slice the last few rows, starting from "Italy":

In [35]:
df.loc["Italy":]

Unnamed: 0,Capital,Population,Monarch,Area
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Check the shape:

In [36]:
df.loc["Italy":].shape

(5, 4)

Slice the rows from "Belgium" until "Germany" included:

In [37]:
df.loc["Belgium":"Germany"]

Unnamed: 0,Capital,Population,Monarch,Area
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022


Check the shape:

In [38]:
df.loc["Belgium":"Germany"].shape

(4, 4)

<div class="alert alert-warning">

<b>Beware:</b> Unlike in <code>Python</code>, <b>the end point is included</b> when slicing in <code>pandas</code> <b>using the index!</b>

</div>

## Exercise 4

In [39]:
apple.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,26.5,26.87,26.25,26.5,2981600,3.02
1984-09-10,26.5,26.62,25.87,26.37,2346400,3.01
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
1984-09-12,26.87,27.0,26.12,26.12,4773600,2.98
1984-09-13,27.5,27.62,27.5,27.5,7429600,3.14


Slice the first few rows of the `apple` DataFrame until the 14 September 1984 included:

In [40]:
apple.loc[:"1984-09-14"]

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,26.5,26.87,26.25,26.5,2981600,3.02
1984-09-10,26.5,26.62,25.87,26.37,2346400,3.01
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
1984-09-12,26.87,27.0,26.12,26.12,4773600,2.98
1984-09-13,27.5,27.62,27.5,27.5,7429600,3.14
1984-09-14,27.62,28.5,27.62,27.87,8826400,3.18


Check the shape:

In [41]:
apple.loc[:"1984-09-14"].shape

(6, 6)

Slice the last few rows of the `apple` DataFrame, starting from the 1 October 2008:

In [42]:
apple.loc["2008-10-01":]

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2008-10-01,111.92,112.36,107.39,109.12,46303000,109.12
2008-10-02,108.01,108.79,100.0,100.1,57477300,100.1
2008-10-03,104.0,106.5,94.65,97.07,81942800,97.07
2008-10-06,91.96,98.78,87.54,98.14,75264900,98.14
2008-10-07,100.48,101.5,88.95,89.16,67099000,89.16
2008-10-08,85.91,96.33,85.68,89.79,78847900,89.79
2008-10-09,93.35,95.8,86.6,88.74,57763700,88.74
2008-10-10,85.7,100.0,85.0,96.8,79260700,96.8
2008-10-13,104.55,110.53,101.02,110.26,54967000,110.26
2008-10-14,116.26,116.4,103.14,104.08,70749800,104.08


Check the shape:

In [43]:
apple.loc["2008-10-01":].shape

(10, 6)

Slice the rows of the `apple` DataFrame for the month of February 2000:

In [44]:
apple.loc["2000-02":"2000-02"]

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-02-01,104.0,105.0,100.0,100.25,11380000,25.06
2000-02-02,100.75,102.12,97.0,98.81,16588800,24.7
2000-02-03,100.31,104.25,100.25,103.31,16977600,25.83
2000-02-04,103.94,110.0,103.62,108.0,15206800,27.0
2000-02-07,108.0,114.25,105.94,114.06,15770800,28.51
2000-02-08,114.0,116.12,111.25,114.87,14613600,28.72
2000-02-09,114.12,117.12,112.44,112.62,10698000,28.16
2000-02-10,112.87,113.87,110.0,113.5,10832400,28.38
2000-02-11,113.62,114.12,108.25,108.75,7592000,27.19
2000-02-14,109.31,115.87,108.62,115.81,13130000,28.95


Check the shape:

In [45]:
apple.loc["2000-02":"2000-02"].shape

(20, 6)

## Demo 5: Slicing rows (using integers)

In [46]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Slice the first 4 rows:

In [47]:
df.iloc[:4]

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340


Check the shape:

In [48]:
df.iloc[:4].shape

(4, 4)

<div class="alert alert-warning">

<b>Beware:</b> Unlike in <code>Python</code>, <b>the end point is NOT included</b> when slicing in <code>pandas</code> <b>using integers!</b>

</div>

Slice the last 3 rows:

In [49]:
df.iloc[-3:]

Unnamed: 0,Capital,Population,Monarch,Area
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Check the shape:

In [50]:
df.iloc[-3:].shape

(3, 4)

Slice the rows from the third until the fifth:

In [51]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


In [52]:
df.iloc[2:5]

Unnamed: 0,Capital,Population,Monarch,Area
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022


Check the shape:

In [53]:
df.iloc[2:5].shape

(3, 4)

<div class="alert alert-warning">

<b>Beware:</b> Like in <code>Python</code>, <b>the end point is NOT included</b> when slicing in <code>pandas</code> <b>using integers!</b>

</div>

## Exercise 5

In [None]:
apple.head()

Slice the first 3 rows of the `apple` DataFrame:

In [54]:
apple.iloc[:3]

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,26.5,26.87,26.25,26.5,2981600,3.02
1984-09-10,26.5,26.62,25.87,26.37,2346400,3.01
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07


Check the shape:

In [55]:
apple.iloc[:3].shape

(3, 6)

Slice the last 6 rows of the `apple` DataFrame:

In [56]:
apple.iloc[-6:]

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2008-10-07,100.48,101.5,88.95,89.16,67099000,89.16
2008-10-08,85.91,96.33,85.68,89.79,78847900,89.79
2008-10-09,93.35,95.8,86.6,88.74,57763700,88.74
2008-10-10,85.7,100.0,85.0,96.8,79260700,96.8
2008-10-13,104.55,110.53,101.02,110.26,54967000,110.26
2008-10-14,116.26,116.4,103.14,104.08,70749800,104.08


Check the shape:

In [57]:
apple.iloc[-6:].shape

(6, 6)

Slice the second to the fourth rows of the `apple` DataFrame:

In [58]:
apple.iloc[1:4]

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-10,26.5,26.62,25.87,26.37,2346400,3.01
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
1984-09-12,26.87,27.0,26.12,26.12,4773600,2.98


Check the shape:

In [59]:
apple.iloc[1:4].shape

(3, 6)

## Demo 6: Selecting data with a boolean array

In [60]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


In [61]:
df["Population"]

Spain       46733038
Belgium     11449656
France      67076000
Italy       60390560
Germany     83122889
Portugal    10295909
Norway       5391369
Greece      10718565
Name: Population, dtype: int64

Comparisons return boolean arrays:

In [62]:
df["Population"] < 15_000_000

Spain       False
Belgium      True
France      False
Italy       False
Germany     False
Portugal     True
Norway       True
Greece       True
Name: Population, dtype: bool

Select the rows for which the population is less than 15 million people:

In [63]:
df.loc[df["Population"] < 15_000_000]

Unnamed: 0,Capital,Population,Monarch,Area
Belgium,Brussels,11449656,Philippe,30688
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Check the shape:

In [64]:
df.loc[df["Population"] < 15_000_000].shape

(4, 4)

Select the rows for which the area is greater than or equal to 400 thousand square km:

In [65]:
df["Area"]

Spain       505990
Belgium      30688
France      640679
Italy       301340
Germany     357022
Portugal     92212
Norway      385207
Greece      131957
Name: Area, dtype: int64

In [66]:
df["Area"] >= 400_000

Spain        True
Belgium     False
France       True
Italy       False
Germany     False
Portugal    False
Norway      False
Greece      False
Name: Area, dtype: bool

In [67]:
df.loc[df["Area"] >= 400_000]

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
France,Paris,67076000,,640679


Check the shape:

In [68]:
df.loc[df["Area"] >= 400_000].shape

(2, 4)

Select the rows for which the area is smaller than 400 thousand square km:

In [69]:
df.loc[df["Area"] < 400_000]

Unnamed: 0,Capital,Population,Monarch,Area
Belgium,Brussels,11449656,Philippe,30688
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Check the shape:

In [70]:
df.loc[df["Area"] < 400_000].shape

(6, 4)

The original DataFrame has been split into two parts:

In [71]:
df.shape

(8, 4)

Select the rows for which the capital is "Roma":

In [72]:
df["Capital"] == "Roma"

Spain       False
Belgium     False
France      False
Italy        True
Germany     False
Portugal    False
Norway      False
Greece      False
Name: Capital, dtype: boolean

In [None]:
df.loc[df["Capital"] == "Roma"]

Check the shape:

In [73]:
df.loc[df["Capital"] == "Roma"].shape

(1, 4)

## Exercise 6

In [74]:
apple.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,26.5,26.87,26.25,26.5,2981600,3.02
1984-09-10,26.5,26.62,25.87,26.37,2346400,3.01
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
1984-09-12,26.87,27.0,26.12,26.12,4773600,2.98
1984-09-13,27.5,27.62,27.5,27.5,7429600,3.14


Select the rows of the `apple` DataFrame for which the "Open" column was less than or equal to 26.50:

In [75]:
apple.loc[apple.Open <= 26.50]

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,26.50,26.87,26.25,26.50,2981600,3.02
1984-09-10,26.50,26.62,25.87,26.37,2346400,3.01
1984-09-25,26.50,26.50,26.12,26.12,5977600,2.98
1984-09-26,26.12,27.25,25.75,25.75,3987200,2.94
1984-09-27,25.75,25.87,25.75,25.75,3796000,2.94
...,...,...,...,...,...,...
2004-05-04,25.97,26.55,25.50,26.14,9999400,13.07
2004-05-05,26.20,26.75,25.96,26.65,8503800,13.32
2004-05-06,26.40,26.75,25.90,26.58,9412800,13.29
2004-05-10,26.27,26.60,25.94,26.28,8927800,13.14


Check the shape:

In [76]:
apple.loc[apple.Open <= 26.50].shape

(1757, 6)

Select the rows of the `apple` DataFrame for which the "Volume" column was greater than 100_000_000:

In [77]:
apple.loc[apple.Volume > 100_000_000]

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1997-08-06,25.25,27.75,25.0,26.31,149671200,6.58
1997-08-07,28.75,29.56,28.37,29.19,134124400,7.3
1999-09-21,73.19,73.25,69.0,69.25,119931200,17.31
2000-09-29,28.19,29.0,25.37,25.75,265069000,12.88
2005-01-13,73.71,74.42,69.73,69.8,113025600,34.9
2007-01-09,86.45,92.98,85.15,92.57,119617800,92.57
2007-01-10,94.75,97.8,93.45,97.0,105460000,97.0
2008-01-23,136.19,140.0,126.14,139.07,120463200,139.07


Check the shape:

In [78]:
apple.loc[apple.Volume > 100_000_000].shape

(8, 6)

Using the `apple` DataFrame, find out how many days the "Close" value was exactly 14.00:

In [79]:
apple.loc[apple.Close == 14.00]

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-12-19,14.38,15.25,14.0,14.0,13367200,7.0


Check the shape:

In [80]:
apple.loc[apple.Close == 14.00].shape

(1, 6)

<div class="alert alert-info">

<b>Note:</b> Up to this point, the <code>.loc[]</code> and <code>.iloc[]</code> methods offered no new functionality; below are examples of their power!

</div>

## Demo 7: Selecting one or more columns (using integers)

In [81]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Select one column:

In [82]:
df["Capital"]

Spain         Madrid
Belgium     Brussels
France         Paris
Italy           Roma
Germany       Berlin
Portugal      Lisbon
Norway          Oslo
Greece        Athens
Name: Capital, dtype: string

In [83]:
df.loc[:, "Capital"]

Spain         Madrid
Belgium     Brussels
France         Paris
Italy           Roma
Germany       Berlin
Portugal      Lisbon
Norway          Oslo
Greece        Athens
Name: Capital, dtype: string

Select one column using integers:

In [84]:
# Raises an error, because indexing by integer is not allowed:
df[0]

KeyError: 0

In [85]:
df.iloc[:, 0]

Spain         Madrid
Belgium     Brussels
France         Paris
Italy           Roma
Germany       Berlin
Portugal      Lisbon
Norway          Oslo
Greece        Athens
Name: Capital, dtype: string

Select several columns using integers:

In [86]:
df.iloc[:, [1, 3]]

Unnamed: 0,Population,Area
Spain,46733038,505990
Belgium,11449656,30688
France,67076000,640679
Italy,60390560,301340
Germany,83122889,357022
Portugal,10295909,92212
Norway,5391369,385207
Greece,10718565,131957


In [87]:
df.iloc[:, 1:]

Unnamed: 0,Population,Monarch,Area
Spain,46733038,Felipe VI,505990
Belgium,11449656,Philippe,30688
France,67076000,,640679
Italy,60390560,,301340
Germany,83122889,,357022
Portugal,10295909,,92212
Norway,5391369,Harald V,385207
Greece,10718565,,131957


## Exercise 7

In [None]:
apple.head()

Select the second column of the `apple` DataFrame:

In [88]:
apple.iloc[:,1]

Date
1984-09-07     26.87
1984-09-10     26.62
1984-09-11     27.37
1984-09-12     27.00
1984-09-13     27.62
               ...  
2008-10-08     96.33
2008-10-09     95.80
2008-10-10    100.00
2008-10-13    110.53
2008-10-14    116.40
Name: High, Length: 6081, dtype: float64

Select the first and fourth columns of the `apple` DataFrame:

In [89]:
apple.iloc[:,[0,3]]

Unnamed: 0_level_0,Open,Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
1984-09-07,26.50,26.50
1984-09-10,26.50,26.37
1984-09-11,26.62,26.87
1984-09-12,26.87,26.12
1984-09-13,27.50,27.50
...,...,...
2008-10-08,85.91,89.79
2008-10-09,93.35,88.74
2008-10-10,85.70,96.80
2008-10-13,104.55,110.26


Select the first to fourth columns of the `apple` DataFrame:

In [90]:
apple.iloc[:,:4]

Unnamed: 0_level_0,Open,High,Low,Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1984-09-07,26.50,26.87,26.25,26.50
1984-09-10,26.50,26.62,25.87,26.37
1984-09-11,26.62,27.37,26.62,26.87
1984-09-12,26.87,27.00,26.12,26.12
1984-09-13,27.50,27.62,27.50,27.50
...,...,...,...,...
2008-10-08,85.91,96.33,85.68,89.79
2008-10-09,93.35,95.80,86.60,88.74
2008-10-10,85.70,100.00,85.00,96.80
2008-10-13,104.55,110.53,101.02,110.26


## Demo 8: Selecting a slice of columns

In [91]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Select a slice of columns:

In [92]:
df[["Population", "Monarch", "Area"]]

Unnamed: 0,Population,Monarch,Area
Spain,46733038,Felipe VI,505990
Belgium,11449656,Philippe,30688
France,67076000,,640679
Italy,60390560,,301340
Germany,83122889,,357022
Portugal,10295909,,92212
Norway,5391369,Harald V,385207
Greece,10718565,,131957


In [93]:
# Raises an error, because a single slice refers to rows, and not to columns:
df["Population":"Area"]

KeyError: 'Population'

In [94]:
df.loc[:, "Population":"Area"]

Unnamed: 0,Population,Monarch,Area
Spain,46733038,Felipe VI,505990
Belgium,11449656,Philippe,30688
France,67076000,,640679
Italy,60390560,,301340
Germany,83122889,,357022
Portugal,10295909,,92212
Norway,5391369,Harald V,385207
Greece,10718565,,131957


## Exercise 8

In [None]:
apple.head()

Select the slice of columns from "Open" to "Close" of the `apple` DataFrame:

In [95]:
apple.loc[:,"Open":"Close"]

Unnamed: 0_level_0,Open,High,Low,Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1984-09-07,26.50,26.87,26.25,26.50
1984-09-10,26.50,26.62,25.87,26.37
1984-09-11,26.62,27.37,26.62,26.87
1984-09-12,26.87,27.00,26.12,26.12
1984-09-13,27.50,27.62,27.50,27.50
...,...,...,...,...
2008-10-08,85.91,96.33,85.68,89.79
2008-10-09,93.35,95.80,86.60,88.74
2008-10-10,85.70,100.00,85.00,96.80
2008-10-13,104.55,110.53,101.02,110.26


## Demo 9: Selecting specific rows by labels

In [96]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Select specific rows:

In [97]:
df["France":"Portugal"]

Unnamed: 0,Capital,Population,Monarch,Area
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212


In [98]:
# Raises an error, because a list refers to columns, and not to rows:
df[["France", "Germany"]]

KeyError: "None of [Index(['France', 'Germany'], dtype='object')] are in the [columns]"

In [99]:
df.loc[["France", "Germany"]]

Unnamed: 0,Capital,Population,Monarch,Area
France,Paris,67076000,,640679
Germany,Berlin,83122889,,357022


## Exercise 9

In [None]:
apple.head()

Select the rows for 18 May 2000 and 18 May 2001 of the `apple` DataFrame using `.loc[]`:

In [100]:
apple.loc[["2000-05-18","2001-05-18"]]

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-05-18,103.0,104.94,100.62,100.75,13365600,25.19
2001-05-18,23.36,23.64,23.12,23.53,5680400,11.77


## Demo 10: Selecting on both rows and columns

In [101]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Select one or more columns:

In [102]:
df[["Capital", "Area"]]

Unnamed: 0,Capital,Area
Spain,Madrid,505990
Belgium,Brussels,30688
France,Paris,640679
Italy,Roma,301340
Germany,Berlin,357022
Portugal,Lisbon,92212
Norway,Oslo,385207
Greece,Athens,131957


In [103]:
df.loc[:, ["Capital", "Area"]]

Unnamed: 0,Capital,Area
Spain,Madrid,505990
Belgium,Brussels,30688
France,Paris,640679
Italy,Roma,301340
Germany,Berlin,357022
Portugal,Lisbon,92212
Norway,Oslo,385207
Greece,Athens,131957


Select rows:

In [104]:
df["Belgium":"Norway"]

Unnamed: 0,Capital,Population,Monarch,Area
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207


In [105]:
df.loc["Belgium":"Norway"]

Unnamed: 0,Capital,Population,Monarch,Area
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207


Select on both rows and columns by chaining single selections:

In [106]:
df[["Capital", "Area"]]["Belgium":"Norway"]

Unnamed: 0,Capital,Area
Belgium,Brussels,30688
France,Paris,640679
Italy,Roma,301340
Germany,Berlin,357022
Portugal,Lisbon,92212
Norway,Oslo,385207


In [107]:
df["Belgium":"Norway"][["Capital", "Area"]]

Unnamed: 0,Capital,Area
Belgium,Brussels,30688
France,Paris,640679
Italy,Roma,301340
Germany,Berlin,357022
Portugal,Lisbon,92212
Norway,Oslo,385207


Select on both rows and columns, using the `.loc[]`/`.iloc[]` methods:

In [108]:
df.loc["Belgium":"Norway", ["Capital", "Area"]]

Unnamed: 0,Capital,Area
Belgium,Brussels,30688
France,Paris,640679
Italy,Roma,301340
Germany,Berlin,357022
Portugal,Lisbon,92212
Norway,Oslo,385207


<div class="alert alert-info">

<b>Note:</b> The <code>.ix[]</code> method allows to select using a mix of labels and integers, but it is deprecated - avoid using it.

</div>

<div class="alert alert-success">

<b>Best Practice:</b> Use <code>.loc[]</code> / <code>.iloc[]</code> when selecting on both rows and columns to <b>view values</b>.

</div>

<div class="alert alert-danger">

<b>Warning:</b> Always use <code>.loc[]</code> / <code>.iloc[]</code> when selecting on both rows and columns to <b>assign values</b>! (See <code>SettingWithCopyWarning</code>)

</div>

## Exercise 10

In [109]:
apple.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1984-09-07,26.5,26.87,26.25,26.5,2981600,3.02
1984-09-10,26.5,26.62,25.87,26.37,2346400,3.01
1984-09-11,26.62,27.37,26.62,26.87,5444000,3.07
1984-09-12,26.87,27.0,26.12,26.12,4773600,2.98
1984-09-13,27.5,27.62,27.5,27.5,7429600,3.14


Select the "Open" and "Close" columns of the `apple` DataFrame using `.loc[]`:

In [111]:
apple.loc[:,["Open","Close"]]

Unnamed: 0_level_0,Open,Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
1984-09-07,26.50,26.50
1984-09-10,26.50,26.37
1984-09-11,26.62,26.87
1984-09-12,26.87,26.12
1984-09-13,27.50,27.50
...,...,...
2008-10-08,85.91,89.79
2008-10-09,93.35,88.74
2008-10-10,85.70,96.80
2008-10-13,104.55,110.26


Select the rows for the month of February 2000 of the `apple` DataFrame using `.loc[]`:

In [112]:
apple.loc["2000-02"]

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2000-02-01,104.0,105.0,100.0,100.25,11380000,25.06
2000-02-02,100.75,102.12,97.0,98.81,16588800,24.7
2000-02-03,100.31,104.25,100.25,103.31,16977600,25.83
2000-02-04,103.94,110.0,103.62,108.0,15206800,27.0
2000-02-07,108.0,114.25,105.94,114.06,15770800,28.51
2000-02-08,114.0,116.12,111.25,114.87,14613600,28.72
2000-02-09,114.12,117.12,112.44,112.62,10698000,28.16
2000-02-10,112.87,113.87,110.0,113.5,10832400,28.38
2000-02-11,113.62,114.12,108.25,108.75,7592000,27.19
2000-02-14,109.31,115.87,108.62,115.81,13130000,28.95


Combine both selections using `.loc[]`:

In [113]:
apple.loc["2000-02",["Open","Close"]]

Unnamed: 0_level_0,Open,Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2000-02-01,104.0,100.25
2000-02-02,100.75,98.81
2000-02-03,100.31,103.31
2000-02-04,103.94,108.0
2000-02-07,108.0,114.06
2000-02-08,114.0,114.87
2000-02-09,114.12,112.62
2000-02-10,112.87,113.5
2000-02-11,113.62,108.75
2000-02-14,109.31,115.81


## Demo 11: Selecting values with `.at[]` and `.iat[]`

In [114]:
df

Unnamed: 0,Capital,Population,Monarch,Area
Spain,Madrid,46733038,Felipe VI,505990
Belgium,Brussels,11449656,Philippe,30688
France,Paris,67076000,,640679
Italy,Roma,60390560,,301340
Germany,Berlin,83122889,,357022
Portugal,Lisbon,10295909,,92212
Norway,Oslo,5391369,Harald V,385207
Greece,Athens,10718565,,131957


Select a specific value giving both the row and the column, using the `.loc[]`/`.iloc[]` methods:

In [115]:
df.at["Belgium", "Capital"]

'Brussels'

In [116]:
df.iat[1, 0]

'Brussels'

## Exercise 11

In [None]:
apple.head()

Select the "Close" value for the 18 May 2000 of the `apple` DataFrame using `.at[]`:

In [117]:
apple.at["2000-05-18", "Close"]

100.75

Select the "Open" value for the 7 September 1984 of the `apple` DataFrame using `.at[]`:

In [118]:
apple.at["1984-09-07", "Open"]

26.5

## Summary

```python
df.loc[..., ...]
df.loc[rows, columns]

df.iloc[..., ...]
df.iloc[rows, columns]

```

 Command                         | Result
:--------------------------------|:------------------------------------------------------
`df["Column"]`                   | Selects one column, and returns a `Series`
`df[["Column_1", "Column_2"]]`   | Selects several columns, and returns a `DataFrame`
`df[["Column"]]`                 | Selects one column, and returns a `DataFrame`
`df[:"Spain"]`                   | Slices rows using the index, and returns a `DataFrame`
`df[:10]`                        | Slices rows using integers, and returns a `DataFrame`
`df[df["Column"] > 0]`           | Selects rows, and returns a `DataFrame`
                                 |
`df.loc[..., ...]`               | Selects on both rows and columns (using labels)
`df.iloc[..., ...]`              | Selects on both rows and columns (using integers)
                                 |
`df.at[..., ...]`                | Selects value at a specific position (using labels)
`df.iat[..., ...]`               | Selects value at a specific position (using integers)
                                 |
`df.loc["Spain"]`                | Selects one row, and returns a `Series`
`df.loc[["Spain", "Belgium"]]`   | Selects several rows, and returns a `DataFrame`
`df.loc[["Spain"]]`              | Selects one row, and returns a `DataFrame`
`df.loc["Spain":"Germany"]`      | Selects a slice of rows, and returns a `DataFrame`
                                 |
`df.loc[:, "Capital"]`           | Selects one column, and returns a `Series`
`df.loc[:, ["Capital", "Area"]]` | Selects several columns, and returns a `DataFrame`
`df.loc[:, ["Capital"]]`         | Selects one column, and returns a `DataFrame`
`df.loc[:, ["Capital":"Area"]]`  | Selects a slice of columns, and returns a `DataFrame`








 


