# Numeric Indexing of DataFrames

This lesson covers:

* Accessing specific elements in DataFrames using numeric indices

Accessing elements in a DataFrame is a common task. To begin this lesson,
clear the workspace set up some vectors and a $5\times5$ array. These vectors
and matrix will make it easy to determine which elements are selected by a
command.

Begin by creating:

* A 5-by-5 DataFrame `x_df` containing `np.arange(25).reshape((5,5))`.
* A 5-element Series `y_s` containing `np.arange(5)`.
* A 5-by-5 DataFrame `x_named` that is `x_df` with columns "c0", "c1", ...,
  "c4" and rows "r0", "r1", ..., "r4".
* A 5-element Series `y_named` with index "r0", "r1", ..., "r4". 

In [1]:
import numpy as np
import pandas as pd

x = np.arange(25).reshape((5, 5))
y = np.arange(5)


x_df = pd.DataFrame(x)
x_named = pd.DataFrame(
    x, index=["r0", "r1", "r2", "r3", "r4"], columns=["c0", "c1", "c2", "c3", "c4"]
)
y_s = pd.Series(y)
y_named = pd.Series(y, index=["r0", "r1", "r2", "r3", "r4"])

print(f"x_df = \n{x_df}")
print(f"y_s = \n{y_s}")

print(f"x_named = \n{x_named}")
print(f"y_named = \n{y_named}")

x_df = 
    0   1   2   3   4
0   0   1   2   3   4
1   5   6   7   8   9
2  10  11  12  13  14
3  15  16  17  18  19
4  20  21  22  23  24
y_s = 
0    0
1    1
2    2
3    3
4    4
dtype: int64
x_named = 
    c0  c1  c2  c3  c4
r0   0   1   2   3   4
r1   5   6   7   8   9
r2  10  11  12  13  14
r3  15  16  17  18  19
r4  20  21  22  23  24
y_named = 
r0    0
r1    1
r2    2
r3    3
r4    4
dtype: int64


## Problem: Picking an Element out of a DataFrame

Using double index notation, select the (0,2) and the (2,0) element of
`x_named`.

In [2]:
x_named.iloc[0, 2]

np.int64(2)

In [3]:
x_named.iloc[2, 0]

np.int64(10)

## Problem: Select Elements from Series

Select the 2nd element of `y_named`.

In [4]:
y_named.iloc[1:2]

r1    1
dtype: int64

## Problem: Selecting Rows as Series

Select the 2nd row of `x_named` using the colon (:) operator.


In [5]:
x_named.iloc[1, :]

c0    5
c1    6
c2    7
c3    8
c4    9
Name: r1, dtype: int64

## Problem: Selecting Rows as DataFrames

1. Select the 2nd row of `x_named` using a slice so that the selection
   remains a DataFrame.
2. Repeat using a list of indices to retain the DataFrame. 


In [6]:
x_named.iloc[1:2, :]

Unnamed: 0,c0,c1,c2,c3,c4
r1,5,6,7,8,9


In [7]:
x_named.iloc[[1], :]

Unnamed: 0,c0,c1,c2,c3,c4
r1,5,6,7,8,9


## Problem: Selecting Entire Columns as Series
Select the 2nd column of `x_named` using the colon (:) operator. 

In [8]:
x_named.iloc[:, 1]

r0     1
r1     6
r2    11
r3    16
r4    21
Name: c1, dtype: int64

## Problem: Selecting Single Columns as DataFrames
Select the 2nd column of `x_named`  so that the selection remains a DataFrame. 


In [9]:
x_named.iloc[:, 1:2]

Unnamed: 0,c1
r0,1
r1,6
r2,11
r3,16
r4,21


In [10]:
x_named.iloc[:, [1]]

Unnamed: 0,c1
r0,1
r1,6
r2,11
r3,16
r4,21


## Problem: Selecting Specific Columns
Select the 2nd and 3rd columns of `x_named` using a slice.

In [11]:
x_named.iloc[:, 1:3]

Unnamed: 0,c1,c2
r0,1,2
r1,6,7
r2,11,12
r3,16,17
r4,21,22


## Problem: Select Specific Rows

Select the 2nd and 4th rows of `x_named` using a slice.  Repeat the 
selection using a list of integers.

In [12]:
x_named.iloc[1:4:2, :]

Unnamed: 0,c0,c1,c2,c3,c4
r1,5,6,7,8,9
r3,15,16,17,18,19


In [13]:
x_named.iloc[[1, 3], :]

Unnamed: 0,c0,c1,c2,c3,c4
r1,5,6,7,8,9
r3,15,16,17,18,19


## Problem: Select arbitrary rows and columns

Combine the previous selections to the 2nd and 3rd columns and the 2nd and 4th rows
of `x_named`. 

**Note**: This is the only important difference with NumPy.  Arbitrary
row/column selection using `DataFrame.iloc` is simpler but less flexible.

In [14]:
x_named.iloc[1:4:2, 1:3]

Unnamed: 0,c1,c2
r1,6,7
r3,16,17


In [15]:
x_named.iloc[[1, 3], [1, 2]]

Unnamed: 0,c1,c2
r1,6,7
r3,16,17


In [16]:
x_named.iloc[[1, 3], 1:3]

Unnamed: 0,c1,c2
r1,6,7
r3,16,17


## Problem: Mixed selection

Select the columns c1 and c2 and the 1st, 3rd and 5th row.

In [17]:
x_named[["c1", "c2"]].iloc[[0, 2, 4]]

Unnamed: 0,c1,c2
r0,1,2
r2,11,12
r4,21,22


## Problem: Mixed selection 2

Select the rows r1 and r2 and the 1st, 3rd and final column.

In [18]:
x_named.loc[["r1", "r2"]].iloc[:, [0, 2, 4]]

Unnamed: 0,c0,c2,c4
r1,5,7,9
r2,10,12,14


## Exercises

### Exercise: Select fixed length block

Compute the mean return of the momentum data in the first 66
observations and the last 66 observations.

In [19]:
# Setup: Load the momentum data

import pandas as pd

momentum = pd.read_csv("data/momentum.csv", index_col="date", parse_dates=True)
momentum.head()

Unnamed: 0_level_0,mom_01,mom_02,mom_03,mom_04,mom_05,mom_06,mom_07,mom_08,mom_09,mom_10
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2016-01-04,0.67,-0.03,-0.93,-1.11,-1.47,-1.66,-1.4,-2.08,-1.71,-2.67
2016-01-05,-0.36,0.2,-0.37,0.28,0.16,0.18,-0.22,0.25,0.29,0.13
2016-01-06,-4.97,-2.33,-2.6,-1.16,-1.7,-1.45,-1.15,-1.46,-1.14,-0.45
2016-01-07,-4.91,-1.91,-3.03,-1.87,-2.31,-2.3,-2.7,-2.31,-2.36,-2.66
2016-01-08,-0.4,-1.26,-0.98,-1.26,-1.13,-1.02,-0.96,-1.42,-0.94,-1.32


In [20]:
momentum.iloc[:66].mean()

mom_01    0.124545
mom_02    0.100455
mom_03    0.062879
mom_04    0.035758
mom_05    0.067424
mom_06    0.055455
mom_07   -0.030455
mom_08   -0.051061
mom_09   -0.020606
mom_10   -0.045909
dtype: float64

In [21]:
nobs = momentum.shape[0]
momentum.iloc[nobs - 66 : nobs].mean()

mom_01    0.089545
mom_02    0.045606
mom_03    0.110303
mom_04    0.087576
mom_05    0.095303
mom_06    0.163485
mom_07    0.121818
mom_08    0.168485
mom_09    0.131818
mom_10    0.096364
dtype: float64

In [22]:
# Also correct, but advances
momentum.iloc[-66:].mean()

mom_01    0.089545
mom_02    0.045606
mom_03    0.110303
mom_04    0.087576
mom_05    0.095303
mom_06    0.163485
mom_07    0.121818
mom_08    0.168485
mom_09    0.131818
mom_10    0.096364
dtype: float64

### Exercise: Compute values using fraction of sample

Compute the correlation of momentum portfolio 1, 5, and 10 in the first half of
the sample and in the second half. 

In [23]:
sub = momentum[["mom_01", "mom_05", "mom_10"]]
nobs = sub.shape[0]

# Must use // division to ensure nobs/2 is an int

first = sub.iloc[0 : nobs // 2]
# Also correct since first is 0
first = sub.iloc[: nobs // 2]

second = sub.iloc[nobs // 2 : nobs]
# Also correct since final is nobs
second = sub.iloc[nobs // 2 :]


first.corr()

Unnamed: 0,mom_01,mom_05,mom_10
mom_01,1.0,0.757825,0.445025
mom_05,0.757825,1.0,0.686656
mom_10,0.445025,0.686656,1.0


In [24]:
second.corr()

Unnamed: 0,mom_01,mom_05,mom_10
mom_01,1.0,0.563911,0.34112
mom_05,0.563911,1.0,0.561871
mom_10,0.34112,0.561871,1.0
