# Accessing Elements

This lesson covers:

* Accessing specific elements in NumPy arrays
* Assessing specific elements in Pandas Series and DataFrames 

Accessing elements in an array or a DataFrame is a common task. To begin this lesson, clear the
workspace set up some vectors and a $5\times5$ array. These vectors and matrix will make it easy
to determine which elements are selected by a command.

In [1]:
import numpy as np
import pandas as pd

x = np.arange(25).reshape((5,5))  
y = np.arange(5)
# The -1 tells numpy to automatically compute the size of
# the dimension using the remaining elements, in this case, 5
z = np.arange(5).reshape((-1, 1))

x_df = pd.DataFrame(x)
x_named = pd.DataFrame(x, index=['r0','r1','r2','r3','r4'],
                       columns=['c0','c1','c2','c3','c4'])
y_s = pd.Series(y)
y_named = pd.Series(y, index=['r0','r1','r2','r3','r4'])

print(f'x = {x}')
print(f'y = {y}')
print(f'z = {z}')

print()
print(f'x_df = \n{x_df}')
print(f'y_s = \n{y_s}')

print()
print(f'x_named = \n{x_named}')
print(f'y_named = \n{y_named}')

x = [[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]
y = [0 1 2 3 4]
z = [[0]
 [1]
 [2]
 [3]
 [4]]

x_df = 
    0   1   2   3   4
0   0   1   2   3   4
1   5   6   7   8   9
2  10  11  12  13  14
3  15  16  17  18  19
4  20  21  22  23  24
y_s = 
0    0
1    1
2    2
3    3
4    4
dtype: int32

x_named = 
    c0  c1  c2  c3  c4
r0   0   1   2   3   4
r1   5   6   7   8   9
r2  10  11  12  13  14
r3  15  16  17  18  19
r4  20  21  22  23  24
y_named = 
r0    0
r1    1
r2    2
r3    3
r4    4
dtype: int32



## Zero-based indexing
Python indexing is 0 based so that the first element has position `0`, the second has position `1`
and so on until the last element has position `n-1` in an array that contains `n` elements in
total.

## Problem: Picking an Element out of a Matrix
1. Select the third element of all three, x, y, and z. 
2. Select the 11$^{\text{th}}$ element of x.
3. Using double index notation, select the (0,2) and the (2,0) element of x.

**Issues to ponder**

* Which index is rows and which index is columns?
* Does NumPy count across first then down or down first then across? 

In [2]:
print(x.flat[2])
print(y.flat[2])
print(z.flat[2])

2
2
2


In [3]:
# Incorrect
print(x[2])
print(y[2])
print(z[2])


[10 11 12 13 14]
2
[2]


In [4]:
print(x.flat[10])  # 11th element is position 10

10


In [5]:
print(x[0, 2])
print(x[2, 0])

2
10


## Problem: Selecting Entire Rows
1. Select the 2nd row of x using the colon (:) operator.
2. Select the 2nd element of z and y using the same syntax.

**Issues to ponder**

* What happens to the output in each case? 


In [6]:
print(x[1, :])
print(x[1])
print(y[1:2])
print(z[1:2, :])
print(z[1:2])

[5 6 7 8 9]


[5 6 7 8 9]
[1]
[[1]]
[[1]]


## Problem: Selecting Entire Columns
Select the 2nd column of x using the colon (:) operator. 

In [7]:
print(x[:, 1])

[ 1  6 11 16 21]


In [8]:
print(x[:, [1]])
print(x[:, 1:2])


[[ 1]
 [ 6]
 [11]
 [16]
 [21]]
[[ 1]
 [ 6]
 [11]
 [16]
 [21]]


## Problem: Selecting Specific Rows or Columns
1. Select the 2nd and 3rd columns of x using the colon (:) operator.
2. Select the 2nd and 4th rows of x. 
3. Combine these be combined to select columns 2 and 3 and rows 2 and 4. 

In [9]:
print(x[:, 1:3])

[[ 1  2]
 [ 6  7]
 [11 12]
 [16 17]
 [21 22]]




In [10]:
print(x[[1, 3], :])
print(x[1:4:2, :])

[[ 5  6  7  8  9]
 [15 16 17 18 19]]


[[ 5  6  7  8  9]
 [15 16 17 18 19]]


In [11]:
print(x[1:4:2, 1:3])

# Wrong
print(x[[1, 3],[1, 2]])

# Right
print(x[[1,3], 1:3])

[[ 6  7]
 [16 17]]
[ 6 17]
[[ 6  7]
 [16 17]]


## Problem: Use `ix_` to select arbitrary rows and columns
Use `ix_` to select the 2nd and 4th rows and 1st and 3rd columns of `x`.

In [12]:
# Must use ix_ when both selectors are "fancy" to get blocks
x[np.ix_([1, 3],[1, 2])]

array([[ 6,  7],
       [16, 17]])

In [13]:
# Also correct, but hard to get right
x[[[1,1],[3,3]],[[1,2],[1,2]]]

array([[ 6,  7],
       [16, 17]])

## Problem: Numeric indexing Series and DataFrame
Repeat the previous questions on `y_s` and `x_df` using `.iloc`.   

In [14]:
print(x_df.iloc[1, :])
print(x_df.iloc[1])
print(y_s.iloc[1:2])

print(x_df.iloc[:, 1:3])
print(x_df.iloc[[1, 3], :])

0    5
1    6
2    7
3    8
4    9
Name: 1, dtype: int32
0    5
1    6
2    7
3    8
4    9
Name: 1, dtype: int32
1    1
dtype: int32
    1   2
0   1   2
1   6   7
2  11  12
3  16  17
4  21  22
    0   1   2   3   4
1   5   6   7   8   9
3  15  16  17  18  19


## Problem: Selecting by Name in Series and DataFrames
Using `x_name` and `y_name`:

1. Select the (0,2) and the (2,0) element of `x_name`.
2. Select the 2nd row of `x_name` using `.loc`.
3. Select the 2nd columns of `x_name` using `.loc`.
4. Select the 2nd element of `y_name` using both `[]` and `loc`.
5. Select the 2nd and 4th rows and 1st and 3rd columns of `x_name`.

In [15]:
print(x_named.loc['r0', 'c2'])
print(x_named.loc['r2', 'c0'])

2


10


In [16]:
x_named.loc['r1']

c0    5
c1    6
c2    7
c3    8
c4    9
Name: r1, dtype: int32

In [17]:
x_named['c1']

r0     1
r1     6
r2    11
r3    16
r4    21
Name: c1, dtype: int32

In [18]:
x_named.loc[:, 'c1']

r0     1
r1     6
r2    11
r3    16
r4    21
Name: c1, dtype: int32

In [19]:
print(y_named['r1'])
y_named.loc['r1']

1


1

In [20]:
x_named.loc[['r1','r3'],['c0','c2']]

Unnamed: 0,c0,c2
r1,5,7
r3,15,17


In [21]:
# Different behavior from NumPy, often simpler
x_named.iloc[[1,3],[0,2]]

Unnamed: 0,c0,c2
r1,5,7
r3,15,17


## Problem: Selecting Data by Date
Load the data in momentum.csv.

In [22]:
# Setup: Load the momentum data

import pandas as pd

momentum = pd.read_csv('data/momentum.csv', index_col='date', parse_dates=True)
momentum.head()

Unnamed: 0_level_0,mom_01,mom_02,mom_03,mom_04,mom_05,mom_06,mom_07,mom_08,mom_09,mom_10
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2016-01-04,0.67,-0.03,-0.93,-1.11,-1.47,-1.66,-1.4,-2.08,-1.71,-2.67
2016-01-05,-0.36,0.2,-0.37,0.28,0.16,0.18,-0.22,0.25,0.29,0.13
2016-01-06,-4.97,-2.33,-2.6,-1.16,-1.7,-1.45,-1.15,-1.46,-1.14,-0.45
2016-01-07,-4.91,-1.91,-3.03,-1.87,-2.31,-2.3,-2.7,-2.31,-2.36,-2.66
2016-01-08,-0.4,-1.26,-0.98,-1.26,-1.13,-1.02,-0.96,-1.42,-0.94,-1.32


1. Select returns on February 16, 2016.
2. Select return in March 2016.
3. Select returns between May 1, 2016, and June 15, 2016

In [23]:
momentum.loc['2016-2-16']


mom_01    4.94
mom_02    2.46
mom_03    2.59
mom_04    2.17
mom_05    2.24
mom_06    1.83
mom_07    1.57
mom_08    1.56
mom_09    1.35
mom_10    1.72
Name: 2016-02-16 00:00:00, dtype: float64

In [24]:
march = momentum.loc['2016-3']
# Use head to nly show top 5 rows
march.head()

Unnamed: 0_level_0,mom_01,mom_02,mom_03,mom_04,mom_05,mom_06,mom_07,mom_08,mom_09,mom_10
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2016-03-01,1.47,2.37,2.38,2.98,2.61,2.68,2.31,2.12,1.42,2.54
2016-03-02,5.76,3.26,1.53,0.4,0.58,0.56,0.29,0.21,0.39,0.14
2016-03-03,4.09,2.41,1.36,0.99,0.71,0.5,0.38,0.41,0.33,-0.29
2016-03-04,2.69,0.82,1.04,0.54,0.73,0.4,0.09,0.47,0.13,0.02
2016-03-07,3.04,2.27,1.46,0.67,0.63,0.87,0.37,-0.04,-0.17,-1.24


In [25]:
block = momentum.loc['2016-5-1':'2016-6-15']
block.tail()

Unnamed: 0_level_0,mom_01,mom_02,mom_03,mom_04,mom_05,mom_06,mom_07,mom_08,mom_09,mom_10
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2016-06-09,-1.72,-1.38,-0.7,-0.74,-0.73,-0.55,-0.32,-0.16,0.37,0.33
2016-06-10,-3.72,-2.63,-1.91,-1.98,-1.59,-0.95,-0.8,-0.72,-0.39,-0.79
2016-06-13,-1.08,0.14,-1.24,-1.25,-0.97,-0.72,-0.9,-0.48,-0.91,-0.67
2016-06-14,-0.33,-0.7,-0.42,-1.34,-0.85,-0.15,-0.07,-0.02,0.13,0.28
2016-06-15,0.65,0.58,-0.14,0.27,-0.07,-0.06,-0.44,-0.25,-0.2,-0.26
