## Pandas Indexing

Example of indexing a small DataFrame by *column* or *row*.

In [3]:
import numpy as np
import pandas as pd
array = np.random.randn(4, 2) # array shape [4,2]
df = pd.DataFrame(array)
df

Unnamed: 0,0,1
0,-1.717125,-0.064019
1,-0.191125,0.197177
2,0.84516,-1.28696
3,1.031942,1.598596


Specify **column** names when creating the DataFrame:

In [4]:
columns = ['colA', 'colB']
df = pd.DataFrame(array, columns=columns)
df

Unnamed: 0,colA,colB
0,-1.717125,-0.064019
1,-0.191125,0.197177
2,0.84516,-1.28696
3,1.031942,1.598596


Specify **index** (rows) labels:

In [5]:
index = ['a', 'b', 'c', 'd']
df = pd.DataFrame(array, columns=columns, index=index)
df

Unnamed: 0,colA,colB
a,-1.717125,-0.064019
b,-0.191125,0.197177
c,0.84516,-1.28696
d,1.031942,1.598596


## Indexing by Columns
Normal indexing selects columns.

In [6]:
df['colA']

a   -1.717125
b   -0.191125
c    0.845160
d    1.031942
Name: colA, dtype: float64

## Indexing by Rows

1. **loc** attribute - access rows via *labels*
  * can take a list of labels
2. **iloc** attribute - access rows via *integer indices*
3. **slicing**
  * slice by *labels*
  * slice by *integer index*
  * **Note**: slicing by *labels* includes the last index of the slice, whereas slicing by *integer* excludes the row.

In [7]:
df.loc['a']

colA   -1.717125
colB   -0.064019
Name: a, dtype: float64

In [9]:
indices=['a','c']
df.loc[indices]

Unnamed: 0,colA,colB
a,-1.717125,-0.064019
c,0.84516,-1.28696


In [8]:
df.iloc[0]

colA   -1.717125
colB   -0.064019
Name: a, dtype: float64

Row indexing by slicing differs in slicing by *label* and *integers*

**Label slicing** is inclusive of the last index.
**Integer slicing** is exclusive of the last index.

In [10]:
df['a':'c']

Unnamed: 0,colA,colB
a,-1.717125,-0.064019
b,-0.191125,0.197177
c,0.84516,-1.28696


In [11]:
df[0:2]

Unnamed: 0,colA,colB
a,-1.717125,-0.064019
b,-0.191125,0.197177
