# Indexing, Selection, And Filtering

Series indexing (obj[...]) works analogously to NumPy array indexing, except you can use the Series’s index values instead of only integers. Here are some examples this:

In [1]:
import pandas as pd
from pandas import Series, DataFrame
import numpy as np

In [22]:
obj = Series(np.arange(4), index=['a', 'b', 'c', 'd'])

obj

a    0
b    1
c    2
d    3
dtype: int32

In [4]:
obj['b'], obj[1]

(1, 1)

In [6]:
obj[2:4], obj[['b', 'a', 'd']]

(c    2
 d    3
 dtype: int32,
 b    1
 a    0
 d    3
 dtype: int32)

In [7]:
obj[[1, 3]]

b    1
d    3
dtype: int32

In [8]:
obj[obj<2]

a    0
b    1
dtype: int32

Slicing with labels behaves differently than normal Python slicing in that the endpoint is inclusive:


In [10]:
obj['b' : 'd']

b    1
c    2
d    3
dtype: int32

Setting using these methods works just as you would expect:

In [14]:
obj['b': 'd' ] = [4, -2, 1]

obj

a    0
b    4
c   -2
d    1
dtype: int32

In [16]:
# or
obj['b': 'd'] = 2

obj

a    0
b    2
c    2
d    2
dtype: int32

As you’ve seen above, indexing into a DataFrame is for retrieving one or more columns either with a single value or sequence:


In [18]:
data = DataFrame(np.arange(16).reshape((4,4)), index = ['Loralai', 'Quetta', 'Duki', 'Pshin'], columns= ['a', 'b', 'c', 'd'])

data

Unnamed: 0,a,b,c,d
Loralai,0,1,2,3
Quetta,4,5,6,7
Duki,8,9,10,11
Pshin,12,13,14,15


In [20]:
data['b'], data[['a', 'c']]

(Loralai     1
 Quetta      5
 Duki        9
 Pshin      13
 Name: b, dtype: int32,
           a   c
 Loralai   0   2
 Quetta    4   6
 Duki      8  10
 Pshin    12  14)

Indexing like this has a few special cases. First selecting rows by slicing or a boolean array:

In [21]:
data[:2]

Unnamed: 0,a,b,c,d
Loralai,0,1,2,3
Quetta,4,5,6,7


In [32]:
data[data['c'] > 8]

Unnamed: 0,a,b,c,d
Duki,8,9,10,11
Pshin,12,13,14,15


This might seem inconsistent to some readers, but this syntax arose out of practicality and nothing more. Another use case is in indexing with a boolean DataFrame, such as one produced by a scalar comparison:

In [34]:
data < 9

Unnamed: 0,a,b,c,d
Loralai,True,True,True,True
Quetta,True,True,True,True
Duki,True,False,False,False
Pshin,False,False,False,False


In [36]:
data[data < 5] = 0

data

Unnamed: 0,a,b,c,d
Loralai,0,0,0,0
Quetta,0,5,6,7
Duki,8,9,10,11
Pshin,12,13,14,15


> This is intended to make DataFrame syntactically more like an ndarray in this case.

For DataFrame label-indexing on the rows, I introduce the special indexing field *loc or iloc*. It enables you to select a subset of the rows and columns from a DataFrame with NumPy-like notation plus axis labels. As I mentioned earlier, this is also a less verbose way to do reindexing:


In [39]:
data.loc['Duki', ['a', 'b', 'c']]

a     8
b     9
c    10
Name: Duki, dtype: int32

In [44]:
data.loc[['Quetta', 'Pshin'], ['d', 'a', 'b']]

Unnamed: 0,d,a,b
Quetta,7,0,5
Pshin,15,12,13


In [49]:
data.iloc[[2,3]]

Unnamed: 0,a,b,c,d
Duki,8,9,10,11
Pshin,12,13,14,15


In [50]:
data.iloc[3]

a    12
b    13
c    14
d    15
Name: Pshin, dtype: int32

In [54]:
data.loc[:'Duki', 'c']

Loralai     0
Quetta      6
Duki       10
Name: c, dtype: int32

In [67]:
data.loc[data.c > 5, :'d']

Unnamed: 0,a,b,c,d
Quetta,0,5,6,7
Duki,8,9,10,11
Pshin,12,13,14,15


> ### Note
> Use loc for non-integer values and integer values insteed of ix

![Indexing options with DataFrame](../../Pictures/Indexing%20options%20with%20DataFrame.png)