In [2]:
import sys
print(sys.version)
import numpy as np
print(np.__version__)
import pandas as pd
print(pd.__version__)

3.6.1 |Anaconda 4.4.0 (x86_64)| (default, May 11 2017, 13:04:09) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
1.12.1
0.20.1


# loc and iloc

We've seen some examples of indexing, but we haven't really addressed the central problem: how do you select rows and columns out of a DataFrame.

There's a chance you've seen the `ix` method, which was one of the first Pandas indexers.  Unfortunately, ix tended to confuse people, because it was hard to predict whether it would index by position or by label.  This is especially true if you have an index of labels that are integers.  `ix` is now deprecated and we urge you to stay away from it.

Instead of `ix`, it's important that you remember two indexing methods, `loc` and `iloc`.

- `loc` always uses labels to index.
- `iloc` always uses integer position.

Here's our DataFrame from before:

In [3]:
np.random.seed(200)
Mice = pd.DataFrame( np.random.geometric(.2, size = (5,5)) , 
             columns = ['test_{}'.format(x) for x in range(5)],
             index = ['mouse_{}'.format(x) for x in range(5)])
Mice

Unnamed: 0,test_0,test_1,test_2,test_3,test_4
mouse_0,14,2,5,3,7
mouse_1,1,2,11,3,18
mouse_2,10,20,12,2,9
mouse_3,1,7,2,1,13
mouse_4,8,4,10,4,3


Say that you want to pull out the value in the middle (equal to 12).  We can do this in two ways.

In [4]:
Mice.loc['mouse_2', 'test_2']

12

In [5]:
Mice.iloc[2,2]

12

Notice that either way, the format is *rows*, then a comma, then *columns*.

We can pull out multiple values by providing lists or slices or boolean Series, to either the rows or the columns.

In [6]:
Mice.loc[['mouse_2','mouse_4'], 'test_2']

mouse_2    12
mouse_4    10
Name: test_2, dtype: int64

In [7]:
Mice.loc['mouse_2', 'test_1':'test_3']

test_1    20
test_2    12
test_3     2
Name: mouse_2, dtype: int64

In [8]:
Mice.iloc[1:3,1:3]

Unnamed: 0,test_1,test_2
mouse_1,2,11
mouse_2,20,12


In [9]:
Mice.loc[Mice.test_1>5,'test_3':'test_4']

Unnamed: 0,test_3,test_4
mouse_2,2,9
mouse_3,1,13


In [11]:
Mice.iloc[:4,:]

Unnamed: 0,test_0,test_1,test_2,test_3,test_4
mouse_0,14,2,5,3,7
mouse_1,1,2,11,3,18
mouse_2,10,20,12,2,9
mouse_3,1,7,2,1,13


In [12]:
Mice.iloc[:4,:][Mice.test_0 > 2]

  """Entry point for launching an IPython kernel.


Unnamed: 0,test_0,test_1,test_2,test_3,test_4
mouse_0,14,2,5,3,7
mouse_2,10,20,12,2,9


## A Note about Integer Indexes

Here's a tricky situation that you might have wondered about.

In [18]:
s = pd.Series( range(5) , index = range(5,0,-1) )
s

5    0
4    1
3    2
2    3
1    4
dtype: int64

We have a Series with an integer index, but it's different than the integer positions.  What would happen if we typed `s[1]`?  Should pandas interpret that as a label or a postion?

In [21]:
s[1]

4

You can see that it treats the number as a label.  Unfortunately, to make things even more confusing, a slice is interpreted as positions

In [32]:
s[1:3]

4    1
3    2
dtype: int64

You can try to memorize all the rules, but a good practice is to always specify exactly what you mean by writing `loc` or `iloc`.

In [29]:
s.loc[3:1]

3    2
2    3
1    4
dtype: int64

The same advice holds when you have an integer index on a DataFrame.  Be explicit.

Finally don't worry if you forget some of these rules.  Try something, take a look at your output, and see if it's what you expect.