### Indexing and Selection

| Operation                     | Syntax         | Result    |
|-------------------------------|----------------|-----------|
| Select column                 | df[col]        | Series    |
| Select row by label           | df.loc[label]  | Series    |
| Select row by integer         | df.iloc[loc]   | Series    |
| Select rows                   | df[start:stop] | DataFrame |
| Select rows with boolean mask | df[mask]       | DataFrame |

documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html

In [1]:
import pandas as pd
import numpy as np

In [33]:
produce_dict = {'veggies': ['potatoes', 'onions', 'peppers', 'carrots'],'fruits': ['apples', 'bananas', 'pineapple', 'berries']}
produce_df = pd.DataFrame(produce_dict)
produce_df

Unnamed: 0,fruits,veggies
0,apples,potatoes
1,bananas,onions
2,pineapple,peppers
3,berries,carrots


##### selection using dictionary-like string

In [7]:
produce_df['fruits']

0       apples
1      bananas
2    pineapple
3      berries
Name: fruits, dtype: object

##### list of strings as index (note: double square brackets)

In [34]:
produce_df[ ['fruits', 'veggies'] ]

Unnamed: 0,fruits,veggies
0,apples,potatoes
1,bananas,onions
2,pineapple,peppers
3,berries,carrots


##### select row using integer index

In [12]:
produce_df.iloc[2]

fruits     pineapple
veggies      peppers
Name: 2, dtype: object

##### select rows using integer slice

In [13]:
produce_df.iloc[0:2]

Unnamed: 0,fruits,veggies
0,apples,potatoes
1,bananas,onions


In [16]:
produce_df.iloc[:-2]

Unnamed: 0,fruits,veggies
0,apples,potatoes
1,bananas,onions


##### + is over-loaded as concatenation operator

In [20]:
produce_df + produce_df.iloc[0]

Unnamed: 0,fruits,veggies
0,applesapples,potatoespotatoes
1,bananasapples,onionspotatoes
2,pineappleapples,pepperspotatoes
3,berriesapples,carrotspotatoes


### Data alignment and arithmetic
Data alignment between DataFrame objects automatically align on both the columns and the index (row labels).

Note locations for 'NaN'

In [23]:
df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])
sum_df = df + df2
sum_df

Unnamed: 0,A,B,C,D
0,-0.013287,0.131575,-2.051984,
1,1.292049,-0.856594,-0.192517,
2,-0.752475,-0.799066,-1.253841,
3,1.225071,0.596251,-1.514304,
4,-2.444553,0.071712,3.708595,
5,0.593777,-0.815971,0.896709,
6,-0.677878,-0.335991,-0.93291,
7,,,,
8,,,,
9,,,,


### Boolean indexing

In [37]:
sum_df>0

Unnamed: 0,A,B,C,D
0,False,True,False,False
1,True,False,False,False
2,False,False,False,False
3,True,True,False,False
4,False,True,True,False
5,True,False,True,False
6,False,False,False,False
7,False,False,False,False
8,False,False,False,False
9,False,False,False,False


In [39]:
sum_df[sum_df>0]

Unnamed: 0,A,B,C,D
0,,0.131575,,
1,1.292049,,,
2,,,,
3,1.225071,0.596251,,
4,,0.071712,3.708595,
5,0.593777,,0.896709,
6,,,,
7,,,,
8,,,,
9,,,,


 first select rows in column B whose values are less than zero
 
 then, include information for all columns in that row in the resulting data set

In [51]:
mask = sum_df['B'] < 0
mask

0    False
1     True
2     True
3    False
4    False
5     True
6     True
7    False
8    False
9    False
Name: B, dtype: bool

In [50]:
sum_df[mask]

Unnamed: 0,A,B,C,D
1,1.292049,-0.856594,-0.192517,
2,-0.752475,-0.799066,-1.253841,
5,0.593777,-0.815971,0.896709,
6,-0.677878,-0.335991,-0.93291,


##### isin function

In [28]:
produce_df.isin(['apples', 'onions'])

Unnamed: 0,fruits,veggies
0,True,False
1,False,True
2,False,False
3,False,False


##### where function

In [31]:
produce_df.where(produce_df > 'k')

Unnamed: 0,fruits,veggies
0,,potatoes
1,,onions
2,pineapple,peppers
3,,
