### Indexing and Selection

| Operation                     | Syntax         | Result    |
|-------------------------------|----------------|-----------|
| Select column                 | df[col]        | Series    |
| Select row by label           | df.loc[label]  | Series    |
| Select row by integer         | df.iloc[loc]   | Series    |
| Select rows                   | df[start:stop] | DataFrame |
| Select rows with boolean mask | df[mask]       | DataFrame |

documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html

In [1]:
import pandas as pd
import numpy as np

In [2]:
produce_dict = {'veggies': ['potatoes', 'onions', 'peppers', 'carrots'],'fruits': ['apples', 'bananas', 'pineapple', 'berries']}
produce_df = pd.DataFrame(produce_dict)
produce_df

Unnamed: 0,fruits,veggies
0,apples,potatoes
1,bananas,onions
2,pineapple,peppers
3,berries,carrots


##### selection using dictionary-like string

In [3]:
produce_df['fruits']

0       apples
1      bananas
2    pineapple
3      berries
Name: fruits, dtype: object

##### list of strings as index (note: double square brackets)

In [4]:
produce_df[ ['fruits', 'veggies'] ]

Unnamed: 0,fruits,veggies
0,apples,potatoes
1,bananas,onions
2,pineapple,peppers
3,berries,carrots


##### select row using integer index

In [5]:
produce_df.iloc[2]

fruits     pineapple
veggies      peppers
Name: 2, dtype: object

##### select rows using integer slice

In [6]:
produce_df.iloc[0:2]

Unnamed: 0,fruits,veggies
0,apples,potatoes
1,bananas,onions


In [7]:
produce_df.iloc[:-2]

Unnamed: 0,fruits,veggies
0,apples,potatoes
1,bananas,onions


##### + is over-loaded as concatenation operator

In [8]:
produce_df + produce_df.iloc[0]

Unnamed: 0,fruits,veggies
0,applesapples,potatoespotatoes
1,bananasapples,onionspotatoes
2,pineappleapples,pepperspotatoes
3,berriesapples,carrotspotatoes


### Data alignment and arithmetic
Data alignment between DataFrame objects automatically align on both the columns and the index (row labels).

Note locations for 'NaN'

In [9]:
df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])
sum_df = df + df2
sum_df

Unnamed: 0,A,B,C,D
0,2.755144,-3.751917,-0.222412,
1,1.431948,1.147275,2.088115,
2,0.550599,0.070135,0.503597,
3,-0.216502,-1.242008,-2.0174,
4,-0.042534,-0.471636,0.925521,
5,-0.531622,-0.68246,0.108012,
6,0.515271,-0.10902,0.114775,
7,,,,
8,,,,
9,,,,


### Boolean indexing

In [10]:
sum_df>0

Unnamed: 0,A,B,C,D
0,True,False,False,False
1,True,True,True,False
2,True,True,True,False
3,False,False,False,False
4,False,False,True,False
5,False,False,True,False
6,True,False,True,False
7,False,False,False,False
8,False,False,False,False
9,False,False,False,False


In [11]:
sum_df[sum_df>0]

Unnamed: 0,A,B,C,D
0,2.755144,,,
1,1.431948,1.147275,2.088115,
2,0.550599,0.070135,0.503597,
3,,,,
4,,,0.925521,
5,,,0.108012,
6,0.515271,,0.114775,
7,,,,
8,,,,
9,,,,


 first select rows in column B whose values are less than zero
 
 then, include information for all columns in that row in the resulting data set

In [12]:
mask = sum_df['B'] < 0
mask

0     True
1    False
2    False
3     True
4     True
5     True
6     True
7    False
8    False
9    False
Name: B, dtype: bool

In [14]:
sum_df[mask]

Unnamed: 0,A,B,C,D
0,2.755144,-3.751917,-0.222412,
3,-0.216502,-1.242008,-2.0174,
4,-0.042534,-0.471636,0.925521,
5,-0.531622,-0.68246,0.108012,
6,0.515271,-0.10902,0.114775,


##### isin function

In [15]:
produce_df.isin(['apples', 'onions'])

Unnamed: 0,fruits,veggies
0,True,False
1,False,True
2,False,False
3,False,False


##### where function

In [16]:
produce_df.where(produce_df > 'k')

Unnamed: 0,fruits,veggies
0,,potatoes
1,,onions
2,pineapple,peppers
3,,
