### Indexing and Selection

| Operation                     | Syntax         | Result    |
|-------------------------------|----------------|-----------|
| Select column                 | df[col]        | Series    |
| Select row by label           | df.loc[label]  | Series    |
| Select row by integer         | df.iloc[loc]   | Series    |
| Select rows                   | df[start:stop] | DataFrame |
| Select rows with boolean mask | df[mask]       | DataFrame |

documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html

In [1]:
import pandas as pd
import numpy as np

In [13]:
produce_dict = {'veggies': ['potatoes', 'onions', 'peppers', 'carrots'],'fruits': ['apples', 'bananas', 'pineapple', 'berries']}
produce_df = pd.DataFrame(produce_dict)
produce_df

Unnamed: 0,veggies,fruits
0,potatoes,apples
1,onions,bananas
2,peppers,pineapple
3,carrots,berries


##### selection using dictionary-like string

In [4]:
produce_df['fruits']

0       apples
1      bananas
2    pineapple
3      berries
Name: fruits, dtype: object

##### list of strings as index (note: double square brackets)

In [6]:
produce_df[["veggies", "fruits"]]

Unnamed: 0,veggies,fruits
0,potatoes,apples
1,onions,bananas
2,peppers,pineapple
3,carrots,berries


##### select row using integer index

In [7]:
produce_df.iloc[2]

veggies      peppers
fruits     pineapple
Name: 2, dtype: object

##### select rows using integer slice

In [15]:
produce_df.iloc[0:2]

Unnamed: 0,veggies,fruits
0,potatoes,apples
1,onions,bananas


In [17]:
produce_df.iloc[:-2]

Unnamed: 0,veggies,fruits
0,potatoes,apples
1,onions,bananas


##### + is over-loaded as concatenation operator

In [18]:
produce_df + produce_df.iloc[0]

Unnamed: 0,veggies,fruits
0,potatoespotatoes,applesapples
1,onionspotatoes,bananasapples
2,pepperspotatoes,pineappleapples
3,carrotspotatoes,berriesapples


### Data alignment and arithmetic
Data alignment between DataFrame objects automatically align on both the columns and the index (row labels).

Note locations for 'NaN'

In [19]:
df = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])
df

Unnamed: 0,A,B,C,D
0,0.4457,-0.624519,-1.066845,1.368374
1,-0.745148,0.596586,-0.985204,-0.582479
2,-0.778104,-1.235924,-0.505692,0.829995
3,-0.861253,-0.266925,-0.853817,0.383789
4,1.482293,0.315912,-0.777849,0.785112
5,-0.722084,1.662929,-1.129091,-0.505276
6,-0.51177,0.757758,-0.823988,2.398775
7,-0.643082,-0.161044,0.039024,-1.022994
8,-0.624962,-0.7932,-0.261243,0.219166
9,-0.446993,-1.583372,-0.922592,-0.63362


In [20]:
df2

Unnamed: 0,A,B,C
0,-0.187445,1.101176,-0.188025
1,0.779205,1.12287,0.164321
2,-1.350238,-0.416154,1.729328
3,0.561723,-1.433174,-1.017447
4,-0.971107,0.722949,0.232068
5,0.127263,-0.541952,-1.875286
6,2.004326,-1.049462,-0.203961


In [21]:
sum_df = df + df2
sum_df

Unnamed: 0,A,B,C,D
0,0.258255,0.476656,-1.25487,
1,0.034057,1.719455,-0.820883,
2,-2.128342,-1.652078,1.223637,
3,-0.29953,-1.700099,-1.871264,
4,0.511186,1.038861,-0.545781,
5,-0.594821,1.120977,-3.004376,
6,1.492556,-0.291704,-1.027949,
7,,,,
8,,,,
9,,,,


### Boolean indexing

In [24]:
sum_df > -20

Unnamed: 0,A,B,C,D
0,True,True,True,False
1,True,True,True,False
2,True,True,True,False
3,True,True,True,False
4,True,True,True,False
5,True,True,True,False
6,True,True,True,False
7,False,False,False,False
8,False,False,False,False
9,False,False,False,False


In [25]:
sum_df[sum_df > 0]

Unnamed: 0,A,B,C,D
0,0.258255,0.476656,,
1,0.034057,1.719455,,
2,,,1.223637,
3,,,,
4,0.511186,1.038861,,
5,,1.120977,,
6,1.492556,,,
7,,,,
8,,,,
9,,,,


 first select rows in column B whose values are less than zero
 
 then, include information for all columns in that row in the resulting data set

In [26]:
mask = sum_df["B"] < 0
mask

0    False
1    False
2     True
3     True
4    False
5    False
6     True
7    False
8    False
9    False
Name: B, dtype: bool

In [27]:
sum_df[mask]

Unnamed: 0,A,B,C,D
2,-2.128342,-1.652078,1.223637,
3,-0.29953,-1.700099,-1.871264,
6,1.492556,-0.291704,-1.027949,


##### isin function

In [28]:
produce_df.isin(["apples", "onions"])

Unnamed: 0,veggies,fruits
0,False,True
1,True,False
2,False,False
3,False,False


##### where function

In [29]:
produce_df.where(produce_df > "k")

Unnamed: 0,veggies,fruits
0,potatoes,
1,onions,
2,peppers,pineapple
3,,
