### Differences between interactive and production work
Note: while standard Python / Numpy expressions for selecting and setting are intuitive and come in handy for interactive work, for production code, we (the pandas development team) recommend the optimized pandas data access methods, .at, .iat, .loc, .iloc and .ix.

from:http://pandas.pydata.org/pandas-docs/stable/10min.html

In [1]:
import pandas as pd
import numpy as np

In [2]:
sample_numpy_data = np.array(np.arange(24)).reshape((6,4))
dates_index = pd.date_range('20160101', periods=6)
sample_df = pd.DataFrame(sample_numpy_data, index=dates_index, columns=list('ABCD'))
sample_df

Unnamed: 0,A,B,C,D
2016-01-01,0,1,2,3
2016-01-02,4,5,6,7
2016-01-03,8,9,10,11
2016-01-04,12,13,14,15
2016-01-05,16,17,18,19
2016-01-06,20,21,22,23


##### selection using column name

In [3]:
sample_df['C']

2016-01-01     2
2016-01-02     6
2016-01-03    10
2016-01-04    14
2016-01-05    18
2016-01-06    22
Freq: D, Name: C, dtype: int32

##### selection using slice
- remember: up to, but not including second index

In [4]:
sample_df[1:4]

Unnamed: 0,A,B,C,D
2016-01-02,4,5,6,7
2016-01-03,8,9,10,11
2016-01-04,12,13,14,15


##### selection using date time index
- note: last index is included

In [5]:
sample_df['2016-01-01':'2016-01-04']

Unnamed: 0,A,B,C,D
2016-01-01,0,1,2,3
2016-01-02,4,5,6,7
2016-01-03,8,9,10,11
2016-01-04,12,13,14,15


### Selection by label
documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html

label-location based indexer for selection by label

In [6]:
sample_df.loc[dates_index[1:3]]

Unnamed: 0,A,B,C,D
2016-01-02,4,5,6,7
2016-01-03,8,9,10,11


##### Selecting using multi-axis by label

In [7]:
sample_df.loc[:,['A','B']]

Unnamed: 0,A,B
2016-01-01,0,1
2016-01-02,4,5
2016-01-03,8,9
2016-01-04,12,13
2016-01-05,16,17
2016-01-06,20,21


##### Label slicing, both endpoints are included

In [8]:
sample_df.loc['2016-01-01':'2016-01-03',['A','B']]

Unnamed: 0,A,B
2016-01-01,0,1
2016-01-02,4,5
2016-01-03,8,9


##### Reduce number of dimensions for returned object
- notice order of 'D' and 'B'

In [9]:
sample_df.loc['2016-01-03',['D','B']]

D    11
B     9
Name: 2016-01-03 00:00:00, dtype: int32

##### using result

In [10]:
sample_df.loc['2016-01-03',['D','B']] [0] * 4

44

##### select a scalar

In [11]:
sample_df.loc[dates_index[2], 'C']

10

### Selection by Position
documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.iloc.html

integer-location based indexing for selection by position

In [12]:
sample_numpy_data[3]

array([12, 13, 14, 15])

In [13]:
sample_df.iloc[3]

A    12
B    13
C    14
D    15
Name: 2016-01-04 00:00:00, dtype: int32

##### integer slices

In [14]:
sample_df.iloc[1:3, 2:4]

Unnamed: 0,C,D
2016-01-02,6,7
2016-01-03,10,11


##### lists of integers

In [15]:
sample_df.iloc[[0,1,3], [0,2]]

Unnamed: 0,A,C
2016-01-01,0,2
2016-01-02,4,6
2016-01-04,12,14


##### slicing rows explicitly
implicitly selecting all columns

In [16]:
sample_df.iloc[1:3,:]

Unnamed: 0,A,B,C,D
2016-01-02,4,5,6,7
2016-01-03,8,9,10,11


##### slicing columns explicitly
implicitly selecting all rows

In [17]:
sample_df.iloc[:, 1:3]

Unnamed: 0,B,C
2016-01-01,1,2
2016-01-02,5,6
2016-01-03,9,10
2016-01-04,13,14
2016-01-05,17,18
2016-01-06,21,22


### Boolean Indexing
##### test based upon one column's data

In [18]:
sample_df.C >= 14

2016-01-01    False
2016-01-02    False
2016-01-03    False
2016-01-04     True
2016-01-05     True
2016-01-06     True
Freq: D, Name: C, dtype: bool

##### test based upon entire data set

In [21]:
sample_df[sample_df >= 11]

Unnamed: 0,A,B,C,D
2016-01-01,,,,
2016-01-02,,,,
2016-01-03,,,,11.0
2016-01-04,12.0,13.0,14.0,15.0
2016-01-05,16.0,17.0,18.0,19.0
2016-01-06,20.0,21.0,22.0,23.0


##### isin() method
documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.isin.html

Returns a boolean Series showing whether each element in the Series is exactly contained in the passed sequence of values.

In [22]:
sample_df_2 = sample_df.copy()
sample_df_2['Fruits'] = ['apple', 'orange','banana','strawberry','blueberry','pineapple']
sample_df_2

Unnamed: 0,A,B,C,D,Fruits
2016-01-01,0,1,2,3,apple
2016-01-02,4,5,6,7,orange
2016-01-03,8,9,10,11,banana
2016-01-04,12,13,14,15,strawberry
2016-01-05,16,17,18,19,blueberry
2016-01-06,20,21,22,23,pineapple


select rows where 'Fruits' column contains either 'banana' or 'pineapple'; notice 'smoothy', which is not in the column

In [23]:
sample_df_2[sample_df_2['Fruits'].isin(['banana','pineapple', 'smoothy'])]

Unnamed: 0,A,B,C,D,Fruits
2016-01-03,8,9,10,11,banana
2016-01-06,20,21,22,23,pineapple
