### Differences between interactive and production work
Note: while standard Python / Numpy expressions for selecting and setting are intuitive and come in handy for interactive work, for production code, we (the pandas development team) recommend the optimized pandas data access methods, .at, .iat, .loc, .iloc and .ix.

from:http://pandas.pydata.org/pandas-docs/stable/10min.html

In [1]:
import pandas as pd
import numpy as np

In [2]:
sample_numpy_data = np.array(np.arange(24)).reshape((6,4))
dates_index = pd.date_range('20160101', periods=6)
sample_df = pd.DataFrame(sample_numpy_data, index=dates_index, columns=list('ABCD'))
sample_df

Unnamed: 0,A,B,C,D
2016-01-01,0,1,2,3
2016-01-02,4,5,6,7
2016-01-03,8,9,10,11
2016-01-04,12,13,14,15
2016-01-05,16,17,18,19
2016-01-06,20,21,22,23


##### selection using column name

In [10]:
sample_df.A #np array
sample_df.loc[:,'A'] # np array
sample_df[['A']] # dataframe

Unnamed: 0,A
2016-01-01,0
2016-01-02,4
2016-01-03,8
2016-01-04,12
2016-01-05,16
2016-01-06,20


##### selection using slice
- remember: up to, but not including second index

In [20]:
sample_df.iloc[1:2,2:4] =[3,4]

In [34]:
sample_df[1:5] # return dataframe

Unnamed: 0,A,B,C,D
2016-01-02,4,5,5,4
2016-01-03,8,9,5,11
2016-01-04,12,13,14,15
2016-01-05,16,17,18,19


In [50]:
sample_df.loc[:,['A','B']]

Unnamed: 0,A,B
2016-01-01,0,1
2016-01-02,4,5
2016-01-03,8,9
2016-01-04,12,13
2016-01-05,16,17
2016-01-06,20,21


In [32]:
pd.Series([3,2])

0    3
1    2
dtype: int64

In [33]:
sample_df.iloc[1:3,2:3]=pd.Series([3,2])

AttributeError: 'Series' object has no attribute 'to_array'

In [24]:
sample_df.iloc[1:2,2:3] # back data frame 

Unnamed: 0,C
2016-01-02,3


In [29]:
sample_df.iloc[1:3,2:3]

Unnamed: 0,C
2016-01-02,5
2016-01-03,5


##### selection using date time index
- note: last index is included

### Selection by label
documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html

label-location based indexer for selection by label

##### Selecting using multi-axis by label

##### Label slicing, both endpoints are included

In [None]:
sample_df.loc['2016-01-01':'2016-01-03',['A','B']]

##### Reduce number of dimensions for returned object
- notice order of 'D' and 'B'

In [None]:
sample_df.loc['2016-01-03',['D','B']]

##### using result

##### select a scalar

In [None]:
sample_df.loc[dates_index[2], 'C']

### Selection by Position
documentation: http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DataFrame.iloc.html

integer-location based indexing for selection by position

##### integer slices

##### lists of integers

##### slicing rows explicitly
implicitly selecting all columns

##### slicing columns explicitly
implicitly selecting all rows

### Boolean Indexing
##### test based upon one column's data

In [61]:
condition = sample_df['A'].map(lambda x: x>3 and x<10)
sample_df[condition]

Unnamed: 0,A,B,C,D
2016-01-02,4,5,5,4
2016-01-03,8,9,5,11


##### test based upon entire data set

##### isin() method
documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.isin.html

Returns a boolean Series showing whether each element in the Series is exactly contained in the passed sequence of values.

In [None]:
sample_df_2 = sample_df.copy()
sample_df_2['Fruits'] = ['apple', 'orange','banana','strawberry','blueberry','pineapple']
sample_df_2

select rows where 'Fruits' column contains eith 'banana' or 'pineapple'; notice 'smoothy', which is not in the column

In [None]:
sample_df_2[sample_df_2['Fruits'].isin(['banana','pineapple', 'smoothy'])]