## 1.1 Filter and Select data

* use `pandas`  

### Indexing in pandas 

* **index** list of int/labels to uniquely id rows/cols   
* done with either  
    * `.loc[]` indexer for label-based
    * `.iloc[]` indexer for position-based  
    
### code demo

#### import libraries and define `pd.Series` object

In [1]:
import numpy as np, pandas as pd
from pandas import Series, DataFrame

In [2]:
series_obj = Series(np.arange(8), index=['row1 row2 row3 row4 row5 row6 row7 row8'.split()])
series_obj

row1    0
row2    1
row3    2
row4    3
row5    4
row6    5
row7    6
row8    7
dtype: int32

#### label index `df.loc[]`

In [3]:
series_obj.loc['row7']

row7    6
dtype: int32

#### integer index `df.iloc[]`

In [4]:
series_obj.iloc[[0,7]]

row1    0
row8    7
dtype: int32

#### define `pd.DataFrame` object

In [5]:
np.random.seed(25)
df_obj = DataFrame(np.random.rand(36).reshape((6,6)), 
                   index='row1 row2 row3 row4 row5 row6'.split(), 
                   columns='col1 col2 col3 col4 col5 col6'.split())
df_obj

Unnamed: 0,col1,col2,col3,col4,col5,col6
row1,0.870124,0.582277,0.278839,0.185911,0.4111,0.117376
row2,0.684969,0.437611,0.556229,0.36708,0.402366,0.113041
row3,0.447031,0.585445,0.161985,0.520719,0.326051,0.699186
row4,0.366395,0.836375,0.481343,0.516502,0.383048,0.997541
row5,0.514244,0.559053,0.03445,0.71993,0.421004,0.436935
row6,0.281701,0.900274,0.669612,0.456069,0.289804,0.525819


#### use label-based `df.loc[ ['rows'], ['columns'] ]` indexer

In [6]:
df_obj.loc[['row2', 'row5'], ['col5', 'col2']]

Unnamed: 0,col5,col2
row2,0.402366,0.437611
row5,0.421004,0.559053


### data slicing

#### label-based `df.loc[]` with `:`

In [7]:
# [starting label index : ending label index]
series_obj.loc['row3':'row7']

row3    2
row4    3
row5    4
row6    5
row7    6
dtype: int32

### comparing with scalars

In [8]:
df_obj < .2

Unnamed: 0,col1,col2,col3,col4,col5,col6
row1,False,False,False,True,False,True
row2,False,False,False,False,False,True
row3,False,False,True,False,False,False
row4,False,False,False,False,False,False
row5,False,False,True,False,False,False
row6,False,False,False,False,False,False


### filtering with scalars

In [9]:
series_obj[series_obj > 6]

row8    7
dtype: int32

### setting values with scalars

In [10]:
series_obj['row1', 'row5', 'row8'] = 8
series_obj

row1    8
row2    1
row3    2
row4    3
row5    4
row6    5
row7    6
row8    7
dtype: int32