# Chapter 2 - Data Preparation Basics
## Segment 1 - Filtering and selecting data

In [4]:
import pandas as pd
import numpy as np

from pandas import Series, DataFrame

### Selecting and retrieving data
You can write an index value in two forms.
- Label index or 
- Integer index

In [5]:
series_obj = Series(np.arange(8), index=['alo 1', 'alo 2', 'alo 3', 'alo 4', 'alo 5', 'alo 6', 'alo 7', 'alo 8'])
series_obj

alo 1    0
alo 2    1
alo 3    2
alo 4    3
alo 5    4
alo 6    5
alo 7    6
alo 8    7
dtype: int64

In [8]:
series_obj['alo 7']

6

In [9]:
series_obj[[2, 7]]

alo 3    2
alo 8    7
dtype: int64

In [31]:
np.random.seed(42)
DF_obj = DataFrame(np.random.rand(36).reshape((6,6)),
                   index=['row 1', 'row 2', 'row 3', 'row 4','row 5','row 6'],
                   columns=['column 1','column 2','column 3','column 4','column 5','column 6'])
DF_obj

Unnamed: 0,column 1,column 2,column 3,column 4,column 5,column 6
row 1,0.37454,0.950714,0.731994,0.598658,0.156019,0.155995
row 2,0.058084,0.866176,0.601115,0.708073,0.020584,0.96991
row 3,0.832443,0.212339,0.181825,0.183405,0.304242,0.524756
row 4,0.431945,0.291229,0.611853,0.139494,0.292145,0.366362
row 5,0.45607,0.785176,0.199674,0.514234,0.592415,0.04645
row 6,0.607545,0.170524,0.065052,0.948886,0.965632,0.808397


In [32]:
DF_obj.loc[['row 2', 'row 5'], ['column 5', 'column 2']]

Unnamed: 0,column 5,column 2
row 2,0.020584,0.866176
row 5,0.592415,0.785176


In [33]:
DF_obj.iloc[2:5, 1:-1]

Unnamed: 0,column 2,column 3,column 4,column 5
row 3,0.212339,0.181825,0.183405,0.304242
row 4,0.291229,0.611853,0.139494,0.292145
row 5,0.785176,0.199674,0.514234,0.592415


### Data slicing
You can use slicing to select and return a slice of several values from a data set. Slicing uses index values so you can use the same square brackets when doing data slicing.

How slicing differs, however, is that with slicing you pass in two index values that are separated by a colon. The index value on the left side of the colon should be the first value you want to select. On the right side of the colon, you write the index value for the last value you want to retrieve. When you execute the code, the indexer then simply finds the first record and the last record and returns every record in between them. 

In [22]:
series_obj['alo 3': 'alo 7']

alo 3    2
alo 4    3
alo 5    4
alo 6    5
alo 7    6
dtype: int64

### Comparing with scalars
Now we're going to talk about comparison operators and scalar values. Just in case you don't know that a scalar value is, it's basically just a single numerical value. You can use comparison operators like greater than or less than to return true/false values for all records to indicate how each element compares to a scalar value.

In [23]:
DF_obj < .2

Unnamed: 0,column 1,column 2,column 3,column 4,column 5,column 6
row 1,False,False,False,False,True,True
row 2,True,False,False,False,True,False
row 3,False,False,True,True,False,False
row 4,False,False,False,True,False,False
row 5,False,False,True,False,False,True
row 6,False,True,True,False,False,False


### Filtering with scalars

In [24]:
series_obj[series_obj > 6]

alo 8    7
dtype: int64

In [25]:
series_obj[series_obj > 2]

alo 4    3
alo 5    4
alo 6    5
alo 7    6
alo 8    7
dtype: int64

### Setting values with scalars

In [27]:
series_obj['alo 1', 'alo 5', 'alo 8'] = 8
series_obj

alo 1    8
alo 2    1
alo 3    2
alo 4    3
alo 5    8
alo 6    5
alo 7    6
alo 8    8
dtype: int64

Filtering and selecting using Pandas is one of the most fundamental things you'll do in data analysis. Make sure you know how to use indexing to select and retrieve records.

In [34]:
DF_obj

Unnamed: 0,column 1,column 2,column 3,column 4,column 5,column 6
row 1,0.37454,0.950714,0.731994,0.598658,0.156019,0.155995
row 2,0.058084,0.866176,0.601115,0.708073,0.020584,0.96991
row 3,0.832443,0.212339,0.181825,0.183405,0.304242,0.524756
row 4,0.431945,0.291229,0.611853,0.139494,0.292145,0.366362
row 5,0.45607,0.785176,0.199674,0.514234,0.592415,0.04645
row 6,0.607545,0.170524,0.065052,0.948886,0.965632,0.808397


In [35]:
DF_obj.iloc[2:5, 1:-1] = 8
DF_obj

Unnamed: 0,column 1,column 2,column 3,column 4,column 5,column 6
row 1,0.37454,0.950714,0.731994,0.598658,0.156019,0.155995
row 2,0.058084,0.866176,0.601115,0.708073,0.020584,0.96991
row 3,0.832443,8.0,8.0,8.0,8.0,0.524756
row 4,0.431945,8.0,8.0,8.0,8.0,0.366362
row 5,0.45607,8.0,8.0,8.0,8.0,0.04645
row 6,0.607545,0.170524,0.065052,0.948886,0.965632,0.808397
