## Chapter - 01 Data Preparation Basics

### Segment 1 - Filtering and selecting data

In [1]:
import numpy as np
import pandas as pd

from pandas import DataFrame, Series

### Selecting and retrieving data

We can write an index value in two forms:

* Label index or
* Integer index

In [2]:
series_obj = Series(np.arange(8), index=['Row 1', 'Row 2', 'Row 3', 'Row 4', 'Row 5', 'Row 6', 'Row 7', 'Row 8'])
series_obj

Row 1    0
Row 2    1
Row 3    2
Row 4    3
Row 5    4
Row 6    5
Row 7    6
Row 8    7
dtype: int32

In [3]:
series_obj['Row 7']

6

In [4]:
series_obj[[0, 7]]

Row 1    0
Row 8    7
dtype: int32

In [None]:
np.random.seed(25)

In [6]:
df_obj = DataFrame(np.random.rand(36).reshape((6,6)),
                    index = ['Row 1', 'Row 2', 'Row 3', 'Row 4', 'Row 5', 'Row 6'],
                    columns = ['Column 1', 'Column 2', 'Column 3', 'Column 4', 'Column 5', 'Column 6']
                  )
df_obj

Unnamed: 0,Column 1,Column 2,Column 3,Column 4,Column 5,Column 6
Row 1,0.046422,0.847703,0.059907,0.385508,0.71901,0.73768
Row 2,0.683108,0.011868,0.494926,0.250323,0.512605,0.541458
Row 3,0.677192,0.094281,0.451835,0.767879,0.902613,0.273354
Row 4,0.136213,0.594894,0.133821,0.558877,0.355686,0.160324
Row 5,0.060624,0.001447,0.420349,0.5601,0.84417,0.030965
Row 6,0.230078,0.769978,0.471582,0.034863,0.468798,0.757796


In [8]:
df_obj.loc[['Row 2', 'Row 5'], ['Column 5', 'Column 2']]

Unnamed: 0,Column 5,Column 2
Row 2,0.512605,0.011868
Row 5,0.84417,0.001447


### Data Slicing

We can use slicing to select and return a slice of several values from a data set. Slicing uses index values so we can use the same square brackets when doing data slicing.

How slicing differs, however, is that with slicing you can pass in two index values that are seperated by a colon. the index value on the left side of the colon should be the first value you want to select. On the right side of the colon, we write the index value we want to retrieve. When you execute the code, the indexer then simply finds the first record and the last record and returns every record in between them.

In [9]:
series_obj['Row 3':'Row 7']

Row 3    2
Row 4    3
Row 5    4
Row 6    5
Row 7    6
dtype: int32

### Comparing with Scalars

Now we're going to talk about comparison operators and scalar values. Just we dont know that a scalar value is, it's basically just a single numerical value. You can use comparison operators like greater than or less than to return True/False values for all records to indicate how each element compares to a scalar value.

In [10]:
df_obj < 0.2

Unnamed: 0,Column 1,Column 2,Column 3,Column 4,Column 5,Column 6
Row 1,True,False,True,False,False,False
Row 2,False,True,False,False,False,False
Row 3,False,True,False,False,False,False
Row 4,True,False,True,False,False,True
Row 5,True,True,False,False,False,True
Row 6,False,False,False,True,False,False


### Filtering with Scalars

In [11]:
series_obj[series_obj > 6]

Row 8    7
dtype: int32

### Setting values with scalars

In [13]:
series_obj['Row 1', 'Row 5', 'Row 8'] = 8
series_obj

Row 1    8
Row 2    1
Row 3    2
Row 4    3
Row 5    8
Row 6    5
Row 7    6
Row 8    8
dtype: int32