___

<a href='https://www.prosperousheart.com/'> <img src='files/learn to code online.png' /></a>
___

In [1]:
import numpy as np
import pandas as pd
from numpy.random import randn

In [2]:
np.random.seed(101)
df = pd.DataFrame(randn(5,4), ["A", "B", "C", "D", "E"], ["W", "X", "Y", "Z"]) # there will be 5 rows & 4 columns as per randn
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


# Conditional Selection

**Conditional Selection** is the ability to form conditional selection (choosing data based on a condition) using bracket notation. This will return a DataFrame of boolean values.

## Single Conditional Statements

In [4]:
booldf = df < 0
booldf

Unnamed: 0,W,X,Y,Z
A,False,False,False,False
B,False,True,True,False
C,True,False,False,True
D,False,True,True,False
E,False,False,False,False


What do you think will happen if we pass this boolean DataFrame in to **df**?

In [5]:
df[booldf]

Unnamed: 0,W,X,Y,Z
A,,,,
B,,-0.319318,-0.848077,
C,-2.018168,,,-0.589001
D,,-0.758872,-0.933237,
E,,,,


In [6]:
# Condense the above two lines into:
df[df < 0]

Unnamed: 0,W,X,Y,Z
A,,,,
B,,-0.319318,-0.848077,
C,-2.018168,,,-0.589001
D,,-0.758872,-0.933237,
E,,,,


Operation with a dataframe against a comparison operator is not common. It is more likely you will pass in a row or column. And instead of returning **null** or **NaN** it will return a subset of the DF where conditions are True.

In [8]:
# When doing comparison on a column (Series) you will get a series back
print(df)
df['X'] > 0

          W         X         Y         Z
A  2.706850  0.628133  0.907969  0.503826
B  0.651118 -0.319318 -0.848077  0.605965
C -2.018168  0.740122  0.528813 -0.589001
D  0.188695 -0.758872 -0.933237  0.955057
E  0.190794  1.978757  2.605967  0.683509


A     True
B    False
C     True
D    False
E     True
Name: X, dtype: bool

You can now use this series of boolean values corresponding to rows based on a column's value.

In [9]:
# Pass the series into a DF using bracket notation
df[df['X'] > 0]

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
C,-2.018168,0.740122,0.528813,-0.589001
E,0.190794,1.978757,2.605967,0.683509


You will notice that it only returned rows that were True in the original `df['X'] > 0` series.

The nulls or **NaN** do not show because you're not even returning them.

You can now call commands as well, since it is returning a DataFrame.

In [14]:
# return only the rows where Z is > 0
print(df)
resultdf = df[df['W'] > 0]
resultdf

          W         X         Y         Z
A  2.706850  0.628133  0.907969  0.503826
B  0.651118 -0.319318 -0.848077  0.605965
C -2.018168  0.740122  0.528813 -0.589001
D  0.188695 -0.758872 -0.933237  0.955057
E  0.190794  1.978757  2.605967  0.683509


Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [15]:
resultdf['X']

A    0.628133
B   -0.319318
D   -0.758872
E    1.978757
Name: X, dtype: float64

In [16]:
# You can do all the above 2 commands in one step!
#     resultdf = df[df['W'] > 0]
#     resultdf['X']
df[df['W'] > 0]['X']

A    0.628133
B   -0.319318
D   -0.758872
E    1.978757
Name: X, dtype: float64

In [17]:
# can also get back a DF
#     boolSer = df['W'] > 0
#     result = df[boolSer]
#     my_cols = ['Y', 'X']
#     result[my_cols]
df[df['W'] > 0][['Y','X']]

Unnamed: 0,Y,X
A,0.907969,0.628133
B,-0.848077,-0.319318
D,-0.933237,-0.758872
E,2.605967,1.978757


## Multiple Conditional Statements

Python's normal **and** and **or** operator cannot take into account a series & compare it to another. It can only compare booleans. When comparing boolean Series, you will use the **&** or **|** operator.

`df[(comparison1) & (comparison2) ...]` or `df[(comparison1) | (comparison2) ...]`

In [19]:
df['W'] > 0

A     True
B     True
C    False
D     True
E     True
Name: W, dtype: bool

In [20]:
df['Y'] < 0

A    False
B     True
C    False
D     True
E    False
Name: Y, dtype: bool

In [21]:
(df['W'] > 0) & (df['Y'] < 0)

A    False
B     True
C    False
D     True
E    False
dtype: bool

In [18]:
print(df)
df[(df['W'] > 0) & (df['Y'] < 0)]

          W         X         Y         Z
A  2.706850  0.628133  0.907969  0.503826
B  0.651118 -0.319318 -0.848077  0.605965
C -2.018168  0.740122  0.528813 -0.589001
D  0.188695 -0.758872 -0.933237  0.955057
E  0.190794  1.978757  2.605967  0.683509


Unnamed: 0,W,X,Y,Z
B,0.651118,-0.319318,-0.848077,0.605965
D,0.188695,-0.758872,-0.933237,0.955057


In [22]:
(df['W'] > 0) | (df['Y'] < 0)

A     True
B     True
C    False
D     True
E     True
dtype: bool

In [25]:
print(df)
df[(df['W'] > 0) | (df['Y'] < 0)]

          W         X         Y         Z
A  2.706850  0.628133  0.907969  0.503826
B  0.651118 -0.319318 -0.848077  0.605965
C -2.018168  0.740122  0.528813 -0.589001
D  0.188695 -0.758872 -0.933237  0.955057
E  0.190794  1.978757  2.605967  0.683509


Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


# Index

## Resetting Index

Reset index back to default from 0 up to number of rows -1 ... simply call:  `df.reset_index()`

<div class="alert alert-block alert-warning">Take note of what happened in the DataFrame!</div>

In [26]:
df

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [27]:
df.reset_index() # does not occur in place unless you specify it to do so using inplace=True

Unnamed: 0,index,W,X,Y,Z
0,A,2.70685,0.628133,0.907969,0.503826
1,B,0.651118,-0.319318,-0.848077,0.605965
2,C,-2.018168,0.740122,0.528813,-0.589001
3,D,0.188695,-0.758872,-0.933237,0.955057
4,E,0.190794,1.978757,2.605967,0.683509


## Changing Index To Something Else

If you wish to change the index, you need to ensure the list you utilize has the same amount of indexes.

### Have A Column Already In DataFrame You Want As Index

`df.set_index(col_name)` - this is not done in place as you must set **inplace** to True.

In [31]:
new_idx = "TX AL DE GA MI".split()
new_idx

['TX', 'AL', 'DE', 'GA', 'MI']

In [32]:
df['States'] = new_idx
df

Unnamed: 0,W,X,Y,Z,States
A,2.70685,0.628133,0.907969,0.503826,TX
B,0.651118,-0.319318,-0.848077,0.605965,AL
C,-2.018168,0.740122,0.528813,-0.589001,DE
D,0.188695,-0.758872,-0.933237,0.955057,GA
E,0.190794,1.978757,2.605967,0.683509,MI


In [34]:
df.set_index('States')

Unnamed: 0_level_0,W,X,Y,Z
States,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
TX,2.70685,0.628133,0.907969,0.503826
AL,0.651118,-0.319318,-0.848077,0.605965
DE,-2.018168,0.740122,0.528813,-0.589001
GA,0.188695,-0.758872,-0.933237,0.955057
MI,0.190794,1.978757,2.605967,0.683509
