# [Pandas Cookbook](https://pandas.pydata.org/pandas-docs/stable/cookbook.html)

In [1]:
import pandas as pd
import numpy as np

## [Idioms](https://pandas.pydata.org/pandas-docs/stable/cookbook.html#idioms)

### [If-then-else](https://pandas.pydata.org/pandas-docs/stable/cookbook.html#if-then)

[Override calculations and reassign variables:](https://stackoverflow.com/questions/17128302/python-pandas-idiom-for-if-then-else)

In [2]:
df = pd.DataFrame({'AAA' : [4,5,6,7],
                  'BBB' : [10,20,30,40],
                  'CCC' : [100,50,-30,-50]})
df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,-50


Execute an if-then statement on one column:

In [3]:
# If AAA >= 5, BBB = -1
df.loc[df.AAA >= 5, 'BBB'] = -1; df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,-1,50
2,6,-1,-30
3,7,-1,-50


If-then with assignment to 2 columns:

In [4]:
df.loc[df.AAA >= 5, ['BBB','CCC']] = 555; df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,555,555
2,6,555,555
3,7,555,555


Now you can perform another operation on the first row:

In [5]:
df.loc[df.AAA < 5, ['BBB', 'CCC']] = 2000; df

Unnamed: 0,AAA,BBB,CCC
0,4,2000,2000
1,5,555,555
2,6,555,555
3,7,555,555


You can also use pandas after setting up a mask:

In [6]:
df_mask = pd.DataFrame({'AAA' : [True] * 4, 'BBB' : [False] * 4, 'CCC' : [True,False] * 2})
df_mask

Unnamed: 0,AAA,BBB,CCC
0,True,False,True
1,True,False,False
2,True,False,True
3,True,False,False


In [7]:
df.where(df_mask, -1000)

Unnamed: 0,AAA,BBB,CCC
0,4,-1000,2000
1,5,-1000,-1000
2,6,-1000,555
3,7,-1000,-1000


[Use numpy's where() to perform an if-then-else operation](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column).  
In other words, the conditional creation of a DataFrame column:

In [8]:
df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40], 'CCC' : [100,50,-30,-50]}); df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,-50


In [9]:
# New column -- if AAA > 5, then high, else low:
df['logic'] = np.where(df['AAA'] > 5,'high','low'); df

Unnamed: 0,AAA,BBB,CCC,logic
0,4,10,100,low
1,5,20,50,low
2,6,30,-30,high
3,7,40,-50,high


### [Splitting](https://pandas.pydata.org/pandas-docs/stable/cookbook.html#splitting)

[Split a frame based on a boolean value:](https://stackoverflow.com/questions/14957116/how-to-split-a-dataframe-according-to-a-boolean-criterion)

In [10]:
df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40], 'CCC' : [100,50,-30,-50]}); df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,-50


In [11]:
df_low = df[df.AAA <= 5]; df_low

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50


In [12]:
df_high = df[df.AAA > 5]; df_high

Unnamed: 0,AAA,BBB,CCC
2,6,30,-30
3,7,40,-50


### [Building Criteria](https://pandas.pydata.org/pandas-docs/stable/cookbook.html#building-criteria)

[Select with multi-column criteria:](https://stackoverflow.com/questions/15315452/selecting-with-complex-criteria-from-pandas-dataframe)

In [13]:
df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,-50


In [14]:
# And operation without assignment:
newseries = df.loc[(df['BBB'] < 25) & (df['CCC'] >= -40), 'AAA']; newseries

0    4
1    5
Name: AAA, dtype: int64

In [18]:
# Or operation without assignment:
newseries = df.loc[(df['BBB'] > 25) | (df['CCC'] >= -40), 'AAA']; newseries

0    4
1    5
2    6
3    7
Name: AAA, dtype: int64

In [19]:
# Or operation with assignment modifies the dataframe:
df.loc[(df['BBB'] > 25) | (df['CCC'] >= 75), 'AAA'] = 0.1; df

Unnamed: 0,AAA,BBB,CCC
0,0.1,10,100
1,5.0,20,50
2,0.1,30,-30
3,0.1,40,-50


[Select the rows with data that's closest to a target value:](https://stackoverflow.com/questions/17758023/return-rows-in-a-dataframe-closest-to-a-user-defined-number)