# [Pandas Cookbook](https://pandas.pydata.org/pandas-docs/stable/cookbook.html)

In [102]:
import pandas as pd
import numpy as np

## [Idioms](https://pandas.pydata.org/pandas-docs/stable/cookbook.html#idioms)

### [If-then-else](https://pandas.pydata.org/pandas-docs/stable/cookbook.html#if-then)

[Override calculations and reassign variables:](https://stackoverflow.com/questions/17128302/python-pandas-idiom-for-if-then-else)

In [103]:
df = pd.DataFrame({'AAA' : [4,5,6,7],
                  'BBB' : [10,20,30,40],
                  'CCC' : [100,50,-30,-50]})
df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,-50


Execute an if-then statement on one column:

In [104]:
# If AAA >= 5, BBB = -1
df.loc[df.AAA >= 5, 'BBB'] = -1; df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,-1,50
2,6,-1,-30
3,7,-1,-50


If-then with assignment to 2 columns:

In [105]:
df.loc[df.AAA >= 5, ['BBB','CCC']] = 555; df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,555,555
2,6,555,555
3,7,555,555


Now you can perform another operation on the first row:

In [106]:
df.loc[df.AAA < 5, ['BBB', 'CCC']] = 2000; df

Unnamed: 0,AAA,BBB,CCC
0,4,2000,2000
1,5,555,555
2,6,555,555
3,7,555,555


You can also use pandas after setting up a mask:

In [107]:
df_mask = pd.DataFrame({'AAA' : [True] * 4, 'BBB' : [False] * 4, 'CCC' : [True,False] * 2})
df_mask

Unnamed: 0,AAA,BBB,CCC
0,True,False,True
1,True,False,False
2,True,False,True
3,True,False,False


In [108]:
df.where(df_mask, -1000)

Unnamed: 0,AAA,BBB,CCC
0,4,-1000,2000
1,5,-1000,-1000
2,6,-1000,555
3,7,-1000,-1000


[Use numpy's where() to perform an if-then-else operation](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column).  
In other words, the conditional creation of a DataFrame column:

In [109]:
df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40], 'CCC' : [100,50,-30,-50]}); df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,-50


In [110]:
# New column -- if AAA > 5, then high, else low:
df['logic'] = np.where(df['AAA'] > 5,'high','low'); df

Unnamed: 0,AAA,BBB,CCC,logic
0,4,10,100,low
1,5,20,50,low
2,6,30,-30,high
3,7,40,-50,high


### [Splitting](https://pandas.pydata.org/pandas-docs/stable/cookbook.html#splitting)

[Split a frame based on a boolean value:](https://stackoverflow.com/questions/14957116/how-to-split-a-dataframe-according-to-a-boolean-criterion)

In [111]:
df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40], 'CCC' : [100,50,-30,-50]}); df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,-50


In [112]:
df_low = df[df.AAA <= 5]; df_low

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50


In [113]:
df_high = df[df.AAA > 5]; df_high

Unnamed: 0,AAA,BBB,CCC
2,6,30,-30
3,7,40,-50


### [Building Criteria](https://pandas.pydata.org/pandas-docs/stable/cookbook.html#building-criteria)

[Select with multi-column criteria:](https://stackoverflow.com/questions/15315452/selecting-with-complex-criteria-from-pandas-dataframe)

In [114]:
df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,-50


In [115]:
# And operation without assignment:
newseries = df.loc[(df['BBB'] < 25) & (df['CCC'] >= -40), 'AAA']; newseries

0    4
1    5
Name: AAA, dtype: int64

In [116]:
# Or operation without assignment:
newseries = df.loc[(df['BBB'] > 25) | (df['CCC'] >= -40), 'AAA']; newseries

0    4
1    5
2    6
3    7
Name: AAA, dtype: int64

In [117]:
# Or operation with assignment modifies the dataframe:
df.loc[(df['BBB'] > 25) | (df['CCC'] >= 75), 'AAA'] = 0.1; df

Unnamed: 0,AAA,BBB,CCC
0,0.1,10,100
1,5.0,20,50
2,0.1,30,-30
3,0.1,40,-50


[Select the rows with data that's closest to a target value:](https://stackoverflow.com/questions/17758023/return-rows-in-a-dataframe-closest-to-a-user-defined-number)

In [118]:
df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40],'CCC' : [100,50,-30,-50]}); df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,-50


In [119]:
aValue = 43.0
df.loc[(df.CCC-aValue).abs().argsort()]

Unnamed: 0,AAA,BBB,CCC
1,5,20,50
0,4,10,100
2,6,30,-30
3,7,40,-50


[Dynamically reduce a list of criteria using binary operators](https://stackoverflow.com/questions/21058254/pandas-boolean-operation-in-a-python-list/21058331)

In [120]:
df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40], 'CCC' : [100,50,-30,-50]}); df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,-50


In [121]:
Crit1 = df.AAA <= 5.5; Crit1

0     True
1     True
2    False
3    False
Name: AAA, dtype: bool

In [122]:
Crit2 = df.BBB == 10.0; Crit2

0     True
1    False
2    False
3    False
Name: BBB, dtype: bool

In [123]:
Crit3 = df.CCC > -40.0; Crit3

0     True
1     True
2     True
3    False
Name: CCC, dtype: bool

If you want to hard code a solution:

In [124]:
AllCrit = Crit1 & Crit2 & Crit3; AllCrit

0     True
1    False
2    False
3    False
dtype: bool

You may want to work with a list of dynamically built criteria:

In [125]:
CritList = [Crit1,Crit2,Crit3]; CritList

[0     True
 1     True
 2    False
 3    False
 Name: AAA, dtype: bool, 0     True
 1    False
 2    False
 3    False
 Name: BBB, dtype: bool, 0     True
 1     True
 2     True
 3    False
 Name: CCC, dtype: bool]

In [126]:
import functools

AllCrit = functools.reduce(lambda x,y: x & y, CritList); AllCrit

0     True
1    False
2    False
3    False
dtype: bool

In [127]:
df[AllCrit]

Unnamed: 0,AAA,BBB,CCC
0,4,10,100


## [Selection](https://pandas.pydata.org/pandas-docs/stable/cookbook.html#selection)

### DataFrames

Ladies and gentleman, [The Indexing Documentation](https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing)

[Pandas using row labels in boolean indexing](https://stackoverflow.com/questions/14725068/pandas-using-row-labels-in-boolean-indexing)

In [128]:
# Using both row labels and value conditionals:
df = pd.DataFrame({'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40], 'CCC' : [100,50,-30,50]}); df

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,50


In [129]:
df[(df.AAA <= 6) & (df.index.isin([0,2,4]))]

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
2,6,30,-30


[Use `loc` for label-oriented slicing and `iloc` for positional slicing](https://github.com/pandas-dev/pandas/issues/2904)

In [130]:
data = {'AAA' : [4,5,6,7], 'BBB' : [10,20,30,40], 'CCC' : [100,50,-30,50]}
pd.DataFrame(data)

Unnamed: 0,AAA,BBB,CCC
0,4,10,100
1,5,20,50
2,6,30,-30
3,7,40,50


In [131]:
df = pd.DataFrame(data=data,index=['foo','bar','goo','car']); df

Unnamed: 0,AAA,BBB,CCC
foo,4,10,100
bar,5,20,50
goo,6,30,-30
car,7,40,50


There are two explicit slicing methods and an available third option:
1. Positional-oriented ( Python slicing style : exclusive end )
2. Label-oriented ( Non-Python slicing style : inclusive end )
3. General ( Either slicing style : depends on slicing on labels or positions )

In [132]:
# Label-oriented:
df.loc['bar' : 'car']

Unnamed: 0,AAA,BBB,CCC
bar,5,20,50
goo,6,30,-30
car,7,40,50


In [133]:
# Positional-oriented:
df.iloc[0:3]

Unnamed: 0,AAA,BBB,CCC
foo,4,10,100
bar,5,20,50
goo,6,30,-30


Ambiguity arises when an index consists of integers with a non-zero start or non-unit increment.

In [134]:
# Begin index at 1 instead of 0
df2 = pd.DataFrame(data=data,index=[1,2,3,4]); df2

Unnamed: 0,AAA,BBB,CCC
1,4,10,100
2,5,20,50
3,6,30,-30
4,7,40,50


In [135]:
df2.iloc[1:3]

Unnamed: 0,AAA,BBB,CCC
2,5,20,50
3,6,30,-30


In [136]:
df2.loc[1:3]

Unnamed: 0,AAA,BBB,CCC
1,4,10,100
2,5,20,50
3,6,30,-30


[Using the inverse operator `~` to take the complement of a mask](https://stackoverflow.com/questions/14986510/picking-out-elements-based-on-complement-of-indices-in-python-pandas)