### Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
%matplotlib inline
plt.style.use('ggplot')
pd.set_option('display.max_rows', 50)
pd.set_option('display.max_columns', 30)

In [None]:
df = pd.read_csv('data/UCI_Credit_Card.csv')

In [None]:
df.head()

In [None]:
function_with_good_docstring(df, column='LIMIT_BAL').head()

In [None]:
df.head()

#### Do you notice how the dataframe is now permanently changed after applying the function?

This is subtle but we most likely do not want this. We want the original object to stay the same and a separate transformed dataframe. How do we achieve this?

Enter `.copy()`

This allows us to stop continously overwriting our objects

### Chaining

Pandas has a great method for chaining that most people don't seem to use. 
Let's take a look at some common (and mostly bad) methods of applying multiple functions

Let's say we need to calculate the following for the balance (x):

`y = 3x**2 + 4`

In [None]:
def first_transformation(df, column='LIMIT_BAL'):
    """
    Squares a given column. 
    
    Args:
        df (pandas DataFrame): The transactional data
        column (str): The name of the column which we square
        
    Returns:
        df with squared column
    """
    df[column] = df[column]**2
    return df

def second_transformation(df, column='LIMIT_BAL', factor=3):
    """
    Multiplies a given column by a given factor. 
    
    Args:
        df (pandas DataFrame): The transactional data
        column (str): The name of the column on which we multiple the factor.
        factor (float): The factor
        
    Returns:
        df with multipled column
    """
    df[column] = df[column]*factor
    return df

def third_transformation(df, column='LIMIT_BAL', factor=4):
    """
    Adds a given factor to a given column. 
    
    Args:
        df (pandas DataFrame): The transactional data
        column (str): The name of the column on which we add the factor.
        factor (float): The factor
        
    Returns:
        df with divided column
    """
    df[column] = df[column] + factor
    return df
    

### Ugly method number 1

In [None]:
df = pd.read_csv('../../data/UCI_Credit_Card.csv')
df = third_transformation(second_transformation(first_transformation(df)))

In [None]:
df.head()

### Ugly (but common) method number 2

In [None]:
df = pd.read_csv('../../data/UCI_Credit_Card.csv')
df = df.copy()
df = first_transformation(df)
df = second_transformation(df)
df = third_transformation(df)

In [None]:
df.head()

### Best practice

In [None]:
def read_data(path):
    # What is this function missing?
    return pd.read_csv(path)

def copy_df(df):
    # What is this function missing?
    return df.copy()

In [None]:
df = read_data(path='../../data/UCI_Credit_Card.csv')

df = (
    copy_df(df)
    .pipe(first_transformation)
    .pipe(second_transformation)
    .pipe(third_transformation)
)

### Note the differences with the arguments

#### Exercise.
1. Write a function that squares BILL_AMT1 if MARRIAGE==1. Save this in a column called "BILL_MANIPULATION".
2. Write another function that divides this new column by LIMIT_BAL. Overwrite BILL_MANIPULATION.
3. Write a final function that sets BILL_AMT2 to 0 if BILL_MANIPULATION > 500
4. Chain all of these together




#### Exercise

Write a function that takes as parameter a number and returns True if it is even and False if it is odd.