# Follow the money

In this exercise, you're working with another version of the `banking` DataFrame that contains missing values for both the `cust_id` column and the `acct_amount` column.

You want to produce analysis on how many unique customers the bank has, the average amount held by customers and more. You know that rows with missing `cust_id` don't really help you, and that on average `acct_amount` is usually 5 times the amount of `inv_amount`.

In this exercise, you will drop rows of `banking` with missing `cust_id`s, and impute missing values of `acct_amount` with some domain knowledge.

In [1]:
import pandas as pd
import numpy as np
from faker import Faker
import datetime as dt
import missingno as msno
import matplotlib.pyplot as plt
fake = Faker()
path=r'Z:/'
file='banking_dirty.csv'
banking = pd.read_csv(path+file,index_col = [0],parse_dates=['birth_date'])
acct_cur = [fake.random_element(elements=('dollar', 'euro')) for _ in range(len(banking))]
banking['acct_cur']=acct_cur
print(banking.head(),'\n')

    cust_id birth_date  Age  acct_amount  inv_amount   fund_A   fund_B  \
0  870A9281 1962-06-09   58     63523.31       51295  30105.0   4138.0   
1  166B05B0 1962-12-16   58     38175.46       15050   4995.0    938.0   
2  BFC13E88 1990-09-12   34     59863.77       24567  10323.0   4590.0   
3  F2158F66 1985-11-03   35     84132.10       23712   3908.0    492.0   
4  7A73F334 1990-05-17   30    120512.00       93230  12158.4  51281.0   

    fund_C   fund_D account_opened last_transaction acct_cur  
0   1420.0  15632.0       02-09-18         22-02-19   dollar  
1   6696.0   2421.0       28-02-19         31-10-18     euro  
2   8469.0   1185.0       25-04-18         02-04-18     euro  
3   6482.0  12830.0       07-11-17         08-11-18   dollar  
4  13434.0  18383.0       14-05-18         19-07-18   dollar   



* Use `.dropna()` to drop missing values of the `cust_id` column in `banking` and store the results in `banking_fullid`.
* Use `inv_amount` to compute the estimated account amounts for `banking_fullid` by setting the amounts equal to `inv_amount * 5`, and assign the results to `acct_imp`.
* Impute the missing values of `acct_amount` in `banking_fullid` with the newly created `acct_imp` using `.fillna()`.


In [4]:
# Drop missing values of cust_id
banking_fullid = banking.dropna(subset = ['cust_id'])

# Compute estimated acct_amount
acct_imp = banking_fullid['inv_amount'] * 5

# Impute missing acct_amount with corresponding acct_imp
banking_imputed = banking_fullid.fillna({'acct_amount':acct_imp})

# Print number of missing values
print(banking_imputed.isna().sum())

cust_id             0
birth_date          0
Age                 0
acct_amount         0
inv_amount          0
fund_A              0
fund_B              0
fund_C              0
fund_D              0
account_opened      0
last_transaction    0
acct_cur            0
dtype: int64
