<a href="https://colab.research.google.com/github/Imppel-9704/de_track_datacamp/blob/main/Project_l15_Cleaning_Banking_Marketing_Campaign_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project objective
- Clean bank_marketing.csv and store as 3 DataFrames called ```client```, ```campaign```, and ```economics```, each containing the columns outlined in the notebook and formatted to the data types listed.
- Save the three DataFrames to csv files, without an index, as ```client.csv```, ```campaign.csv```, and ```economics.csv``` respectively.

## Requirement
### client.csv

| column | data type | description | cleaning requirements |
|--------|-----------|-------------|-----------------------|
| `client_id` | `integer` | Client ID | N/A |
| `age` | `integer` | Client's age in years | N/A |
| `job` | `object` | Client's type of job | Change `"."` to `"_"` |
| `marital` | `object` | Client's marital status | N/A |
| `education` | `object` | Client's level of education | Change `"."` to `"_"` and `"unknown"` to `np.NaN` |
| `credit_default` | `bool` | Whether the client's credit is in default | Convert to boolean data type |
| `mortgage` | `bool` | Whether the client has an existing mortgage (housing loan) | Convert to boolean data type |

<br>

### campaign.csv

| column | data type | description | cleaning requirements |
|--------|-----------|-------------|-----------------------|
| `client_id` | `integer` | Client ID | N/A |
| `number_contacts` | `integer` | Number of contact attempts to the client in the current campaign | N/A |
| `contact_duration` | `integer` | Last contact duration in seconds | N/A |
| `previous_campaign_contacts` | `integer` | Number of contact attempts to the client in the previous campaign | N/A |
| `previous_outcome` | `bool` | Outcome of the previous campaign | Convert to boolean data type |
| `campaign_outcome` | `bool` | Outcome of the current campaign | Convert to boolean data type |
| `last_contact_date` | `datetime` | Last date the client was contacted | Create from a combination of `day`, `month`, and a newly created `year` column (which should have a value of `2022`); <br> **Format =** `"YYYY-MM-DD"` |

<br>

### economics.csv

| column | data type | description | cleaning requirements |
|--------|-----------|-------------|-----------------------|
| `client_id` | `integer` | Client ID | N/A |
| `cons_price_idx` | `float` | Consumer price index (monthly indicator) | N/A |
| `euribor_three_months` | `float` | Euro Interbank Offered Rate (euribor) three-month rate (daily indicator) | N/A |

In [90]:
import pandas as pd
import numpy as np

# Read .csv file
df = pd.read_csv("bank_marketing.csv", sep=",")

# Check columns type
print(df.dtypes)

client_id                       int64
age                             int64
job                            object
marital                        object
education                      object
credit_default                 object
mortgage                       object
month                          object
day                             int64
contact_duration                int64
number_contacts                 int64
previous_campaign_contacts      int64
previous_outcome               object
cons_price_idx                float64
euribor_three_months          float64
campaign_outcome               object
dtype: object


In [91]:
# client.csv

client = df.drop(columns=['month', 'day', 'contact_duration', 'number_contacts',
                            'previous_campaign_contacts', 'previous_outcome', 'cons_price_idx',
                            'euribor_three_months', 'campaign_outcome'])

# Mapping values to boolean
mapping = {'no': False, 'yes': True, 'unknown': None}
client['mortgage'] = client['mortgage'].map(mapping)
client['credit_default'] = client['credit_default'].map(mapping)
client[['mortgage', 'credit_default']] = client[['mortgage', 'credit_default']].astype('bool')

# Replacing . with _ and unknown with np.NaN
client['job'] = client['job'].str.replace('.', '_')
client['education'] = client['education'].str.replace('.', '_').replace('unknown', np.NaN)

# Export
client.to_csv("client.csv", index=False)

  client['job'] = client['job'].str.replace('.', '_')
  client['education'] = client['education'].str.replace('.', '_').replace('unknown', np.NaN)


In [92]:
# campaign.csv

campaign = df.drop(columns=['age', 'job', 'marital', 'education', 'credit_default', 'mortgage', 'cons_price_idx', 'euribor_three_months'])

# Mapping values to boolean
mapping_po = {'failure': False, 'success': True, "nonexistent": None}
mapping_co = {'yes': True, 'no': False}
campaign['previous_outcome'] = campaign['previous_outcome'].map(mapping_po)
campaign['campaign_outcome'] = campaign['campaign_outcome'].map(mapping_co)
campaign[['previous_outcome', 'campaign_outcome']] = campaign[['previous_outcome', 'campaign_outcome']].astype('bool')

# mapping month and create columns year and combine day, month, year to last_contact_date
month_map = {
  'jan': 1,
  'feb': 2,
  'mar': 3,
  'apr': 4,
  'may': 5,
  'jun': 6,
  'jul': 7,
  'aug': 8,
  'sep': 9,
  'oct': 10,
  'nov': 11,
  'dec': 12
}

campaign['month'] = campaign['month'].apply(lambda x: month_map[x])
campaign['year'] = 2019
campaign['last_contact_date'] = pd.to_datetime(campaign[['year', 'month', 'day']], format='%Y-%m-%d')

campaign.drop(columns=['year', 'month', 'day'], inplace=True)

campaign.to_csv("campaign.csv", index=False)

In [93]:
# Export file as economics.csv
economics = df.drop(columns=['age', 'job', 'marital', 'education', 'credit_default', 'mortgage',
                          'month', 'day', 'contact_duration', 'number_contacts', 'previous_campaign_contacts',
                          'previous_outcome', 'campaign_outcome'])

economics.to_csv("economics.csv", index=False)