# CLASSIFICATION

**Data Description**: The csv contains data on 5000 customers. The data include customer demographic information (age, income, etc.), the customer's relationship with the bank (mortgage, securities account, etc.), and the customer response to the last personal loan campaign (Personal Loan). Among these 5000 customers, only 480 (= 9.6%) accepted the personal loan that was offered to them in the earlier campaign.

**Domain**: Banking

**Context**: This case is about a bank (Thera Bank) whose management wants to explore ways of converting its liability customers to personal loan customers (while retaining them as depositors). A campaign that the bank ran last year for liability customers showed a healthy conversion rate of over 9% success. This has encouraged the retail marketing department to devise campaigns with better target marketing to increase the success ratio with minimal budget.

**Attribute Information**
* 1. **`number`**: incident identifier (24,918 different values)
* 2. **`incident state`**: eight levels controlling the incident management process transitions from opening until closing the case;
* 3.  **`active`**: boolean attribute that shows whether the record is active or closed/canceled;
* 4.  **`reassignment_count`**: number of times the incident has the group or the support analysts changed;
* 5.  **`reopen_count`**: number of times the incident resolution was rejected by the caller;
* 6.  **`sys_mod_count`**: number of incident updates until that moment;
* 7.  **`made_sla`**: boolean attribute that shows whether the incident exceeded the target SLA;
* 8.  **`caller_id`**: identifier of the user affected;
* 9.  **`opened_by`**: identifier of the user who reported the incident;
* 10. **`opened_at`**: incident user opening date and time;
* 11. **`sys_created_by`**: identifier of the user who registered the incident;
* 12. **`sys_created_at`**: incident system creation date and time;
* 13. **`sys_updated_by`**: identifier of the user who updated the incident and generated the current log record;
* 14. **`sys_updated_at`**: incident system update date and time;
* 15. **`contact_type`**: categorical attribute that shows by what means the incident was reported;
* 16. **`location`**: identifier of the location of the place affected;
* 17. **`category`**: first-level description of the affected service;
* 18. **`subcategory`**: second-level description of the affected service (related to the first level description, i.e., to category);
* 19. **`u_symptom`**: description of the user perception about service availability;
* 20. **`cmdb_ci`**: (confirmation item) identifier used to report the affected item (not mandatory);
* 21. **`impact`**: description of the impact caused by the incident (values: 1â€“High; 2â€“Medium; 3â€“Low);
* 22. **`urgency`**: description of the urgency informed by the user for the incident resolution (values: 1â€“High; 2â€“Medium; 3â€“Low);
* 23. **`priority`**: calculated by the system based on 'impact' and 'urgency';
* 24. **`assignment_group`**: identifier of the support group in charge of the incident;
* 25. **`assigned_to`**: identifier of the user in charge of the incident;
* 26. **`knowledge`**: boolean attribute that shows whether a knowledge base document was used to resolve the incident;
* 27. **`u_priority_confirmation`**: boolean attribute that shows whether the priority field has been double-checked;
* 28. **`notify`**: categorical attribute that shows whether notifications were generated for the incident;
* 29. **`problem_id`**: identifier of the problem associated with the incident;
* 30. **`rfc`**: (request for change) identifier of the change request associated with the incident;
* 31. **`vendor`**: identifier of the vendor in charge of the incident;
* 32. **`caused_by`**: identifier of the RFC responsible by the incident;
* 33. **`close_code`**: identifier of the resolution of the incident;
* 34. **`resolved_by`**: identifier of the user who resolved the incident;
* 35. **`resolved_at`**: incident user resolution date and time (dependent variable);
* 36. **`closed_at`**: incident user close date and time (dependent variable).

**Learning Outcomes**
* Exploratory Data Analysis
* Preparing the data to train a model
* Training and making predictions using a classification model
* Model evaluatio

In [1]:
!pip install -U imbalanced-learn

# Importing packages - Pandas, Numpy, Seaborn, Scipy
import pandas as pd, numpy as np, matplotlib.pyplot as plt, seaborn as sns, sys
import matplotlib.style as style; style.use('fivethirtyeight')
from scipy.stats import zscore, norm

np.random.seed(0)

# Modelling - LR, KNN, NB, Metrics
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve, accuracy_score
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier

# Oversampling
from imblearn.over_sampling import SMOTE

# Suppress warnings
import warnings; warnings.filterwarnings('ignore')
pd.options.display.max_rows = 4000



In [None]:
# Reading the data as dataframe and print the first five rows
bank = pd.read_csv('Bank_Personal_Loan_Modelling.csv')
bank.head()