### Step 1: Setup the Environment

In this step, I will import all the necessary libraries and modules needed for the analysis. This includes libraries for data manipulation, machine learning, and visualization. I will be using Python’s `pandas`, `numpy`, `matplotlib`, and `sklearn` for data preprocessing, clustering, model training, and evaluation.

In [4]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import make_column_transformer
from sklearn.cluster import KMeans
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report


### Step 2: Load the Dataset

In this step, I will load the dataset into a pandas dataframe from the provided path. I will also take a quick look at the first few rows of the dataset to understand its structure and identify the features we need to work with.

In [5]:
data = pd.read_csv('/Users/darienprall/Documents/GitHub/School/Capella/CSC-4030_Introduction_to_Machine_Learning/Assessment_9/credit_card_customers.csv')
print(data.head())

   CLIENTNUM     Attrition_Flag  Customer_Age Gender  Dependent_count  \
0  768805383  Existing Customer            45      M                3   
1  818770008  Existing Customer            49      F                5   
2  713982108  Existing Customer            51      M                3   
3  769911858  Existing Customer            40      F                4   
4  709106358  Existing Customer            40      M                3   

  Education_Level Marital_Status Income_Category Card_Category  \
0     High School        Married     $60K - $80K          Blue   
1        Graduate         Single  Less than $40K          Blue   
2        Graduate        Married    $80K - $120K          Blue   
3     High School        Unknown  Less than $40K          Blue   
4      Uneducated        Married     $60K - $80K          Blue   

   Months_on_book  ...  Credit_Limit  Total_Revolving_Bal  Avg_Open_To_Buy  \
0              39  ...       12691.0                  777          11914.0   
1       

### Step 3: Data Preprocessing

In this step, I will preprocess the data by:
- Separating the features (`X`) and the target variable (`y`).
- Encoding categorical features using OneHotEncoder.
- Scaling numerical features like `Age` and `CreditCardLimit` using StandardScaler.
I will use a `ColumnTransformer` to apply these transformations.


In [None]:
X = data.drop('Attrition_Flag', axis=1)
y = data['Attrition_Flag'].map({'Existing Customer': 0, 'Attrited Customer': 1})
preprocess = make_column_transformer(
    (StandardScaler(), ['Customer_Age','Credit_Limit']),
    (OneHotEncoder(), ['Marital_Status', 'Card_Category']),
    remainder = 'passthrough'
)

X_processed = preprocess.fit_transform(X)
#print(X_processed[:5])

[[-0.1654055800960332 0.4466219030536973 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0
  768805383 'M' 3 'High School' '$60K - $80K' 39 5 1 3 777 11914.0 1.335
  1144 42 1.625 0.061 9.3448e-05 0.99991]
 [0.33357038345253087 -0.04136665206187948 0.0 0.0 1.0 0.0 1.0 0.0 0.0
  0.0 818770008 'F' 5 'Graduate' 'Less than $40K' 44 6 1 2 864 7392.0
  1.541 1291 33 3.714 0.105 5.6861e-05 0.99994]
 [0.5830583652268129 -0.5736977974168198 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0
  713982108 'M' 3 'Graduate' '$80K - $120K' 36 4 1 0 0 3418.0 2.594 1887
  20 2.333 0.0 2.1081e-05 0.99998]
 [-0.7891255345317383 -0.5852510777521379 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0
  769911858 'F' 4 'High School' 'Less than $40K' 34 3 4 1 2517 796.0
  1.405 1171 20 2.333 0.76 0.00013366 0.99987]
 [-0.7891255345317383 -0.43087724622403095 0.0 1.0 0.0 0.0 1.0 0.0 0.0
  0.0 709106358 'M' 3 'Uneducated' '$60K - $80K' 21 5 1 0 0 4716.0 2.175
  816 28 2.5 0.0 2.1676e-05 0.99998]]
