# Bank Customer Complaint Classification Project

## Project Overview  
This NLP project aims to automatically categorize banking customer complaints into specific product categories using textual narratives. Developed to streamline complaint resolution processes, this solution helps financial institutions:

- **Reduce manual categorization effort** by 70-80%  
- **Improve complaint routing accuracy**  
- **Identify emerging product-related issues** faster 

**Business Impact**: Enables:  
- 30-40% faster response times  
- Better resource allocation for customer service teams  

# Importing the libraries

In [1]:
import pandas as pd
import numpy as np
import re
import nltk
import matplotlib.pyplot as plt
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score
from imblearn.over_sampling import SMOTE
from collections import Counter

# Load dataset

In [2]:
compdataset = pd.read_csv('complaints.csv').drop(columns=['Unnamed: 0'])
print(f"Original shape: {compdataset.shape}")

Original shape: (162421, 2)


In [3]:
compdataset.head()

Unnamed: 0,product,narrative
0,credit_card,purchase order day shipping amount receive pro...
1,credit_card,forwarded message date tue subject please inve...
2,retail_banking,forwarded message cc sent friday pdt subject f...
3,credit_reporting,payment history missing credit report speciali...
4,credit_reporting,payment history missing credit report made mis...


In [4]:
compdataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 162421 entries, 0 to 162420
Data columns (total 2 columns):
 #   Column     Non-Null Count   Dtype 
---  ------     --------------   ----- 
 0   product    162421 non-null  object
 1   narrative  162411 non-null  object
dtypes: object(2)
memory usage: 2.5+ MB


# preprocessing

## Remove missing values

In [5]:
compdataset.isnull().sum()

product       0
narrative    10
dtype: int64

In [6]:
compdataset = compdataset.dropna()

## Remove duplicates

In [7]:
compdataset = compdataset.drop_duplicates(subset=['narrative', 'product'])
print(f"After removing duplicates: {compdataset.shape}")

After removing duplicates: (124676, 2)


## Separate features and target

In [8]:
X = compdataset['narrative'].astype(str).fillna("")
y = compdataset['product']

## Display unique class distribution

In [9]:
product_ratio = y.value_counts(normalize=True) * 100
for product, ratio in product_ratio.items():
    print(f"Product ({product}): %{ratio:.2f}")

Product (credit_reporting): %45.16
Product (debt_collection): %16.94
Product (mortgages_and_loans): %15.05
Product (credit_card): %12.05
Product (retail_banking): %10.81


# Text preprocessing

### Download necessary NLTK resources

In [10]:
nltk.download('stopwords')
nltk.download('wordnet')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\user\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\user\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

### Remove Stopwords and Lemmatization

In [11]:
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

### Clean Text

In [12]:
def clean_text(text):
    text = text.lower()
    text = re.sub(r'[^a-zA-Z\s]', '', text)  # Remove punctuation/numbers
    tokens = text.split()
    tokens = [lemmatizer.lemmatize(word) for word in tokens if word not in stop_words]
    return ' '.join(tokens)

In [13]:
X_cleaned = X.apply(clean_text)

# Split the dataset 

In [14]:
X_train_text, X_test_text, y_train, y_test = train_test_split(
    X_cleaned, y, test_size=0.2, random_state=42, stratify=y
)


## TF-IDF Vectorization 

In [15]:
tfidf = TfidfVectorizer(max_features=5000)
X_train = tfidf.fit_transform(X_train_text)
X_test = tfidf.transform(X_test_text)

## Encode labels

In [16]:
le = LabelEncoder()
y_train = le.fit_transform(y_train)
y_test = le.transform(y_test)

# Train and Evaluat models

In [None]:
models = {
    "Naive Bayes": MultinomialNB(),
    "Logistic Regression": LogisticRegression(max_iter=1000),
    "Random Forest": RandomForestClassifier()
}

In [25]:
for name, model in models.items():
    model.fit(X_train, y_train)
    print(f"\nEvaluating {name}:")
    y_pred = model.predict(X_test)
    print(classification_report(y_test, y_pred, target_names=le.classes_))
    print("Accuracy:", accuracy_score(y_test, y_pred))



Evaluating Naive Bayes:
                     precision    recall  f1-score   support

        credit_card       0.75      0.74      0.74      3005
   credit_reporting       0.83      0.88      0.86     11261
    debt_collection       0.83      0.63      0.71      4223
mortgages_and_loans       0.79      0.85      0.82      3752
     retail_banking       0.83      0.85      0.84      2695

           accuracy                           0.81     24936
          macro avg       0.80      0.79      0.79     24936
       weighted avg       0.81      0.81      0.81     24936

Accuracy: 0.8136429258902791

Evaluating Logistic Regression:
                     precision    recall  f1-score   support

        credit_card       0.80      0.78      0.79      3005
   credit_reporting       0.87      0.90      0.89     11261
    debt_collection       0.81      0.76      0.78      4223
mortgages_and_loans       0.85      0.84      0.84      3752
     retail_banking       0.86      0.88      0.87     

In [17]:
# Apply SMOTE only on training data
smote = SMOTE()
X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)

In [18]:
print("Before SMOTE:", Counter(y_train))
print("After SMOTE:", Counter(y_train_resampled))

Before SMOTE: Counter({1: 45042, 2: 16894, 3: 15007, 0: 12019, 4: 10778})
After SMOTE: Counter({3: 45042, 1: 45042, 2: 45042, 4: 45042, 0: 45042})


In [27]:
for name, model in models.items():
    model.fit(X_train_resampled, y_train_resampled)
    print(f"\nEvaluating {name}:")
    y_pred_smote = model.predict(X_test)
    print(classification_report(y_test, y_pred_smote, target_names=le.classes_))
    print("Accuracy:", accuracy_score(y_test, y_pred_smote))


Evaluating Naive Bayes:
                     precision    recall  f1-score   support

        credit_card       0.66      0.80      0.72      3005
   credit_reporting       0.92      0.77      0.84     11261
    debt_collection       0.74      0.76      0.75      4223
mortgages_and_loans       0.73      0.88      0.80      3752
     retail_banking       0.79      0.89      0.84      2695

           accuracy                           0.80     24936
          macro avg       0.77      0.82      0.79     24936
       weighted avg       0.81      0.80      0.80     24936

Accuracy: 0.800489252486365

Evaluating Logistic Regression:
                     precision    recall  f1-score   support

        credit_card       0.73      0.80      0.77      3005
   credit_reporting       0.92      0.82      0.86     11261
    debt_collection       0.74      0.81      0.77      4223
mortgages_and_loans       0.79      0.86      0.82      3752
     retail_banking       0.84      0.89      0.86      

In [21]:
from xgboost import XGBClassifier

xgb = XGBClassifier(use_label_encoder=False, eval_metric='mlogloss')
xgb.fit(X_train_resampled, y_train_resampled)

# Evaluate
y_pred_xgb = xgb.predict(X_test)
print("XGBoost Model Evaluation:")
print(classification_report(y_test, y_pred_xgb, target_names=le.classes_))
print("Accuracy:", accuracy_score(y_test, y_pred_xgb))

Parameters: { "use_label_encoder" } are not used.

  bst.update(dtrain, iteration=i, fobj=obj)


XGBoost Model Evaluation:
                     precision    recall  f1-score   support

        credit_card       0.77      0.80      0.79      3005
   credit_reporting       0.89      0.86      0.88     11261
    debt_collection       0.77      0.80      0.79      4223
mortgages_and_loans       0.84      0.83      0.83      3752
     retail_banking       0.85      0.88      0.87      2695

           accuracy                           0.84     24936
          macro avg       0.82      0.84      0.83     24936
       weighted avg       0.84      0.84      0.84     24936

Accuracy: 0.842316329804299


In [22]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout

# Build a simple feedforward neural network
model = Sequential([
    Dense(512, activation='relu', input_shape=(X_train_resampled.shape[1],)),
    Dropout(0.3),
    Dense(256, activation='relu'),
    Dropout(0.3),
    Dense(len(le.classes_), activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(X_train_resampled.toarray(), y_train_resampled, epochs=10, batch_size=32, validation_data=(X_test.toarray(), y_test))

# Evaluate
y_pred_nn = model.predict(X_test.toarray()).argmax(axis=1)
print("Neural Network Model Evaluation:")
print(classification_report(y_test, y_pred_nn, target_names=le.classes_))
print("Accuracy:", accuracy_score(y_test, y_pred_nn))

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/10
[1m7038/7038[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m135s[0m 19ms/step - accuracy: 0.8661 - loss: 0.3975 - val_accuracy: 0.8442 - val_loss: 0.4680
Epoch 2/10
[1m7038/7038[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m132s[0m 19ms/step - accuracy: 0.9551 - loss: 0.1408 - val_accuracy: 0.8530 - val_loss: 0.5499
Epoch 3/10
[1m7038/7038[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m133s[0m 19ms/step - accuracy: 0.9804 - loss: 0.0614 - val_accuracy: 0.8524 - val_loss: 0.6434
Epoch 4/10
[1m7038/7038[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m134s[0m 19ms/step - accuracy: 0.9882 - loss: 0.0355 - val_accuracy: 0.8539 - val_loss: 0.7790
Epoch 5/10
[1m7038/7038[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m136s[0m 19ms/step - accuracy: 0.9913 - loss: 0.0263 - val_accuracy: 0.8518 - val_loss: 0.8933
Epoch 6/10
[1m7038/7038[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m136s[0m 19ms/step - accuracy: 0.9930 - loss: 0.0218 - val_accuracy: 0.8511 - val_loss: 0.944

# Model Performance Report

In [9]:
from IPython.display import Markdown
import pandas as pd

# Model accuracy data
models = [
    {"Model": "Naive Bayes", "Accuracy": 0.800489252486365, "Rank": 5},
    {"Model": "Logistic Regression", "Accuracy": 0.8295636830285531, "Rank": 4},
    {"Model": "Neural Network", "Accuracy": 0.8460057747834456, "Rank": 2},
    {"Model": "XGBoost", "Accuracy": 0.842316329804299, "Rank": 3},
    {"Model": "Random Forest", "Accuracy": 0.8575152390118704, "Rank": 1}
]

# Create DataFrame and sort by accuracy
df = pd.DataFrame(models).sort_values('Accuracy', ascending=False)

# Generate Markdown
md_text = f"""

## Accuracy Comparison

{df.to_markdown(index=False, tablefmt="github", floatfmt=".3f")}

### Key Insights:
1. **Top Performer**: Random Forest achieved the highest accuracy ({df.iloc[0]['Accuracy']:.1%})
2. **Traditional Models**: Logistic Regression  ({df[df['Model']=='Logistic Regression']['Accuracy'].values[0]:.1%})
3. **Ensemble Advantage**: Random Forest and XGBoost both exceeded 84% accuracy
4. **Baseline**: Naive Bayes served as effective baseline ({df.iloc[-1]['Accuracy']:.1%})

## Recommendations:
- **Production Deployment**: Random Forest (best accuracy)
- **Balanced Choice**: XGBoost (nearly equal performance)
"""

Markdown(md_text)



## Accuracy Comparison

| Model               |   Accuracy |   Rank |
|---------------------|------------|--------|
| Random Forest       |      0.858 |      1 |
| Neural Network      |      0.846 |      2 |
| XGBoost             |      0.842 |      3 |
| Logistic Regression |      0.830 |      4 |
| Naive Bayes         |      0.800 |      5 |

### Key Insights:
1. **Top Performer**: Random Forest achieved the highest accuracy (85.8%)
2. **Traditional Models**: Logistic Regression  (83.0%)
3. **Ensemble Advantage**: Random Forest and XGBoost both exceeded 84% accuracy
4. **Baseline**: Naive Bayes served as effective baseline (80.0%)

## Recommendations:
- **Production Deployment**: Random Forest (best accuracy)
- **Balanced Choice**: XGBoost (nearly equal performance)
