## AI-Powered Complaint Resolution & Root Cause Analysis

An end-to-end NLP-driven complaint intelligence platform that automatically classifies customer complaints, identifies root causes, prioritizes urgency, and generates automated responses using LSTM and Transformer models, deployed via a Flask web application.

Project Objectives :

- Automate complaint classification using deep learning and transformers

- Detect sentiment and assign complaint priority

- Identify recurring issues and root causes

- Generate automated, context-aware replies

- Provide a clean web interface for real-time analysis

- Stores complaints and AI decisions in a database for internal departments

The system is designed for banks, financial institutions, customer support teams, and enterprises.
 
#### Dataset :

Source: Consumer Financial Protection Bureau (CFPB)

Size: ~228,000 complaints

Key Columns Used:

Consumer complaint narrative

Product

Issue

Sub-issue

https://www.kaggle.com/datasets/kennathalexanderroy/ai-powered-complaint-resolution/data

https://www.consumerfinance.gov/data-research/consumer-complaints/#get-the-data


#### AI & NLP Techniques Used :

| Task                                      | Model / Technique                     |
| ----------------------------------------- | ------------------------------------- |
| Complaint Classification (Traditional DL) | LSTM (TensorFlow/Keras)               |
| Complaint Classification (Advanced NLP)   | Transformer (BART Zero-Shot)          |
| Text Preprocessing                        | Tokenization, Lemmatization, Cleaning |
| Sentiment Analysis                        | DistilBERT                            |
| Priority Assignment                       | Rule-based on sentiment               |
| Root Cause Analysis                       | Category mapping logic                |
| Automated Replies                         | AI-driven response templates          |


#### System Architecture :

- User Complaint

      ↓

- Text Preprocessing (Cleaning + Lemmatization)

      ↓

- LSTM Classifier ───── Transformer Classifier

      ↓                     ↓

- Sentiment Analysis (DistilBERT)

      ↓

- Priority Assignment

      ↓

- Root Cause Mapping

      ↓

- Automated Reply Generation

      ↓

- SQLite Database Storage

      ↓

- Flask Web Interface



In [5]:
import pandas as pd
import numpy as np
import re
import pickle

import nltk
nltk.download('stopwords')
nltk.download('wordnet')

from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout

from transformers import pipeline


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Pc\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Pc\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
  from .autonotebook import tqdm as notebook_tqdm


### Data Collection

In [6]:
df = pd.read_csv(r"C:\Users\Pc\OneDrive\Desktop\Alex..New..Folder\Deep_Learning\NLP_Project\Data\complaints.csv")

df = df[['Consumer complaint narrative', 'Product']]
df.dropna(inplace=True)

df.rename(columns={
    'Consumer complaint narrative': 'text',
    'Product': 'label'
}, inplace=True)

df.head()

Unnamed: 0,text,label
0,Statement Regarding Unauthorized Online Bankin...,"Money transfer, virtual currency, or money ser..."
1,I am filing a complaint a\n\ngainst Cash App (...,"Money transfer, virtual currency, or money ser..."
2,This account has already been deleted from my ...,Debt collection
3,"I received the following letter, accepted the ...",Mortgage
4,I am filing a complaint against capital one re...,Checking or savings account


In [7]:
df["label"].value_counts()

label
Debt collection                                       100059
Money transfer, virtual currency, or money service     63405
Checking or savings account                            51119
Mortgage                                               14114
Name: count, dtype: int64

### NLP Preprocessing (Lemmatization)

In [13]:
stop_words = set(stopwords.words('english'))
lemmatizer = WordNetLemmatizer()

def clean_text(text):
    text = text.lower()
    text = re.sub(r'[^a-z\s]', '', text)
    words = text.split()
    words = [lemmatizer.lemmatize(w) for w in words if w not in stop_words]
    return " ".join(words)

df['clean_text'] = df['text'].apply(clean_text)


### Encode Labels

In [14]:
label_encoder = LabelEncoder()
df['label_encoded'] = label_encoder.fit_transform(df['label'])

num_classes = len(label_encoder.classes_)

### Train/Test Split

In [15]:
X_train, X_test, y_train, y_test = train_test_split(
    df['clean_text'],
    df['label_encoded'],
    test_size=0.2,
    random_state=42
)


### Tokenization

In [16]:
MAX_WORDS = 20000
MAX_LEN = 150

tokenizer = Tokenizer(num_words=MAX_WORDS)
tokenizer.fit_on_texts(X_train)

X_train_seq = tokenizer.texts_to_sequences(X_train)
X_test_seq = tokenizer.texts_to_sequences(X_test)

X_train_pad = pad_sequences(X_train_seq, maxlen=MAX_LEN)
X_test_pad = pad_sequences(X_test_seq, maxlen=MAX_LEN)


### LSTM Model

In [17]:
lstm_model = Sequential([
    Embedding(MAX_WORDS, 128, input_length=MAX_LEN),
    LSTM(128, return_sequences=False),
    Dropout(0.3),
    Dense(num_classes, activation='softmax')
])

lstm_model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

lstm_model.summary()




### Train LSTM

In [18]:
lstm_model.fit(
    X_train_pad,
    y_train,
    validation_data=(X_test_pad, y_test),
    epochs=3,
    batch_size=64
)


Epoch 1/3
[1m2859/2859[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m507s[0m 176ms/step - accuracy: 0.9034 - loss: 0.2880 - val_accuracy: 0.9055 - val_loss: 0.2697
Epoch 2/3
[1m2859/2859[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m520s[0m 182ms/step - accuracy: 0.9299 - loss: 0.2075 - val_accuracy: 0.9319 - val_loss: 0.1944
Epoch 3/3
[1m2859/2859[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m506s[0m 177ms/step - accuracy: 0.9429 - loss: 0.1620 - val_accuracy: 0.9372 - val_loss: 0.1777


<keras.src.callbacks.history.History at 0x2d051152c50>

### SAVE MODELS (IMPORTANT)

In [19]:
lstm_model.save("models/lstm_model.h5")

with open("models/tokenizer.pkl", "wb") as f:
    pickle.dump(tokenizer, f)

with open("models/label_encoder.pkl", "wb") as f:
    pickle.dump(label_encoder, f)




### TRANSFORMER MODELS (REAL USAGE)

In [20]:
# Zero-shot category classification
zero_shot_classifier = pipeline(
    "zero-shot-classification",
    model="facebook/bart-large-mnli",
    framework="pt"
)

# Sentiment
sentiment_pipeline = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english",
    framework="pt"
)


Device set to use cpu
Device set to use cpu


### CATEGORY LIST

In [21]:
CATEGORIES = list(label_encoder.classes_)

### FINAL ANALYSIS FUNCTION

In [22]:
def analyze_complaint(text):
    cleaned = clean_text(text)

    # Transformer category
    tf_result = zero_shot_classifier(text, CATEGORIES)
    transformer_category = tf_result['labels'][0]

    # LSTM category
    seq = tokenizer.texts_to_sequences([cleaned])
    pad = pad_sequences(seq, maxlen=MAX_LEN)
    lstm_pred = lstm_model.predict(pad)
    lstm_category = label_encoder.inverse_transform(
        [np.argmax(lstm_pred)]
    )[0]

    # Sentiment
    sentiment = sentiment_pipeline(text)[0]['label']
    priority = "High" if sentiment == "NEGATIVE" else "Medium"

    # Root cause + Reply
    root_cause_map = {
        "Checking or savings account": "Unauthorized transaction",
        "Debt collection": "Incorrect debt collection",
        "Money transfer": "Transaction failure",
        "Mortgage": "Loan servicing issue",
        "Credit reporting": "Incorrect credit report"
    }

    reply_map = {
        "Checking or savings account":
        "We apologize for the unauthorized transaction. Our support team is reviewing your account and will resolve this at the earliest.",

        "Debt collection":
        "We apologize for the debt collection issue. Our team is investigating this urgently.",

        "Money transfer":
        "We regret the inconvenience caused by the failed transfer. Our team is working on it.",

        "Mortgage":
        "We understand your concern regarding your mortgage. Our team is addressing this issue.",

        "Credit reporting":
        "We apologize for the incorrect credit reporting. Our team will coordinate with the bureau."
    }

    return {
        "Transformer Category": transformer_category,
        "LSTM Category": lstm_category,
        "Sentiment": sentiment,
        "Priority": priority,
        "Root Cause": root_cause_map.get(transformer_category),
        "Automated Reply": reply_map.get(transformer_category)
    }


### TEST

In [23]:
test_complaint = """
There was an unauthorized charge deducted from my savings account
without my consent. Customer support did not help.
"""

analyze_complaint(test_complaint)


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 900ms/step


{'Transformer Category': 'Checking or savings account',
 'LSTM Category': 'Checking or savings account',
 'Sentiment': 'NEGATIVE',
 'Priority': 'High',
 'Root Cause': 'Unauthorized transaction',
 'Automated Reply': 'We apologize for the unauthorized transaction. Our support team is reviewing your account and will resolve this at the earliest.'}