# Overview

### This repository contains Python code for text classification of consumer complaints into predefined categories. The classification is based on the product mentioned in the complaint. The primary purpose is to categorize consumer complaints into four main categories: Credit reporting, repair, or other; Debt collection; Consumer Loan; and Mortgage.

In [1]:
import pandas as pd 
df=pd.read_csv(r"C:\Users\mdeva\Desktop\Data Analysis\complaints.csv")
df.head(50)

  df=pd.read_csv(r"C:\Users\mdeva\Desktop\Data Analysis\complaints.csv")


Unnamed: 0,Date received,Product,Sub-product,Issue,Sub-issue,Consumer complaint narrative,Company public response,Company,State,ZIP code,Tags,Consumer consent provided?,Submitted via,Date sent to company,Company response to consumer,Timely response?,Consumer disputed?,Complaint ID
0,2024-01-08,Debt collection,Credit card debt,Written notification about debt,Didn't receive enough information to verify debt,,,Experian Information Solutions Inc.,HI,96815,,,Web,2024-01-08,In progress,Yes,,8121088
1,2023-12-07,Credit reporting or other personal consumer re...,Credit reporting,Problem with a company's investigation into an...,Investigation took more than 30 days,,,"TRANSUNION INTERMEDIATE HOLDINGS, INC.",IL,60615,,,Web,2023-12-07,In progress,Yes,,7963560
2,2023-12-07,Credit reporting or other personal consumer re...,Credit reporting,Incorrect information on your report,Account status incorrect,,,"TRANSUNION INTERMEDIATE HOLDINGS, INC.",IL,604XX,,Other,Web,2023-12-07,In progress,Yes,,7963773
3,2024-01-08,Credit reporting or other personal consumer re...,Credit reporting,Problem with a company's investigation into an...,Their investigation did not fix an error on yo...,,,Experian Information Solutions Inc.,CA,94109,,,Web,2024-01-08,In progress,Yes,,8121175
4,2024-01-08,Debt collection,I do not know,Written notification about debt,Didn't receive enough information to verify debt,,,Experian Information Solutions Inc.,CA,90715,,,Web,2024-01-08,In progress,Yes,,8121453
5,2023-12-26,Credit reporting or other personal consumer re...,Credit reporting,Improper use of your report,Credit inquiries on your report that you don't...,,,Experian Information Solutions Inc.,NY,11375,,,Web,2023-12-26,In progress,Yes,,8061372
6,2023-12-26,Credit reporting or other personal consumer re...,Credit reporting,Incorrect information on your report,Personal information incorrect,,,Experian Information Solutions Inc.,NY,10977,,,Web,2023-12-26,In progress,Yes,,8061373
7,2023-12-26,Credit card,General-purpose credit card or charge card,Problem with a company's investigation into an...,Was not notified of investigation status or re...,,,Experian Information Solutions Inc.,TX,76018,,,Web,2023-12-26,In progress,Yes,,8061447
8,2023-12-17,Student loan,Federal student loan servicing,Improper use of your report,Reporting company used your report improperly,,,"Maximus Education, LLC dba Aidvantage",OH,43004,,,Web,2023-12-17,In progress,Yes,,8011657
9,2023-12-26,Credit reporting or other personal consumer re...,Credit reporting,Incorrect information on your report,Account status incorrect,,,Experian Information Solutions Inc.,CA,92114,,,Web,2023-12-26,In progress,Yes,,8061466


In [2]:
df['Category'] = df['Product'].apply(lambda x: 'Credit reporting, repair, or other' if 'credit' in x.lower() else
                                        'Debt collection' if 'debt' in x.lower() else
                                        'Consumer Loan' if 'loan' in x.lower() else
                                        'Mortgage' if 'mortgage' in x.lower() else 'Other')

df.dropna(subset=['Category', 'Consumer complaint narrative'], inplace=True)

df['Consumer complaint narrative'].fillna('', inplace=True)

## Model Evaluation

### The code includes evaluation metrics such as classification report and accuracy for both Multinomial Naive Bayes and Linear SVC classifiers.

In [4]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix

X_train, X_test, y_train, y_test = train_test_split(
    df[['Consumer complaint narrative']],
    df['Category'],
    test_size=0.2,
    random_state=42
)

vectorizer = TfidfVectorizer(max_features=5000, stop_words='english')
X_train_tfidf = vectorizer.fit_transform(X_train['Consumer complaint narrative'])
X_test_tfidf = vectorizer.transform(X_test['Consumer complaint narrative'])


clf = MultinomialNB()
clf.fit(X_train_tfidf, y_train)

y_pred = clf.predict(X_test_tfidf)

print("Classification Report:")
print(classification_report(y_test, y_pred))
print("\nAccuracy:", accuracy_score(y_test, y_pred))

Classification Report:
                                    precision    recall  f1-score   support

                     Consumer Loan       0.69      0.68      0.69     19384
Credit reporting, repair, or other       0.90      0.90      0.90    212111
                   Debt collection       0.80      0.63      0.71     46198
                          Mortgage       0.87      0.90      0.88     22429
                             Other       0.66      0.89      0.76     27825

                          accuracy                           0.85    327947
                         macro avg       0.78      0.80      0.79    327947
                      weighted avg       0.85      0.85      0.85    327947


Accuracy: 0.8463928622612801


In [5]:
new_complaint = ["I have an issue with my mortgage payment."]
new_complaint_tfidf = vectorizer.transform(new_complaint)
predicted_category = clf.predict(new_complaint_tfidf)

print("\nPredicted Category for New Complaint:", predicted_category[0])


Predicted Category for New Complaint: Mortgage


In [6]:
from sklearn.svm import LinearSVC
X_train, X_test, y_train, y_test = train_test_split(
    df[['Consumer complaint narrative']],
    df['Category'],
    test_size=0.2,
    random_state=42
)

vectorizer = TfidfVectorizer(max_features=5000, stop_words='english')
X_train_tfidf = vectorizer.fit_transform(X_train['Consumer complaint narrative'])
X_test_tfidf = vectorizer.transform(X_test['Consumer complaint narrative'])

linear_svc = LinearSVC()
linear_svc.fit(X_train_tfidf, y_train)

y_pred_linear_svc = linear_svc.predict(X_test_tfidf)

print("Linear SVC Classification Report:")
print(classification_report(y_test, y_pred_linear_svc))
print("\nAccuracy:", accuracy_score(y_test, y_pred_linear_svc))


Linear SVC Classification Report:
                                    precision    recall  f1-score   support

                     Consumer Loan       0.80      0.69      0.74     19384
Credit reporting, repair, or other       0.91      0.95      0.93    212111
                   Debt collection       0.81      0.72      0.77     46198
                          Mortgage       0.90      0.90      0.90     22429
                             Other       0.86      0.87      0.87     27825

                          accuracy                           0.89    327947
                         macro avg       0.86      0.83      0.84    327947
                      weighted avg       0.89      0.89      0.89    327947


Accuracy: 0.8905707324659168


In [7]:
new_complaint = ["Exeptionally high student loan"]
new_complaint_tfidf = vectorizer.transform(new_complaint)
predicted_category_linear_svc = linear_svc.predict(new_complaint_tfidf)

print("\nPredicted Category for New Complaint (Linear SVC):", predicted_category_linear_svc[0])



Predicted Category for New Complaint (Linear SVC): Consumer Loan
