## Problem Statement 

You need to build a model that is able to classify customer complaints based on the products/services. By doing so, you can segregate these tickets into their relevant categories and, therefore, help in the quick resolution of the issue.

You will be doing topic modelling on the <b>.json</b> data provided by the company. Since this data is not labelled, you need to apply NMF to analyse patterns and classify tickets into the following five clusters based on their products/services:

* Credit card / Prepaid card

* Bank account services

* Theft/Dispute reporting

* Mortgages/loans

* Others 


With the help of topic modelling, you will be able to map each ticket onto its respective department/category. You can then use this data to train any supervised model such as logistic regression, decision tree or random forest. Using this trained model, you can classify any new customer complaint support ticket into its relevant department.

## Pipelines that needs to be performed:

You need to perform the following eight major tasks to complete the assignment:

1.  Data loading

2. Text preprocessing

3. Exploratory data analysis (EDA)

4. Feature extraction

5. Topic modelling 

6. Model building using supervised learning

7. Model training and evaluation

8. Model inference

## Importing the necessary libraries

In [58]:
import json 
import statsmodels.api as sm
import numpy as np
import pandas as pd
import re, nltk, spacy, string
import en_core_web_sm
nlp = en_core_web_sm.load()
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

from collections import Counter
from sklearn.model_selection import train_test_split

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer,TfidfTransformer
from nltk.stem import WordNetLemmatizer
import nltk
nltk.download('wordnet')
nltk.download('punkt')
nltk.download('stopwords') 
nltk.download('averaged_perceptron_tagger')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk import pos_tag
from nltk import bigrams
from nltk import trigrams

[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/arunprakash/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /Users/arunprakash/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/arunprakash/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/arunprakash/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


## Loading the data

The data is in JSON format and we need to convert it to a dataframe.

In [2]:
pd.set_option('display.max_colwidth', None)

# Display the entire DataFrame without truncation
pd.set_option('display.max_columns', None)  # Show all columns
pd.set_option('display.width', None)       # Adjust display width

In [3]:
# Opening JSON file 
with open('/Users/arunprakash/Desktop/complaints-2021-05-14_08_16.json', 'r') as file:
    data = json.load(file)
#Write the path to your data file and load it 
  
# returns JSON object as  
# a dictionary 

df=pd.json_normalize(data)

## Data preparation

In [4]:
df.shape

(78313, 22)

In [5]:
# Inspect the dataframe to understand the given data.
df.head(5).T

Unnamed: 0,0,1,2,3,4
_index,complaint-public-v2,complaint-public-v2,complaint-public-v2,complaint-public-v2,complaint-public-v2
_type,complaint,complaint,complaint,complaint,complaint
_id,3211475,3229299,3199379,2673060,3203545
_score,0.0,0.0,0.0,0.0,0.0
_source.tags,,Servicemember,,,
_source.zip_code,90301,319XX,77069,48066,10473
_source.complaint_id,3211475,3229299,3199379,2673060,3203545
_source.issue,Attempts to collect debt not owed,Written notification about debt,"Other features, terms, or problems",Trouble during payment process,Fees or interest
_source.date_received,2019-04-13T12:00:00-05:00,2019-05-01T12:00:00-05:00,2019-04-02T12:00:00-05:00,2017-09-13T12:00:00-05:00,2019-04-05T12:00:00-05:00
_source.state,CA,GA,TX,MI,NY


In [6]:
#print the column names
df.columns
data = df.copy()

In [7]:
data.shape

(78313, 22)

In [8]:
#Assign new column names
df.columns=df.columns.str.replace("_source.","")
df.columns=df.columns.str.replace("^_(\w)",r"\1",regex=True)
df.columns

  df.columns=df.columns.str.replace("_source.","")


Index(['index', 'type', 'id', 'score', 'tags', 'zip_code', 'complaint_id',
       'issue', 'date_received', 'state', 'consumer_disputed', 'product',
       'company_response', 'company', 'submitted_via', 'date_sent_to_company',
       'company_public_response', 'sub_product', 'timely',
       'complaint_what_happened', 'sub_issue', 'consumer_consent_provided'],
      dtype='object')

In [9]:
#Assign nan in place of blanks in the complaints column
df['complaint_what_happened']=df['complaint_what_happened'].replace(r"^\s*$", np.nan, regex=True)


In [10]:
#Remove all rows where complaints column is nan
df.dropna(subset=['complaint_what_happened'],inplace=True)

## Prepare the text for topic modeling

Once you have removed all the blank complaints, you need to:

* Make the text lowercase
* Remove text in square brackets
* Remove punctuation
* Remove words containing numbers


Once you have done these cleaning operations you need to perform the following:
* Lemmatize the texts
* Extract the POS tags of the lemmatized text and remove all the words which have tags other than NN[tag == "NN"].


In [11]:
# Write your function here to clean the text and remove all the unnecessary elements.
def preprocess(document):
    #'Make the text lowercase ,Remove text in square brackets,Remove punctuation,Remove words containing numbers'

    # change sentence to lower case
    document = document.lower()
    # remove text in curly bracket
    pattern = r"\{.*?\}"
    document = re.sub(pattern,"",document)
    # remove puntuation
    puntuation = r'[^\w\s]'
    document = re.sub(puntuation,"",document)
    # remove word containing numbers
    numbers = r"\d"
    document = re.sub(numbers,"",document)
    # remove word containing new line
    pattern = r"\n"
    document = re.sub(pattern,"",document)
    # tokenize into words
    words = word_tokenize(document)
    # stopword removal
    
    filtered_words=[]
    stop_words = set(stopwords.words('english'))
    filtered_words = [word for word in words if word not in stop_words]
    
    #lemitize
    wordnet_lemitzer=WordNetLemmatizer()
    
    words = [wordnet_lemitzer.lemmatize(word, pos='v') for word in filtered_words]
    document=" ".join(words)
    return document

In [12]:
#Create a dataframe('df_clean') that will have only the complaints and the lemmatized complaints 
df_clean=pd.DataFrame()
df_clean['complaints']=df['complaint_what_happened']

In [13]:
#Write your function to Lemmatize the texts
df_clean['lemmatized complaints']= df['complaint_what_happened'].apply(preprocess)

In [14]:
#Write your function to extract the POS tags 

def pos_tag_text(text):
    pos_tags = pos_tag(word_tokenize(text))  # Tokenize and tag words
    nn_tags = []                             # List to store nouns (NN)
    
    for word, tag in pos_tags:               # Loop through tagged words
        if tag == 'NN':                      # Check if the tag is 'NN'
            nn_tags.append((word, tag))      # Append to the list
    
    return nn_tags if nn_tags else None      # Return list or None if empty

# Perform POS tagging
df_clean["complaint_POS_removed"] = df_clean['lemmatized complaints'].apply(pos_tag_text)

#this column should contain lemmatized text with all the words removed which have tags other than NN[tag == "NN"].


In [15]:
#The clean dataframe should now contain the raw complaint, lemmatized complaint and the complaint after removing POS tags.
df_clean.shape

(21072, 3)

## Exploratory data analysis to get familiar with the data.

Write the code in this task to perform the following:

*   Visualise the data according to the 'Complaint' character length
*   Using a word cloud find the top 40 words by frequency among all the articles after processing the text
*   Find the top unigrams,bigrams and trigrams by frequency among all the complaints after processing the text. ‘




In [None]:

transformed_data=np.log1p(df_clean.complaints.str.len())
sns.displot(transformed_data)
plt.show()

#### Find the top 40 words by frequency among all the articles after processing the text.

In [None]:
!pip install wordcloud

In [None]:
#Using a word cloud find the top 40 words by frequency among all the articles after processing the text
from wordcloud import WordCloud
stop_words = set(stopwords.words('english'))
wordcloud = WordCloud(stopwords=stop_words,max_words=40).generate(str(df_clean.complaint_POS_removed))
plt.figure(figsize=(10,6))
plt.imshow(wordcloud)
plt.axis('off')
plt.show()
    

In [None]:
#Removing -PRON- from the text corpus
#df_clean['Complaint_clean'] = df_clean['complaint_POS_removed'].str.replace('-PRON-', '')

#### Find the top unigrams,bigrams and trigrams by frequency among all the complaints after processing the text.

#Write your code here to find the top 30 unigram frequency among the complaints in the cleaned datafram(df_clean). 


In [None]:
# Drop null rows and ensure all complaints are strings
df_clean = df_clean.dropna(subset=['lemmatized complaints'])
df_clean['lemmatized complaints'] = df_clean['lemmatized complaints'].astype(str)

In [None]:

# Tokenize and flatten the list
tokens = df_clean['lemmatized complaints'].str.split().sum()

# Count unigram frequencies
counts = Counter(tokens)

# Get the top 30 unigrams
top_30_unigrams = counts.most_common(30)




In [None]:
# Display the results
print("Top 30 Unigrams:")
for word, count in top_30_unigrams:
    print(f"{word}: {count}")

In [None]:
#Print the top 10 words in the unigram frequency
top_10_unigrams = counts.most_common(10)
print("top 10 words in unigram frequency")
print(top_10_unigrams)

In [None]:
#Write your code here to find the top 30 bigram frequency among the complaints in the cleaned datafram(df_clean). 

tokens = df_clean['lemmatized complaints'].str.split().sum()
bigrams = list(bigrams(tokens))
counts = Counter(bigrams)

# Get the top 30 unigrams
top_30_bigrams = counts.most_common(30)

In [None]:
# Display the results
print("Top 30 bigrams:")
for word, count in top_30_bigrams:
    print(f"{word}: {count}")

In [None]:
#Print the top 10 words in the bigram frequency
top_10_bigrams = counts.most_common(10)
print("top 10 words in bigram frequency")
print(top_10_bigrams)

In [None]:
#Write your code here to find the top 30 trigram frequency among the complaints in the cleaned datafram(df_clean). 

tokens = df_clean['lemmatized complaints'].str.split().sum()
trigrams = list(trigrams(tokens))
counts = Counter(trigrams)

# Get the top 30 unigrams
top_30_trigrams = counts.most_common(30)

In [None]:
# Display the results
print("Top 30 trigrams:")
for word, count in top_30_trigrams:
    print(f"{word}: {count}")

In [None]:
#Print the top 10 words in the trigram frequency
top_10_trigrams = counts.most_common(10)
print("top 10 words in trigram frequency")
print(top_10_trigrams)

## The personal details of customer has been masked in the dataset with xxxx. Let's remove the masked text as this will be of no use for our analysis

In [16]:
def xremoval(document):
    
    # remove word containing new line
    pattern = r"x{4,}"
    document = re.sub(pattern,"",document)
    
    return document

In [17]:
#df_clean['Complaint_clean'] = df_clean['Complaint_clean'].str.replace('xxxx','')
df_clean['lemmatized complaints_clean']= df_clean['lemmatized complaints'].apply(xremoval)

In [19]:
#All masked texts has been removed
df_clean.head()

Unnamed: 0,complaints,lemmatized complaints,complaint_POS_removed,lemmatized complaints_clean
1,Good morning my name is XXXX XXXX and I appreciate it if you could help me put a stop to Chase Bank cardmember services. \nIn 2018 I wrote to Chase asking for debt verification and what they sent me a statement which is not acceptable. I am asking the bank to validate the debt. Instead I been receiving mail every month from them attempting to collect a debt. \nI have a right to know this information as a consumer. \n\nChase account # XXXX XXXX XXXX XXXX Thanks in advance for your help.,good morning name xxxx xxxx appreciate could help put stop chase bank cardmember service write chase ask debt verification send statement acceptable ask bank validate debt instead receive mail every month attempt collect debt right know information consumer chase account xxxx xxxx xxxx xxxx thank advance help,"[(morning, NN), (name, NN), (appreciate, NN), (chase, NN), (bank, NN), (cardmember, NN), (service, NN), (chase, NN), (debt, NN), (verification, NN), (statement, NN), (bank, NN), (debt, NN), (mail, NN), (month, NN), (attempt, NN), (debt, NN), (right, NN), (information, NN), (consumer, NN), (chase, NN), (account, NN), (advance, NN), (help, NN)]",good morning name appreciate could help put stop chase bank cardmember service write chase ask debt verification send statement acceptable ask bank validate debt instead receive mail every month attempt collect debt right know information consumer chase account thank advance help
2,I upgraded my XXXX XXXX card in XX/XX/2018 and was told by the agent who did the upgrade my anniversary date would not change. It turned the agent was giving me the wrong information in order to upgrade the account. XXXX changed my anniversary date from XX/XX/XXXX to XX/XX/XXXX without my consent! XXXX has the recording of the agent who was misled me.,upgrade xxxx xxxx card xxxx tell agent upgrade anniversary date would change turn agent give wrong information order upgrade account xxxx change anniversary date xxxxxxxx xxxxxxxx without consent xxxx record agent mislead,"[(card, NN), (tell, NN), (agent, NN), (date, NN), (information, NN), (order, NN), (upgrade, NN), (account, NN), (change, NN), (anniversary, NN), (date, NN), (consent, NN), (xxxx, NN), (record, NN), (agent, NN), (mislead, NN)]",upgrade card tell agent upgrade anniversary date would change turn agent give wrong information order upgrade account change anniversary date without consent record agent mislead
10,"Chase Card was reported on XX/XX/2019. However, fraudulent application have been submitted my identity without my consent to fraudulently obtain services. Do not extend credit without verifying the identity of the applicant.",chase card report xxxx however fraudulent application submit identity without consent fraudulently obtain service extend credit without verify identity applicant,"[(card, NN), (report, NN), (application, NN), (submit, NN), (identity, NN), (consent, NN), (service, NN), (credit, NN), (identity, NN), (applicant, NN)]",chase card report however fraudulent application submit identity without consent fraudulently obtain service extend credit without verify identity applicant
11,"On XX/XX/2018, while trying to book a XXXX XXXX ticket, I came across an offer for {$300.00} to be applied towards the ticket if I applied for a rewards card. I put in my information for the offer and within less than a minute, was notified via the screen that a decision could not be made. I immediately contacted XXXX and was referred to Chase Bank. I then immediately contacted Chase bank within no more than 10minutes of getting the notification on the screen and I was told by the Chase representative I spoke with that my application was denied but she could not state why. I asked for more information about the XXXX offer and she explained that even if I had been approved, the credit offer only gets applied after the first account statement and could not be used to purchase the ticket. I then explicitly told her I was glad I got denied and I was ABSOLUTELY no longer interested in the account. I asked that the application be withdrawn and the representative obliged. This all happened no later than 10mins after putting in the application on XX/XX/2018. Notwithstanding my explicit request not to proceed with the application and contrary to what I was told by the Chase representative, Chase did in fact go ahead to open a credit account in my name on XX/XX/2018. This is now being reported in my Credit Report and Chase has refused to correct this information on my credit report even though they went ahead to process an application which I did not consent to and out of their error.",xxxx try book xxxx xxxx ticket come across offer apply towards ticket apply reward card put information offer within less minute notify via screen decision could make immediately contact xxxx refer chase bank immediately contact chase bank within minutes get notification screen tell chase representative speak application deny could state ask information xxxx offer explain even approve credit offer get apply first account statement could use purchase ticket explicitly tell glad get deny absolutely longer interest account ask application withdraw representative oblige happen later mins put application xxxx notwithstanding explicit request proceed application contrary tell chase representative chase fact go ahead open credit account name xxxx report credit report chase refuse correct information credit report even though go ahead process application consent error,"[(try, NN), (book, NN), (ticket, NN), (come, NN), (offer, NN), (apply, NN), (ticket, NN), (card, NN), (information, NN), (offer, NN), (notify, NN), (decision, NN), (refer, NN), (chase, NN), (bank, NN), (chase, NN), (bank, NN), (screen, NN), (speak, NN), (application, NN), (deny, NN), (state, NN), (information, NN), (xxxx, NN), (offer, NN), (credit, NN), (offer, NN), (account, NN), (statement, NN), (purchase, NN), (ticket, NN), (get, NN), (deny, NN), (interest, NN), (account, NN), (application, NN), (representative, NN), (oblige, NN), (application, NN), (xxxx, NN), (request, NN), (proceed, NN), (application, NN), (tell, NN), (chase, NN), (chase, NN), (fact, NN), (credit, NN), (account, NN), (name, NN), (report, NN), (credit, NN), (report, NN), (chase, NN), (information, NN), (credit, NN), (report, NN), (application, NN), (consent, NN), (error, NN)]",try book ticket come across offer apply towards ticket apply reward card put information offer within less minute notify via screen decision could make immediately contact refer chase bank immediately contact chase bank within minutes get notification screen tell chase representative speak application deny could state ask information offer explain even approve credit offer get apply first account statement could use purchase ticket explicitly tell glad get deny absolutely longer interest account ask application withdraw representative oblige happen later mins put application notwithstanding explicit request proceed application contrary tell chase representative chase fact go ahead open credit account name report credit report chase refuse correct information credit report even though go ahead process application consent error
14,my grand son give me check for {$1600.00} i deposit it into my chase account after fund clear my chase bank closed my account never paid me my money they said they need to speek with my grand son check was clear money was taking by my chase bank refuse to pay me my money my grand son called chase 2 times they told him i should call not him to verify the check owner he is out the country most the time date happen XX/XX/2018 check number XXXX claim number is XXXX with chase,grand son give check deposit chase account fund clear chase bank close account never pay money say need speek grand son check clear money take chase bank refuse pay money grand son call chase time tell call verify check owner country time date happen xxxx check number xxxx claim number xxxx chase,"[(son, NN), (check, NN), (deposit, NN), (chase, NN), (account, NN), (fund, NN), (bank, NN), (account, NN), (money, NN), (son, NN), (check, NN), (money, NN), (bank, NN), (refuse, NN), (pay, NN), (money, NN), (son, NN), (call, NN), (time, NN), (check, NN), (owner, NN), (country, NN), (time, NN), (date, NN), (number, NN), (claim, NN), (number, NN), (chase, NN)]",grand son give check deposit chase account fund clear chase bank close account never pay money say need speek grand son check clear money take chase bank refuse pay money grand son call chase time tell call verify check owner country time date happen check number claim number chase


## Feature Extraction
Convert the raw texts to a matrix of TF-IDF features

**max_df** is used for removing terms that appear too frequently, also known as "corpus-specific stop words"
max_df = 0.95 means "ignore terms that appear in more than 95% of the complaints"

**min_df** is used for removing terms that appear too infrequently
min_df = 2 means "ignore terms that appear in less than 2 complaints"

In [20]:
#Write your code here to initialise the TfidfVectorizer 
vectorizer = TfidfVectorizer(max_df=0.95, min_df=2, stop_words='english')


#### Create a document term matrix using fit_transform

The contents of a document term matrix are tuples of (complaint_id,token_id) tf-idf score:
The tuples that are not there have a tf-idf score of 0

In [22]:
#Write your code here to create the Document Term Matrix by transforming the complaints column present in df_clean.
dtm = vectorizer.fit_transform(df_clean['lemmatized complaints_clean'])


## Topic Modelling using NMF

Non-Negative Matrix Factorization (NMF) is an unsupervised technique so there are no labeling of topics that the model will be trained on. The way it works is that, NMF decomposes (or factorizes) high-dimensional vectors into a lower-dimensional representation. These lower-dimensional vectors are non-negative which also means their coefficients are non-negative.

In this task you have to perform the following:

* Find the best number of clusters 
* Apply the best number to create word clusters
* Inspect & validate the correction of each cluster wrt the complaints 
* Correct the labels if needed 
* Map the clusters to topics/cluster names

In [23]:
from sklearn.decomposition import NMF

## Manual Topic Modeling
You need to do take the trial & error approach to find the best num of topics for your NMF model.

The only parameter that is required is the number of components i.e. the number of topics we want. This is the most crucial step in the whole topic modeling process and will greatly affect how good your final topics are.

In [None]:
#Load your nmf_model with the n_components i.e 5
#Case1

In [None]:
num_topics = 15 #write the value you want to test out

#keep the random_state =40
nmf_model = NMF(n_components=num_topics,max_iter=1000,random_state=40)
W=nmf_model.fit_transform(dtm)
H=nmf_model.components_
#write your code here
df_clean.shape

In [None]:
# Top 15 words per topic

words = np.array(vectorizer.get_feature_names_out())
topic_words = pd.DataFrame(np.zeros((num_topics, 15)), index=[f'Topic {i + 1}' for i in range(num_topics)],
                           columns=[f'Word {i + 1}' for i in range(15)]).astype(str)
for i in range(num_topics):
    ix = H[i].argsort()[::-1][:15]
    topic_words.iloc[i] = words[ix]

topic_words
df_clean.shape

In [None]:
#Case2
num_topics = 10 #write the value you want to test out

#keep the random_state =40
nmf_model = NMF(n_components=num_topics,max_iter=1000,random_state=40)
W=nmf_model.fit_transform(dtm)
H=nmf_model.components_
#write your code here
df_clean.shape

In [None]:
# Top 15 words per topic

words = np.array(vectorizer.get_feature_names_out())
topic_words = pd.DataFrame(np.zeros((num_topics, 15)), index=[f'Topic {i + 1}' for i in range(num_topics)],
                           columns=[f'Word {i + 1}' for i in range(15)]).astype(str)
for i in range(num_topics):
    ix = H[i].argsort()[::-1][:15]
    topic_words.iloc[i] = words[ix]

topic_words


In [41]:
#Case3
num_topics = 5 #write the value you want to test out

#keep the random_state =40
nmf_model = NMF(n_components=num_topics,max_iter=1000,random_state=40)
W=nmf_model.fit_transform(dtm)
H=nmf_model.components_
len(vectorizer.get_feature_names_out())


12195

In [42]:
#Print the Top15 words for each of the topics
words = np.array(vectorizer.get_feature_names_out())
topic_words = pd.DataFrame(np.zeros((num_topics, 15)), index=[f'Topic {i + 1}' for i in range(num_topics)],
                           columns=[f'Word {i + 1}' for i in range(15)]).astype(str)
for i in range(num_topics):
    ix = H[i].argsort()[::-1][:15]
    topic_words.iloc[i] = words[ix]

topic_words.T

Unnamed: 0,Topic 1,Topic 2,Topic 3,Topic 4,Topic 5
Word 1,account,credit,loan,charge,payment
Word 2,check,report,mortgage,card,late
Word 3,bank,card,chase,chase,pay
Word 4,chase,inquiry,home,dispute,payments
Word 5,deposit,chase,modification,purchase,fee
Word 6,money,account,property,claim,balance
Word 7,close,remove,send,refund,make
Word 8,fund,inquiries,letter,receive,month
Word 9,tell,hard,request,merchant,statement
Word 10,open,apply,time,fraud,account


In [43]:
#Create the best topic for each complaint in terms of integer value 0,1,2,3 & 4

best_topics=W.argmax(axis=1)



In [44]:
#Assign the best topic to each of the cmplaints in Topic Column
df_clean['Topic'] =W.argmax(axis=1) #write your code to assign topics to each rows.

In [45]:
#Print the first 5 Complaint for each of the Topics
First5_comp=df_clean.groupby('Topic').head(5)
First5_comp.sort_values('Topic')


Unnamed: 0,complaints,lemmatized complaints,complaint_POS_removed,lemmatized complaints_clean,Topic
27,"I opened an account with chase bank on XXXX and used a code for XXXX bonus. I called to follow up on XX/XX/XXXX about the terms and was told everything was on the account and once I made XXXX direct deposit the bonus would be paid out in 10 days. As of XXXX I had made the required deposits and was told my account never had the coupon code applied and it was past the 21 days to do so, so no bonus would be paid.",open account chase bank xxxx use code xxxx bonus call follow xxxxxxxx term tell everything account make xxxx direct deposit bonus would pay days xxxx make require deposit tell account never coupon code apply past days bonus would pay,"[(account, NN), (bank, NN), (xxxx, NN), (use, NN), (code, NN), (bonus, NN), (call, NN), (term, NN), (tell, NN), (everything, NN), (account, NN), (xxxx, NN), (deposit, NN), (bonus, NN), (deposit, NN), (tell, NN), (account, NN), (coupon, NN), (code, NN), (bonus, NN)]",open account chase bank use code bonus call follow term tell everything account make direct deposit bonus would pay days make require deposit tell account never coupon code apply past days bonus would pay,0
14,my grand son give me check for {$1600.00} i deposit it into my chase account after fund clear my chase bank closed my account never paid me my money they said they need to speek with my grand son check was clear money was taking by my chase bank refuse to pay me my money my grand son called chase 2 times they told him i should call not him to verify the check owner he is out the country most the time date happen XX/XX/2018 check number XXXX claim number is XXXX with chase,grand son give check deposit chase account fund clear chase bank close account never pay money say need speek grand son check clear money take chase bank refuse pay money grand son call chase time tell call verify check owner country time date happen xxxx check number xxxx claim number xxxx chase,"[(son, NN), (check, NN), (deposit, NN), (chase, NN), (account, NN), (fund, NN), (bank, NN), (account, NN), (money, NN), (son, NN), (check, NN), (money, NN), (bank, NN), (refuse, NN), (pay, NN), (money, NN), (son, NN), (call, NN), (time, NN), (check, NN), (owner, NN), (country, NN), (time, NN), (date, NN), (number, NN), (claim, NN), (number, NN), (chase, NN)]",grand son give check deposit chase account fund clear chase bank close account never pay money say need speek grand son check clear money take chase bank refuse pay money grand son call chase time tell call verify check owner country time date happen check number claim number chase,0
17,"With out notice J.P. Morgan Chase restricted my account by my debit card Tuesday XX/XX/2019. On Thursday XX/XX/2019 I went into A branch after being advised by a customer service representative that my account would actually be closed. I went into the branch to see how I can remove the funds that are in there currently in as well as if my direct deposit from my place of employment would be returned. The bank associate and the customer service representative assured me that the funds would Post but they may take an additional business day. That Saturday I attempted to go inside of a branch yet again to retrieve my funds that actually did post on Friday, XX/XX/2019. Upon looking at my account I realize that the funds have been reversed and no longer were present on my current statement. Ive been called and I was told that it may take two business day stating Tuesday, XX/XX/XXXX would be the date that my funds would be available to withdraw from a bank teller only. Now, Chase is informing me that I will be mailed a check into the three business days to recover the funds that are owed to me and left in my account currently. Unfortunately, This has put me in an additional financial bind do two fees from late rent late car payments, etc. I am not being a short or giving written notice that these things will actually occur so that I have peace of mind in fact Chase has handled my situation grossly I even had a bank teller inform me that my account looks suspicious after giving me a suspicious look myself. Although I know that Chase reserves the right to close my account at any time I do believe that their protocol has been in the past to give notice in the form of a written document. I am not being a shored or giving written notice that these things will actually occur so that I have peace of mind in fact Chase has handled my situation grossly I even had a bank teller inform me that my account looks suspicious after giving me a suspicious look myself. Although I know that Chase reserves the right to close my account at any time I do believe that their protocol has been in the past to give notice in the form of a written document. This situation is truly affecting my livelihood and they dont seem to want to deal with Me professionally. Thank you",notice jp morgan chase restrict account debit card tuesday xxxx thursday xxxx go branch advise customer service representative account would actually close go branch see remove fund currently well direct deposit place employment would return bank associate customer service representative assure fund would post may take additional business day saturday attempt go inside branch yet retrieve fund actually post friday xxxx upon look account realize fund reverse longer present current statement ive call tell may take two business day state tuesday xxxxxxxx would date fund would available withdraw bank teller chase inform mail check three business days recover fund owe leave account currently unfortunately put additional financial bind two fee late rent late car payments etc short give write notice things actually occur peace mind fact chase handle situation grossly even bank teller inform account look suspicious give suspicious look although know chase reserve right close account time believe protocol past give notice form write document shore give write notice things actually occur peace mind fact chase handle situation grossly even bank teller inform account look suspicious give suspicious look although know chase reserve right close account time believe protocol past give notice form write document situation truly affect livelihood dont seem want deal professionally thank,"[(jp, NN), (restrict, NN), (account, NN), (debit, NN), (card, NN), (tuesday, NN), (thursday, NN), (branch, NN), (customer, NN), (service, NN), (account, NN), (branch, NN), (fund, NN), (deposit, NN), (place, NN), (employment, NN), (bank, NN), (customer, NN), (service, NN), (assure, NN), (fund, NN), (business, NN), (day, NN), (attempt, NN), (branch, NN), (fund, NN), (look, NN), (fund, NN), (reverse, NN), (statement, NN), (call, NN), (tell, NN), (business, NN), (day, NN), (state, NN), (tuesday, NN), (date, NN), (fund, NN), (bank, NN), (teller, NN), (chase, NN), (mail, NN), (check, NN), (business, NN), (fund, NN), (owe, NN), (account, NN), (bind, NN), (fee, NN), (rent, NN), (car, NN), (peace, NN), (mind, NN), (fact, NN), (situation, NN), (bank, NN), (teller, NN), (inform, NN), (account, NN), (look, NN), (time, NN), (protocol, NN), (notice, NN), (form, NN), (document, NN), (notice, NN), (peace, NN), (mind, NN), (fact, NN), (situation, NN), (bank, NN), (teller, NN), (inform, NN), (account, NN), (look, NN), (time, NN), (protocol, NN), (notice, NN), (form, NN), (document, NN), (situation, NN), (livelihood, NN), (dont, NN), (deal, NN)]",notice jp morgan chase restrict account debit card tuesday thursday go branch advise customer service representative account would actually close go branch see remove fund currently well direct deposit place employment would return bank associate customer service representative assure fund would post may take additional business day saturday attempt go inside branch yet retrieve fund actually post friday upon look account realize fund reverse longer present current statement ive call tell may take two business day state tuesday would date fund would available withdraw bank teller chase inform mail check three business days recover fund owe leave account currently unfortunately put additional financial bind two fee late rent late car payments etc short give write notice things actually occur peace mind fact chase handle situation grossly even bank teller inform account look suspicious give suspicious look although know chase reserve right close account time believe protocol past give notice form write document shore give write notice things actually occur peace mind fact chase handle situation grossly even bank teller inform account look suspicious give suspicious look although know chase reserve right close account time believe protocol past give notice form write document situation truly affect livelihood dont seem want deal professionally thank,0
24,mishandling of this account by Chase auto and XXXX.,mishandle account chase auto xxxx,"[(account, NN), (auto, NN), (xxxx, NN)]",mishandle account chase auto,0
35,"I opened the saving account for the {$25.00} bonus. I was supposed to received the {$25.00} bonus after 3 consecutive auto transfers from checking to savings. I notice on XX/XX/2019 that automatic transfer was cancelled for not enough funds into my checking 's account. Therefore, I put enough funds in my account on XX/XX/2019 requested that the executive team reactivate my automatic transfer for the month of XXXX. Although Ms. XXXX reached out to me from the executive office, she failed to try to resolve my concerns ( case # XXXX ).",open save account bonus suppose receive bonus consecutive auto transfer check save notice xxxx automatic transfer cancel enough fund check account therefore put enough fund account xxxx request executive team reactivate automatic transfer month xxxx although ms xxxx reach executive office fail try resolve concern case xxxx,"[(save, NN), (account, NN), (bonus, NN), (bonus, NN), (auto, NN), (transfer, NN), (check, NN), (notice, NN), (transfer, NN), (cancel, NN), (fund, NN), (check, NN), (account, NN), (fund, NN), (account, NN), (request, NN), (executive, NN), (team, NN), (transfer, NN), (month, NN), (executive, NN), (office, NN), (try, NN), (resolve, NN), (concern, NN), (case, NN), (xxxx, NN)]",open save account bonus suppose receive bonus consecutive auto transfer check save notice automatic transfer cancel enough fund check account therefore put enough fund account request executive team reactivate automatic transfer month although ms reach executive office fail try resolve concern case,0
10,"Chase Card was reported on XX/XX/2019. However, fraudulent application have been submitted my identity without my consent to fraudulently obtain services. Do not extend credit without verifying the identity of the applicant.",chase card report xxxx however fraudulent application submit identity without consent fraudulently obtain service extend credit without verify identity applicant,"[(card, NN), (report, NN), (application, NN), (submit, NN), (identity, NN), (consent, NN), (service, NN), (credit, NN), (identity, NN), (applicant, NN)]",chase card report however fraudulent application submit identity without consent fraudulently obtain service extend credit without verify identity applicant,1
11,"On XX/XX/2018, while trying to book a XXXX XXXX ticket, I came across an offer for {$300.00} to be applied towards the ticket if I applied for a rewards card. I put in my information for the offer and within less than a minute, was notified via the screen that a decision could not be made. I immediately contacted XXXX and was referred to Chase Bank. I then immediately contacted Chase bank within no more than 10minutes of getting the notification on the screen and I was told by the Chase representative I spoke with that my application was denied but she could not state why. I asked for more information about the XXXX offer and she explained that even if I had been approved, the credit offer only gets applied after the first account statement and could not be used to purchase the ticket. I then explicitly told her I was glad I got denied and I was ABSOLUTELY no longer interested in the account. I asked that the application be withdrawn and the representative obliged. This all happened no later than 10mins after putting in the application on XX/XX/2018. Notwithstanding my explicit request not to proceed with the application and contrary to what I was told by the Chase representative, Chase did in fact go ahead to open a credit account in my name on XX/XX/2018. This is now being reported in my Credit Report and Chase has refused to correct this information on my credit report even though they went ahead to process an application which I did not consent to and out of their error.",xxxx try book xxxx xxxx ticket come across offer apply towards ticket apply reward card put information offer within less minute notify via screen decision could make immediately contact xxxx refer chase bank immediately contact chase bank within minutes get notification screen tell chase representative speak application deny could state ask information xxxx offer explain even approve credit offer get apply first account statement could use purchase ticket explicitly tell glad get deny absolutely longer interest account ask application withdraw representative oblige happen later mins put application xxxx notwithstanding explicit request proceed application contrary tell chase representative chase fact go ahead open credit account name xxxx report credit report chase refuse correct information credit report even though go ahead process application consent error,"[(try, NN), (book, NN), (ticket, NN), (come, NN), (offer, NN), (apply, NN), (ticket, NN), (card, NN), (information, NN), (offer, NN), (notify, NN), (decision, NN), (refer, NN), (chase, NN), (bank, NN), (chase, NN), (bank, NN), (screen, NN), (speak, NN), (application, NN), (deny, NN), (state, NN), (information, NN), (xxxx, NN), (offer, NN), (credit, NN), (offer, NN), (account, NN), (statement, NN), (purchase, NN), (ticket, NN), (get, NN), (deny, NN), (interest, NN), (account, NN), (application, NN), (representative, NN), (oblige, NN), (application, NN), (xxxx, NN), (request, NN), (proceed, NN), (application, NN), (tell, NN), (chase, NN), (chase, NN), (fact, NN), (credit, NN), (account, NN), (name, NN), (report, NN), (credit, NN), (report, NN), (chase, NN), (information, NN), (credit, NN), (report, NN), (application, NN), (consent, NN), (error, NN)]",try book ticket come across offer apply towards ticket apply reward card put information offer within less minute notify via screen decision could make immediately contact refer chase bank immediately contact chase bank within minutes get notification screen tell chase representative speak application deny could state ask information offer explain even approve credit offer get apply first account statement could use purchase ticket explicitly tell glad get deny absolutely longer interest account ask application withdraw representative oblige happen later mins put application notwithstanding explicit request proceed application contrary tell chase representative chase fact go ahead open credit account name report credit report chase refuse correct information credit report even though go ahead process application consent error,1
15,Can you please remove inquiry,please remove inquiry,"[(inquiry, NN)]",please remove inquiry,1
23,I have a Chase credit card which is incorrectly reporting data on my credit report. The company is not helping resolve the issue.,chase credit card incorrectly report data credit report company help resolve issue,"[(chase, NN), (credit, NN), (card, NN), (credit, NN), (report, NN), (company, NN), (issue, NN)]",chase credit card incorrectly report data credit report company help resolve issue,1
26,I have reached out to XXXX several times in attempt to have this fraudulent inquiry removed I was told that I need to call and contact the original creditor that placed this inquiry on my report. I have made several attempts to get chase bank to contact the bureau and remove this inquiry that was not authorized by me. They seem to not be able to get me to the right person to take care of this issue no matter how many attempts I have made nobody seems to understand what I'm talkin about. I want chase bank to have this fraudulent inquiry removed from my credit report before I take legal action. I I have never requested any kind of credit with chase bank or hold any account with them. I have an iquiry from XX/XX/2019. I want this matter resolved.,reach xxxx several time attempt fraudulent inquiry remove tell need call contact original creditor place inquiry report make several attempt get chase bank contact bureau remove inquiry authorize seem able get right person take care issue matter many attempt make nobody seem understand im talkin want chase bank fraudulent inquiry remove credit report take legal action never request kind credit chase bank hold account iquiry xxxx want matter resolve,"[(reach, NN), (time, NN), (inquiry, NN), (tell, NN), (contact, NN), (creditor, NN), (place, NN), (report, NN), (attempt, NN), (bank, NN), (contact, NN), (bureau, NN), (inquiry, NN), (authorize, NN), (get, NN), (person, NN), (care, NN), (issue, NN), (nobody, NN), (im, NN), (talkin, NN), (chase, NN), (bank, NN), (inquiry, NN), (credit, NN), (report, NN), (action, NN), (kind, NN), (credit, NN), (chase, NN), (bank, NN), (account, NN), (matter, NN)]",reach several time attempt fraudulent inquiry remove tell need call contact original creditor place inquiry report make several attempt get chase bank contact bureau remove inquiry authorize seem able get right person take care issue matter many attempt make nobody seem understand im talkin want chase bank fraudulent inquiry remove credit report take legal action never request kind credit chase bank hold account iquiry want matter resolve,1


In [46]:
df_clean.Topic=df_clean.Topic.astype(str)
df_clean.shape

(21072, 5)

#### After evaluating the mapping, if the topics assigned are correct then assign these names to the relevant topic:
* Bank Account services
* Credit card or prepaid card
* Theft/Dispute Reporting
* Mortgage/Loan
* Others

In [47]:
print(df_clean['Topic'].unique())

['2' '3' '1' '0' '4']


In [48]:
#Create the dictionary of Topic names and Topics

Topic_names = {
0:'Bank account services',
1:'Credit card / Prepaid card',
2:'Mortgages/loans',
3:'Theft/Dispute reporting',
4:'Others '   
}
#Replace Topics with Topic Names
df_clean['Topic'] = pd.to_numeric(df_clean['Topic'], errors='coerce')
df_clean['Topic'] = df_clean['Topic'].map(Topic_names)

In [49]:
df_clean.head(2)

Unnamed: 0,complaints,lemmatized complaints,complaint_POS_removed,lemmatized complaints_clean,Topic
1,Good morning my name is XXXX XXXX and I appreciate it if you could help me put a stop to Chase Bank cardmember services. \nIn 2018 I wrote to Chase asking for debt verification and what they sent me a statement which is not acceptable. I am asking the bank to validate the debt. Instead I been receiving mail every month from them attempting to collect a debt. \nI have a right to know this information as a consumer. \n\nChase account # XXXX XXXX XXXX XXXX Thanks in advance for your help.,good morning name xxxx xxxx appreciate could help put stop chase bank cardmember service write chase ask debt verification send statement acceptable ask bank validate debt instead receive mail every month attempt collect debt right know information consumer chase account xxxx xxxx xxxx xxxx thank advance help,"[(morning, NN), (name, NN), (appreciate, NN), (chase, NN), (bank, NN), (cardmember, NN), (service, NN), (chase, NN), (debt, NN), (verification, NN), (statement, NN), (bank, NN), (debt, NN), (mail, NN), (month, NN), (attempt, NN), (debt, NN), (right, NN), (information, NN), (consumer, NN), (chase, NN), (account, NN), (advance, NN), (help, NN)]",good morning name appreciate could help put stop chase bank cardmember service write chase ask debt verification send statement acceptable ask bank validate debt instead receive mail every month attempt collect debt right know information consumer chase account thank advance help,Mortgages/loans
2,I upgraded my XXXX XXXX card in XX/XX/2018 and was told by the agent who did the upgrade my anniversary date would not change. It turned the agent was giving me the wrong information in order to upgrade the account. XXXX changed my anniversary date from XX/XX/XXXX to XX/XX/XXXX without my consent! XXXX has the recording of the agent who was misled me.,upgrade xxxx xxxx card xxxx tell agent upgrade anniversary date would change turn agent give wrong information order upgrade account xxxx change anniversary date xxxxxxxx xxxxxxxx without consent xxxx record agent mislead,"[(card, NN), (tell, NN), (agent, NN), (date, NN), (information, NN), (order, NN), (upgrade, NN), (account, NN), (change, NN), (anniversary, NN), (date, NN), (consent, NN), (xxxx, NN), (record, NN), (agent, NN), (mislead, NN)]",upgrade card tell agent upgrade anniversary date would change turn agent give wrong information order upgrade account change anniversary date without consent record agent mislead,Theft/Dispute reporting


## Supervised model to predict any new complaints to the relevant Topics.

You have now build the model to create the topics for each complaints.Now in the below section you will use them to classify any new complaints.

Since you will be using supervised learning technique we have to convert the topic names to numbers(numpy arrays only understand numbers)

In [51]:
#Create the dictionary again of Topic names and Topics

Topic_names = {
'Bank account services':0,
'Credit card / Prepaid card':1,
'Mortgages/loans':2,
'Theft/Dispute reporting':3,
'Others ':4   
}

#Replace Topics with Topic Names
df_clean['Topic'] = df_clean['Topic'].map(Topic_names)

In [52]:
df_clean.head()

Unnamed: 0,complaints,lemmatized complaints,complaint_POS_removed,lemmatized complaints_clean,Topic
1,Good morning my name is XXXX XXXX and I appreciate it if you could help me put a stop to Chase Bank cardmember services. \nIn 2018 I wrote to Chase asking for debt verification and what they sent me a statement which is not acceptable. I am asking the bank to validate the debt. Instead I been receiving mail every month from them attempting to collect a debt. \nI have a right to know this information as a consumer. \n\nChase account # XXXX XXXX XXXX XXXX Thanks in advance for your help.,good morning name xxxx xxxx appreciate could help put stop chase bank cardmember service write chase ask debt verification send statement acceptable ask bank validate debt instead receive mail every month attempt collect debt right know information consumer chase account xxxx xxxx xxxx xxxx thank advance help,"[(morning, NN), (name, NN), (appreciate, NN), (chase, NN), (bank, NN), (cardmember, NN), (service, NN), (chase, NN), (debt, NN), (verification, NN), (statement, NN), (bank, NN), (debt, NN), (mail, NN), (month, NN), (attempt, NN), (debt, NN), (right, NN), (information, NN), (consumer, NN), (chase, NN), (account, NN), (advance, NN), (help, NN)]",good morning name appreciate could help put stop chase bank cardmember service write chase ask debt verification send statement acceptable ask bank validate debt instead receive mail every month attempt collect debt right know information consumer chase account thank advance help,2
2,I upgraded my XXXX XXXX card in XX/XX/2018 and was told by the agent who did the upgrade my anniversary date would not change. It turned the agent was giving me the wrong information in order to upgrade the account. XXXX changed my anniversary date from XX/XX/XXXX to XX/XX/XXXX without my consent! XXXX has the recording of the agent who was misled me.,upgrade xxxx xxxx card xxxx tell agent upgrade anniversary date would change turn agent give wrong information order upgrade account xxxx change anniversary date xxxxxxxx xxxxxxxx without consent xxxx record agent mislead,"[(card, NN), (tell, NN), (agent, NN), (date, NN), (information, NN), (order, NN), (upgrade, NN), (account, NN), (change, NN), (anniversary, NN), (date, NN), (consent, NN), (xxxx, NN), (record, NN), (agent, NN), (mislead, NN)]",upgrade card tell agent upgrade anniversary date would change turn agent give wrong information order upgrade account change anniversary date without consent record agent mislead,3
10,"Chase Card was reported on XX/XX/2019. However, fraudulent application have been submitted my identity without my consent to fraudulently obtain services. Do not extend credit without verifying the identity of the applicant.",chase card report xxxx however fraudulent application submit identity without consent fraudulently obtain service extend credit without verify identity applicant,"[(card, NN), (report, NN), (application, NN), (submit, NN), (identity, NN), (consent, NN), (service, NN), (credit, NN), (identity, NN), (applicant, NN)]",chase card report however fraudulent application submit identity without consent fraudulently obtain service extend credit without verify identity applicant,1
11,"On XX/XX/2018, while trying to book a XXXX XXXX ticket, I came across an offer for {$300.00} to be applied towards the ticket if I applied for a rewards card. I put in my information for the offer and within less than a minute, was notified via the screen that a decision could not be made. I immediately contacted XXXX and was referred to Chase Bank. I then immediately contacted Chase bank within no more than 10minutes of getting the notification on the screen and I was told by the Chase representative I spoke with that my application was denied but she could not state why. I asked for more information about the XXXX offer and she explained that even if I had been approved, the credit offer only gets applied after the first account statement and could not be used to purchase the ticket. I then explicitly told her I was glad I got denied and I was ABSOLUTELY no longer interested in the account. I asked that the application be withdrawn and the representative obliged. This all happened no later than 10mins after putting in the application on XX/XX/2018. Notwithstanding my explicit request not to proceed with the application and contrary to what I was told by the Chase representative, Chase did in fact go ahead to open a credit account in my name on XX/XX/2018. This is now being reported in my Credit Report and Chase has refused to correct this information on my credit report even though they went ahead to process an application which I did not consent to and out of their error.",xxxx try book xxxx xxxx ticket come across offer apply towards ticket apply reward card put information offer within less minute notify via screen decision could make immediately contact xxxx refer chase bank immediately contact chase bank within minutes get notification screen tell chase representative speak application deny could state ask information xxxx offer explain even approve credit offer get apply first account statement could use purchase ticket explicitly tell glad get deny absolutely longer interest account ask application withdraw representative oblige happen later mins put application xxxx notwithstanding explicit request proceed application contrary tell chase representative chase fact go ahead open credit account name xxxx report credit report chase refuse correct information credit report even though go ahead process application consent error,"[(try, NN), (book, NN), (ticket, NN), (come, NN), (offer, NN), (apply, NN), (ticket, NN), (card, NN), (information, NN), (offer, NN), (notify, NN), (decision, NN), (refer, NN), (chase, NN), (bank, NN), (chase, NN), (bank, NN), (screen, NN), (speak, NN), (application, NN), (deny, NN), (state, NN), (information, NN), (xxxx, NN), (offer, NN), (credit, NN), (offer, NN), (account, NN), (statement, NN), (purchase, NN), (ticket, NN), (get, NN), (deny, NN), (interest, NN), (account, NN), (application, NN), (representative, NN), (oblige, NN), (application, NN), (xxxx, NN), (request, NN), (proceed, NN), (application, NN), (tell, NN), (chase, NN), (chase, NN), (fact, NN), (credit, NN), (account, NN), (name, NN), (report, NN), (credit, NN), (report, NN), (chase, NN), (information, NN), (credit, NN), (report, NN), (application, NN), (consent, NN), (error, NN)]",try book ticket come across offer apply towards ticket apply reward card put information offer within less minute notify via screen decision could make immediately contact refer chase bank immediately contact chase bank within minutes get notification screen tell chase representative speak application deny could state ask information offer explain even approve credit offer get apply first account statement could use purchase ticket explicitly tell glad get deny absolutely longer interest account ask application withdraw representative oblige happen later mins put application notwithstanding explicit request proceed application contrary tell chase representative chase fact go ahead open credit account name report credit report chase refuse correct information credit report even though go ahead process application consent error,1
14,my grand son give me check for {$1600.00} i deposit it into my chase account after fund clear my chase bank closed my account never paid me my money they said they need to speek with my grand son check was clear money was taking by my chase bank refuse to pay me my money my grand son called chase 2 times they told him i should call not him to verify the check owner he is out the country most the time date happen XX/XX/2018 check number XXXX claim number is XXXX with chase,grand son give check deposit chase account fund clear chase bank close account never pay money say need speek grand son check clear money take chase bank refuse pay money grand son call chase time tell call verify check owner country time date happen xxxx check number xxxx claim number xxxx chase,"[(son, NN), (check, NN), (deposit, NN), (chase, NN), (account, NN), (fund, NN), (bank, NN), (account, NN), (money, NN), (son, NN), (check, NN), (money, NN), (bank, NN), (refuse, NN), (pay, NN), (money, NN), (son, NN), (call, NN), (time, NN), (check, NN), (owner, NN), (country, NN), (time, NN), (date, NN), (number, NN), (claim, NN), (number, NN), (chase, NN)]",grand son give check deposit chase account fund clear chase bank close account never pay money say need speek grand son check clear money take chase bank refuse pay money grand son call chase time tell call verify check owner country time date happen check number claim number chase,0


In [53]:
#Keep the columns"complaint_what_happened" & "Topic" only in the new dataframe --> training_data
training_data=df_clean[['complaints','Topic']]

In [54]:
training_data.

Unnamed: 0,complaints,Topic
1,Good morning my name is XXXX XXXX and I appreciate it if you could help me put a stop to Chase Bank cardmember services. \nIn 2018 I wrote to Chase asking for debt verification and what they sent me a statement which is not acceptable. I am asking the bank to validate the debt. Instead I been receiving mail every month from them attempting to collect a debt. \nI have a right to know this information as a consumer. \n\nChase account # XXXX XXXX XXXX XXXX Thanks in advance for your help.,2
2,I upgraded my XXXX XXXX card in XX/XX/2018 and was told by the agent who did the upgrade my anniversary date would not change. It turned the agent was giving me the wrong information in order to upgrade the account. XXXX changed my anniversary date from XX/XX/XXXX to XX/XX/XXXX without my consent! XXXX has the recording of the agent who was misled me.,3
10,"Chase Card was reported on XX/XX/2019. However, fraudulent application have been submitted my identity without my consent to fraudulently obtain services. Do not extend credit without verifying the identity of the applicant.",1
11,"On XX/XX/2018, while trying to book a XXXX XXXX ticket, I came across an offer for {$300.00} to be applied towards the ticket if I applied for a rewards card. I put in my information for the offer and within less than a minute, was notified via the screen that a decision could not be made. I immediately contacted XXXX and was referred to Chase Bank. I then immediately contacted Chase bank within no more than 10minutes of getting the notification on the screen and I was told by the Chase representative I spoke with that my application was denied but she could not state why. I asked for more information about the XXXX offer and she explained that even if I had been approved, the credit offer only gets applied after the first account statement and could not be used to purchase the ticket. I then explicitly told her I was glad I got denied and I was ABSOLUTELY no longer interested in the account. I asked that the application be withdrawn and the representative obliged. This all happened no later than 10mins after putting in the application on XX/XX/2018. Notwithstanding my explicit request not to proceed with the application and contrary to what I was told by the Chase representative, Chase did in fact go ahead to open a credit account in my name on XX/XX/2018. This is now being reported in my Credit Report and Chase has refused to correct this information on my credit report even though they went ahead to process an application which I did not consent to and out of their error.",1
14,my grand son give me check for {$1600.00} i deposit it into my chase account after fund clear my chase bank closed my account never paid me my money they said they need to speek with my grand son check was clear money was taking by my chase bank refuse to pay me my money my grand son called chase 2 times they told him i should call not him to verify the check owner he is out the country most the time date happen XX/XX/2018 check number XXXX claim number is XXXX with chase,0
...,...,...
78303,"After being a Chase Card customer for well over a decade, was offered multiple solicitations for acquiring new credit cards with Chase - all with bonus airline miles and hotel points. Was approved for all ( 3 ) new cards with No annual fee for 1st year. After less than 2 months with payment always paid as agreed, Chase closed all my cards. One of my ( 3 ) approved new cards was never activated but was sent to me.\n\nChase has done harm to my credit, has charged me an annual fee even though they cancelled my account, failed to credit my points for both airlines and hotel, failed to credit up to {$100.00} for XXXX enrollment and failed to credit airline charge as agreed upon",1
78309,"On Wednesday, XX/XX/XXXX I called Chas, my XXXX XXXX Visa Credit Card provider, and asked how to make a claim under their purchase protection benefit. On XX/XX/XXXX, I purchased three high school textbooks for my XXXX year old daughter because she transferred to a new school. All three books were damaged when a water bottle in her backpack broke. The Chase representative assured me the textbooks would be covered and instructed me to file a claim at XXXX. I immediately went to the website and filed the claim, including uploading all of the requested paperwork which included a copy of my credit card statement, copies of the three receipts and photographic evidence of the damage. The website even had "" books '' as one of the catagories I could list as the type of item they cover and that I could make a claim on. After following up repeatedly on my claim since the insurance provider failed to "" review my information and contact me within 5 business days to outline the next steps of the process, '' as outlined in an email I received acknowledging my claim submission, I called to complain. The representative said claims are not looked at by an examiner "" for eight to ten days '' and then it would take "" two days to actually review the claim. '' I responded that this information was contradictory to the information provided in writing in the email XXXX sent to me, and she said that she is not an adjuster and that is how it works. I then asked to speak with an adjuster and she agreed to connect me to one. I was then put on hold and when she returned, she said my file had "" just been updated while I was on hold and that the claim was being denied because textbooks have finite lives and are undergo revision after courses end. '' I explained that my daughter 's course had not ended and that I was told specifically by Chase that my textbook purchases would be covered and was again told they were refusing my claim. '' By the time the call ended I received an email stating that my claim status had been updated and was being denied. I find this completely outrageous and borderline fraudulent.",3
78310,"I am not familiar with XXXX pay and did not understand the great risk this provides to consumers. I believed this to be safe as it was through my Chase bank app. I have been with Chase for almost 20 years and trust their mobile banking and now am sadly regretful. I am being told to deal with the merchant except the merchant has refused to answer my inquiries and now shut down communication. The website of the said merchant looks entirely legitamite and is even using the faces of highly successful brands with individuals linked to their social media without their consent. In performing research of the phone number and other associated information available through PI it is very clear this merchant is continually creating new account title holders to perpetuate this cycle of fraud. Furthermore as this is a non fixed voip being used I believe they are fraudulently using the identity of the real XXXX XXXX Chase Bank told me they wouldnt even investigate, report this to XXXX or allow me to file a report or take any potential recourse for the matter. There isnt even a protocol in place to address this issue yet! The chase mobile app verbiage makes a point to deceptively position this app as under the branch of Chase banking service and as such, imply a degree of entitlement to its customer service protection protocols. Chase has your back which reads on the very same link as the XXXX tab ... .is most certainly not true. This places consumers at risk when using this mobile service and does not flag the concern of Chase in the slightest. At minimum the risk of using XXXX on your mobile banking app must realistically be made aware to the public as it stands to be potentially devastating I have plans to file reports with all corresponding authorities as well as notify and contact the individual whose identity is being misused to inform him. I also intend to urge My neighbor who works in television that the news network should perhaps present the risk of using XXXX and XXXX integrated banking apps to the public. I understand fraud and scamming are overwhelming rampant but a banking mogul such as Chase not having any recourse of action is simply a risk that needs to Be disclosed more throughly. I would not have clicked on the to link to the extent I did if I would have been better informed.",3
78311,"I have had flawless credit for 30 yrs. I've had Chase credit cards, "" Chase Freedom '' specifically since XXXX with no problems ever. I've done many balance transfers in my life, so I have plenty of experience. I recently did a balance transfer with Chase Freedom for {$9000.00} ( did many with other Chase cards, but apparently not "" Freeedom '' ) When I got my first bill, my minimum payment was {$470.00}. I was SHOCKED. I immediately called on XX/XX/XXXX. The representative told me that the min payment was so high bc they were making me pay the "" Balance transfer fee '' up front, but my future payments would be around {$90.00}, 1 % of total balance ( which is standard, AND the rate THEY ADVERTISE ) I went to pay the next payment on XX/XX/XXXX & was once again SHOCKED to see my minimum payment was {$440.00}. I paid it, but I have been trying to work this out with Chase ever since. Apparently, the representative was WRONG & I am actually expected to pay 5 %, instead of the standard 1 % they normally charge, bc that was written in my "" user agreement '' paperwork back in XX/XX/XXXX!! 28 years ago!!! They currently charge 1 % to everyone else in the world. My other cards, including Chase, are all 1 %. This {$440.00} is an unreasonable amt to expect someone to pay for 1 credit card. I have kids & many other bills to pay. They never warned me they were so "" off the charts '' with their minimum payment percentage, ( except for my original paperwork in XX/XX/XXXX ). They offer everyone else 1 % terms. They change THEIR "" terms '' anytime it benefits THEM, but won't budge to lower it to make this payment more affordable & reasonable. So, I also asked them ( as soon as I found this out ) to refund my {$360.00} transfer fee so I could use it to transfer the balance to a different card with a more reasonable minimum payment. They refused. I also asked if they would transfer to my CHASE SLATE card, which has 1 % min pmt, they also refused. They will not work with me at all. I am a responsible working person. I would like to preserve my good credit. I could easily transfer again to a different card, but I'd have to pay another lump sum of $ again.",4


In [55]:
training_data.shape

(21072, 2)

#### Apply the supervised models on the training data created. In this process, you have to do the following:
* Create the vector counts using Count Vectoriser
* Transform the word vecotr to tf-idf
* Create the train & test data using the train_test_split on the tf-idf & topics


In [56]:
training_data.Topic.unique()

array([2, 3, 1, 0, 4])

In [59]:

count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(training_data['complaints'])

tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
y=training_data.Topic

In [60]:
X_train,X_test,y_train,y_test=train_test_split(X_train_tfidf,y,train_size=0.8,random_state=40)

You have to try atleast 3 models on the train & test data from these options:
* Logistic regression
* Decision Tree
* Random Forest
* Naive Bayes (optional)

**Using the required evaluation metrics judge the tried models and select the ones performing the best**

In [66]:
# Write your code here to build any 3 models and evaluate them using the required metrics
#Logistic Regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import classification_report

In [67]:
print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)

X_train shape: (16857, 29725)
X_test shape: (4215, 29725)


In [68]:
print("y_train shape:", X_train.shape)
print("y_test shape:", X_test.shape)

y_train shape: (16857, 29725)
y_test shape: (4215, 29725)


In [74]:
lr = LogisticRegression(max_iter=6000).fit(X_train, y_train)
predicted = lr.predict(X_test)

print(classification_report(y_true=y_test, y_pred=predicted))

              precision    recall  f1-score   support

           0       0.93      0.96      0.95       976
           1       0.94      0.94      0.94       827
           2       0.95      0.93      0.94       840
           3       0.92      0.94      0.93      1099
           4       0.94      0.88      0.91       473

    accuracy                           0.93      4215
   macro avg       0.94      0.93      0.93      4215
weighted avg       0.94      0.93      0.93      4215



In [70]:
# Decision tree classifier
dt = DecisionTreeClassifier().fit(X_train, y_train)
predicted = dt.predict(X_test)

print(classification_report(y_pred=predicted, y_true=y_test))

              precision    recall  f1-score   support

           0       0.78      0.78      0.78       976
           1       0.77      0.78      0.78       827
           2       0.82      0.77      0.79       840
           3       0.72      0.74      0.73      1099
           4       0.64      0.65      0.65       473

    accuracy                           0.75      4215
   macro avg       0.75      0.74      0.74      4215
weighted avg       0.75      0.75      0.75      4215



In [77]:
rfc = RandomForestClassifier(max_depth=100)
rfc.fit(X_train, y_train)
predicted = rfc.predict(X_test)

print(classification_report(y_pred=predicted, y_true=y_test))

              precision    recall  f1-score   support

           0       0.80      0.92      0.86       976
           1       0.83      0.86      0.84       827
           2       0.90      0.86      0.88       840
           3       0.80      0.87      0.83      1099
           4       0.93      0.46      0.61       473

    accuracy                           0.83      4215
   macro avg       0.85      0.79      0.81      4215
weighted avg       0.84      0.83      0.83      4215



## Based on overall accuracy logistic regression is best model with 93% acuracy

## Model Verification and Infernce

In [87]:
with open('/Users/arunprakash/Desktop/test.json', 'r') as test_file:
    test_data = json.load(test_file)

test_df=pd.json_normalize(test_data)

In [88]:
test_df.columns

Index(['_index', '_type', '_id', '_score', '_source.tags', '_source.zip_code',
       '_source.complaint_id', '_source.issue', '_source.date_received',
       '_source.state', '_source.consumer_disputed', '_source.product',
       '_source.company_response', '_source.company', '_source.submitted_via',
       '_source.date_sent_to_company', '_source.company_public_response',
       '_source.sub_product', '_source.timely',
       '_source.complaint_what_happened', '_source.sub_issue',
       '_source.consumer_consent_provided'],
      dtype='object')

In [89]:
test_df.shape

(10, 22)

In [96]:
test_df['_source.complaint_what_happened'].replace(['', 'N/A', 'null', 'NaN'], np.nan, inplace=True)
test_df.dropna(subset=['_source.complaint_what_happened'],inplace=True)

In [97]:
test_df['_source.complaint_what_happened']

1    I received a letter from XYZ Financial claiming I owe a debt, but I have no record of this debt.
2                                                   I was charged an annual fee without prior notice.
6                                           There is an incorrect account listed on my credit report.
8                          I am having trouble making my mortgage payments due to financial hardship.
9                                      I was scammed into sending money through a fraudulent website.
Name: _source.complaint_what_happened, dtype: object

In [107]:

def predict_lr(text):
    Topic_names = {0:'Account Services', 1:'Others', 2:'Mortgage/Loan', 3:'Credit card or prepaid card', 4:'Theft/Dispute Reporting'}
    test_counts = count_vect.transform(text)
    test_tfidf = tfidf_transformer.transform(test_counts)
    predicted = lr.predict(test_tfidf)
    return Topic_names[predicted[0]]

In [108]:
test_df['tag'] = test_df['_source.complaint_what_happened'].apply(lambda x: predict_lr([x]))
test_df[['_source.complaint_what_happened','tag']]

Unnamed: 0,_source.complaint_what_happened,tag
1,"I received a letter from XYZ Financial claiming I owe a debt, but I have no record of this debt.",Mortgage/Loan
2,I was charged an annual fee without prior notice.,Credit card or prepaid card
6,There is an incorrect account listed on my credit report.,Others
8,I am having trouble making my mortgage payments due to financial hardship.,Mortgage/Loan
9,I was scammed into sending money through a fraudulent website.,Credit card or prepaid card
