## IMPORT LIBRARIES

In [1]:
# data wrangling

import numpy as np
import pandas as pd
import re

# data visualization

import matplotlib.pyplot as plt
import seaborn as sns

# text processing

import nltk
from nltk.stem.porter import PorterStemmer
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn import decomposition

# filter warnings

import warnings
warnings.filterwarnings('ignore')

## OVERVIEW

In [2]:
# set pandas option

pd.set_option('display.max_colwidth', -1)

In [3]:
# load data

df = pd.read_csv('consumer_compliants.csv')

In [4]:
# show top 5

df.head()

Unnamed: 0,Date received,Product,Sub-product,Issue,Sub-issue,Consumer complaint narrative,Company public response,Company,State,ZIP code,Tags,Consumer consent provided?,Submitted via,Date sent to company,Company response to consumer,Timely response?,Consumer disputed?,Complaint ID
0,4/3/2020,Vehicle loan or lease,Loan,Getting a loan or lease,Fraudulent loan,"This auto loan was opened on XX/XX/2020 in XXXX, NC with BB & T in my name. I have NEVER been to North Carolina and I have NEVER been a resident. I have filed a dispute twice through my credit bureaus but both times BB & T has claimed that this is an accurate loan. Which I wasn't aware of until today. I have tried to contact BB & T multiple times but I have never gotten through to a live person. I do n't drive and I have never owned a car before. I didn't have any knowledge of this account until I checked XXXXXXXX XXXX and noticed it. I've tried twice to dispute it. Additionally I never received any bills or information about this account. This is my last resort in trying to remove this fraudulent loan off of my account.",Company has responded to the consumer and the CFPB and chooses not to provide a public response,TRUIST FINANCIAL CORPORATION,PA,,,Consent provided,Web,4/3/2020,Closed with explanation,Yes,,3591341
1,3/12/2020,Debt collection,Payday loan debt,Attempts to collect debt not owed,Debt is not yours,"In XXXX of 2019 I noticed a debt for {$620.00} on my credit which i believed was mine I thought speedy cash had bought one of my old debts and sold it to XXXX XXXX XXXX XXXX. I contacted XXXX XXXX XXXX XXXX and after several attempts of giving my full name, nothing came up in their system. I gave my social and the rep said the account popped up but DID NOT tell me that the account was under someone elses name and continued to let me make a payment. The payment was for {$120.00}. Confirmation number-XXXX. After realizing it was not my account, I called back to get my money back and inform them of the mistake. I was told i needed to mail them an FTC report and dispute letter to get my money back. I completed all of this and when i called again they said they transferred the account back to speedy cash for fraud review and I would need to contact them. After contacting them i was again told that i can not get my money back. The issue im having is this representative at XXXX XXXX played blind to obvious fraud and let an innocent person make a payment on someone elses debt and i want my money back.",,CURO Intermediate Holdings,CO,806XX,,Consent provided,Web,3/12/2020,Closed with explanation,Yes,,3564184
2,2/6/2020,Vehicle loan or lease,Loan,Getting a loan or lease,Credit denial,"As stated from Capital One, XXXX XX/XX/XXXX and XXXX 2018, My wife and I went to several car dealerships to request for a car loan to get a used car. However, according to their credit requirements unfortunately my credit score was insufficient for the car loan approval at that time. It seemed as though they pulled my credit report multiple times.",,CAPITAL ONE FINANCIAL CORPORATION,OH,430XX,,Consent provided,Web,2/6/2020,Closed with explanation,Yes,,3521949
3,3/6/2020,Checking or savings account,Savings account,Managing an account,Banking errors,"Please see CFPB case XXXX. \n\nCapital One, in the letter they provided ( and attached to that case as their response ) said this : "" The funds were reversed and sent back to XXXX XXXX XXXX on XX/XX/XXXX ''. \n\nXXXX XXXX XXXX ( now XXXX XXXX ) has not received these funds. Staff at XXXX XXXX - and also staff at the account-holder 's business - have looked for return of my money ( {$650.00} ) and find nothing. \n\nCapital One needs to document - actually prove - they returned the funds, as stated in their letter. Capital One must provide electronic information, if the return was made that way, or document the paper check they sent back to XXXX XXXX. \n\nI've left 3 messages about this problem for the person who signed the letter ( XXXX ) from Capital One. I have received no call-backs. \n\nSummary : Capital One said they returned my money on XX/XX/XXXX : they did not. If they continue claim they did, then they need to prove that.",,CAPITAL ONE FINANCIAL CORPORATION,CA,,,Consent provided,Web,3/6/2020,Closed with explanation,Yes,,3556237
4,2/14/2020,Debt collection,Medical debt,Attempts to collect debt not owed,Debt is not yours,"This debt was incurred due to medical malpractice ( XXXX XXXX XXXX, XXXX, TX ). I asked the doctor to turn over my claim to his malpractice insurance company. This has cost me thousands of dollars to XXXX XXXX XXXX. I am still trying to collect damages from this doctor. He never responded and turned over me to collections Merchants and Professional Collection Bureau , Inc. I sent them a letter describing exactly this issue and instead of not contacting me and verifying my debt they start reporting this debt to the credit reporting agencies. They never verified the debt, like I asked and they never stopped it from being reported when I specifically told them not to, due to the circumstances above.",Company believes it acted appropriately as authorized by contract or law,"Merchants and Professional Bureau, Inc.",OH,432XX,,Consent provided,Web,2/14/2020,Closed with explanation,Yes,,3531704


In [5]:
# show info()

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 57453 entries, 0 to 57452
Data columns (total 18 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Date received                 57453 non-null  object 
 1   Product                       57453 non-null  object 
 2   Sub-product                   57453 non-null  object 
 3   Issue                         57453 non-null  object 
 4   Sub-issue                     57453 non-null  object 
 5   Consumer complaint narrative  57453 non-null  object 
 6   Company public response       57453 non-null  object 
 7   Company                       57453 non-null  object 
 8   State                         57453 non-null  object 
 9   ZIP code                      57453 non-null  object 
 10  Tags                          57453 non-null  object 
 11  Consumer consent provided?    57453 non-null  object 
 12  Submitted via                 57453 non-null  object 
 13  D

In [6]:
# show null values

df.isna().sum()

Date received                   0    
Product                         0    
Sub-product                     0    
Issue                           0    
Sub-issue                       0    
Consumer complaint narrative    0    
Company public response         0    
Company                         0    
State                           0    
ZIP code                        0    
Tags                            0    
Consumer consent provided?      0    
Submitted via                   0    
Date sent to company            0    
Company response to consumer    0    
Timely response?                0    
Consumer disputed?              57453
Complaint ID                    0    
dtype: int64

In [7]:
# setting new dataframe

df_new = df[['Consumer complaint narrative','Product','Company']].rename(columns={'Consumer complaint narrative':'Complaints'})

In [8]:
# show top 5 new data

df_new.head()

Unnamed: 0,Complaints,Product,Company
0,"This auto loan was opened on XX/XX/2020 in XXXX, NC with BB & T in my name. I have NEVER been to North Carolina and I have NEVER been a resident. I have filed a dispute twice through my credit bureaus but both times BB & T has claimed that this is an accurate loan. Which I wasn't aware of until today. I have tried to contact BB & T multiple times but I have never gotten through to a live person. I do n't drive and I have never owned a car before. I didn't have any knowledge of this account until I checked XXXXXXXX XXXX and noticed it. I've tried twice to dispute it. Additionally I never received any bills or information about this account. This is my last resort in trying to remove this fraudulent loan off of my account.",Vehicle loan or lease,TRUIST FINANCIAL CORPORATION
1,"In XXXX of 2019 I noticed a debt for {$620.00} on my credit which i believed was mine I thought speedy cash had bought one of my old debts and sold it to XXXX XXXX XXXX XXXX. I contacted XXXX XXXX XXXX XXXX and after several attempts of giving my full name, nothing came up in their system. I gave my social and the rep said the account popped up but DID NOT tell me that the account was under someone elses name and continued to let me make a payment. The payment was for {$120.00}. Confirmation number-XXXX. After realizing it was not my account, I called back to get my money back and inform them of the mistake. I was told i needed to mail them an FTC report and dispute letter to get my money back. I completed all of this and when i called again they said they transferred the account back to speedy cash for fraud review and I would need to contact them. After contacting them i was again told that i can not get my money back. The issue im having is this representative at XXXX XXXX played blind to obvious fraud and let an innocent person make a payment on someone elses debt and i want my money back.",Debt collection,CURO Intermediate Holdings
2,"As stated from Capital One, XXXX XX/XX/XXXX and XXXX 2018, My wife and I went to several car dealerships to request for a car loan to get a used car. However, according to their credit requirements unfortunately my credit score was insufficient for the car loan approval at that time. It seemed as though they pulled my credit report multiple times.",Vehicle loan or lease,CAPITAL ONE FINANCIAL CORPORATION
3,"Please see CFPB case XXXX. \n\nCapital One, in the letter they provided ( and attached to that case as their response ) said this : "" The funds were reversed and sent back to XXXX XXXX XXXX on XX/XX/XXXX ''. \n\nXXXX XXXX XXXX ( now XXXX XXXX ) has not received these funds. Staff at XXXX XXXX - and also staff at the account-holder 's business - have looked for return of my money ( {$650.00} ) and find nothing. \n\nCapital One needs to document - actually prove - they returned the funds, as stated in their letter. Capital One must provide electronic information, if the return was made that way, or document the paper check they sent back to XXXX XXXX. \n\nI've left 3 messages about this problem for the person who signed the letter ( XXXX ) from Capital One. I have received no call-backs. \n\nSummary : Capital One said they returned my money on XX/XX/XXXX : they did not. If they continue claim they did, then they need to prove that.",Checking or savings account,CAPITAL ONE FINANCIAL CORPORATION
4,"This debt was incurred due to medical malpractice ( XXXX XXXX XXXX, XXXX, TX ). I asked the doctor to turn over my claim to his malpractice insurance company. This has cost me thousands of dollars to XXXX XXXX XXXX. I am still trying to collect damages from this doctor. He never responded and turned over me to collections Merchants and Professional Collection Bureau , Inc. I sent them a letter describing exactly this issue and instead of not contacting me and verifying my debt they start reporting this debt to the credit reporting agencies. They never verified the debt, like I asked and they never stopped it from being reported when I specifically told them not to, due to the circumstances above.",Debt collection,"Merchants and Professional Bureau, Inc."


In [9]:
# train test split

X_train, X_test = train_test_split(df_new, test_size=0.3, random_state=42)

In [10]:
# show train example

X_train['Product'].value_counts()

Debt collection                15254
Credit card or prepaid card    9237 
Mortgage                       6826 
Checking or savings account    4910 
Student loan                   2071 
Vehicle loan or lease          1919 
Name: Product, dtype: int64

In [11]:
# show test example

X_test['Product'].value_counts()

Debt collection                6518
Credit card or prepaid card    3956
Mortgage                       2973
Checking or savings account    2093
Student loan                   879 
Vehicle loan or lease          817 
Name: Product, dtype: int64

In [12]:
# set stemmer using PorterSemmer

stemmer = PorterStemmer()

In [13]:
# function to automate text tokenization

def tokenize(text):
    tokens = [word for word in nltk.word_tokenize(text) if (len(word) > 3 and len(word.strip('Xx/')) > 2)]
    stems  = [stemmer.stem(item) for item in tokens]
    return stems

In [14]:
# set vectorizer using tf-idf

vectorizer_tf = TfidfVectorizer(tokenizer=tokenize, stop_words='english', max_df=0.75, max_features=10000, use_idf=False, norm=None)
tf_vectors = vectorizer_tf.fit_transform(X_train['Complaints']) 

In [15]:
# show vecorizer features

vectorizer_tf.get_feature_names()

["'account",
 "'all",
 "'annual",
 "'back",
 "'busi",
 "'collect",
 "'credit",
 "'debt",
 "'disput",
 "'find",
 "'fraud",
 "'investig",
 "'late",
 "'make",
 "'may",
 "'new",
 "'not",
 "'pend",
 "'process",
 "'secur",
 "'sign",
 "'system",
 "'the",
 "'they",
 "'thi",
 "'you",
 '-.02',
 '-3-501',
 '-and',
 '-call',
 '-origin',
 '-pleas',
 '-proof',
 '-see',
 '-she',
 '-that',
 '-the',
 '-they',
 '-thi',
 '-what',
 '-which',
 '-your',
 '....',
 '.after',
 '.all',
 '.also',
 '.and',
 '.becaus',
 '.but',
 '.call',
 '.even',
 '.for',
 '.gov',
 '.how',
 '.must',
 '.not',
 '.now',
 '.pdf',
 '.pleas',
 '.she',
 '.thank',
 '.that',
 '.the',
 '.then',
 '.there',
 '.they',
 '.thi',
 '.well',
 '.what',
 '.when',
 '.which',
 '/lender',
 '0.00',
 '0.01',
 '0.125',
 '0.25',
 '1,000',
 '1,000,000',
 '1-1/2',
 '1-800',
 '1-month',
 '1-year',
 '1.00',
 '1.25',
 '1.30',
 '1.59',
 '1.75',
 '1.99',
 '1/12',
 '10,000',
 '10-1-36',
 '10-1-36.',
 '10-12',
 '10-14',
 '10-15',
 '10-20',
 '10-busi',
 '10-day',
 '

In [16]:
# modelling using LDA

lda = decomposition.LatentDirichletAllocation(n_components=6, max_iter=5, learning_method='online', learning_offset=50, n_jobs=-1, random_state=42)

# fit model prediction

W1 = lda.fit_transform(tf_vectors)
H1 = lda.components_

In [17]:
# show prediction

W1

array([[5.15548093e-02, 2.14103085e-01, 1.78452342e-03, 5.59450412e-01,
        1.77795960e-03, 1.71329210e-01],
       [3.32130789e-01, 7.19386746e-02, 3.37323471e-01, 2.57570612e-01,
        5.18242839e-04, 5.18210257e-04],
       [2.30933479e-03, 1.02331101e-01, 1.78345205e-01, 6.65983017e-01,
        4.87314191e-02, 2.29992364e-03],
       ...,
       [3.90983093e-03, 7.41005925e-02, 4.37834942e-01, 3.92948260e-03,
        4.76288670e-01, 3.93648185e-03],
       [3.91989310e-03, 3.91886601e-03, 1.34431036e-01, 8.49921117e-01,
        3.91348872e-03, 3.89559945e-03],
       [1.01439328e-01, 8.85910659e-01, 3.16820966e-03, 3.16546662e-03,
        3.16154926e-03, 3.15478726e-03]])

In [18]:
# show topic with top 10 words

num_words=10

vocab = np.array(vectorizer_tf.get_feature_names())

top_words = lambda t: [vocab[i] for i in np.argsort(t)[:-num_words-1:-1]]
topic_words = ([top_words(t) for t in H1])
topics = [' '.join(t) for t in topic_words]

In [19]:
# show topics

topics

['thi receiv told email time servic said ask phone custom',
 'loan mortgag thi home payment year compani time servic insur',
 'account card bank thi charg credit check money close chase',
 'thi credit debt report collect account compani receiv letter inform',
 'payment late credit account balanc month paid thi statement charg',
 'thi debt report provid inform request document account consum violat']

In [20]:
# getting prediction to data train

colnames = ["Topic" + str(i) for i in range(lda.n_components)]
docnames = ["Doc" + str(i) for i in range(len(X_train['Complaints']))]
df_doc_topic = pd.DataFrame(np.round(W1, 2), columns=colnames, index=docnames)
significant_topic = np.argmax(df_doc_topic.values, axis=1)
df_doc_topic['dominant_topic'] = significant_topic

In [21]:
# show prediction on data train

df_doc_topic

Unnamed: 0,Topic0,Topic1,Topic2,Topic3,Topic4,Topic5,dominant_topic
Doc0,0.05,0.21,0.00,0.56,0.00,0.17,3
Doc1,0.33,0.07,0.34,0.26,0.00,0.00,2
Doc2,0.00,0.10,0.18,0.67,0.05,0.00,3
Doc3,0.00,0.02,0.88,0.07,0.04,0.00,2
Doc4,0.03,0.03,0.03,0.83,0.03,0.03,3
...,...,...,...,...,...,...,...
Doc40212,0.68,0.29,0.00,0.03,0.00,0.00,0
Doc40213,0.00,0.00,0.50,0.00,0.00,0.49,2
Doc40214,0.00,0.07,0.44,0.00,0.48,0.00,4
Doc40215,0.00,0.00,0.13,0.85,0.00,0.00,3


In [22]:
X_train.head()

Unnamed: 0,Complaints,Product,Company
42027,"After many weeks of receiving hourly phone calls from Navient, I sent them a letter instructing them only to contact me by mail, per the Fair Debt Collection Practices Act, and the calls stopped, although they have emailed me a couple of times. Today, XX/XX/19, the mother of a friend from middle school let me know that Navient had called HER and left a message for me. She is not listed as an emergency or secondary contact in any of my paperwork ; Navient should not have her number unless they dig it up themselves. This is a huge breach of privacy and a clear violation of the FDCPA, given that I had instructed them to only contact me by phone and that they contacted someone completely unrelated to me and told them they were looking for me regarding student loan debt. Per the FDCPA : "" a debt collector may not communicate, in connection with the collection of any debt, with any person other than the consumer, his attorney, a consumer reporting agency if otherwise permitted by law, the creditor, the attorney of the creditor, or the attorney of the debt collector. '' Navient clearly violated this. \n\nThe person who contacted her was named XXXX XXXX, and she gave the reference number XXXX and the phone number XXXX extension XXXX.",Debt collection,"Navient Solutions, LLC."
13307,"As a former customer of XXXX, I swithced my services to another carrier as of XX/XX/2018 after experiencing a period of two weeks in which time we had no service. This was due to a global technical issue however we were not able to make and or receive calls. Prior to leaving XXXX I did reach out to tech support both via phone and via the app on the phone time and time again with an attempt to get this issue resolved however I was not able too. Even though we did not have service XXXX was not forgiving the bill in any way shape or form and I had an infant therefore, it was and is important that I have a cell phone that works in case of an emergency. Finally after getting no place, I took my business else where. \n\nA few days later, after switching my cell phone provider, without even contacting XXXX, I received a return kit in the mail so that I could return the two devices, XXXX, to them using the prepaid return label via XXXX. This was exactly what did. During this time, we were in the processing of moving therefore I misplaced the return tracking number. However working in customer care and with return label, I am fully aware that as a company they have the means to locate the return tracking # and track the return package, IF THEY WANTED TOO!. \n\nAgain, call after call, I got no place. On XXXX day, XX/XX/2018, I received notification from at least 3 cards, one being my bank card that XXXX illegally tried to charge each of the cards {$1800.00}. \n\nThis was an issue and I flipped from the beginning as I NEVER authorized XXXX to keep any card I EVER used to pay a bill with on file! Each time I paid a bill via the IVR and was asked if I wanted my information kept on file I responded NO! Furthermore, the amount was incorrect! \n\nWorking in the XXXX industry full time, there is nothing slow about me, I am fully aware that is ILLEGAL to keep a customers credit card and or banking information on file AND charge it for ANY AMOUNT with out their consent!. On the same day, after notifying my card issuers I did not make this purchase and I did not authorize it, to have all my cards cancelled and new ones issued, I contacted XXXX yet again. \n\nI spoke with a nice rep who was only able to assist me to a limited amount, she advised me she would note the account accordingly and have the phones returned located. AT THIS POINT MY ACCOUNT SHOULD HAVE BEEN PLACED ON HOLD AND IT WAS NOT! \n\nI continued to climb the corporate ladder and went to the executive office. Originally I was dealing with XXXX XXXX who was useless. She admitted XXXX DID keep my information on file and that it was okay for them to charge my cards as a letter was sent to me advising it would be done. I NEVER GOT THIS LETTER TO BEGIN WITH AND IT IS NOT OKAY AS I NEVER AUTHORIZED XXXX TO KEEP MY INFORMATION ON FILE! \n\nShe was not willing to help, we had a few words and I started working with XXXX XXXX. Again we went back and forth and originally he too did not want to assist me in locating the phones. I told him what XXXX did is Fraud, even CC executive leadership on each email sent to XXXX and XXXX, though it got me no place! Finally after the 120 days passsed ( this is the length of time XXXX keeps the return tracking information in there system for ) I was told that as a courtesy, he lowered my bill to {$470.00}. \n\nI requested that the account me removed from my credit immediately and XXXX continued to refuse even though THEY COMMITTED FRAUD and they are to blame. I agreed to make a payment, which I did, never committed to an amount and still they would not remove it from my credit. \n\nXXXX has changed their collection companies time and time again, just to keep this invalid debit active. \n\nI have continued to research my rights and have even sent XXXX a demand letter requesting {$3000.00} ; in court, I can get 3 times the amount if it is found in my favor and clearly with XXXX ILLEGALLY keeping and attempting to charge several of my cards with out my consent, they clearly committed fraud. \n\nI went back and forth with the collection companies and finally reached a really nice rep several months ago at Sunrise. When I explained the above in detail to her, she did confirm what XXXX did was fraud and that she would put my account on hold. While I appreciate her help, this is not good enough! \n\nI WANT THIS REMOVED FROM MY CREDIT! For over a year, XXXX, the collection companies and their illegal practices have ruined my credit beyond belief. \n\nI pay all my bills on time and XXXX carelessness and fraudlent activity as destroyed my credit. \n\nI am not able to get any new cards, no loans and or refinance my car. \n\nThis is not acceptable and I would like this removed from my credit immediately and {$3000.00} for all that they have put me thru. \n\nThank you",Debt collection,"SUNRISE CREDIT SERVICES, INC"
19649,"I called this collection company after finding out that XXXX XXXX had submitted a {$50.00} debt from our now XXXX year old. At the time she was XXXX or XXXX and this was also around the time she was on our insurance. However, I believe we had declared bankruptcy. This was a copay that XXXX XXXX XXXX never EVER called about or anything. We moved in XXXX and had our mail forwarded but NEVER EVER received a bill!!! When I called this company recently, as soon as I found out this debt existed, via XXXX XXXX alert system, I TOLD them I can easily pay the {$50.00}. They said it was too late because it was over 7 years old BUT THEY SAID IT WOULDNT go on our credit until I just saw in now, XX/XX/XXXX! Why would I not pay {$50.00}??? This XXXX XXXX XXXX is NOTORIOUS for this behavior. One simple phone call or a certified letter would have instantly solved this 7 years ago. We have always paid our co pay at the time of visit but my then minor daughter may have been told she would be billed and didnt tell me .... at least that is the only thing I can come up with. Tell me who to send the {$50.00} to!! I want this off my credit!",Debt collection,"PlusFour, Inc"
37669,"I have reached out multiple times to my local chase in XXXX IL, and they have failed to solve my problem. I recieved my account number from a teller at my local chase on a business card. When I filed my taxes for XXXX on XX/XX/XXXX I provided the account number that chase had provided me. Well I was supposed to receive my direct deposit for my tax return for the amount of {$1100.00} on XX/XX/XXXX. Well XX/XX/XXXX came and no deposit was made when I contacted my local chase to see what the issue was on XXXX XXXX thus my first complaint. it turned out that the teller person had provided me with the incorrect account number. And the account number provided to me by my local chase was somebody else account and my direct deposit went into this other persons account. After reaching out multiple times and going into my local chase in XXXX IL in person chase has refused to reverse that payment back to the IRS or back to my account. Chase has also allowed somebody else to use of my money!!! Chase has refused multiple times to reverse the payment into my account and accept their fault at giving me the wrong account number. And they have allowed another person to use my hard working money that I earned. I have received a letter from the IRS stating that Chase is fully responsible for fixing there mistake. And reversing the payment to my account. This is not right! And I am extremely dissapointed. You are supposed to trust your bank and trust that they take care of your money. And the fact that they have refused and allowed someone else to steal my money is absured. I have filed a police report and this other person wont be so happy to hear that they are being charged with theft for stealing my money. As well as Jp Morgan Chase wont be happy to hear this is my 3rd complaint and nothing has been resolved and it will be posted on the reviews that they have refused to issue my refund back to me.",Checking or savings account,JPMORGAN CHASE & CO.
9392,Robocall Claiming I owe a debt I do not owe any debts,Debt collection,"Duncan Solutions, lnc."


In [23]:
# predict model to test

WHold = lda.transform(vectorizer_tf.transform(X_test['Complaints']))

In [25]:
# getting model to data test

colnames = ["Topic" + str(i) for i in range(lda.n_components)]
docnames = ["Doc" + str(i) for i in range(len(X_test['Complaints']))]
df_doc_topic = pd.DataFrame(np.round(WHold, 2), columns=colnames, index=docnames)
significant_topic = np.argmax(df_doc_topic.values, axis=1)
df_doc_topic['dominant_topic'] = significant_topic

In [27]:
# show prediction on data test

df_doc_topic

Unnamed: 0,Topic0,Topic1,Topic2,Topic3,Topic4,Topic5,dominant_topic
Doc0,0.18,0.00,0.08,0.74,0.01,0.00,3
Doc1,0.12,0.64,0.09,0.04,0.07,0.04,1
Doc2,0.22,0.70,0.07,0.00,0.00,0.00,1
Doc3,0.02,0.02,0.02,0.71,0.02,0.21,3
Doc4,0.01,0.01,0.01,0.01,0.01,0.94,5
...,...,...,...,...,...,...,...
Doc17231,0.07,0.00,0.53,0.00,0.38,0.02,2
Doc17232,0.00,0.00,0.00,0.58,0.00,0.41,3
Doc17233,0.00,0.00,0.60,0.38,0.00,0.00,2
Doc17234,0.51,0.06,0.43,0.00,0.00,0.00,0


In [28]:
X_test.head()

Unnamed: 0,Complaints,Product,Company
2710,"On XX/XX/XXXX, I paid off the account in full to the original creditor. During this phone call, the original creditor informed me that they will close the account with the debt collection agency. \n\nOn XX/XX/XXXX, I called the original creditor regarding the account closure because I will still seeing the debt on my credit report. During this phone call, they said they had emailed the supervisor of the collection agency on XX/XX/XXXX to close the account and that the collection account will be closed soon. \n\nOn XX/XX/XXXX, I called the supervisor of the collection agency who informed me that he never received any communication from the original creditor regarding the closure of this account. He informed me that he needed a formal confirmation from the original creditor in order to close the account. I placed the supervisor of the collection agency on a 3-way phone call with the original creditor and myself. During this phone call, the supervisor told the original creditor that no communication has been received from the original creditor regarding this account while the original creditor claims she has submitted it. The supervisor of the collection agency explained the process by which the original creditor must take to close the account and both agreed to take the necessary measures to move forward in the closure of the account. \n\nOn XX/XX/XXXX, I called the supervisor of the collection agency who informed me that he never received any communication from the original creditor regarding the closure of this account even after the phone call on XX/XX/XXXX. \n\nOn XX/XX/XXXX, I called the original creditor and asked to be directed to the woman who handles account closure but got her voicemail twice. On the same day, I spoke to another representative of the original creditor who told me the account has been closed as the debt has been paid off. She further suggested that I just needed to call the collection agency and tell them that the debt has been paid off to close my account. I reminded her that I had done so and kept getting nowhere as the communication for closure needed to occur directly between the original creditor and the collection agency. \n\nPlease help me resolve this situation. I paid off my medical debt and just want this off my credit report. I'm not sure how to resolve this situation where the debt has been paid off, but the collection agency refuses to remove the account until the original creditor close it, while the original creditor claims they already closed it.",Debt collection,"CCS Financial Services, Inc."
35615,"I'm in the process of refinancing my exsisting home loan. I've been involved in the process for over a month now. I paid {$500.00} ( on XX/XX/XXXX ) in full for my appraisal. Then my mortgage company came back to me to collect {$300.00} more. So I have TWO issues with Alliant Credit Union -- # 1 that they are charging me more money for an appraisal than what a basic appraisal costs. ( I know this as Ive spoken to the Appraisal company -- XXXX XXXX, XXXX ) # 2. Alliant Credit Union is requiring an "" inside '' appraisal. This should not have to be done for a refinance -- especially for an exsisting loan. AND especially due to the circumstances surrounding the COVID-19 crisis. There are alternative ways to complete the refinance process that does not involve an "" inside '' inspection that would jeopardize the health and safety of my family. I've talked with XXXX XXXX. Who has guidelines published on their website and Selling guides of appraisal exceptions that need to be made. I've been told that a "" drive-by '' ( exterior only ) and/or a desktop appraisal can be done. I've also been told that there is a process titled DRDC -- Directed Remote Data Collection -- which is a process by which an inspection can be completed remotely by an appraiser who guides the homeowner through the inspection aided by video conferencing technology. Again, these are all suitable methods that the Lender - Alliant Credit Union can do to respect the state and federal mandates of "" shelter in place '' and social distancing. I can not have someone come into my home. I, my daughter, my spouse and my XXXX years old mother ( who's immune system is compromised ) lives with me. We can not take any changes of being exposed to this deadly virus. I have clients who have already died from this virus and it should not be taken lightly. I need help here to have my Lender find an alternative way to complete my refinance without having an outside individual in my home. Thank you",Mortgage,ALLIANT CREDIT UNION
44033,"I went to quicken loans and Rocket Mortgage to try to refinance my mortgage. I had just gotten word that my mortgage interest was going to increase so I wanted a new loan. I contacted quicken loans on approximately XX/XX/XXXX They said they could help me. In our initial contacts and review of loan I was told that I would not need to put any money down as closing costs. I would need to do an initial deposit for appraisals, etc. During the loan process and the appraisal things changed. I was constantly asked to prove I had XXXX which I don't have. I was then told that the appraisal came in lower so I would have to. I spent over, including the XXXX deposit, XXXX dollars in getting my house ready. I kept continuall y getting letters for proof of money. I finally called and they said I need the XXXX now because my house was appraised lower and basically they didn't think they could get me a loan now. At this time covid has happened and I am unemployed. I begged for my appraisal back. I would not have gone through this process if I was initially told I would need money down. I don't have any money to do that. They said it is through a 3rd party so no. They refused to give me any money back.",Mortgage,"QUICKEN LOANS, INC."
17758,Claiming I had a debt and to contact a know scam debt collection scheme,Debt collection,Terrill Outsourcing Group
6724,I am a victim of identity theft and this debt does not belong to me. Please see the identity theft report and legal affidavit attached.,Debt collection,"PMAB,LLC"
