## Multi-Class Text Classification with Doc2Vec & Logistic Regression

The goal is to classify consumer finance complaints into 12 pre-defined classes using Doc2Vec and Logistic Regression.

Doc2vec is an NLP tool for representing documents as a vector and is a generalizing of the word2vec method.

### Consumer Complaint Database
> https://catalog.data.gov/dataset/consumer-complaint-database

*Metadata Updated: November 10, 2020*

The Consumer Complaint Database is a collection of complaints about consumer financial products and services that we sent to companies for response.

Complaints are published after the company responds, confirming a commercial relationship with the consumer, or after 15 days, whichever comes first.

Complaints referred to other regulators, such as complaints about depository institutions with less than $10 billion in assets, are not published in the Consumer Complaint Database. The database generally updates daily.

In [1]:
import pandas as pd
import numpy as np
from tqdm import tqdm

tqdm.pandas(desc="progress-bar")
from gensim.models import Doc2Vec
from sklearn import utils
from sklearn.model_selection import train_test_split
import gensim
from sklearn.linear_model import LogisticRegression
from gensim.models.doc2vec import TaggedDocument
import re
import seaborn as sns
import matplotlib.pyplot as plt

df = pd.read_csv('./complaints.csv')
df = df[['Consumer complaint narrative', 'Product']]
df = df[pd.notnull(df['Consumer complaint narrative'])]
df.rename(columns={'Consumer complaint narrative': 'narrative'}, inplace=True)
df.head(10)

C:\ProgramData\Anaconda3\lib\site-packages\numpy\.libs\libopenblas.GK7GX5KEQ4F6UYO3P26ULGBQYHGQO7J4.gfortran-win_amd64.dll
C:\ProgramData\Anaconda3\lib\site-packages\numpy\.libs\libopenblas.WCDJNK7YVMPZQ2ME2ZZHJJRJ3JIKNDB7.gfortran-win_amd64.dll


Unnamed: 0,narrative,Product
0,transworld systems inc. \nis trying to collect...,Debt collection
2,"Over the past 2 weeks, I have been receiving e...",Debt collection
4,My personal information was used without my kn...,"Credit reporting, credit repair services, or o..."
5,XX/XX/2021 I lost my phone and I didn't notice...,"Money transfer, virtual currency, or money ser..."
10,"Previously, on XX/XX/XXXX, XX/XX/XXXX, and XX/...","Credit reporting, credit repair services, or o..."
11,Hello This complaint is against the three cred...,"Credit reporting, credit repair services, or o..."
15,Today XX/XX/XXXX went online to dispute the in...,"Credit reporting, credit repair services, or o..."
17,XXXX is reporting incorrectly to Equifax and X...,"Credit reporting, credit repair services, or o..."
18,Please reverse the late payments reported on t...,"Credit reporting, credit repair services, or o..."
19,Pioneer has committed several federal violatio...,Debt collection
