# Exploratory Data Analysis
## CFPB Complaints

The purpose of this project is to download and explore a dataset using Python and associated libraries.

The dataset is about consumer complaints about finacial products and services collected by the Consumer Financial Protection Bureau (CFPB). The data can be downloaded from the [data.gov](https://www.data.gov) website which hosts the U.S Government's open data.

### Import Libraries

In [2]:
%matplotlib inline

In [62]:
import pandas as pd
import numpy as np
import json
import requests
from datetime import datetime
import matplotlib.pyplot as plt
import seaborn as sns

In [4]:
pd.set_option('display.max_colwidth',1000) # Show complete text in dataframe with truncating.

### Gather

Downloaded dataset manually as a csv file and saved it locally as the file is too large and is causing app to crash when I download automatically using the requests library. (should check this out to see why)

In [76]:
# Read csv file into Pandas DataFrame.
complaints_df = pd.read_csv('Consumer_Complaints.csv', parse_dates=True) 


In [77]:
col_names = ['Date_received', 'Product', 'Sub_product', 'Issue', 'Sub_issue',
       'Consumer_complaint_narrative', 'Company_public_response', 'Company',
       'State', 'Zip', 'Tags', 'Consumer_consent_provided?',
       'Submitted_via', 'Date_sent_to_company', 'Company_response_to_consumer',
       'Timely_response?', 'Consumer_disputed?', 'Complaint_ID']
complaints_df.columns = col_names

In [78]:
complaints_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 932473 entries, 0 to 932472
Data columns (total 18 columns):
Date_received                   932473 non-null object
Product                         932473 non-null object
Sub_product                     697303 non-null object
Issue                           932473 non-null object
Sub_issue                       450184 non-null object
Consumer_complaint_narrative    227328 non-null object
Company_public_response         271533 non-null object
Company                         932473 non-null object
State                           922555 non-null object
Zip                             918556 non-null object
Tags                            129864 non-null object
Consumer_consent_provided?      410778 non-null object
Submitted_via                   932473 non-null object
Date_sent_to_company            932473 non-null object
Company_response_to_consumer    932473 non-null object
Timely_response?                932473 non-null object
Consumer_

In [85]:
complaints_df['Date_received'] = pd.to_datetime(complaints_df['Date_received'])
complaints_df['Date_sent_to_company'] = pd.to_datetime(complaints_df['Date_sent_to_company'])

### Univariate Analysis
In this section, I will explore variables individually.

In [86]:
complaints_df.shape

(932473, 18)

In [87]:
complaints_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 932473 entries, 0 to 932472
Data columns (total 18 columns):
Date_received                   932473 non-null datetime64[ns]
Product                         932473 non-null object
Sub_product                     697303 non-null object
Issue                           932473 non-null object
Sub_issue                       450184 non-null object
Consumer_complaint_narrative    227328 non-null object
Company_public_response         271533 non-null object
Company                         932473 non-null object
State                           922555 non-null object
Zip                             918556 non-null object
Tags                            129864 non-null object
Consumer_consent_provided?      410778 non-null object
Submitted_via                   932473 non-null object
Date_sent_to_company            932473 non-null datetime64[ns]
Company_response_to_consumer    932473 non-null object
Timely_response?                932473 non-null 

In [91]:
complaints_df.State.value_counts()

CA    131832
FL     88556
TX     74549
NY     62778
GA     46323
NJ     35907
IL     34949
PA     32541
VA     28919
MD     28528
OH     28317
NC     27509
MI     22792
AZ     20568
WA     19070
MA     17568
CO     15810
TN     14969
SC     12945
MO     12442
NV     11406
CT     10849
IN     10680
OR     10488
MN     10412
LA     10294
AL     10180
WI      9948
KY      6781
OK      6063
       ...  
MS      4767
DE      4716
KS      4635
NM      4420
AR      4193
NH      3973
IA      3881
ID      3028
HI      2971
ME      2943
NE      2849
RI      2844
PR      2401
WV      2368
MT      1490
VT      1404
SD      1254
AK      1034
ND       939
WY       855
AE       378
AP       267
VI       200
GU       146
FM        48
MP        31
MH        30
AS        25
AA        15
PW        13
Name: State, Length: 62, dtype: int64