# Analysis of Financial Consumer Complaints Database

This mini project goes over how Essentia can be used to explore data to analyze publically available data to gain insights.

Data is available at open source [data of Consumer Complaint Database](https://catalog.data.gov/dataset/consumer-complaint-database#topic=consumer_navigation) from [Data.gov](https://www.data.gov/).

A 5 minutes summary of this EDA project is available at [PivotBillions blog](https://pivotbillions.com/5-minute-analysis-of-data-gov-datasets-with-pivotbillions/), so please take a look at it. On this notebook we'll go over how data wrangling and transformation is done using Essentia.


## Setup

We'll first scan and create a category for the data, which is stored on s3 bucket named _essentia-playground_ , then take a brief look at the data's structures and schema.
After that we'll start EDA using various aq_commands.


In [5]:
# First we'll take a look at files in the s3 bucket
ess select s3://essentia-playground
ess ls mini-projects/consumer-complaints/

   0B Nov 19 22:01    /mini-projects/consumer-complaints/
 831M Nov 19 22:01    /mini-projects/consumer-complaints/Consumer_Complaints.csv


In [16]:
# create a category, and display its summary and schemas
ess category add complaints "/mini-projects/consumer-complaints/*.csv"
ess summary compliants | head -n 21

2019-11-19 23:07:06 ip-10-10-1-118 ess[8942]: Fetching file list from datastore.
2019-11-19 23:07:06 ip-10-10-1-118 ess[8942]: Examining largest matched file to determine compression type: /mini-projects/consumer-complaints/Consumer_Complaints.csv
2019-11-19 23:07:06 ip-10-10-1-118 ess[8942]: Probing largest matched file to determine data configuration: /mini-projects/consumer-complaints/Consumer_Complaints.csv
Name:        compliants
Pattern:     /mini-projects/consumer-complaints/*.csv
Exclude:     None
Date Format: auto
Date Regex:  
Archive:     
Delimiter:   Comma
# of files:  1
Total size:  832.0MB
File range:  1970-01-01 - 1970-01-01
# columns:   18
Column Spec: S:Date_received S:Product S:Sub_product S:Issue S:Sub_issue S:Consumer_complaint_narrative S:Company_public_response S:Company S:State S:ZIP_code S:Tags S:Consumer_consent_provided S:Submitted_via S:Date_sent_to_company S:Company_response_to_consumer S:Timely_response S:Consumer_disputed I:Complaint_ID
Pkey: 
Schema: S:D

## EDA

Now we'll move onto exploratory data analysis. Let's first take a look at change in numbers of complaints over the years. 

**Numbers of Complaints over the Years**<br>
To do this, we can use column `date_received`, which looks like below.

In [18]:
ess stream complaints "*" "*"  "aq_pp -f,+1 - -d %cols -c date_received | head -n 10"

"Date_received"
"08/09/2015"
"01/29/2019"
"10/13/2019"
"08/19/2015"
"03/04/2016"
"03/18/2013"
"12/21/2011"
"10/24/2018"
"03/03/2018"


It is a string column contains month, date and year. In order to count numbers of complaints per year, we'd like to **count unique number of `complaints_id` per each year.**

**1. extract year**<br>
To do this, we will extract year from `Date_received` column and store it in `year` column(new column).

We're using [-map(f/c)](../../aq_pp%20-map.ipynb) option to extract the year.

In [27]:
ess stream complaints "*" "*"  \
"aq_pp -f,+1 - -d %cols \
-mapf date_received \"%*/%%YEAR:4-4%%\" -mapc s:year \"%%YEAR%%\" \
-c complaint_id year | head -n 10"

"Complaint_ID","year"
1509954,"2015"
3136759,"2019"
3404213,"2019"
1527601,"2015"
1816726,"2016"
358304,"2013"
7362,"2011"
3054861,"2018"
2831821,"2018"


Now we've successfully extracted year from `Date_received` column using map option. You can also do this with `-eval` option with one of the builtin function as well.

**2. count numbers of complaints per year**<br>

Now let's count the numbers of complaints per year. We'll do this using `aq_cnt`'s `-kX` option, with `-g` option.

**Note that we'll be combining this command and previous command, and piping the output from previous to this command.**

In [7]:
ess stream complaints "*" "*"  \
"aq_pp -f,+1 - -d %cols \
-mapf date_received \"%*/%%YEAR:4-4%%\" -mapc s:year \"%%YEAR%%\" \
-c  complaint_id year | \
aq_cnt -f,+1 - -d I:complaint_id s:year -g year -k key complaint_id"

"year","row","key"
"2012",72373,72373
"2017",242968,242968
"2014",153045,153045
"2018",257341,257341
"2011",2536,2536
"2013",108217,108217
"2016",191471,191471
"2019",238466,238466
"2015",168476,168476


The numbers of complaints gone up over the year since 2011.

<img src="img/complaints_per_year.png">

**Tops Financial Services that was associated with a Complaiants**<br>

We can also dig deeper on which financial products/services received most complaints over the years.

We'll do this by counting frequencies of each values in `sub_product` column.

In [10]:
# counting frequency counts of each sub_product, and displaying top 10 only
ess stream complaints "*" "*"  \
"aq_cnt -f,+1 - -d %cols \
-kX - keys sub_product | head -n 10"

"Sub_product","count"
"Student prepaid card",6
"Pawn loan",121
"Electronic Benefit Transfer / EBT card",12
"Transit card",37
"Credit repair",103
"Gift card",236
"Check cashing service",227
"Traveler’s/Cashier’s checks",88
"Gift or merchant card",402


<img src='img/complaints_by_category.png'>

You can see that credit card, checking account and other morgarges are the top 3 complaints categories.

**Top financial institutions that receives these complains**<br>

Receivers of these complains can be investigated similary by using `aq_cnt` command on `company` column.

In [14]:
# counting frequency counts of each sub_product, and displaying top 10 only
ess stream complaints "*" "*"  \
"aq_cnt -f,+1 - -d %cols \
-kX - keys company | head -n 10"

"Company","count"
"Ascent Holding Co",1
"Credit Mount",1
"Exchange Finance Company",1
"Law Office of Anthony C. Onwuanibe",1
"Off Lease Only Inc.",1
"NORTHPOINT MORTGAGE, INC.",1
"MECHANICS BANK",1
"Credit Advisors Foundation",1
"Pinnacle Lending Group, Inc.",1


<img src="img/complaints_by_company.png">

From the graph you can observe that major credit reporting companies and banks occupy the top receivers of complaints.


**Complaints by States**<br>

You can see that the top states that most complaints are reported are follows, by using `state` column with `aq_cnt`'s `-k` and `-g` option like below.

In [6]:
# counting numbers of unique complaint_id groupby state, and displaying top 10 only
ess stream complaints "*" "*"  \
"aq_cnt -f,+1 - -d %cols \
-g state -k key complaint_id | head -n 10"

"State","row","key"
"PW",13,13
"MP",34,34
"AS",27,27
"MH",31,31
"UNITED STATES MINOR OUTLYING ISLANDS",43,43
"FM",115,115
"AA",25,25
"GU",200,200
"AE",531,531


As you can see, top 3 states for complaints are California, followed by Florida and Texas.

<img src="img/complaints_by_state.png">

**Preferred methods of submission**<br>
Next let's take a look at preferred methods of submitting complaints. This infomation is available in `submitted_via` column.

We'll start with what methods are available.

In [7]:
# methods of complaints submissions
ess stream complaints "*" "*"  \
"aq_cnt -f,+1 - -d %cols \
-kx - key submitted_via"

"Submitted_via"
"Email"
"Postal mail"
"Fax"
"Phone"
"Referral"
"Web"


Wow.. it's surprising to see Fax included in the genre. Anyways, we can take a look at changes in the frequency counts of each methods overtime. 

**Preferred Methods of Submission Overtime**<br>

First we'd like to extract month, date and year from `date_received` column in order to be able to analyze records monthly, daily and annually. 

In [17]:
# extract month, day and year and put them into new columns
ess stream complaints "*" "*" \
"aq_pp -f,+1 - -d %cols -mapf date_received \"%%MONTH:2-2%%/%%DATE:2-2%%/%%YEAR:4-4%%\" \
-mapc S:date \"%%DATE%%\" -mapc S:month \"%%MONTH%%\" -mapc S:year \"%%YEAR%%\" \
-c month date year submitted_via | head -n 20"

"month","date","year","Submitted_via"
"08","09","2015","Web"
"01","29","2019","Web"
"10","13","2019","Web"
"08","19","2015","Web"
"03","04","2016","Web"
"03","18","2013","Referral"
"12","21","2011","Web"
"10","24","2018","Web"
"03","03","2018","Web"
"01","02","2019","Web"
"12","23","2018","Web"
"09","12","2016","Phone"
"04","13","2018","Web"
"06","05","2014","Web"
"08","07","2015","Web"
"11","01","2016","Web"
"12","28","2016","Web"
"04","01","2017","Web"
"06","26","2017","Web"


With `aq_cnt`, we can get frequency counts of each submission per each day, month and year.

In [31]:
# Next pass it to aq_cnt to get frequency counts of each submission methods per day
ess stream complaints "*" "*" \
"aq_pp -f,+1 - -d %cols -mapf date_received \"%%MONTH:2-2%%/%%DATE:2-2%%/%%YEAR:4-4%%\" \
-mapc S:date \"%%DATE%%\" -mapc S:month \"%%MONTH%%\" -mapc S:year \"%%YEAR%%\" \
-c month date year submitted_via | \
aq_cnt -f,+1 - -d i:month i:date i:year s:submitted_via \
-g year month date -kX - methods submitted_via | head -n 20"

"year","month","date","submitted_via","count"
2019,11,18,"Phone",1
2019,11,18,"Web",16
2019,11,16,"Web",24
2019,7,4,"Web",393
2012,1,1,"Web",14
2019,11,17,"Web",22
2014,3,9,"Web",5
2012,3,18,"Web",23
2019,6,29,"Referral",26
2019,6,29,"Fax",8
2019,6,29,"Phone",1
2019,6,29,"Postal mail",71
2019,6,29,"Web",432
2011,12,18,"Web",16
2019,11,9,"Referral",1
2019,11,9,"Web",140
2011,12,25,"Web",10
2012,8,12,"Web",35
2012,6,16,"Referral",1


This output is in flat table, and hard to interpret for us. Let's pivot it, making year as new table's row index and submitted_via values into column using `aq_rst`. 

In [34]:
# Next pass it to aq_cnt to get frequency counts of each submission methods per day
ess stream complaints "*" "*" \
"aq_pp -f,+1 - -d %cols -mapf date_received \"%%MONTH:2-2%%/%%DATE:2-2%%/%%YEAR:4-4%%\" \
-mapc S:date \"%%DATE%%\" -mapc S:month \"%%MONTH%%\" -mapc S:year \"%%YEAR%%\" \
-c month date year submitted_via | \
aq_cnt -f,+1 - -d i:month i:date i:year s:submitted_via \
-g year month date -kX - methods submitted_via | \
aq_rst -f,+1 - -d I:year I:month I:date S:submitted_via I:count \
-key year month date -lab submitted_via -val count -ord | head -n 20"

"year","month","date","Email","Fax","Phone","Postal mail","Referral","Web"
2011,12,1,0,0,15,2,63,40
2011,12,2,0,0,11,2,77,48
2011,12,3,0,0,0,0,0,26
2011,12,4,0,0,0,0,3,19
2011,12,5,0,1,10,3,108,42
2011,12,6,0,0,5,1,107,57
2011,12,7,2,1,15,6,41,35
2011,12,8,1,2,11,2,28,122
2011,12,9,0,0,6,2,22,38
2011,12,10,0,0,0,0,0,36
2011,12,11,0,0,0,0,1,58
2011,12,12,0,0,29,4,10,120
2011,12,13,0,0,11,7,15,101
2011,12,14,0,0,19,2,12,59
2011,12,15,1,0,4,2,15,42
2011,12,16,0,0,12,1,24,64
2011,12,17,0,0,1,0,1,26
2011,12,18,0,0,0,0,0,16
2011,12,19,0,3,11,0,26,51


We can observe that the complaints submission via web has been increasing since the beginning of data collection time while numbers of other methods of submission stays comparatively low.
<img src='img/submission_via_time.png'>
_the data to plot the graph was aggregated by same month by mean value to smooth the graph_

The above command outputted frequency counts of each methods by date (everyday) from 2012 to 2019. 

Let's organize this flat table into pivot table of index of date, and column of submitted_via values using `aq_rst`.

**Numbers of Complaints by Categories overtime**<br>

Let's focus on top three major financial product / services reported on, and observe the change in numbers overtime.

**Top 3 sub_product**<br>
Let's start with investigating the top 3 of product sub category using `aq_cnt`.


In [49]:
# getting frequency counts for each sub_category, sort by frequency count, then output the top 10
ess stream complaints "*" "*" \
"aq_cnt -f,+1 - -d %cols -kX - key sub_product | \
aq_ord -f,+1 - -d S:sub_product I:count -sort,dec count | head -n 10"

"sub_product","count"
"Credit reporting",298072
,235166
"Checking account",98613
"Other mortgage",86635
"Conventional fixed mortgage",70613
"I do not know",55600
"General-purpose credit card or charge card",47576
"Other (i.e. phone, health club, etc.)",44544
"Other debt",37225


**Empty Sub_product**<br>
Looking at the top 3, we notice that the second sub_product is empty. 
Let's check out the other column's value on these empty rows.
We can use `-filt` option of `aq_pp` command, to filter out only the records with empty value of sub_product.

In [52]:
# displaying other columns for the empty sub_product records
ess stream complaints "*" "*" \
"aq_pp -f,+1 - -d %cols -filt 'sub_product==\"\"' \
-c sub_product product company | head -n 20"

"Sub_product","Product","Company"
,"Credit reporting","Experian Information Solutions Inc."
,"Credit card","DISCOVER BANK"
,"Credit card","CITIBANK, N.A."
,"Credit reporting","Experian Information Solutions Inc."
,"Credit card","CITIBANK, N.A."
,"Credit reporting","Experian Information Solutions Inc."
,"Credit reporting","TRANSUNION INTERMEDIATE HOLDINGS, INC."
,"Credit reporting","Experian Information Solutions Inc."
,"Credit reporting","EQUIFAX, INC."
,"Credit card","JPMORGAN CHASE & CO."
,"Credit reporting","EQUIFAX, INC."
,"Credit reporting","EQUIFAX, INC."
,"Credit reporting","TRANSUNION INTERMEDIATE HOLDINGS, INC."
,"Credit reporting","TRANSUNION INTERMEDIATE HOLDINGS, INC."
,"Credit card","SYNCHRONY FINANCIAL"
,"Credit reporting","Experian Information Solutions Inc."
,"Credit reporting","EQUIFAX, INC."
,"Credit reporting","EQUIFAX, INC."
,"Credit card","CAPITAL ONE FINANCIAL CORPORATION"


We can see that the top 20 records for the empty sub_product are complaints for credit product.
Let's see what kinds of products belongs to this empty sub_product using `aq_cnt`.

In [54]:
# unique value of products that belongs to empty sub_product 
ess stream complaints "*" "*" \
"aq_pp -f,+1 - -d %cols -filt 'sub_product==\"\"' \
-c sub_product product | \
aq_cnt -f,+1 - -d S:sub_product S:product \
-kX - s_prod product"

"product","count"
"Payday loan",5544
"Credit card",89190
"Credit reporting",140432


3 products belongs to the empty sub_product category. One of them Credit reporting is one of the sub_product we've seen, so this looks like sub_product names were missrecorded as product by accident. <br>
Let's check if this is the case by checking all of the products that belongs to each sub_product.



In [76]:
# listing up all the products grouped by sub_products
ess stream complaints "*" "*" \
"aq_cnt -f,+1 - -d %cols -g Sub_product -kx - key Product | \
aq_pp -f,+1 - -d S:sub_product S:product -o,fix - -c sub_product,n=40 product,n=30"

sub_product                             product                       
Student prepaid card                    Credit card or prepaid card   
Pawn loan                               Payday loan, title loan, or pe
Pawn loan                               Consumer Loan                 
Electronic Benefit Transfer / EBT card  Prepaid card                  
Transit card                            Prepaid card                  
Credit repair                           Other financial service       
Gift card                               Credit card or prepaid card   
Check cashing service                   Money transfer, virtual curren
Traveler’s/Cashier’s checks         Other financial service       
Gift or merchant card                   Prepaid card                  
CD (Certificate of Deposit)             Checking or savings account   
Government benefit payment card         Prepaid card                  
Refund anticipation check               Money transfer, virtual curren
Refund ant

**Imputing Empty Sub_product**<br>
You can see that all three of payday loan, credit card and credit reporting appears in the sub_product. It is safe to say that these were misplaced to product column instead of sub_product column. <br>
With that in mind, we can impute the empty values of sub_product with the value from product using `aq_pp` command.

In [81]:
# impute the empty sub_product column with value from product column
ess stream complaints "*" "*" \
"aq_pp -f,+1 - -d %cols -if -filt 'sub_product==\"\"' \
-eval sub_product product -c sub_product product -endif | head -n 20"

"Sub_product","Product"
"Credit reporting","Credit reporting"
"Credit card","Credit card"
"Credit card","Credit card"
"Credit reporting","Credit reporting"
"Credit card","Credit card"
"Credit reporting","Credit reporting"
"Credit reporting","Credit reporting"
"Credit reporting","Credit reporting"
"Credit reporting","Credit reporting"
"Credit card","Credit card"
"Credit reporting","Credit reporting"
"Credit reporting","Credit reporting"
"Credit reporting","Credit reporting"
"Credit reporting","Credit reporting"
"Credit card","Credit card"
"Credit reporting","Credit reporting"
"Credit reporting","Credit reporting"
"Credit reporting","Credit reporting"
"Credit card","Credit card"


**Top sub_products, Imputated**<br>

Now we can investigate the top sub_products that generated most complaints, with imputed data.

In [89]:
# impute data, count the frequencies of each sub_product, then sort the result
ess stream complaints "*" "*" \
"aq_pp -f,+1 - -d %cols -if -filt 'sub_product==\"\"' \
-eval sub_product product -endif -c sub_product | \
aq_cnt -f,+1 - -d S:sub_product -kX - key sub_product | \
aq_ord -f,+1 - -d S:sub_product I:count -sort,dec count | head -n 10"

"sub_product","count"
"Credit reporting",438504
"Credit card",117888
"Checking account",98613
"Other mortgage",86635
"Conventional fixed mortgage",70613
"I do not know",55600
"General-purpose credit card or charge card",47576
"Other (i.e. phone, health club, etc.)",44544
"Other debt",37225


**Change of numbers of top sub_products overtime**<br>

Finally we can get to work of counting the sub_products frequencies overtime. <br>
This involves all the steps we took, from 
1. imputing the data
2. filter the top 4 sub_product, 
    1. credit reporting
    2. credit card
    3. checkign account
    4. other mortgage
3. extracting year, month and date from `date_received` column
4. counting the frequencies by year, month group.
5. sort the result by year and month
6. pivot the result to a new table, with 
    1. month and year as rows
    2. sub_products as columns

In [108]:
ess stream complaints "*" "*" \
"aq_pp -f,+1 - -d %cols -if -filt 'sub_product==\"\"' -eval sub_product product -endif \
-filt 'sub_product == \"Credit reporting\" || sub_product == \"Credit card\" || sub_product == \"Checking account\" || sub_product == \"Other mortgage\"' \
-mapf date_received \"%%MONTH:2-2%%/%%DATE:2-2%%/%%YEAR:4-4%%\" \
-mapc S:date \"%%DATE%%\" -mapc S:month \"%%MONTH%%\" -mapc S:year \"%%YEAR%%\" \
-c year month sub_product | \
aq_cnt -f,+1 - -d s:year s:month s:sub_product \
-g year month -kX - key sub_product | \
aq_rst -f,+1 - -d I:year I:month S:sub_product I:count \
-key year month -lab sub_product -val count -ord | head -n 20"

"year","month","Checking account","Credit card","Credit reporting","Other mortgage"
2011,12,0,1260,0,388
2012,1,0,1167,0,592
2012,2,0,1212,0,932
2012,3,918,1378,0,1513
2012,4,994,1173,0,1512
2012,5,1330,1468,0,2496
2012,6,1093,1699,0,2340
2012,7,995,1505,0,1676
2012,8,974,1281,0,2155
2012,9,868,994,0,1494
2012,10,943,1358,360,1603
2012,11,745,1086,782,1408
2012,12,731,1032,731,1605
2013,1,1017,1162,911,2755
2013,2,849,1142,1065,2258
2013,3,934,1280,1098,2531
2013,4,772,1216,1151,2238
2013,5,778,1072,1207,2126
2013,6,820,1022,1112,1977


<img src='img/comp_sub_prod_time.png'>

As you can see, number of complaints for credit reporting gradually increase until it rapidly hits the peak around 2017, september. This time was when Equifax revealed about the data breach, which would make sense.

**September 2017, Credit Reporting**<br>

We can dig deeper about what happened in September of 2017 in credit reporting sub_product that caused so many complaints.<br>
We will filter out the records that belongs to this time periods, as well as to the sub_product of credit report.

In [116]:
# Filtering out credit report related complaints that occured on Sep 2017
ess stream complaints "*" "*" \
"aq_pp -f,+1 - -d %cols -filt 'sub_product == \"Credit reporting\" && PatCmp(date_received, \"09/*/2017\")' \
-c date_received sub_issue company | head -n 20"

"Date_received","Sub_issue","Company"
"09/09/2017","Reporting company used your report improperly","EQUIFAX, INC."
"09/06/2017","Public record information inaccurate","EQUIFAX, INC."
"09/14/2017","Difficulty submitting a dispute or getting information about a dispute over the phone","EQUIFAX, INC."
"09/23/2017","Reporting company used your report improperly","EQUIFAX, INC."
"09/20/2017","Credit inquiries on your report that you don't recognize","Experian Information Solutions Inc."
"09/09/2017","Reporting company used your report improperly","EQUIFAX, INC."
"09/09/2017","Reporting company used your report improperly","EQUIFAX, INC."
"09/30/2017","Difficulty submitting a dispute or getting information about a dispute over the phone","EQUIFAX, INC."
"09/26/2017","Credit inquiries on your report that you don't recognize","JPMORGAN CHASE & CO."
"09/25/2017","Account information incorrect","TRANSUNION INTERMEDIATE HOLDINGS, INC."
"09/08/2017","Was not notified of investigation status or res

Let's zoom in, in terms of time period. We'd like to know on which date exactly the number of complaints spiked up.


In [123]:
# Filtering out credit report related complaints that occured on Sep 2017
# then get the distribution of company 
ess stream complaints "*" "*" \
"aq_pp -f,+1 - -d %cols -filt 'sub_product == \"Credit reporting\" && PatCmp(date_received, \"09/*/2017\")' \
-c date_received | \
aq_cnt -f,+1 - -d s:date_received -kX - comp_date date_received | \
aq_ord -f,+1 - -d s:date_received i:count -sort,dec count | head -n 10"

"date_received","count"
"09/08/2017",3015
"09/09/2017",2445
"09/13/2017",1027
"09/11/2017",754
"09/12/2017",732
"09/14/2017",629
"09/10/2017",499
"09/15/2017",481
"09/26/2017",447


Looks like the spike occurred over 2 days, September 8th and 9th, 2017. What does distribution of companies look like over the 2 days?

In [129]:
# zooming into two days
ess stream complaints "*" "*" \
"aq_pp -f,+1 - -d %cols -filt 'sub_product == \"Credit reporting\" && RxCmp(date_received, \"(09\/0[8|9]\/2017)\", pcre)' \
-c date_received company | \
aq_cnt -f,+1 - -d s:date_received s:company -g date_received -kX - key company | \
aq_ord -f,+1 - -d s:date_received s:company i:count -sort,dec count date_received | head -n 10"

"date_received","company","count"
"09/08/2017","EQUIFAX, INC.",2645
"09/09/2017","EQUIFAX, INC.",2266
"09/08/2017","Experian Information Solutions Inc.",161
"09/08/2017","TRANSUNION INTERMEDIATE HOLDINGS, INC.",107
"09/09/2017","Experian Information Solutions Inc.",92
"09/09/2017","TRANSUNION INTERMEDIATE HOLDINGS, INC.",62
"09/08/2017","CAPITAL ONE FINANCIAL CORPORATION",9
"09/08/2017","SANTANDER CONSUMER USA HOLDINGS INC.",7
"09/08/2017","BANK OF AMERICA, NATIONAL ASSOCIATION",5


Majority of the complaints are towards EQUIFAX during 2 days. Looking at the details of the complaints by checking `sub_issue` column on these complaints,

In [135]:
ess stream complaints "*" "*" \
"aq_pp -f,+1 - -d %cols -filt 'sub_product == \"Credit reporting\" && RxCmp(date_received, \"(09\/0[8|9]\/2017)\", pcre) && company == \"EQUIFAX, INC.\"' \
-c date_received company issue | head -n 20"

"Date_received","Company","Issue"
"09/09/2017","EQUIFAX, INC.","Improper use of your report"
"09/09/2017","EQUIFAX, INC.","Improper use of your report"
"09/09/2017","EQUIFAX, INC.","Improper use of your report"
"09/08/2017","EQUIFAX, INC.","Problem with a credit reporting company's investigation into an existing problem"
"09/08/2017","EQUIFAX, INC.","Improper use of your report"
"09/09/2017","EQUIFAX, INC.","Improper use of your report"
"09/08/2017","EQUIFAX, INC.","Improper use of your report"
"09/09/2017","EQUIFAX, INC.","Improper use of your report"
"09/09/2017","EQUIFAX, INC.","Improper use of your report"
"09/09/2017","EQUIFAX, INC.","Improper use of your report"
"09/08/2017","EQUIFAX, INC.","Improper use of your report"
"09/08/2017","EQUIFAX, INC.","Problem with a credit reporting company's investigation into an existing problem"
"09/08/2017","EQUIFAX, INC.","Improper use of your report"
"09/08/2017","EQUIFAX, INC.","Improper use of your report"
"09/09/2017","EQUIFAX, INC.","Impr

Most of them's issues are "Improper use of your report". What does this exactly mean? We can again further fliter the data to complaints with this phrase in Issue column, and take a look at Consumer_complaint_narrative column to see what customers said.

In [141]:
ess stream complaints "*" "*" \
"aq_pp -f,+1 - -d %cols -filt 'sub_product == \"Credit reporting\" && RxCmp(date_received, \"(09\/0[8|9]\/2017)\", pcre) && company == \"EQUIFAX, INC.\" && issue == \"Improper use of your report\"' \
-c consumer_complaint_narrative | head -n 20"

"Consumer_complaint_narrative"

"XXXX of XXXX through XXXX XXXX Kept breach a secret for months that allowed the hackers a head start on using the information gained. 

equifax system breach. Allowed access to my personal information that could/will cause problem for the rest of my life."
"Equifax experienced a security breach that affects 143 million individuals. I checked their website ( https : //www.equifaxsecurity2017.com/potential-impact/ ) to determine whether or not I was potentially impacted, and the results say that I  am. 

I have requested a 90-day fraud alert to be set up so that I am aware of any attempts made at stealing my identity."

"Equifax was the subject of a cyberattack and failed to protect the sensitive, personal information I trusted them with. 

The breach lasted from mid-XX/XX/XXXX The hackers accessed peoples names, Social Security numbers, birth dates, addresses and, in some instances, drivers license numbers. They also stole credit card numbers for about 2

By looking at the narratives, it's pretty obvious that the complaints spike during September 8~9th on 2017, towards  EQUIFAX were about the data breach.