# Working with Pandas Data Frames

In this notebook, we'll load some data as a Pandas Data Frame and do some analysis on the data.

## 1.0 Load and Clean data

We'll upload are data using `wget` and then load our data into a pandas data frame.

In [1]:
!wget https://raw.githubusercontent.com/IBM/python-and-analytics/master/data/cfpbciti.csv

--2020-10-19 19:52:45--  https://raw.githubusercontent.com/IBM/python-and-analytics/master/data/cfpbciti.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.48.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3948432 (3.8M) [text/plain]
Saving to: ‘cfpbciti.csv’


2020-10-19 19:52:46 (46.7 MB/s) - ‘cfpbciti.csv’ saved [3948432/3948432]



We use the convention and set our data frame to the variable `df`.

In [2]:
import pandas as pd
df = pd.read_csv('cfpbciti.csv')
df.head(5)

Unnamed: 0,Date received,Product,Sub-product,Issue,Sub-issue,Consumer complaint narrative,Company public response,Company,State,ZIP code,Tags,Consumer consent provided?,Submitted via,Date sent to company,Company response to consumer,Timely response?,Consumer disputed?,Complaint ID
0,01/24/20,Credit card or prepaid card,General-purpose credit card or charge card,Problem with a purchase shown on your statement,Card was charged for something you did not pur...,,Company has responded to the consumer and the ...,"CITIBANK, N.A.",NJ,07302,,Consent not provided,Web,01/24/20,Closed with monetary relief,Yes,,3508199
1,02/12/20,Credit card or prepaid card,General-purpose credit card or charge card,Getting a credit card,Delay in processing application,,Company has responded to the consumer and the ...,"CITIBANK, N.A.",IL,600XX,,Consent not provided,Web,02/12/20,Closed with monetary relief,Yes,,3529728
2,05/21/20,Credit card or prepaid card,Store credit card,Problem with a purchase shown on your statement,Credit card company isn't resolving a dispute ...,,Company has responded to the consumer and the ...,"CITIBANK, N.A.",FL,33020,,Consent not provided,Web,05/21/20,Closed with monetary relief,Yes,,3661785
3,05/18/20,Debt collection,Credit card debt,Written notification about debt,Didn't receive notice of right to dispute,Company has wrong information on me and thus n...,Company has responded to the consumer and the ...,"CITIBANK, N.A.",CA,935XX,,Consent provided,Web,05/18/20,Closed with explanation,Yes,,3657603
4,05/21/20,Credit card or prepaid card,General-purpose credit card or charge card,"Other features, terms, or problems",Other problem,,Company has responded to the consumer and the ...,"CITIBANK, N.A.",FL,328XX,Older American,Other,Web,05/21/20,Closed with explanation,Yes,,3661714


### 1.1 Work with columns and rows
We can perform a variety of actions on the columns and rows of the data frame.
For example, if there is a column we wish to drop, we use the `df.drop()` method.
(Ignore errors for missing keys so that this is not dependent on a particuluar data set)




In [3]:
df = df.drop(columns=['Submitted via'], axis=1, errors='ignore')

We can print the first n lines using `df.head(n)`

In [4]:
df.head(5)

Unnamed: 0,Date received,Product,Sub-product,Issue,Sub-issue,Consumer complaint narrative,Company public response,Company,State,ZIP code,Tags,Consumer consent provided?,Date sent to company,Company response to consumer,Timely response?,Consumer disputed?,Complaint ID
0,01/24/20,Credit card or prepaid card,General-purpose credit card or charge card,Problem with a purchase shown on your statement,Card was charged for something you did not pur...,,Company has responded to the consumer and the ...,"CITIBANK, N.A.",NJ,07302,,Consent not provided,01/24/20,Closed with monetary relief,Yes,,3508199
1,02/12/20,Credit card or prepaid card,General-purpose credit card or charge card,Getting a credit card,Delay in processing application,,Company has responded to the consumer and the ...,"CITIBANK, N.A.",IL,600XX,,Consent not provided,02/12/20,Closed with monetary relief,Yes,,3529728
2,05/21/20,Credit card or prepaid card,Store credit card,Problem with a purchase shown on your statement,Credit card company isn't resolving a dispute ...,,Company has responded to the consumer and the ...,"CITIBANK, N.A.",FL,33020,,Consent not provided,05/21/20,Closed with monetary relief,Yes,,3661785
3,05/18/20,Debt collection,Credit card debt,Written notification about debt,Didn't receive notice of right to dispute,Company has wrong information on me and thus n...,Company has responded to the consumer and the ...,"CITIBANK, N.A.",CA,935XX,,Consent provided,05/18/20,Closed with explanation,Yes,,3657603
4,05/21/20,Credit card or prepaid card,General-purpose credit card or charge card,"Other features, terms, or problems",Other problem,,Company has responded to the consumer and the ...,"CITIBANK, N.A.",FL,328XX,Older American,Other,05/21/20,Closed with explanation,Yes,,3661714


To access a row, we can use the Pandas method `loc` and index it by the row number

In [5]:
row_1 = df.loc[1]
row_1

Date received                                                            02/12/20
Product                                               Credit card or prepaid card
Sub-product                            General-purpose credit card or charge card
Issue                                                       Getting a credit card
Sub-issue                                         Delay in processing application
Consumer complaint narrative                                                  NaN
Company public response         Company has responded to the consumer and the ...
Company                                                            CITIBANK, N.A.
State                                                                          IL
ZIP code                                                                    600XX
Tags                                                                         None
Consumer consent provided?                                   Consent not provided
Date sent to com

### 1.2 Examine the data types and statistics of the features
When we run `df.info()` we will see the name of the feature (column), the number of entries (rows), whether there are entries (null or non-null), and the data type (string, object, float64, etc)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3835 entries, 0 to 3834
Data columns (total 17 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Date received                 3835 non-null   object 
 1   Product                       3835 non-null   object 
 2   Sub-product                   3835 non-null   object 
 3   Issue                         3835 non-null   object 
 4   Sub-issue                     3835 non-null   object 
 5   Consumer complaint narrative  1974 non-null   object 
 6   Company public response       3835 non-null   object 
 7   Company                       3835 non-null   object 
 8   State                         3835 non-null   object 
 9   ZIP code                      3835 non-null   object 
 10  Tags                          3835 non-null   object 
 11  Consumer consent provided?    3070 non-null   object 
 12  Date sent to company          3835 non-null   object 
 13  Com

We can run `describe()` to get statistics for the columns (features).
Set the `include` parameter to `object`, since default is to describe just the numeric features.
Note that our results will show `NaN` for statistics that are not applicable for our object data if we change to `include = 'all`).

In [7]:
df.describe(include = 'object')

Unnamed: 0,Date received,Product,Sub-product,Issue,Sub-issue,Consumer complaint narrative,Company public response,Company,State,ZIP code,Tags,Consumer consent provided?,Date sent to company,Company response to consumer,Timely response?
count,3835,3835,3835,3835,3835,1974,3835,3835,3835,3835.0,3835.0,3070,3835,3835,3835
unique,170,9,37,56,104,1956,2,1,55,1638.0,4.0,5,170,4,1
top,04/21/20,Credit card or prepaid card,General-purpose credit card or charge card,Problem with a purchase shown on your statement,Credit card company isn't resolving a dispute ...,This particular account situation that is late...,Company has responded to the consumer and the ...,"CITIBANK, N.A.",CA,,,Consent provided,04/21/20,Closed with explanation,Yes
freq,51,2333,1796,783,546,8,3786,3835,726,545.0,3223.0,1974,53,2961,3835
