## Q1: Data Wrangling

In [44]:
import pandas as pd

# Load the dataset into a pandas DataFrame
data = pd.read_csv('dataset.csv')

# Examine the first few rows using the .head() method
print("First few rows of the dataset:")
print(data.head())



First few rows of the dataset:
  Date received                      Product  \
0     3/12/2014                     Mortgage   
1     1/19/2017                 Student loan   
2      4/6/2018  Credit card or prepaid card   
3      6/8/2014                  Credit card   
4     9/13/2014              Debt collection   

                                  Sub-product  \
0                              Other mortgage   
1              Federal student loan servicing   
2  General-purpose credit card or charge card   
3                                         NaN   
4                                 Credit card   

                        Consumer complaint narrative Company public response  \
0                                                NaN                     NaN   
1  When my loan was switched over to Navient i wa...                     NaN   
2  I tried to sign up for a spending monitoring p...                     NaN   
3                                                NaN             

In [45]:
# Check the dimensions of the dataset using the .shape attribute
rows, columns = data.shape
print("\nNumber of rows:", rows)
print("Number of columns:", columns)




Number of rows: 99
Number of columns: 12


In [46]:
# Get an overview of data types and missing values using the .info() method
print("\nOverview of data types and missing values:")
print(data.info())


Overview of data types and missing values:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99 entries, 0 to 98
Data columns (total 12 columns):
 #   Column                        Non-Null Count  Dtype 
---  ------                        --------------  ----- 
 0   Date received                 99 non-null     object
 1   Product                       99 non-null     object
 2   Sub-product                   86 non-null     object
 3   Consumer complaint narrative  16 non-null     object
 4   Company public response       33 non-null     object
 5   Company                       99 non-null     object
 6   State                         98 non-null     object
 7   ZIP code                      89 non-null     object
 8   Submitted via                 99 non-null     object
 9   Company response to consumer  99 non-null     object
 10  Timely response?              99 non-null     object
 11  Consumer disputed?            75 non-null     object
dtypes: object(12)
memory usage: 9.4+ KB


In [47]:
missing_values_sum = data.isnull().sum()
print("Missing values in each column:")
print(missing_values_sum)

Missing values in each column:
Date received                    0
Product                          0
Sub-product                     13
Consumer complaint narrative    83
Company public response         66
Company                          0
State                            1
ZIP code                        10
Submitted via                    0
Company response to consumer     0
Timely response?                 0
Consumer disputed?              24
dtype: int64


In [48]:
# Analysis would be state focused based only

data.drop('ZIP code', axis=1, inplace=True)

data.drop('Timely response?', axis=1, inplace=True)


In [49]:
data['Sub-product'].fillna('Unknown', inplace=True)
data['Consumer disputed?'].fillna('N/A', inplace=True)



In [50]:
data.dropna(subset=['State'], inplace=True)

In [51]:
data['Consumer complaint narrative'].fillna('Not Provided', inplace=True)
data['Company public response'].fillna('Not Provided', inplace=True)

In [52]:
missing_values_sum = data.isnull().sum()
print("Missing values in each column:")
print(missing_values_sum)

Missing values in each column:
Date received                   0
Product                         0
Sub-product                     0
Consumer complaint narrative    0
Company public response         0
Company                         0
State                           0
Submitted via                   0
Company response to consumer    0
Consumer disputed?              0
dtype: int64


## **Q2: Business Questions**

**1. Product Analysis:**
What are the top three products that customers complain about the most? For example, is it credit cards, mortgages, or something else? We want to understand what's bothering customers the most. Are there any specific patterns or trends in consumer complaints for these products?

**2. Demographic Patterns:**
Which states have the highest number of consumer complaints? Are there certain states where specific products or companies receive more complaints?Do people in certain states tend to complain more than people in other states? We're curious if there are states where people are generally happier with their products and services.

**3. Consumer Disputes and Company Response:**
Do consumer disputes vary based on the company's response to the complaint? Are there particular company responses that seem to lead to higher or lower consumer disputes? Does the way a company responds to a complaint make a difference?

**4. Consumer Narratives Sentiment Analysis?**
After performing sentiment analysis on consumer complaint narratives, which products or companies tend to receive more negative feedback? Are there common themes or issues mentioned in these negative narratives?

**5. How do Consumer Complaints Vary by Submission Method?**
Are there any differences in the types of complaints based on how customers submitted them (e.g., web, online, phone)? We're curious if certain submission methods lead to more critical complaints or if they're associated with specific products.


## Machine Learning Methodolgy

**Machine Learning Methodolgy could be used to improve consumer dispute rate, the following steps could be utilized :**

1. Data Preprocessing
2. Feature Engineering
3. Model Selection
4. Model Evaluation
6. Text Analysis (optional)

