<a href="https://www.kaggle.com/code/carneirofernando/exploratory-data-analysis-of-credit-card-data?scriptVersionId=152180257" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<img src="https://upload.wikimedia.org/wikipedia/commons/2/22/721-credit-card.svg" align="center" alt="credcard-banner" width="150">
            
# Exploratory Data Analysis: Credit Cards
*by [Fernando Carneiro](https://www.linkedin.com/in/fernandohcarneiro/)*

---

## Summary
1.   [**Intro**](#intro)
1.   [**Data Analysis**](#da)
1.   [**Conclusion**](#conclusion)

---
<a id='intro'></a>
## 1. Intro

### 1.1 Objective
Credit cards have become an integral part of the modern financial system, do not you agree? In this exploratory data analysis (EDA) project, we will carefully analyze credit card customer data. Our goal is to gain a better understanding of these customers, extracting trends and patterns from the data, which will enable us to extract valuable insights that will help us grow and optimize our business. By making informed decisions, we can increase customer loyalty, increase revenue and improve overall performance.

### 1.2 Data description
The data source of our analysis is a comma-separated values file (CSV) with fictional data to simulate bank accounts informations created by [André Perez](https://www.linkedin.com/in/andremarcosperez/). You can reach this file (credito.csv) at this [link](https://github.com/andre-marcos-perez/ebac-course-utils/tree/main/dataset).

### 1.3 Data columns/variables
The dataset consists of the columns/variables as follows:

#### **age** = customer's age
#### **sex** = customer's gender (F or M)
#### **dependents** = number of customer's dependents
#### **education** = customer's level of education
#### **marital_status** = customer's marital status
#### **anual_salary** = customer's annual salary range
#### **card_type** = customer's card type
#### **qty_products** = number of products purchased in the last 12 months
#### **iteration_12m** = number of interactions/transactions in the last 12 months
#### **inactive_months_12m** = number of months the customer was inactive
#### **credit_limit** = customer's credit limit
#### **transaction_value_12m** = value of transactions in the last 12 months
#### **qty_transactions_12m** = number of transactions in the last 12 months


### 1.4 Tools
For this analysis work we will **only** use two Amazon Web Services (**AWS**), the **S3 and Athena**, the first one is a storage service and the last one is a query service that uses Structured Query Language (SQL) to handle data.

---
<a id='da'></a>
## 2. Data Analysis

### 2.1 Overview
First, let's take a look in our data to better understand  what we have in hands.

#### *How our data looks like? (schema)*

>  **Query:** 

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/01.png?raw=true)

> **Result:** 

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/02.png?raw=true)

Here we have 13 columns, with 5 being categorical and 8 being numerical.

#### *How many rows?*

>  **Query:**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/03.png?raw=true)

> **Result:**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/04.png?raw=true)

In this dataset, we have 10128 rows.

#### *What is probably the most important data?*

We should give a special attention to the numerical data about time of relationship, credit limit and categorical data as card_type and annual salary in search of a relationship and, eventually, an insight.

#### *Is our dataset clean (without blank spaces)?*

>**Query:** 

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/05.png?raw=true)

> **Result**: 

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/06.png?raw=true)

How is it possible to notice that we have a problem here? The query returns 10,127 rows with null values, almost the entire dataset. As you can see in the image above, these values are in two columns: credit_limit and transaction_value_12m.

>**Solution**:

To solve this issue, I made some changes to our table properties and our data source. First, I changed the field delimiter from "," to ";" and the decimal delimiter from "," to ".", eliminating the need for quotes "" and establishing the use of Athena's standard.

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/07.png?raw=true)


Afterward, I updated the table properties to reflect these changes in the data source. This changes are made on AWS Glue Console.

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/08.png?raw=true)

Running the same query as earlier:

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/09.png?raw=true)

We now have 0 rows with null values, as you can see above.

#### *Now let's see the options of categorical data we have*

> **Query**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/10.png?raw=true)

By selecting just one line at a time, we can perform the queries separately

> **Result**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/11.png?raw=true)
 ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/12.png?raw=true)
 ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/13.png?raw=true)
 ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/14.png?raw=true)
 


Even throught table don't have any NULL values, in the first three columns have the NA (Not Applicable) value that don't contribute any way to our analysis

#### *How many 'na' (not applicable) values we have?*

> **Query:**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/15.png?raw=true)

> **Result:**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/16.png?raw=true)

In the third column, as you can see, the table has 7 to 15 percent of 'na' values (not applicable), we can deal with this later if our analysis depends on one of these categorical data.

### 2.2 Analysis
What follows bellow is the analysis of this data in search of patterns that help us obtain insights. I will start by looking for the relationship between customers and their relationship with the bank.

#### *What is the relationship between the time since become a costumer and the card type?*

Before starting deal with the table, I created a view to help us determine the category of the relationship of costumers with the bank. This view will have two columns: months_of_relationship (bigint) and customer_relation (varchar). The last one with three categories of costumers: New, Returning and Old.

> **Query**: 

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/17.png?raw=true)

> **Result:**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/18.png?raw=true)

This visualization will help in the query below. In this query, we see the percenrage number of users of a card type by relationship category.

> **Query**:

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/19.png?raw=true)


> **Result:**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/20.png?raw=true)

With this data we can conclude that:
1. New customers (less than 1 and a half years of relationship) do not have a credit card.
1. The table has the majority of customers as returning customers (between 1.5 years and 3 years of relationship).
1. Most old customers have Platinum cards.

#### *How annual salary is related to some indicators:* 

#### **Query:**
> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/21.png?raw=true)

#### **Result:**
> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/22.png?raw=true)

As shown, the only causal relationship with the annual salary is the credit limit, the average of the products consumed, the inactive months and the iterations are practically the same for all salary ranges.

#### *Let's see the percentage of costumer types*

**Query:**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/23.png?raw=true)

**Result:**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/24.png?raw=true)

The results show that the majority of consumers are neither new nor old; they have between one and a half years and three years of relationship, closely followed by just over a third of consumers consisting of old customers.

#### *How is the distribution of card types?*

**Query:**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/25.png?raw=true)

**Result:**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/26.png?raw=true)

The data reveals a clear distribution of user types, with the majority falling into the blue card category, representing the most common and standard type of customer. In contrast, the platinum card holders, meaning that our most premium customers, account for a mere 0.2% of the total users. Maybe we should have more investments to attract premium category customers.

#### *How much they spend and generate profit:*

**Query:**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/27.png?raw=true)

**Result:**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/28.png?raw=true)

The average transaction values tell us that the more premium the card, the more the customer tends to spend. However, as we saw before, we have a significant difference between the number of customer each card type. The next query will help us to see the total transaction value of each card type.

**Query:**

> ![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/29.png?raw=true)

**Results:**

>![image.png](https://github.com/carneiro-fernando/EBAC/blob/main/assets/Images/EDA_CreditCard/30.png?raw=true)

Now we can see a huge difference between the total values spent on each card type, ranging from 180 thousand on our most premium card (but with few customers) to over 3 millions on our starter card (with lots of customers). As we saw previously, the premium segment has an average credit limit of ~20,000, but only spends half of it, while the standard segment tends to use practically all of its limit.

---
<a id='conclusion'></a>
## 3. Conclusion
Based on the results gathered in this project, the insights and the recommended actions are provided below.

### 3.1 Insights

1. **New customers don't have credit cards:** New customers (less than 1 and a half years of relationship) do not have a credit card.

1. **Majority of customers are returning:** The table has the majority of customers as returning customers (between 1.5 years and 3 years of relationship).

1. **Older ones are premium:** Most old customers have Platinum cards.

1. **Bigger salary, bigger limit:** As shown, the only causal relationship with the annual salary is the credit limit, the average of the products consumed, the inactive months and the iterations are practically the same for all salary ranges.

1. **Few new customers:** The results show that the majority of consumers are neither new nor old; they have between one and a half years and three years of relationship, closely followed by just over a third of consumers consisting of old customers.

1. **Need more Premium:** The data reveals a clear distribution of user types, with the majority falling into the blue card category, representing the most common and standard type of customer. In contrast, the platinum card holders, signifying our most premium customers, account for a mere 0.2% of the total users. Maybe we should have more investments to attract premium category customers.

1. **Huge difference in transactions between customer types:** Average transaction values tell us that the more premium the card, the more the customer tends to spend. However, as we saw previously, we have a significant difference between the number of customers for each type of card, in reality, the sum of the total amounts spent on each type of card shows a huge difference that varies from 180 thousand on our most premium card (with few customers) to more than 3 million on our initial card (with many customers).

1. **Low use of credit on Premium:** The premium segment has an average credit limit of 19 thousand, but only spends half of it, while the standard segment tends to use practically all of its limit.

 ### 3.2 Call to Action


1. **New Customer Engagement:** Start by encouraging new customers to use credit cards since they currently don't have them. Create focused marketing campaigns and offer incentives to promote card adoption in this group.

2. **Customer Loyalty:** Concentrate on retaining your returning and long-standing customers, as they make up a significant part of your customer base. Customize services and engagement strategies to enhance their loyalty.

3. **Attract Premium Customers:** Given that most customers hold blue cards, explore the potential to attract more premium customers. Invest in marketing and services tailored to this group to boost the number of Platinum and Gold cardholders.

4. **Increase Premium Spending:** The premium cardholders tend to not use much limit. Develop plans and promotions aimed at encouraging higher spending among your premium customer segment.

In summary, analyzing this data gives us useful information about our customers and presents opportunities for growth and optimization. We can use this information to make smart choices that make customers loyal, increase revenue, and improve business performance at all. If we keep looking at data and maybe get more of it, we can learn even more. One way to do this could be creating a data dashboard, which would be a great next step to expand this project. Thank you for read my project till here! 