# **Credit EDA**

*My goal is to analyze this data to identify trends and insights.*

## Let's begin: 

This dataset represents information from a bank's customers and has the following columns: 

* idade = idade do cliente
* sexo = sexo do cliente (F ou M)
* dependentes = número de dependentes do cliente
* escolaridade = nível de escolaridade do clientes
* salario_anual = faixa salarial do cliente
* tipo_cartao = tipo de cartao do cliente
* qtd_produtos = quantidade de produtos comprados nos últimos 12 meses
* iteracoes_12m = quantidade de iterações/transacoes nos ultimos 12 meses
* meses_inativo_12m = quantidade de meses que o cliente ficou inativo
* limite_credito = limite de credito do cliente
* valor_transacoes_12m = valor das transações dos ultimos 12 meses
* qtd_transacoes_12m  = quantidade de transacoes dos ultimos 12 meses

The table was created in **AWS Athena** with **S3 Bucket**. A version of the data available at: https://github.com/arthurx-x/Credit-EDA




## **Project Structure:**

1. Data Exploration;
    * Questions;
    * Info;
2. Data Investigation;
    * Hypothesis;
    * Info;
3. EDA Results;
    * Conclusions;
4. Recommendations;
5. Future Works:

 ## **1. Data exploration:**

**To start, I'd want to get a high-level view of the data by running some summary SQL queries:**

In [None]:
SELECT * FROM credito LIMIT 10;

![](https://github.com/arthurx-x/Credit-EDA/blob/main/QALL.png?raw=true)

> As we can see the database has presented some NA values. For the scope of this EDA i opted to filter out these values. Let's have a look at the ranges of the data:

**Let's have a look at the ranges of the data:**

In [None]:
SELECT MIN(idade) AS min_age, 
       MAX(idade) AS max_age,
       MIN(limite_credito) AS min_credit_limit,
       MAX(limite_credito) AS max_credit_limit
FROM credito;

![](https://github.com/arthurx-x/Credit-EDA/blob/main/q1.png?raw=true)

> Info: Age goes from 26 to 73 with credit limit ranging from 1438 to 34516

**Now let's see how the distribution is presented across sex:**

In [None]:
SELECT sexo, COUNT(*) AS num_customers
FROM credito
GROUP BY sexo;

![](https://github.com/arthurx-x/Credit-EDA/blob/main/q3.png?raw=true)

> Info: 61% of customers are Male and 39% female

**Right, what about the marital status of the customers;**

In [None]:
SELECT estado_civil, COUNT(*) AS num_customers 
FROM credito 
WHERE estado_civil IS NOT NULL 
GROUP BY estado_civil;

![](https://github.com/arthurx-x/Credit-EDA/blob/main/q4.png?raw=true)

> Info: Married customers represent more than 2X singles and divorced people combined

## **2. Data Investigation:**

Now I'd like to explore some hypothesis about trends in this customer data:

## Hypothesis 1: Older customers have higher average credit limits than younger customers.

In [None]:
SELECT AVG(limite_credito) AS avg_limit,
FLOOR(idade/10)*10 AS age_group
FROM credito
GROUP BY FLOOR(idade/10)*10
ORDER BY age_group;

![](https://github.com/arthurx-x/Credit-EDA/blob/main/q5.png?raw=true)

> Info: Hypothesis 1 not supported: Data shows that costumers on their 20's and 30's have the highest avg_limit and shows a concistent trend of decrease after the 30's

## Hypothesis 2: Customers with higher education levels have higher credit limits on average.

In [None]:
SELECT AVG(limite_credito) AS avg_limit,
       escolaridade AS education
FROM credito
GROUP BY escolaridade
ORDER BY avg_limit DESC;
-- This groups by education level and finds the average credit limit per group. 
-- We see the highest average limits among those with advanced degrees like Masters/PhD, supporting the hypothesis.

![](https://github.com/arthurx-x/Credit-EDA/blob/main/q6.png?raw=true)

> Info: Hypothesis 2 not supported: Data shows that the higher the education the lower the avg_limit

## Hypothesis 3: Higher income customers tend to have higher credit card spending.

In [None]:
SELECT AVG(valor_transacoes_12m) AS avg_spend, 
       salario_anual AS income_group
FROM credito
WHERE salario_anual != 'na' 
GROUP BY salario_anual
ORDER BY avg_spend DESC;

![](https://github.com/arthurx-x/Credit-EDA/blob/main/q8.png?raw=true)

> Info: Hypothesis 3 not supported: Data shows that the higher the income_group, the lower the avg_spend

## **Examining relations and providing recommendations:**
Querying the data and testing some hypothesis made possible to uncover insights like how demographics relate to credit limits and spending. 

As a next step, i would like to further analyse how credit relates to sex, marital status, product type, usage by age group and usage by customer maturity. 

After that i will compile the findings into recommendations on how this credit company could better understand and serve different customers.

**Now, how customer's sex relates to credit limit:**

In [None]:
SELECT sexo, 
       ROUND(AVG(limite_credito),2) AS avg_limit 
FROM credito
GROUP BY sexo
ORDER BY avg_limit DESC;

![](https://github.com/arthurx-x/Credit-EDA/blob/main/q9.png?raw=true)

> Info: Male customers have double avg_limit wcompared to female customers

**Which marital status used more credit:**

In [None]:
SELECT estado_civil,
       ROUND(AVG(valor_transacoes_12m) / AVG(limite_credito), 2) AS util_ratio
FROM credito
GROUP BY estado_civil
ORDER BY util_ratio DESC;

![](https://github.com/arthurx-x/Credit-EDA/blob/main/q11.png?raw=true)

> Info: We see that married customers have the highest utilization ratio


**Identifying top spending products:**

In [None]:
SELECT tipo_cartao, COUNT(*) AS num_customers
FROM credito
GROUP BY tipo_cartao
ORDER BY num_customers DESC;

![](https://github.com/arthurx-x/Credit-EDA/blob/main/q12.png?raw=true)

> Info: This shows the most popular credit card types among customers. Note that the most popular product is not the most profitable per costumer


**What about the average inactive months by age group:**

In [None]:
SELECT FLOOR(idade/10)*10 AS age_group,
       AVG(meses_inativo_12m) AS avg_inactive_months
FROM credito
GROUP BY age_group
ORDER BY age_group;

![](https://github.com/arthurx-x/Credit-EDA/blob/main/q13.png?raw=true)

> Info: This shows a not too significant difference between the avg_inactive_months by user group with the lowest being the 20's with 2.22 months and the higher being 70's with 2.5 months

**Last thing, we will compare the average transactions for new vs existing customers:**

In [None]:
SELECT CASE WHEN iteracoes_12m = 0 
            THEN 'New' 
            ELSE 'Existing' 
            END AS customer_type,
      AVG(qtd_transacoes_12m) AS avg_transactions
FROM credito
GROUP BY CASE WHEN iteracoes_12m = 0 
            THEN 'New' 
            ELSE 'Existing' 
            END
ORDER BY avg_transactions DESC;

![](https://github.com/arthurx-x/Credit-EDA/blob/main/q14.png?raw=true)

> Info: Existing customers have more transactions than new ones. 4.47 more on average.

 ## **3. EDA Results:**
The conclusions of this Credit EDA are:


**Age and Credit Limit:**
>Age ranges from 26 to 73, and credit limits vary from 1438 to 34516.
Customers in their 20s and 30s have the highest average credit limit, with a consistent decrease afterward.

**Gender Distribution:**
>61% of customers are male, while 39% are female.

**Marital Status:**
>Married customers represent over 2X the number of singles and divorced individuals combined.

**Age and Spending Trends:**
>There is a consistent decrease in average credit limits after customers' 30s.

**Education and Credit Limit:**
>Higher education is associated with lower average credit limits.

**Income Group and Spending:**
>Higher income groups correlate with lower average spending.

**Gender and Credit Limit:**
>Male customers have double the average credit limit compared to female customers.

**Utilization Ratio:**
>Married customers exhibit the highest utilization ratio.

**Popular Credit Card Types:**
>Identified the most popular credit card types, noting that popularity doesn't necessarily align with profitability.

**Inactivity:**
>Minor differences in average inactive months between age groups, with the lowest in the 20s (2.22 months) and the highest in the 70s (2.5 months).

**Transaction Analysis:**
>Existing customers engage in 4.47 more transactions on average than new customers.

 ## **4. Recommendations:**
 
Based on the results of this EDA, some actions can be proposed:

**Targeted Marketing:**
>Tailor marketing strategies towards younger demographics for higher credit limits.

**Gender-Specific Campaigns:**
>Consider gender-specific promotions based on the observed credit limit disparities.

**Educational Program:**
>Develop educational initiatives to inform customers about credit management, particularly for those with higher education.

**Income-Driven Products:**
>Introduce credit products aligned with different income groups to encourage spending.

**Utilization Optimization:**
>Explore strategies to optimize utilization ratios, especially among married customers.

 ## **5. Future Works:**
 
 Some future reviews and improvements could be as the following:

**Behavioral Analysis:**
>Conduct a more in-depth analysis of customer spending behavior over time.

**Market Segmentation:**
>Explore further segmentation based on additional factors to refine targeting strategies.

**Profitability Assessment:**
>Evaluate the profitability of each credit card type to align offerings with customer preferences.

**Customer Retention Strategies:**
>Investigate strategies to enhance customer retention, given the observed disparity in transactions between existing and new customers.

**Dynamic Credit Limit Adjustments:**
>Investigate the possibility of dynamic credit limit adjustments based on evolving customer behavior and financial situations.

*These recommendations and future investigation suggestions aim to enhance understanding and guide strategic decisions for improved business outcomes.*