<div style="text-align: center; margin-top: 50px; margin-bottom: 30px;">
    <h1 style="font-size: 3em; color:#f54242;">Payments Data EDA</h1>
    
</div>


## For this EDA I will still be using: Google Analytics Steps

<ol style="font-size: 1.2em; color: #F3CF1D; line-height: 1.5;">
  <li><strong>Ask</strong> - Define the questions you want to answer.</li>
  <li><strong>Prepare</strong> - Collect and prepare your data for analysis.</li>
  <li><strong>Process</strong> - Clean and transform the data to ensure accuracy.</li>
  <li><strong>Analyze</strong> - Examine the data to find insights and patterns.</li>
  <li><strong>Share</strong> - Present your findings to stakeholders.</li>
  <li><strong>Act</strong> - Make decisions and take actions based on the analysis.</li>
</ol>

## 1. Ask

**Core Question:** Which features best predict how long a client will take to pay a receivable?

---

### Key EDA Questions

#### Target Variable
1. What is the distribution of payment delay (days between `due_date` and `settled_at`)?
2. What percentage of assets are paid early, on-time, or late?
3. How many assets remain unsettled (`settled_at` is NULL)?

#### Feature Relationships
4. Do higher `face_value` amounts take longer to get paid?
5. Do companies with higher `quod_score` pay faster?
6. Which industries (`main_cnae`) have the best/worst payment behavior?
7. Does `company_size` affect payment speed?

#### Data Coverage & Quality
8. What percentage of buyers/sellers have matching records in `df_company_data` and `df_quod`?
9. What is the missing value rate per column across all datasets?
10. Are there repeat buyers? Does their historical payment behavior predict future payments?


Data Glossary: 
- df_assets.parquet: Asset-level data including payment, due dates and face value. Each asset represents a installment or payment to be paid, which is linked to a invoice/order.
- df_company_data.parquet: Company basic characteristics.
- df_quod.parquet: Bureau Credit signals (periodically refreshed).
- glossary.xlsx: Dictionary containing the descriptions of each column in the above datasets.

In [1]:
import pandas as pd 


In [3]:
df_assets = pd.read_parquet('../data/raw/df_assets.parquet')
df_company_data = pd.read_parquet('../data/raw/df_company_data.parquet')
df_quod = pd.read_parquet('../data/raw/df_quod.parquet')

In [4]:
df_assets.head()

Unnamed: 0,asset_id,invoice_id,face_value,buyer_tax_id,seller_tax_id,maturity_date,due_date,settled_at,created_at,reference_date
0,5c764ad8-59dc-e30a-9d73-b051582605fb,9c3673c7cbbddde1,534.41,40199547791118,60259422305974,2025-04-10,2025-04-10,2025-04-06,2025-03-13,2025-04-15
1,b947a6c4-6e17-a418-091d-c04952f85047,23ba3b2f6e5570da,165.14,40519406990678,925347588806,2024-12-24,2024-12-24,2024-12-22,2024-12-03,2025-03-19
2,a314edf1-9bf6-6d6c-bca5-55d1349b19d5,c2d4b28ddd0da663,432.55,38842652921056,14133781557079,2024-02-02,2024-02-02,2024-02-01,2024-01-26,2025-03-20
3,c9877d94-5168-849f-42f7-3bf4ef46bf31,2e6921011ecf24f1,102.6,54929917112999,14133781557079,2023-03-08,2023-03-08,2023-03-07,2023-02-28,2025-03-20
4,3ee1a83eb0ba837d,f32ba85be54f4160,6739.7,51211364716527,87924474238085,2024-08-08,2024-08-08,2024-08-06,2024-06-24,2025-02-03


In [5]:
df_company_data.head()

Unnamed: 0,tax_id,company_status,company_status_date,company_creation_date,company_size,main_cnae,main_cnae_description,secondary_cnae_array,legal_nature,is_mei,city,state,zipcode
7,87385337719617,Ativa,2006-06-07,2006-06-07,Demais,2621300,Fabricação de equipamentos de informática,"[{""classe"":""26.22-1"",""descricao"":""Fabricação d...",Sociedade Empresária Limitada,False,João Pessoa,Paraíba,58028870
9,48307533985873,Ativa,2016-09-29,2016-09-29,Demais,4639701,Comércio atacadista de produtos alimentícios e...,"[{""classe"":""46.49-4"",""descricao"":""Comércio ata...",Sociedade Empresária Limitada,False,Campo Grande,Mato Grosso do Sul,79009030
10,2003411654784,Ativa,2011-03-25,2011-03-25,Demais,4651602,Comércio atacadista de suprimentos para inform...,"[{""classe"":""46.47-8"",""descricao"":""Comércio ata...",Sociedade Simples Limitada,False,Rio de Janeiro,Rio de Janeiro,21032900
12,31629860033682,Ativa,2015-10-08,2015-10-08,Micro Empresa,4651601,Comércio atacadista de equipamentos de informá...,"[{""classe"":""47.51-2"",""descricao"":""Comércio var...",Sociedade Empresária Limitada,False,Belo Horizonte,Minas Gerais,30160040
13,85030704409437,Ativa,2022-02-28,2000-07-13,Demais,4651601,Comércio atacadista de equipamentos de informá...,"[{""classe"":""46.51-6"",""descricao"":""Comércio ata...",Sociedade Empresária Limitada,False,São José dos Campos,São Paulo,12231820


In [7]:
df_quod.head()

Unnamed: 0,tax_id,quod_score,presumed_revenue,created_at
0,43778071459800,578.0,2596484.57,2024-12-18 05:43:53.417114
1,59073051153497,743.0,2549628.85,2025-07-14 15:19:55.763181
2,6947646827624,448.0,54668.1,2025-02-14 21:35:02.724279
4,35820924677294,890.0,55824.92,2025-02-13 15:38:28.069201
5,90609740460747,675.0,59310.96,2025-02-14 18:36:39.512346
