Managing condominiums requires a lot of attention and organization. One key task is collecting rent from tenants. To ensure financial stability, payments must be made on time. This notebook presents a data challenge to analyze rental payment delays in a fictional condominium.

- Imports pandas for data handling and numpy for numerical operations.

In [43]:
import pandas as pd
import numpy as np

- Reads a JSON file containing rental data and displays its content in a DataFrame.

In [44]:
dados = pd.read_json('/content/dados_locacao_imoveis.json')
dados

Unnamed: 0,dados_locacao
0,"{'apartamento': 'A101 (blocoAP)', 'datas_combi..."
1,"{'apartamento': 'A102 (blocoAP)', 'datas_combi..."
2,"{'apartamento': 'B201 (blocoAP)', 'datas_combi..."
3,"{'apartamento': 'B202 (blocoAP)', 'datas_combi..."
4,"{'apartamento': 'C301 (blocoAP)', 'datas_combi..."
5,"{'apartamento': 'C302 (blocoAP)', 'datas_combi..."
6,"{'apartamento': 'D401 (blocoAP)', 'datas_combi..."
7,"{'apartamento': 'D402 (blocoAP)', 'datas_combi..."
8,"{'apartamento': 'E501 (blocoAP)', 'datas_combi..."
9,"{'apartamento': 'E502 (blocoAP)', 'datas_combi..."


- Normalizes the nested structure in the 'dados_locacao' key into flat tabular format.

In [45]:
dados = pd.json_normalize(dados['dados_locacao'])
dados

Unnamed: 0,apartamento,datas_combinadas_pagamento,datas_de_pagamento,valor_aluguel
0,A101 (blocoAP),"[01/06/2022, 01/07/2022]","[05/06/2022, 03/07/2022]","[$ 1000,0 reais, $ 2500,0 reais]"
1,A102 (blocoAP),"[02/06/2022, 02/07/2022]","[02/06/2022, 06/07/2022]","[$ 1100,0 reais, $ 2600,0 reais]"
2,B201 (blocoAP),"[03/06/2022, 03/07/2022]","[07/06/2022, 03/07/2022]","[$ 1200,0 reais, $ 2700,0 reais]"
3,B202 (blocoAP),"[04/06/2022, 04/07/2022]","[07/06/2022, 05/07/2022]","[$ 1300,0 reais, $ 2800,0 reais]"
4,C301 (blocoAP),"[05/06/2022, 05/07/2022]","[10/06/2022, 09/07/2022]","[$ 1400,0 reais, $ 2900,0 reais]"
5,C302 (blocoAP),"[06/06/2022, 06/07/2022]","[08/06/2022, 12/07/2022]","[$ 1500,0 reais, $ 1200,0 reais]"
6,D401 (blocoAP),"[07/06/2022, 07/07/2022]","[07/06/2022, 09/07/2022]","[$ 1600,0 reais, $ 1300,0 reais]"
7,D402 (blocoAP),"[08/06/2022, 08/07/2022]","[10/06/2022, 14/07/2022]","[$ 1700,0 reais, $ 1400,0 reais]"
8,E501 (blocoAP),"[09/06/2022, 09/07/2022]","[10/06/2022, 09/07/2022]","[$ 1800,0 reais, $ 1500,0 reais]"
9,E502 (blocoAP),"[10/06/2022, 10/07/2022]","[16/06/2022, 12/07/2022]","[$ 1900,0 reais, $ 1600,0 reais]"


- Stores the column names of the DataFrame into the variable cols and prints them.

In [46]:
cols = list(dados.columns)
cols

['apartamento',
 'datas_combinadas_pagamento',
 'datas_de_pagamento',
 'valor_aluguel']

- "Explodes" all list-type columns except the first one (apartamento) — each list item becomes its own row.

In [47]:
dados = dados.explode(cols[1:])
dados

Unnamed: 0,apartamento,datas_combinadas_pagamento,datas_de_pagamento,valor_aluguel
0,A101 (blocoAP),01/06/2022,05/06/2022,"$ 1000,0 reais"
0,A101 (blocoAP),01/07/2022,03/07/2022,"$ 2500,0 reais"
1,A102 (blocoAP),02/06/2022,02/06/2022,"$ 1100,0 reais"
1,A102 (blocoAP),02/07/2022,06/07/2022,"$ 2600,0 reais"
2,B201 (blocoAP),03/06/2022,07/06/2022,"$ 1200,0 reais"
2,B201 (blocoAP),03/07/2022,03/07/2022,"$ 2700,0 reais"
3,B202 (blocoAP),04/06/2022,07/06/2022,"$ 1300,0 reais"
3,B202 (blocoAP),04/07/2022,05/07/2022,"$ 2800,0 reais"
4,C301 (blocoAP),05/06/2022,10/06/2022,"$ 1400,0 reais"
4,C301 (blocoAP),05/07/2022,09/07/2022,"$ 2900,0 reais"


- Displays structure and data types of the DataFrame — showing 30 rows with 4 object-type columns.

In [48]:
dados.info()

<class 'pandas.core.frame.DataFrame'>
Index: 30 entries, 0 to 14
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   apartamento                 30 non-null     object
 1   datas_combinadas_pagamento  30 non-null     object
 2   datas_de_pagamento          30 non-null     object
 3   valor_aluguel               30 non-null     object
dtypes: object(4)
memory usage: 1.2+ KB


- Cleans the valor_aluguel column by removing currency symbols and converting to proper decimal format.

In [49]:
dados['valor_aluguel'] = dados['valor_aluguel'].apply(lambda x: x.replace('$', '').replace(',', '.').replace('reais', '').strip())
dados

Unnamed: 0,apartamento,datas_combinadas_pagamento,datas_de_pagamento,valor_aluguel
0,A101 (blocoAP),01/06/2022,05/06/2022,1000.0
0,A101 (blocoAP),01/07/2022,03/07/2022,2500.0
1,A102 (blocoAP),02/06/2022,02/06/2022,1100.0
1,A102 (blocoAP),02/07/2022,06/07/2022,2600.0
2,B201 (blocoAP),03/06/2022,07/06/2022,1200.0
2,B201 (blocoAP),03/07/2022,03/07/2022,2700.0
3,B202 (blocoAP),04/06/2022,07/06/2022,1300.0
3,B202 (blocoAP),04/07/2022,05/07/2022,2800.0
4,C301 (blocoAP),05/06/2022,10/06/2022,1400.0
4,C301 (blocoAP),05/07/2022,09/07/2022,2900.0


- Resets the DataFrame index after transformation, dropping the old index.

In [50]:
dados.reset_index(inplace = True, drop = True)

- Converts the cleaned valor_aluguel column to float (32-bit) for numerical analysis.

In [51]:
dados['valor_aluguel'] = dados['valor_aluguel'].astype(np.float32)
dados

Unnamed: 0,apartamento,datas_combinadas_pagamento,datas_de_pagamento,valor_aluguel
0,A101 (blocoAP),01/06/2022,05/06/2022,1000.0
1,A101 (blocoAP),01/07/2022,03/07/2022,2500.0
2,A102 (blocoAP),02/06/2022,02/06/2022,1100.0
3,A102 (blocoAP),02/07/2022,06/07/2022,2600.0
4,B201 (blocoAP),03/06/2022,07/06/2022,1200.0
5,B201 (blocoAP),03/07/2022,03/07/2022,2700.0
6,B202 (blocoAP),04/06/2022,07/06/2022,1300.0
7,B202 (blocoAP),04/07/2022,05/07/2022,2800.0
8,C301 (blocoAP),05/06/2022,10/06/2022,1400.0
9,C301 (blocoAP),05/07/2022,09/07/2022,2900.0


- Shows updated DataFrame info — now the valor_aluguel is a float.

In [52]:
dados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 4 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   apartamento                 30 non-null     object 
 1   datas_combinadas_pagamento  30 non-null     object 
 2   datas_de_pagamento          30 non-null     object 
 3   valor_aluguel               30 non-null     float32
dtypes: float32(1), object(3)
memory usage: 972.0+ bytes


- Removes the substring "(blocoAP)" from the apartamento column using regular expressions.

In [53]:
dados['apartamento'] = dados['apartamento'].str.replace('\(blocoAP\)', '', regex = True)

- Displays the first 5 rows of the cleaned DataFrame.

In [54]:
dados.head()

Unnamed: 0,apartamento,datas_combinadas_pagamento,datas_de_pagamento,valor_aluguel
0,A101,01/06/2022,05/06/2022,1000.0
1,A101,01/07/2022,03/07/2022,2500.0
2,A102,02/06/2022,02/06/2022,1100.0
3,A102,02/07/2022,06/07/2022,2600.0
4,B201,03/06/2022,07/06/2022,1200.0


- Converts both date columns data_combinadas_pagamento and datas_de_pagamento (scheduled and actual payment dates) from strings to datetime objects.

In [55]:
dados['datas_combinadas_pagamento'] = pd.to_datetime(dados['datas_combinadas_pagamento'], format='%d/%m/%Y')
dados['datas_de_pagamento'] = pd.to_datetime(dados['datas_de_pagamento'], format='%d/%m/%Y')

- Shows the first few rows again, now with proper datetime columns.

In [56]:
dados.head()

Unnamed: 0,apartamento,datas_combinadas_pagamento,datas_de_pagamento,valor_aluguel
0,A101,2022-06-01,2022-06-05,1000.0
1,A101,2022-07-01,2022-07-03,2500.0
2,A102,2022-06-02,2022-06-02,1100.0
3,A102,2022-07-02,2022-07-06,2600.0
4,B201,2022-06-03,2022-06-07,1200.0


- Creates a new column that calculates how many days early or late the rent was paid (negative = paid late).

In [57]:
dados['Days Late on Payment'] = (dados['datas_combinadas_pagamento'] - dados['datas_de_pagamento']).dt.days
dados

Unnamed: 0,apartamento,datas_combinadas_pagamento,datas_de_pagamento,valor_aluguel,Days Late on Payment
0,A101,2022-06-01,2022-06-05,1000.0,-4 days
1,A101,2022-07-01,2022-07-03,2500.0,-2 days
2,A102,2022-06-02,2022-06-02,1100.0,0 days
3,A102,2022-07-02,2022-07-06,2600.0,-4 days
4,B201,2022-06-03,2022-06-07,1200.0,-4 days
5,B201,2022-07-03,2022-07-03,2700.0,0 days
6,B202,2022-06-04,2022-06-07,1300.0,-3 days
7,B202,2022-07-04,2022-07-05,2800.0,-1 days
8,C301,2022-06-05,2022-06-10,1400.0,-5 days
9,C301,2022-07-05,2022-07-09,2900.0,-4 days


- Groups the data by apartment and calculates the average number of days late per tenant.

In [58]:
media_atraso = dados.groupby(['apartamento'])['Days Late on Payment'].mean()
media_atraso

Unnamed: 0_level_0,Days Late on Payment
apartamento,Unnamed: 1_level_1
A101,-3 days +00:00:00
A102,-2 days +00:00:00
B201,-2 days +00:00:00
B202,-2 days +00:00:00
C301,-5 days +12:00:00
C302,-4 days +00:00:00
D401,-1 days +00:00:00
D402,-4 days +00:00:00
E501,-1 days +12:00:00
E502,-4 days +00:00:00
