# PicPay – Case Técnico: Receita Financeira

Este notebook em Databricks apresenta a solução para o case técnico de receita financeira enviado pelo PicPay. O objetivo é aplicar regras de negócio sobre transações financeiras, com foco no cálculo de receitas provenientes de:

- P2P (pagamentos entre usuários via cartão de crédito)
- BILLS (pagamento de boletos, com ou sem parcelamento)


O pipeline organiza as etapas de ingestão, tratamento e geração de dados analíticos em camadas, utilizando recursos nativos da plataforma Azure.

- **Bronze**: ingestão dos dados brutos no formato original (CSV)
- **Silver**: aplicação de regras de negócio e normalização dos dados
- **Gold**: geração da tabela de installments detalhadas (Tabela Price) com juros compostos



Autenticação com o Azure Data Lake Storage (ADLS Gen2)
A leitura e escrita dos dados neste notebook é feita diretamente no Azure Data Lake Storage Gen2.
Para garantir segurança e evitar exposição de credenciais, foi utilizado um Secret Scope (picpay_scope) configurado no Azure Databricks.

A chave da Storage Account foi armazenada no Databricks por meio da CLI, e acessada de forma segura utilizando a função dbutils.secrets.get().

In [0]:
spark.conf.set(
    "fs.azure.account.key.stcensodados.dfs.core.windows.net",
    dbutils.secrets.get(scope="picpay_scope", key="stcensodados_key")
)

Leitura dos dados brutos (Camada Bronze)
Nesta etapa, os dados da camada Bronze são lidos diretamente do Data Lake.

O arquivo transactions.csv foi carregado via updload para a camada bronze (raw) no adls.

sep = ';'

In [0]:
bronze_path = "abfss://raw@stcensodados.dfs.core.windows.net/picpay_analytics_case_receitas/transactions.csv"

df_raw = spark.read.option("header", True).option("sep", ";").csv(bronze_path)

display(df_raw)

transaction_id,transaction_date,transaction_type,transaction_value,receiver_used_cc_limit,payment_method,installments,p2p_surcharge_rate,bills_surcharge_rate,installment_rate
1,03/01/2021,P2P,400,600,Credit card,12,1.99,2.99,3.49
2,14/09/2021,BILLS,650,300,Credit card,5,1.99,2.99,3.49
3,20/07/2021,BILLS,1200,0,Credit card,8,1.99,2.99,3.49
4,06/08/2021,P2P,350,800,Credit card,9,1.99,2.99,3.49
5,13/04/2021,P2P,3500,0,Credit card,10,1.99,2.99,3.49
6,24/05/2021,P2P,3420,0,Credit card,7,1.99,2.99,3.49
7,31/03/2021,BILLS,5000,0,Credit card,12,1.99,2.99,3.49
8,30/08/2021,P2P,2800,0,Credit card,12,1.99,2.99,3.49
9,28/08/2021,BILLS,6000,0,Credit card,4,1.99,2.99,3.49
10,27/09/2021,P2P,8000,0,Credit card,6,1.99,2.99,3.49


- Análise exploratória inicial
Antes da fase de transformação, foram inspecionados alguns pontos, tais como a estrutura do dataset, tipos de dados incorretos, campos nulos e avaliar a consistência dos valores.

In [0]:
df_raw.printSchema()

print(f"Total de linhas: {df_raw.count()} | Total de colunas: {len(df_raw.columns)}")


root
 |-- transaction_id: string (nullable = true)
 |-- transaction_date: string (nullable = true)
 |-- transaction_type: string (nullable = true)
 |-- transaction_value: string (nullable = true)
 |-- receiver_used_cc_limit: string (nullable = true)
 |-- payment_method: string (nullable = true)
 |-- installments: string (nullable = true)
 |-- p2p_surcharge_rate: string (nullable = true)
 |-- bills_surcharge_rate: string (nullable = true)
 |-- installment_rate: string (nullable = true)

Total de linhas: 10 | Total de colunas: 10


In [0]:
# Estatísticas básicas
df_raw.describe(["transaction_value", "receiver_used_cc_limit"]).show()


+-------+-----------------+----------------------+
|summary|transaction_value|receiver_used_cc_limit|
+-------+-----------------+----------------------+
|  count|               10|                    10|
|   mean|           3132.0|                 170.0|
| stddev|2599.939315531123|      298.328677803526|
|    min|             1200|                     0|
|    max|             8000|                   800|
+-------+-----------------+----------------------+



Camada Silver (processed): Conversão e Normalização dos Dados

Nesta etapa, os dados da camada Bronze são padronizados para facilitar a aplicação das regras de negócio e análise posterior.

In [0]:
from pyspark.sql.functions import col, to_date, trim

df_silver = df_raw \
    .filter(col("transaction_type") != "") \
    .withColumn("transaction_date", to_date("transaction_date", "dd/MM/yyyy")) \
    .withColumn("transaction_value", col("transaction_value").cast("double")) \
    .withColumn("receiver_used_cc_limit", col("receiver_used_cc_limit").cast("double")) \
    .withColumn("installments", col("installments").cast("int")) \
    .withColumn("p2p_surcharge_rate", col("p2p_surcharge_rate").cast("double") / 100) \
    .withColumn("bills_surcharge_rate", col("bills_surcharge_rate").cast("double") / 100) \
    .withColumn("installment_rate", col("installment_rate").cast("double") / 100) \
    .withColumn("transaction_type", trim(col("transaction_type"))) \
    .withColumn("payment_method", trim(col("payment_method")))

silver_path = "abfss://processed@stcensodados.dfs.core.windows.net/picpay_analytics_case_receitas/transactions_silver"
df_silver.write.format("delta").mode("overwrite").save(silver_path)

df_silver.write \
    .option("header", True) \
    .mode("overwrite") \
    .csv("abfss://processed@stcensodados.dfs.core.windows.net/"
         "picpay_analytics_case_receitas/transctions_processed.csv")

display(df_silver)

transaction_id,transaction_date,transaction_type,transaction_value,receiver_used_cc_limit,payment_method,installments,p2p_surcharge_rate,bills_surcharge_rate,installment_rate
1,2021-01-03,P2P,400.0,600.0,Credit card,12,0.0199,0.0299,0.0349
2,2021-09-14,BILLS,650.0,300.0,Credit card,5,0.0199,0.0299,0.0349
3,2021-07-20,BILLS,1200.0,0.0,Credit card,8,0.0199,0.0299,0.0349
4,2021-08-06,P2P,350.0,800.0,Credit card,9,0.0199,0.0299,0.0349
5,2021-04-13,P2P,3500.0,0.0,Credit card,10,0.0199,0.0299,0.0349
6,2021-05-24,P2P,3420.0,0.0,Credit card,7,0.0199,0.0299,0.0349
7,2021-03-31,BILLS,5000.0,0.0,Credit card,12,0.0199,0.0299,0.0349
8,2021-08-30,P2P,2800.0,0.0,Credit card,12,0.0199,0.0299,0.0349
9,2021-08-28,BILLS,6000.0,0.0,Credit card,4,0.0199,0.0299,0.0349
10,2021-09-27,P2P,8000.0,0.0,Credit card,6,0.0199,0.0299,0.0349


Valor da taxa adicional para transações P2P

Esta etapa aplica a regra de negócio definida para transações do tipo P2P, onde é cobrada uma taxa de 1,99% sobre o valor excedente a R$800, por recebedor, considerando o total de recebimentos via cartão de crédito no mês.

O resultado será armazenado em uma nova coluna chamada `p2p_surcharge`.


In [0]:
from pyspark.sql import functions as F
from pyspark.sql import Window

LIMITE_P2P = 800

df_silver = df_silver.withColumn(
    "p2p_surcharge",
    F.round(
      F.when(
        (F.col("transaction_type") == "P2P") &
        (F.col("payment_method")    == "Credit card"),
        F.least(
          F.greatest(
            F.col("receiver_used_cc_limit") + F.col("transaction_value") - LIMITE_P2P,
            F.lit(0.0)
          ),
          F.col("transaction_value")
        )
        * F.col("p2p_surcharge_rate")
      )
      .otherwise(0.0),
      2
    )
)

display(df_silver.select(
  "transaction_id","receiver_used_cc_limit","transaction_value",
  "p2p_surcharge_rate","p2p_surcharge"
))


transaction_id,receiver_used_cc_limit,transaction_value,p2p_surcharge_rate,p2p_surcharge
1,600.0,400.0,0.0199,3.98
2,300.0,650.0,0.0199,0.0
3,0.0,1200.0,0.0199,0.0
4,800.0,350.0,0.0199,6.97
5,0.0,3500.0,0.0199,53.73
6,0.0,3420.0,0.0199,52.14
7,0.0,5000.0,0.0199,0.0
8,0.0,2800.0,0.0199,39.8
9,0.0,6000.0,0.0199,0.0
10,0.0,8000.0,0.0199,143.28



Valor da taxa adicional para transações BILLS

Para transações do tipo `BILLS` pagas com cartão de crédito, é aplicada uma taxa fixa de 2,99% sobre o valor da transação.

Caso o pagamento tenha sido parcelado (coluna `installments` > 1), é aplicado também juros compostos mensais de 3,49%, com base na Tabela Price.

O valor da transação é ajustado com a taxa fixa antes do cálculo dos juros.



In [0]:
# Cálculo da taxa BILLS
from pyspark.sql import functions as F

# Taxa BILLS: 2,99% sobre o valor da transação
df_silver = df_silver.withColumn(
    "bills_surcharge",
    F.round(
        F.when(
            (F.col("transaction_type") == "BILLS") &
            (F.col("payment_method")    == "Credit card"),
            F.col("transaction_value") * F.col("bills_surcharge_rate")
        )
        .otherwise(0.0),
        2
    )
)

# Value principal ajustado incluindo surcharge
df_silver = df_silver.withColumn(
    "principal_after_fee",
    F.col("transaction_value")
    + F.when(F.col("transaction_type") == "P2P", F.col("p2p_surcharge"))
       .otherwise(F.col("bills_surcharge"))
)


display(df_silver.select(
    "transaction_id","transaction_type","transaction_value",
    "bills_surcharge_rate","bills_surcharge","principal_after_fee"
))


transaction_id,transaction_type,transaction_value,bills_surcharge_rate,bills_surcharge,principal_after_fee
1,P2P,400.0,0.0299,0.0,403.98
2,BILLS,650.0,0.0299,19.44,669.44
3,BILLS,1200.0,0.0299,35.88,1235.88
4,P2P,350.0,0.0299,0.0,356.97
5,P2P,3500.0,0.0299,0.0,3553.73
6,P2P,3420.0,0.0299,0.0,3472.14
7,BILLS,5000.0,0.0299,149.5,5149.5
8,P2P,2800.0,0.0299,0.0,2839.8
9,BILLS,6000.0,0.0299,179.4,6179.4
10,P2P,8000.0,0.0299,0.0,8143.28


Prestação fixa (Tabela Price)

In [0]:
from pyspark.sql import functions as F

monthly_rate = 0.0349

df_silver = df_silver.withColumn(
    "due_amount",
    F.round(
        F.when(F.col("installments") > 1,
            # Price: P = S0 * [j / (1 - (1+j)^(-n))]
            F.col("principal_after_fee")
            * (monthly_rate /
               (1 - F.pow(F.lit(1 + monthly_rate), -F.col("installments"))))
        )
        .otherwise(F.col("principal_after_fee")),  # à vista: parcela = principal
        2
    )
)

display(df_silver.filter("transaction_type='BILLS'") \
    .select("transaction_id","installments","principal_after_fee","due_amount"))


transaction_id,installments,principal_after_fee,due_amount
2,5,669.44,148.23
3,8,1235.88,179.72
7,12,5149.5,532.58
9,4,6179.4,1681.95


Calculo de installment_fee e individual_installment para fechar o df_transactions

In [0]:
from pyspark.sql import functions as F

# 4) Calcular installment_fee e individual_installment
df_silver = df_silver \
    .withColumn(
        "installment_fee",
        F.round(
            F.when(F.col("installments") > 1,
                   F.col("due_amount") * F.col("installments") - F.col("principal_after_fee")
            ).otherwise(0.0),
            2
        )
    ) \
    .withColumn(
        "individual_installment",
        F.col("due_amount")
    )

# Exibir resultado para os IDs que têm parcelamento
df_silver.filter(F.col("installments") > 1) \
    .select(
        "transaction_id",
        "installments",
        "principal_after_fee",
        "due_amount",
        "installment_fee",
        "individual_installment"
    ) \
    .orderBy("transaction_id") \
    .show(10, False)


+--------------+------------+-------------------+----------+---------------+----------------------+
|transaction_id|installments|principal_after_fee|due_amount|installment_fee|individual_installment|
+--------------+------------+-------------------+----------+---------------+----------------------+
|1             |12          |403.98             |41.78     |97.38          |41.78                 |
|10            |6           |8143.28            |1527.73   |1023.1         |1527.73               |
|2             |5           |669.44             |148.23    |71.71          |148.23                |
|3             |8           |1235.88            |179.72    |201.88         |179.72                |
|4             |9           |356.97             |46.9      |65.13          |46.9                  |
|5             |10          |3553.73            |427.09    |717.17         |427.09                |
|6             |7           |3472.14            |567.64    |501.34         |567.64                |


In [0]:
from pyspark.sql import functions as F

df_transactions = df_silver.select(
    "transaction_id",
    "transaction_date",
    "transaction_type",
    "transaction_value",
    "receiver_used_cc_limit",
    "payment_method",
    "installments",
    "p2p_surcharge_rate",
    "bills_surcharge_rate",
    "installment_rate",

    # Valor da taxa adicional para transações P2P
    F.col("p2p_surcharge"),

    # Valor da taxa adicional para transações BILLS
    F.col("bills_surcharge").alias("bill_surcharge"),

    # Valor total da transação incluindo taxas P2P e BILLS
    F.col("principal_after_fee").alias("surcharged_transaction_value"),

    # Valor total do juros a ser cobrado pelo parcelamento (valor para a transação, não por parcela) 
    F.col("installment_fee"),

    # Valor de cada parcela a ser paga pelo usuário que fez um pagamento parcelado com cartão de crédito
    F.col("individual_installment")
)

display(df_transactions)
    
df_transactions.write \
    .option("header", True) \
    .mode("overwrite") \
    .csv("abfss://analytics@stcensodados.dfs.core.windows.net/"
         "picpay_analytics_case_receitas/transactions.csv")


transaction_id,transaction_date,transaction_type,transaction_value,receiver_used_cc_limit,payment_method,installments,p2p_surcharge_rate,bills_surcharge_rate,installment_rate,p2p_surcharge,bill_surcharge,surcharged_transaction_value,installment_fee,individual_installment
1,2021-01-03,P2P,400.0,600.0,Credit card,12,0.0199,0.0299,0.0349,3.98,0.0,403.98,97.38,41.78
2,2021-09-14,BILLS,650.0,300.0,Credit card,5,0.0199,0.0299,0.0349,0.0,19.44,669.44,71.71,148.23
3,2021-07-20,BILLS,1200.0,0.0,Credit card,8,0.0199,0.0299,0.0349,0.0,35.88,1235.88,201.88,179.72
4,2021-08-06,P2P,350.0,800.0,Credit card,9,0.0199,0.0299,0.0349,6.97,0.0,356.97,65.13,46.9
5,2021-04-13,P2P,3500.0,0.0,Credit card,10,0.0199,0.0299,0.0349,53.73,0.0,3553.73,717.17,427.09
6,2021-05-24,P2P,3420.0,0.0,Credit card,7,0.0199,0.0299,0.0349,52.14,0.0,3472.14,501.34,567.64
7,2021-03-31,BILLS,5000.0,0.0,Credit card,12,0.0199,0.0299,0.0349,0.0,149.5,5149.5,1241.46,532.58
8,2021-08-30,P2P,2800.0,0.0,Credit card,12,0.0199,0.0299,0.0349,39.8,0.0,2839.8,684.6,293.7
9,2021-08-28,BILLS,6000.0,0.0,Credit card,4,0.0199,0.0299,0.0349,0.0,179.4,6179.4,548.4,1681.95
10,2021-09-27,P2P,8000.0,0.0,Credit card,6,0.0199,0.0299,0.0349,143.28,0.0,8143.28,1023.1,1527.73


Filtrar parcelamentos no cartão e preparar colunas base

In [0]:
from pyspark.sql import functions as F

df_inst_base = (
    df_silver
    .filter(
        (F.col("payment_method") == "Credit card") &
        (F.col("installments") > 1)
    )
    .select(
        "transaction_id",
        "transaction_date",
        "transaction_type",
        "transaction_value",
        "installments",
        F.col("principal_after_fee"),
        F.col("installment_rate"),
        F.col("due_amount")
    )
)

# Preview: confira 5 linhas
display(df_inst_base)

transaction_id,transaction_date,transaction_type,transaction_value,installments,principal_after_fee,installment_rate,due_amount
1,2021-01-03,P2P,400.0,12,403.98,0.0349,41.78
2,2021-09-14,BILLS,650.0,5,669.44,0.0349,148.23
3,2021-07-20,BILLS,1200.0,8,1235.88,0.0349,179.72
4,2021-08-06,P2P,350.0,9,356.97,0.0349,46.9
5,2021-04-13,P2P,3500.0,10,3553.73,0.0349,427.09
6,2021-05-24,P2P,3420.0,7,3472.14,0.0349,567.64
7,2021-03-31,BILLS,5000.0,12,5149.5,0.0349,532.58
8,2021-08-30,P2P,2800.0,12,2839.8,0.0349,293.7
9,2021-08-28,BILLS,6000.0,4,6179.4,0.0349,1681.95
10,2021-09-27,P2P,8000.0,6,8143.28,0.0349,1527.73


Abrir o cronograma de parcelas 1 para n

In [0]:
from pyspark.sql.functions import sequence, explode, col

df_inst_expanded = (
    df_inst_base
    .withColumn(
        "installment_number",
        explode(sequence(col("installments") * 0 + 1, col("installments")))
    )
)

display(df_inst_expanded.limit(30))


transaction_id,transaction_date,transaction_type,transaction_value,installments,principal_after_fee,installment_rate,due_amount,installment_number
1,2021-01-03,P2P,400.0,12,403.98,0.0349,41.78,1
1,2021-01-03,P2P,400.0,12,403.98,0.0349,41.78,2
1,2021-01-03,P2P,400.0,12,403.98,0.0349,41.78,3
1,2021-01-03,P2P,400.0,12,403.98,0.0349,41.78,4
1,2021-01-03,P2P,400.0,12,403.98,0.0349,41.78,5
1,2021-01-03,P2P,400.0,12,403.98,0.0349,41.78,6
1,2021-01-03,P2P,400.0,12,403.98,0.0349,41.78,7
1,2021-01-03,P2P,400.0,12,403.98,0.0349,41.78,8
1,2021-01-03,P2P,400.0,12,403.98,0.0349,41.78,9
1,2021-01-03,P2P,400.0,12,403.98,0.0349,41.78,10


Calcular data de Vencimento (due_date)

In [0]:
from pyspark.sql.functions import add_months, col

df_inst_due = df_inst_expanded.withColumn(
    "due_date",
    add_months(col("transaction_date"), col("installment_number"))
)

display(
    df_inst_due
      .select("transaction_id", "installment_number", "due_date")
      .orderBy("transaction_id", "installment_number")
)



transaction_id,installment_number,due_date
1,1,2021-02-03
1,2,2021-03-03
1,3,2021-04-03
1,4,2021-05-03
1,5,2021-06-03
1,6,2021-07-03
1,7,2021-08-03
1,8,2021-09-03
1,9,2021-10-03
1,10,2021-11-03


Calculo do Saldo devedor antes de cada parcela

In [0]:
from pyspark.sql import functions as F

df_inst_balance = df_inst_due.withColumn(
    "remaining_balance",
    F.round(
        F.col("principal_after_fee") *
        F.pow(
            F.lit(1) + F.col("installment_rate"),
            F.col("installments") - F.col("installment_number")
        ),
        2
    )
)

display(
    df_inst_balance
      .select("transaction_id", "installment_number", "remaining_balance")
      .orderBy("transaction_id", "installment_number")
)


transaction_id,installment_number,remaining_balance
1,1,589.17
1,2,569.3
1,3,550.1
1,4,531.55
1,5,513.63
1,6,496.31
1,7,479.57
1,8,463.4
1,9,447.77
1,10,432.67


Juros e amortização por parcela

In [0]:
from pyspark.sql import functions as F

df_inst_calc = (
    df_inst_balance
    .withColumn(
        "installment_fee",
        F.round(F.col("remaining_balance") * F.col("installment_rate"), 2)
    )
    .withColumn(
        "installment_amortization",
        F.round(F.col("due_amount") - F.col("installment_fee"), 2)
    )
)

display(
    df_inst_calc
      .select("transaction_id", "installment_number", "installment_fee", "installment_amortization")
      .orderBy("transaction_id", "installment_number")
)


transaction_id,installment_number,installment_fee,installment_amortization
1,1,20.56,21.22
1,2,19.87,21.91
1,3,19.2,22.58
1,4,18.55,23.23
1,5,17.93,23.85
1,6,17.32,24.46
1,7,16.74,25.04
1,8,16.17,25.61
1,9,15.63,26.15
1,10,15.1,26.68


Criar o df_transactions_installments


In [0]:
#cria o df para gravar
df_transactions_installments = df_inst_calc.select(
    "transaction_id",
    "installment_number",
    "transaction_date",
    "transaction_type",
    "transaction_value",
    "due_date",
    "due_amount",
    "installment_amortization",
    "installment_fee"
)

df_installments_final.write \
    .option("header", True) \
    .mode("overwrite") \
    .csv("abfss://analytics@stcensodados.dfs.core.windows.net/"
         "picpay_analytics_case_receitas/transactions_installments.csv")


# exibir
display(df_transactions_installments)


transaction_id,installment_number,transaction_date,transaction_type,transaction_value,due_date,due_amount,installment_amortization,installment_fee
1,1,2021-01-03,P2P,400.0,2021-02-03,41.78,21.22,20.56
1,2,2021-01-03,P2P,400.0,2021-03-03,41.78,21.91,19.87
1,3,2021-01-03,P2P,400.0,2021-04-03,41.78,22.58,19.2
1,4,2021-01-03,P2P,400.0,2021-05-03,41.78,23.23,18.55
1,5,2021-01-03,P2P,400.0,2021-06-03,41.78,23.85,17.93
1,6,2021-01-03,P2P,400.0,2021-07-03,41.78,24.46,17.32
1,7,2021-01-03,P2P,400.0,2021-08-03,41.78,25.04,16.74
1,8,2021-01-03,P2P,400.0,2021-09-03,41.78,25.61,16.17
1,9,2021-01-03,P2P,400.0,2021-10-03,41.78,26.15,15.63
1,10,2021-01-03,P2P,400.0,2021-11-03,41.78,26.68,15.1
