
## Payments Mart â€“ Implementation

Based on insights from the EDA, the following transformations were implemented in the **Payments Mart**:

* Aggregated payment records to the **order_id grain** to handle duplicate order entries and ensure join consistency with the master table

* Rolled up payment-related metrics at the order level to prevent double counting in downstream joins.

* Encoded the **payment mode** categorical feature, enabling its use in modeling and analytical workflows.

* Exposed order-level payment features for integration into the final master dataset.


In [2]:
import pandas as pd
import os
payments=pd.read_csv("../Source Data/olist_order_payments_dataset.csv")

In [3]:
payments.head()

Unnamed: 0,order_id,payment_sequential,payment_type,payment_installments,payment_value
0,b81ef226f3fe1789b1e8b2acac839d17,1,credit_card,8,99.33
1,a9810da82917af2d9aefd1278f1dcfa0,1,credit_card,1,24.39
2,25e8ea4e93396b6fa0d3dd708e76c1bd,1,credit_card,1,65.71
3,ba78997921bbcdc1373bb41e913ab953,1,credit_card,8,107.78
4,42fdf880ba16b47b59251dd489d4441a,1,credit_card,2,128.45


## AGGREGATIONS ##

* Pivoting Payment Column to Distinct Payment Options Columns (Encoding)

* Aggregated Payment Value for an order ID

* Took Average Installment (AVG), Total Installments (Count), and Total Payment Value through different modes for an OrderID

In [None]:
payments_pivot = (
    payments.pivot_table(
        index="order_id",
        columns="payment_type",
        values="payment_value",
        aggfunc="sum",
        fill_value=0
    )
    .reset_index()
)


payment_cols = payments_pivot.columns.difference(["order_id"])
payments_pivot.rename(columns={c: f"pymt_mode_{c}" for c in payment_cols}, inplace=True)

avg_installments = (
    payments.groupby("order_id")["payment_installments"]
    .mean()
    .reset_index(name="avg_payment_installments")
)

totals = (
    payments.groupby("order_id")
    .agg(
        total_payment_installments=("payment_installments", "sum"),
        total_payment_value=("payment_value", "sum")
    )
    .reset_index()
)

payments_pivot = (
    payments_pivot
    .merge(avg_installments, on="order_id", how="left")
    .merge(totals, on="order_id", how="left")
)

payments_pivot.head()


Unnamed: 0,order_id,pymt_mode_boleto,pymt_mode_credit_card,pymt_mode_debit_card,pymt_mode_not_defined,pymt_mode_voucher,avg_payment_installments,total_payment_installments,total_payment_value
0,00010242fe8c5a6d1ba2dd792cb16214,0.0,72.19,0.0,0.0,0.0,2.0,2,72.19
1,00018f77f2f0320c557190d7a144bdd3,0.0,259.83,0.0,0.0,0.0,3.0,3,259.83
2,000229ec398224ef6ca0657da4fc703e,0.0,216.87,0.0,0.0,0.0,5.0,5,216.87
3,00024acbcdf0a6daa1e931b038114c75,0.0,25.78,0.0,0.0,0.0,2.0,2,25.78
4,00042b26cf59d7ce69dfabb4e55b4fd9,0.0,218.04,0.0,0.0,0.0,3.0,3,218.04


In [4]:
payments_pivot.duplicated("order_id").sum()

0

In [6]:
payments_pivot.to_csv("../Processed Data/prd_payments.csv", index=False)

## VALIDATION ##

In [7]:
payments.loc[payments["order_id"]=='02ec4da9d03014f06d711d60eb37cc22']

Unnamed: 0,order_id,payment_sequential,payment_type,payment_installments,payment_value
3261,02ec4da9d03014f06d711d60eb37cc22,2,voucher,1,75.97
68931,02ec4da9d03014f06d711d60eb37cc22,1,credit_card,2,29.11


In [5]:
payments_pivot.loc[payments_pivot["order_id"]=='02ec4da9d03014f06d711d60eb37cc22']

Unnamed: 0,order_id,pymt_mode_boleto,pymt_mode_credit_card,pymt_mode_debit_card,pymt_mode_not_defined,pymt_mode_voucher,avg_payment_installments,total_payment_installments,total_payment_value
1123,02ec4da9d03014f06d711d60eb37cc22,0.0,29.11,0.0,0.0,75.97,1.5,3,105.08


In [11]:
payments_pivot["order_id"].duplicated().sum()

0

In [9]:
payments_pivot.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99440 entries, 0 to 99439
Data columns (total 7 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   order_id                  99440 non-null  object 
 1   boleto                    99440 non-null  float64
 2   credit_card               99440 non-null  float64
 3   debit_card                99440 non-null  float64
 4   not_defined               99440 non-null  float64
 5   voucher                   99440 non-null  float64
 6   avg_payment_installments  99440 non-null  float64
dtypes: float64(6), object(1)
memory usage: 5.3+ MB
