# Tutorial 1: From ERP Data to an Event Log (Procure-to-Pay)

This tutorial walks through the process of loading relational ERP data for a procure-to-pay (P2P) process, transforming it into an event log, and discovering a process model.

## 1. Import necessary libraries

In [None]:
import pandas as pd
from erp_processminer.io_erp import loaders, mappings
from erp_processminer.discovery import directly_follows
from erp_processminer.visualization import graphs

## 2. Create Sample ERP Data

For this tutorial, we'll create some sample data in memory. In a real-world scenario, you would load this data from CSV files or a database.

In [None]:
po_data = [
    ["PO-001", "2023-01-10", "Vendor A", "User 1"],
    ["PO-002", "2023-01-11", "Vendor B", "User 2"],
    ["PO-003", "2023-01-12", "Vendor A", "User 1"],
]
po_df = pd.DataFrame(po_data, columns=["PO_NUMBER", "CREATION_DATE", "VENDOR", "CREATED_BY"])
po_df['CREATION_DATE'] = pd.to_datetime(po_df['CREATION_DATE'])

gr_data = [
    ["GR-101", "PO-001", "2023-01-15", 100, 1],
    ["GR-102", "PO-002", "2023-01-18", 200, 1],
    ["GR-103", "PO-003", "2023-01-17", 50, 1],
]
gr_df = pd.DataFrame(gr_data, columns=["GR_NUMBER", "PO_NUMBER", "RECEIPT_DATE", "QUANTITY", "ITEM_NUMBER"])
gr_df['RECEIPT_DATE'] = pd.to_datetime(gr_df['RECEIPT_DATE'])

inv_data = [
    ["INV-201", "PO-001", "2023-01-20", 1000.0, "Paid"],
    ["INV-202", "PO-002", "2023-01-22", 2000.0, "Paid"],
    ["INV-203", "PO-003", "2023-01-25", 500.0, "Paid"],
]
inv_df = pd.DataFrame(inv_data, columns=["INVOICE_NUMBER", "PO_NUMBER", "INVOICE_DATE", "AMOUNT", "STATUS"])
inv_df['INVOICE_DATE'] = pd.to_datetime(inv_df['INVOICE_DATE'])

display(po_df.head())
display(gr_df.head())
display(inv_df.head())

## 3. Define the ERP-to-EventLog Mapping

The mapping configuration is a dictionary that tells the toolkit how to construct an event log. We need to specify:
- `case_id`: The column that represents the process instance.
- `tables`: A dictionary where each key is a table name and the value defines how to extract events from it.

In [None]:
p2p_mapping_config = {
    "case_id": "PO_NUMBER",
    "tables": {
        "purchase_orders": {
            "entity_id": "PO_NUMBER",
            "activity": "'Create Purchase Order'", # Static activity name
            "timestamp": "CREATION_DATE"
        },
        "goods_receipts": {
            "entity_id": "PO_NUMBER",
            "activity": "'Receive Goods'",
            "timestamp": "RECEIPT_DATE"
        },
        "invoices": {
            "entity_id": "PO_NUMBER",
            "activity": "'Receive Invoice'",
            "timestamp": "INVOICE_DATE"
        }
    }
}

## 4. Apply the Mapping to Create an Event Log

In [None]:
event_log = mappings.apply_mapping([po_df, gr_df, inv_df], p2p_mapping_config)

print(f"Successfully created an event log with {len(event_log.traces)} traces.")

# Print the first trace for inspection
for event in event_log.traces[0]:
    print(event)

## 5. Discover and Visualize a Directly-Follows Graph (DFG)

Now that we have an event log, we can use a discovery algorithm to learn a process model. The DFG is the simplest process model.

In [None]:
dfg, start_activities, end_activities = directly_follows.discover_dfg(event_log)

# The visualize_dfg function saves the graph to a file and returns the graphviz object
g = graphs.visualize_dfg(dfg, start_activities, end_activities, output_file='p2p_tutorial_dfg')
g