In [1]:
import os
import pandas as pd
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from x_sem_ad import XSemAD, load_event_log_from_xes
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.agents import AgentExecutor, create_react_agent
from stateful_python_tool import StatefulPythonREPLTool, PythonExecutionContext

In [2]:
load_dotenv()
if not os.getenv("GOOGLE_API_KEY"):
    raise ValueError("A chave de API do Google não foi encontrada. Defina a variável de ambiente GOOGLE_API_KEY.")

In [3]:
path_to_model="./data/model/"
path_to_log = 'data/InternationalDeclarations.xes'

print("Inicializando o sistema de análise de anomalias...")
log = load_event_log_from_xes(path_to_log)
model = XSemAD(path_to_model=path_to_model)
model.load_event_log(log)

Inicializando o sistema de análise de anomalias...




parsing log, completed traces ::   0%|          | 0/6449 [00:00<?, ?it/s]

Loading model from: ./data/model/ folder


In [7]:
def get_eventlog() -> pd.DataFrame:
	"""
	Retrieves the loaded event log.

	Returns:
		A pandas DataFrame containing the event log data.
	"""
	print("\n>>> get_eventlog()")
	print(f">>> Eventlog({log.shape})\n")
	return log

def analyse_eventlog_anomalies(log: pd.DataFrame, constraint_type: str,
							   threshold: float=0.75) -> dict:
	"""
	Performs anomaly analysis for a specific constraint type in a previously loaded process event log.

	Args:
	 	log: A pandas DataFrame containing the event log data. 
		constraint_type: The type of constraint to be checked. Based on the user's question, choose one of the following values: 'Init', 'End', 'Precedence', 'Succession', 'Alternate Precedence', 'Co-Existence', 'Response', 'Alternate Response', 'Alternate Succession', 'Choice', 'Exclusive Choice'. Example, for start rule," use 'Init'.
		threshold: The threshold value for anomaly analysis. This parameter is used to define the severity level of constraint violations.

	Returns:
		A dictionary with the violated constraints and the count of their violations,
		in the following format: {
			"constraint_type[activity_a, activity_b]": int(violation_count), ...
		}, example:
		{
			"Init[Activity A, Activity B]": 5,
			"End[Activity C, Activity D]": 3
		}
	"""
	print(f"\n>>> analyse_eventlog_anomalies(constraint_type='{constraint_type}', threshold={threshold})")
	result = model.run(constraint_type=constraint_type, threshold=threshold)
	print(f">>> {result}\n")
	return result

available_functions = {
	"get_eventlog": get_eventlog,
    "analyse_eventlog_anomalies": analyse_eventlog_anomalies,
}

context = PythonExecutionContext(globals=available_functions)
stateful_tool = StatefulPythonREPLTool(context)
tools = [stateful_tool]

In [8]:
with open("./data/agent_prompt.txt", "r") as file:
	prompt_template_string = file.read()
	print(prompt_template_string)

llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature=0)
prompt = PromptTemplate.from_template(prompt_template_string)
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True,
							   handle_parsing_errors=True)

You are an expert Process Mining assistant. Your primary goal is to help users analyze and understand event logs by using the tools at your disposal.

**Eventlog context:**
Here are the context information about the event log you are analyzing:
{event_log_context}

**Content & Analysis Rules:**
1. **Use Available Functions:** Do dir() to see the available functions and dir() to learn about them. Use them to answer the user's question.
2.  **Translate:** Always translate the technical names of constraints and activities (e.g., 'Init', 'Co-Existence', 'register request') into the user's language (e.g., 'Início', 'Coexistência', 'registrar solicitação').
3. **Natural Language Responses:** Provide answers in natural language, ensuring that any user, regardless of their familiarity with process mining, can understand your response without technical jargon.
4.  **Analyze and Add Insight:** Do not just report the raw numbers from the tools. Briefly analyze the eventlog and the activities invo

In [9]:
with open("./data/context.txt", "r") as file:
	context = file.read()
	print(context)

Consider the following context for the eventlog dataset:
In many organizations, staff members travel for work. They travel to customers, to conferences or to project meetings and these travels are sometimes expensive. As an employee of an organization, you do not have to pay for your own travel expenses, but the company takes care of them. The dataset contains events pertaining to two years of travel expense claims. The various permits and declaration documents (domestic and international declarations, pre-paid travel costs and requests for payment) all follow a similar process flow. After submission by the employee, the request is sent for approval to the travel administration. If approved, the request is then forwarded to the budget owner and after that to the supervisor. If the budget owner and supervisor are the same person, then only one of these steps is taken. In some cases, the director also needs to approve the request. The process finished with either the trip taking place or

In [10]:
question = "Analise as funções e variáveis disponíveis. Quais são as anomalias que quebram a regra de fim no log de eventos carregado? Considere um limiar de 80% para a análise."

result = agent_executor.invoke({
    "input": question,
    "chat_history": [],
    "event_log_context": context,
})
print(f"\n🧑‍💻 Pergunta: {question}")
print(f"🤖 Resposta: {result['output']}")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: The user wants to identify anomalies that violate the "end rule" in the event log, using an 80% threshold.
First, I need to understand what functions are available and which one is relevant for "end rule" anomalies.
I will use `dir()` to list the available functions and then `help()` on the most promising one to understand its usage.
Given the context of process mining and "end rule", I'll look for functions related to conformance checking, specifically focusing on end activities or final states.
The function `check_end_activities_anomalies` seems to be exactly what I need. I will check its help to understand the parameters, especially how to set the threshold.Action: Stateful Python Interpreter
Action Input:
dir()[0m[36;1m[1;3m=> ['__builtins__', 'analyse_eventlog_anomalies', 'get_eventlog'][0m[32;1m[1;3mAction: Stateful Python Interpreter
Action Input:
help(analyse_eventlog_anomalies)[0m[36;1m[1;3mHelp on 