In [1]:
import os
import pandas as pd
from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from x_sem_ad import XSemAD, load_event_log_from_xes
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.agents import AgentExecutor, create_react_agent
from stateful_python_tool import StatefulPythonREPLTool, PythonExecutionContext

In [2]:
load_dotenv()
if not os.getenv("GOOGLE_API_KEY"):
    raise ValueError("A chave de API do Google não foi encontrada. Defina a variável de ambiente GOOGLE_API_KEY.")

In [3]:
path_to_model="./data/model/"
path_to_log = 'data/InternationalDeclarations.xes'

print("Inicializando o sistema de análise de anomalias...")
log = load_event_log_from_xes(path_to_log)
model = XSemAD(path_to_model=path_to_model)
model.load_event_log(log)

Inicializando o sistema de análise de anomalias...




parsing log, completed traces ::   0%|          | 0/6449 [00:00<?, ?it/s]

Loading model from: ./data/model/ folder


In [4]:
def get_eventlog() -> pd.DataFrame:
	"""
	Retrieves the loaded event log.

	Returns:
		A pandas DataFrame containing the event log data.
	"""
	print("\n>>> get_eventlog()")
	print(f">>> Eventlog({log.shape})\n")
	return log

def analyse_eventlog_anomalies(log: pd.DataFrame, constraint_type: str,
							   threshold: float=0.75) -> dict:
	"""
	Performs anomaly analysis for a specific constraint type in a previously loaded process event log.

	Args:
	 	log: A pandas DataFrame containing the event log data. 
		constraint_type: The type of constraint to be checked. Based on the user's question, choose one of the following values:
			"Init" - Initiation(a) = The process starts with activity a.
			"End" - Termination(a) = The process ends with activity a.
			"Succession" - Succession(aj,ak)– Activity aj (ak) occurs if and only if it is followed (preceded) by activity ak (aj) in the process.
			"Alternate Succession" - Alternate Succession(aj,ak)– Activities aj and ak occur in the process if and only if the latter follows the former, and they alternate each other in the trace.
			"Choice" - Choice(aj,ak)– Activity aj or activity ak must be in the process.
			"Co-Existence" - Co-Existence(aj,ak)– Both activities aj and ak must be in the process.
			"Exclusive Choice" - Exclusive Choice(aj,ak)– Either activity aj or activity ak (but not both) must be in the process.
			"Response" - Response(aj,ak)– If activity aj occurs in the process, then activity ak occurs after aj.
			"Alternate Response" - Alternate Response(aj,ak)– Each time activity aj occurs in the process instance, activity ak occurs afterwards, before aj recurs.
			"Precedence" - Precedence(aj,ak)– Activity ak occurs in the process instance only if preceded by activity aj.
			"Alternate Precedence" - Alternate Precedence(aj,ak)– Each time activity ak occurs in the process instance, it is preceded by activity aj, and no other ak can recur in between.
		threshold: The threshold value for anomaly analysis. This parameter is used to define the severity level of constraint violations.

	Returns:
		A dictionary with the violated constraints and the count of their violations,
		in the following format: {
			"constraint_type[activity_a, activity_b]": int(violation_count), ...
		}, example:
		{
			"Init[Activity A, Activity B]": 5,
			"End[Activity C, Activity D]": 3
		}
	"""
	print(f"\n>>> analyse_eventlog_anomalies(constraint_type='{constraint_type}', threshold={threshold})")
	result = model.run(constraint_type=constraint_type, threshold=threshold)
	print(f">>> {result}\n")
	return result

available_functions = {
	"get_eventlog": get_eventlog,
    "analyse_eventlog_anomalies": analyse_eventlog_anomalies,
}

context = PythonExecutionContext(globals=available_functions)
stateful_tool = StatefulPythonREPLTool(context)
tools = [stateful_tool]

In [10]:
# MODEL = "gemini-2.0-flash"
# MODEL = "gemini-2.5-flash"
MODEL = "gemini-2.5-pro"

with open("./data/agent_prompt.txt", "r") as file:
	prompt_template_string = file.read()
	print(prompt_template_string)

llm = ChatGoogleGenerativeAI(model=MODEL, temperature=0)
prompt = PromptTemplate.from_template(prompt_template_string)
agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True,
							   handle_parsing_errors=True)

You are an expert Process Mining assistant. Your primary goal is to help users analyze and understand event logs by using the tools at your disposal.

## Here are the context information about the event log you are analyzing:
{event_log_context}

## You have access to the following tools:
{tools}

## Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

**When you have enough information and know the final answer, you MUST output the final answer in the following format, and nothing else:**

Begin!

## Some rules to follow:
1. **Use Available Functions:** Do dir() to see the available functions and help() to learn about

In [11]:
with open("./data/context.txt", "r") as file:
	context = file.read()
	print(context)

Consider the following context for the eventlog dataset:
In many organizations, staff members travel for work. They travel to customers, to conferences or to project meetings and these travels are sometimes expensive. As an employee of an organization, you do not have to pay for your own travel expenses, but the company takes care of them. The dataset contains events pertaining to two years of travel expense claims. The various permits and declaration documents (domestic and international declarations, pre-paid travel costs and requests for payment) all follow a similar process flow. After submission by the employee, the request is sent for approval to the travel administration. If approved, the request is then forwarded to the budget owner and after that to the supervisor. If the budget owner and supervisor are the same person, then only one of these steps is taken. In some cases, the director also needs to approve the request. The process finished with either the trip taking place or

In [None]:
user_prompt = "Detecte as maiores anomalias do tipo Succession e Alternate Precedence."

result = agent_executor.invoke({
    "input": user_prompt,
    "event_log_context": context,
})
print(f"\n🧑‍💻 Pergunta: {user_prompt}")
print(f"🤖 Resposta: {result['output']}")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought
The user wants to find the most significant anomalies for the "Succession" and "Alternate Precedence" rule types.

1.  **Succession Rule**: This rule checks if the occurrence of one activity is always followed by the occurrence of another activity within the same case.
2.  **Alternate Precedence Rule**: This rule checks if one activity is always preceded by another, but not necessarily directly.

I will use the available functions to check for violations of these two rules. Then, I will analyze the results to identify the most frequent anomalies and present them to the user in a clear and translated format.

First, I need to see which functions are available.Action
Stateful Python Interpreter
Action Input
dir()[0mInvalid Format: Missing 'Action:' after 'Thought:'[32;1m[1;3mThought: O usuário deseja detectar as anomalias mais significativas para as regras do tipo "Succession" e "Alternate Precedence".

1.  **Listar 