LogPipe is a framework that enhances the effectiveness of LLMs in anomaly detection through knowledge base augmentation. By providing patterns to the LLM via a knowledge base, it significantly improves the model's performance. Additionally, LogPipe incorporates caching capabilities, which reduces the operational costs of the LLM. Furthermore, LogPipe offers fault localization functionality, thereby enhancing interpretability.
The experimental design, including data partitioning, for the newly incorporated baselines KNN, NeuralLog, DeepLog, and LogAnomaly was kept consistent with that of LogPipe. Furthermore, we strictly followed the hyperparameter configurations specified in the official repository to ensure fair reproduction. The experimental results indicate that even with these additional baselines, LogPipe consistently maintains superior performance.
We performed a grid search on the validation set to determine the optimal hyperparameter combination, including the anomaly score and the dynamic pattern threshold T. To assess robustness, we tested two sub-optimal settings by perturbing the optimal values by ±1: (anomaly score − 1, T − 1) and (anomaly score + 1, T + 1). Green, Blue, and Yellow bars in the figure correspond to Threshold −1, Optimal Threshold, and Threshold +1, respectively. The results show that LogPipe's F1-score fluctuates by no more than 3% relative to the optimal threshold, indicating stable performance across the tested ranges.
We evaluated DeepSeek, GLM4-2.0-Flash, and Qwen3-14B within the LogPipe framework. The experimental results demonstrate that LogPipe is robust across different LLMs and does not rely on GLM4-2.0-Flash from the original LogPipe paper.
You can get api key from https://bigmodel.cn/.
The code is implemented in Python==3.11. To install the required packages, run the following command:
pip install -r requirements.txt
You can download the dataset and the pretrained model parameters from the following anonymized OSF link: 🔗 https://osf.io/w8sf2/?view_only=7a7b5d9dfc3748d6848875d757c1cae8
To get the sentiment of log, run the following command:
python "\LogPipe\preprocess\get_log_event_sentiment.py"data_file = '/dataset/preprocessed/BGL/BGL.csv' (you can modify the data_file to specify the dataset you want to run)
with open('BGL_sentiment.pkl', 'wb') as f: (
pickle.dump(Eid_sentiment, f)
then we get a BGL_sentiment.pkl To slice logs into block:
python"\LogPipe\preprocess\slice_logs_blcok.py" file_path = 'dataset/preprocessed/BGL/BGL.csv'
output_path = 'dataset/preprocessed/BGL/100l_BGL.csv'
L = 100 # The desired block length
process_log_to_csv(file_path, output_path, L)To reduce your workload, we have already included BGL_sentiment.pkl and pre-sliced log blocks in the dataset, which can be used directly.
Run "logpipe\detect_log_Bgl-Spirit-Thunderbird.py"
Deatils:
your_api_key = "" (Can get from step1)
data_path = '/dataset/preprocessed/100L_BGL.csv'
sent_template_file='/dataset/preprocessed/BGL/BGL_sentiment.pkl' For Session-based datsesets, please Run "\LogPipe\detect_session-H2-H3-S3-S4-HDFS.py"The effect of anomaly detection is as follows.




