This project provides a Advanceed log Classification System that combines multiple approaches to handle different levels of log complexity.
By integrating regex rules, machine learning, and LLMs, it ensures accuracy and flexibility in log analysis, even when training data is scarce.
-
Regex-Based Rules
- Efficient for structured, repetitive, and predictable log patterns.
- Uses manually defined regular expressions.
-
Sentence Transformer + Logistic Regression
- Handles more complex log patterns when sufficient labeled data is available.
- Embeddings are generated using a Sentence Transformer and classified via Logistic Regression.
-
Large Language Models (LLMs)
- Ideal for scenarios with limited or noisy training data.
- Serves as a fallback method for irregular or unstructured patterns.
├── training/ # Training scripts (Regex + ML models) ├── models/ # Pretrained models and embeddings ├── resources/ # Sample datasets, test files, images, outputs ├── server.py # FastAPI server code └── requirements.txt # Dependencies
-
Install Dependencies: Make sure you have Python installed on your system. Install the required Python libraries by running the following command:
pip install -r requirements.txt
-
Run the FastAPI Server: To start the server, use the following command:
uvicorn server:app --reload
Once the server is running, you can access the API at:
http://127.0.0.1:8000/(Main endpoint)http://127.0.0.1:8000/docs(Interactive Swagger documentation)http://127.0.0.1:8000/redoc(Alternative API documentation)
Upload a CSV file containing logs to the FastAPI endpoint for classification. Ensure the file has the following columns:
sourcelog_message
The output will be a CSV file with an additional column target_label, which represents the classified label for each log entry.