# Use Case Description
Automating the prioritization of critical alerts from a large volume of security alerts is essential because it enables security teams to focus on the most urgent and impactful threats quickly. Without automation, manually sorting through numerous alerts can lead to delays, missed critical incidents, and inefficient use of resources. Automation using LLMs improves response times, reduces alert fatigue, and ensures that critical vulnerabilities are addressed promptly, enhancing overall security posture and operational efficiency.


## Model used for this use case
Both Instruct Model and Reasoning Model would be suitable for this task. In this example, we used Reasoning Model. <br>
The Reasoning Model is currently in preview. If you don't have access, you can use the instruct model instead.

## SetUp

The setup scripts below are essentially the same as those in the [Quickstart (Reasoning Model)](https://github.com/RobustIntelligence/foundation-ai-cookbook/blob/main/1_quickstarts/Preview_Quickstart_reasoning_model.ipynb)

### Notice
- The code below assumes that users have access to the models via Hugging Face. If you are using API access instead, please replace the inference code with the API version provided in the Quickstart guide.
- This model is currently in preview mode and may receive updates. As a result, outputs can vary even when parameters are configured to ensure reproducibility.

In [1]:
import os
import transformers
import torch
from IPython.display import display, Markdown

HF_TOKEN = os.getenv("HF_TOKEN")

def _get_device():
    if torch.cuda.is_available():
        return "cuda"
    elif torch.backends.mps.is_available():
        return "mps"
    else:
        return "cpu"

DEVICE = _get_device()

In [2]:
MODEL_ID = "" # To be relaced with the final model name

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, token=HF_TOKEN)
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=MODEL_ID,
    device_map="auto",
    torch_dtype=torch.float32, # this model's tensor_type is float32
    token=HF_TOKEN,
)

Loading checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]

In [3]:
generation_args = {
    "max_new_tokens": 1024,
    "temperature": None,
    "repetition_penalty": 1.2,
    "do_sample": False,
    "use_cache": True,
    "eos_token_id": tokenizer.eos_token_id,
    "pad_token_id": tokenizer.pad_token_id,
}

In [4]:
import re

def inference(prompt):
    messages = [
        {"role": "user", "content": prompt},
    ]
    inputs = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )

    # The old model version didn’t include the <think> token in the chat template.
    think_token = "<think>\n"
    if not inputs.endswith(think_token):
        inputs += think_token
    
    inputs = tokenizer(inputs, return_tensors="pt").to(DEVICE)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            **generation_args,
        )
    response = tokenizer.decode(outputs[0], skip_special_tokens = False)

    # extract the thinking part only
    match = re.search(r"<think>(.*?)<\|end_of_text\|>", response, re.DOTALL)
    
    return match.group(1).strip()

## Alerts analysis

Here’s several mock security alerts to be analyzed

In [5]:
alerts = """
{
"alert_uuid": "3a4bd7e4-f8a2-4b7e-9d3b-7e2fa3f98b6c",
"timestamp": "2025-07-15T10:55:00Z",
"user": "security_admin",
"source_ip": "198.51.100.33",
"message": "The host at 198.51.100.33 has been permanently added to the ssh allowed list.",
"code": "SEC3007"
},
{
"alert_uuid": "a3f9d2b7-4c1e-4f8a-9d3b-7e2f1a6c8b9d",
"timestamp": "2025-07-15T09:45:12Z",
"user": "sysadmin",
"source_ip": "192.168.1.10",
"destination_ip": "10.0.0.5",
"message": "Log Error: Push error for subscription logs_sub_01: Failed to connect to 10.0.0.5: Connection timed out",
"code": "LOG1001"
},
{
"alert_uuid": "b9d3a4b7-e2f1-4c8b-9d3b-7e2fa3f98b6c",
"timestamp": "2025-07-15T10:20:45Z",
"user": "sysadmin",
"source_ip": "192.168.1.10",
"message": "Log Error: Subscription logs_sub_02: Log partition is full.",
"code": "LOG1002"
},
{
"alert_uuid": "d7e4f8a2-9b3c-4f1e-8a7d-2b6c9f1e3a4b",
"timestamp": "2025-07-15T10:05:33Z",
"user": "automation_bot",
"source_ip": "192.168.1.20",
"message": "Startup script backup_script.sh exited with error: Disk quota exceeded. System services failed to start, causing immediate operational outage.",
"code": "SYS2002"
},
{
"alert_uuid": "4c1e9d3b-7e2f-4f8a-9d3b-7e2fa3f98b6c",
"timestamp": "2025-07-15T10:45:55Z",
"user": "system",
"source_ip": "127.0.0.1",
"message": "Dependency cycle detected: processA -> processB -> processA.",
"code": "SYS2010"
},
{
"alert_uuid": "f1a6c8b9-d2b7-4c1e-9d3b-7e2fa3f98b6c",
"timestamp": "2025-07-15T10:15:00Z",
"user": "unknown",
"source_ip": "203.0.113.45",
"message": "The host at 203.0.113.45 has been added to the blocked list because of an SSH DOS attack.",
"code": "SEC3003"
},
{
"alert_uuid": "e2f1a6c8-b9d3-4c1e-9d3b-7e2fa3f98b6c",
"timestamp": "2025-07-15T10:35:00Z",
"user": "jdoe",
"source_ip": "198.51.100.22",
"message": "Account locked due to 5 failed login attempts. Last login attempt from 198.51.100.22.",
"code": "SEC3006"
},
{
"alert_uuid": "9d3b7e2f-a3f9-4c1e-8b6c-7e2fa3f98b6c",
"timestamp": "2025-07-15T10:40:20Z",
"user": "sysadmin",
"source_ip": "192.168.1.10",
"destination_ip": "10.0.0.6",
"message": "Subscription logs_sub_03: Timed out after 30 seconds sending data to syslog server syslog01 (10.0.0.6).",
"code": "LOG1003"
},
{
"alert_uuid": "7e2fa3f9-8b6c-4c1e-9d3b-7e2fa3f98b6c",
"timestamp": "2025-07-15T10:50:10Z",
"user": "reporting_service",
"source_ip": "192.168.1.40",
"message": "A failure occurred while building periodic report ‘Monthly_Security_Audit’.",
"code": "REP5001"
},
{
"alert_uuid": "c8b9d3a4-b7e2-4f1e-9d3b-7e2fa3f98b6c",
"timestamp": "2025-07-15T10:30:12Z",
"user": "app_service",
"source_ip": "192.168.1.30",
"message": "An application fault occurred: segmentation fault in module auth_handler.",
"code": "APP4005"
}
"""

In [6]:
def make_prompt(alerts):
   return f"""You are a security expert in charge of analyzing bunch of alerts and prioritizing critical ones.
    
    Go over the alerts given, and prioritize which ones are critical
    
    ## ALERTS
    {alerts}
    
    Respond with:
    - Which ones are critical?
    - Why are they crtical?
    - What should we do to address the alerts?

    """

In [7]:
response = inference(make_prompt(alerts))
display(Markdown(response))

Critical Alerts:

1. **Alert UUID:** d7e4f8a2-9b3c-4f1e-8a7d-2b6c9f1e3a4b  
   **Code:** SYS2002  
   **Message:** Startup script `backup_script.sh` exited with error: Disk quota exceeded. System services failed to start, causing immediate operational outage.

   *Why Critical:* This alert indicates a severe issue where system services couldn't start due to disk quotas being reached. Such failures can lead to downtime, loss of critical data, or inability to perform backups, leading to significant business disruptions. Immediate attention is required to resolve this and prevent further issues like data loss or corruption.

   *Action:* Check storage usage across systems; clear unnecessary files, increase quotas if possible, investigate root cause (like rogue processes consuming space), restore service functionality immediately.

2. **Alert UUID:** f1a6c8b9-d2b7-4c1e-9d3b-7e2fa3f98b6c  
   **Code:** SEC3003  
   **Message:** The host at 203.0.113.45 has been added to the blocked list because of an SSH DOS attack.

   *Why Critical:* A Denial-of-Service (DoS) attack on SSH could render the system inaccessible to legitimate users, disrupting operations and potentially leading to data breaches if attackers use it as a smokescreen. Continuous monitoring and swift action are needed to mitigate such attacks and ensure availability.

   *Action:* Investigate source IP's activity, block it temporarily, analyze traffic patterns, implement rate limiting, consider increasing logging granularity, coordinate with network/security teams for deeper analysis and mitigation strategies.

Other Important but Not Critically Urgent Alerts:

3. **Alert UUID:** b9d3a4b7-e2f1-4c8b-9d3b-7e2fa3f98b6c  
   **Code:** LOG1002  
   **Message:** Log Error: Subscription logs_sub_02: Log partition is full.

   *Priority:* High  
   *Why:* Filling up log partitions can hinder forensic investigations and compliance auditing. While not directly impacting uptime, it's crucial to avoid losing important logs.  

   *Action:* Increase log partition size, archive old logs, configure better retention policies, monitor regularly for similar incidents.

4. **Alert UUID:** 4c1e9d3b-7e2f-4f8a-9d3b-7e2fa3f98b6c  
   **Code:** SYS2010  
   **Message:** Dependency cycle detected: processA -> processB -> processA

   *Priority:* Medium-High  
   *Why:* Circular dependencies can lead to system instability, crashes, or resource leaks. If left unaddressed, might result in service degradation or complete stoppage.  

   *Action:* Analyze startup scripts, configuration files, or running processes for circular references. Fix configurations and restart affected services.

Non-Critical Alerts:

Remaining alerts indicate various issues that need addressing but don’t pose immediate threats to system integrity or availability. They include temporary connection timeouts, account lockouts (which have built-in protections against brute force), and reporting errors. These should be addressed within regular maintenance cycles unless part of ongoing incident response efforts.

*Note:* Prioritization may vary based on specific environment, SLAs, and existing controls. Always cross-check with your organization’s incident classification guidelines and playbooks.