# Use Case Description

Critical cybersecurity incidents often result in lengthy, complex documentation that is difficult to digest, slowing triage, handoffs, and leadership visibility. The goal is to automatically condense these incident reports into concise summaries tailored to analysts or executives — enabling faster response, clearer communication, and more effective decision-making.

**For end to end incident investigation, see IncidentInvestigation_EndtoEnd. This notebook addresses a simple use case of incident summarization with longer logs**

## Model used for this use case
Both Instruct Model and Reasoning Model can be used, but Instruct Model could work better as no complex reasoning is involved.

## SetUp

The setup scripts below are essentially the same as those in the [Quickstart (Instruct Model)](https://github.com/RobustIntelligence/foundation-ai-cookbook/blob/main/1_quickstarts/Preview_Quickstart_instruct_model.ipynb)|

In [1]:
import os
import transformers
import torch
from IPython.display import display, Markdown

HF_TOKEN = os.getenv("HF_TOKEN")
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

In [2]:
MODEL_ID = "" # To be relaced with the final model name

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, token=HF_TOKEN)
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=MODEL_ID,
    device_map="auto",
    torch_dtype=torch.bfloat16, # this model's tensor_type is BF16
    token=HF_TOKEN,
)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [3]:
generation_args = {
    "max_new_tokens": 1024,
    "temperature": None,
    "repetition_penalty": 1.2,
    "do_sample": False,
    "use_cache": True,
    "eos_token_id": tokenizer.eos_token_id,
    "pad_token_id": tokenizer.pad_token_id,
}

In [4]:
import re

def inference(prompt, system_prompt):
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt},
    ]
    inputs = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    
    inputs = tokenizer(inputs, return_tensors="pt").to(DEVICE)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            **generation_args,
        )
    response = tokenizer.decode(outputs[0], skip_special_tokens = False)
    
    # extract the assistant response part only
    match = re.search(r"<\|assistant\|>(.*?)<\|end_of_text\|>", response, re.DOTALL)
    
    return match.group(1).strip()

## Incident Summarization

In [5]:
def make_prompt(incident_document: str, summary_type: str = "executive") -> str:

    assert summary_type in ["executive", "tactical"], "Invalid summary_type. Use 'executive' or 'tactical'."
    
    if summary_type == "executive":
        instructions = (
            "Your task is to provide a clear, non-technical summary for security leadership.\n"
            "- Focus on what happened, when it happened, and why it matters\n"
            "- Keep it under 5 sentences\n"
            "- Use clear, plain language"
        )
    elif summary_type == "tactical":
        instructions = (
            "Your task is to provide a concise bullet-point summary for L2/L3 analysts.\n"
            "- List key events and techniques\n"
            "- Use technical language where appropriate"
        )

    return f"""
    {instructions}
    
    ## INCIDENT DOCUMENTATION
    {incident_document}

    Respond with a clear summary.
    """

Let's assume such an incident has happened and have model to summarize them

In [6]:
long_incident_doc = """
Incident Report: IR-2025-0173

Executive Summary:
On April 4, 2025, the security operations center (SOC) was alerted to anomalous behavior originating from an engineering jump box (ENG-JB01). The incident spanned approximately 18 hours and resulted in the compromise of multiple internal systems, theft of sensitive source code repositories, and signs of attempted exfiltration to a known threat infrastructure host associated with APT29. This report outlines the timeline, impact, attacker techniques, and remediation actions taken.

Timeline of Events:
- 2025-04-04 01:14 UTC: Alert triggered for unusual lateral movement attempt originating from ENG-JB01. Logged by Cisco XDR as “Possible Pass-the-Hash movement attempt.”
- 2025-04-04 01:18 UTC: Multiple failed Kerberos authentication attempts on host HR-LAPTOP-5 from ENG-JB01.
- 2025-04-04 01:23 UTC: Successful logon to FIN-SQL-02 using a privileged service account (svc-backup-prod) outside of usual login times.
- 2025-04-04 01:25 UTC: PowerShell execution detected from FIN-SQL-02 running a Base64-encoded command fetching content from hxxp://185.44.76.100/payload.ps1.
- 2025-04-04 01:30 UTC: Scheduled task created named “WindowsHealth” on FIN-SQL-02 for persistence.
- 2025-04-04 01:33 UTC: Internal network scanning activity detected from FIN-SQL-02 targeting subnet 10.0.42.0/24.
- 2025-04-04 02:01 UTC: SMB and WMI-based authentication attempts observed toward code repository server (ENG-GIT-SRV).
- 2025-04-04 02:11 UTC: Access to sensitive Git repositories verified using server audit logs.
- 2025-04-04 02:15–04:30 UTC: Multiple 50–150MB file transfers from ENG-GIT-SRV to ENG-JB01. Total volume approx. 3.1GB.
- 2025-04-04 04:45 UTC: Outbound connection to 185.44.76.100 established from ENG-JB01 on port 443, observed via egress firewall logs.
- 2025-04-04 04:52 UTC: Data transfer spike of ~3.2GB to external host.
- 2025-04-04 05:02 UTC: DLP system flagged unauthorized data exfiltration attempt. Incident escalated to IR team.
- 2025-04-04 05:13 UTC: Isolation actions initiated on ENG-JB01, FIN-SQL-02, and ENG-GIT-SRV.
- 2025-04-04 05:55 UTC: Memory capture and forensic triage initiated on all impacted systems.

Tactics and Techniques:
- Initial Access: Credential stuffing or lateral movement via Jump Box (ENG-JB01)
- Credential Access: Possible Pass-the-Hash (T1550.002), credential dumping via LSASS access
- Execution: Obfuscated PowerShell scripts (T1059.001)
- Persistence: Scheduled Task creation (T1053.005)
- Discovery: Network scanning (T1046)
- Collection: Staging of large volumes of code repositories
- Exfiltration: HTTPS outbound to threat infrastructure (T1041)

Impacted Assets:
- ENG-JB01 (Jump Box, staging point)
- FIN-SQL-02 (Finance DB server, used for lateral movement)
- ENG-GIT-SRV (Code repository server, source of exfiltrated data)

Impacted Data:
- At least 12 private Git repositories accessed
- Two contained proprietary ML models and customer integration code
- No indication of PII or PCI data exposure

Root Cause:
The attacker appears to have compromised or reused valid credentials on ENG-JB01, which had elevated network permissions. The exact credential acquisition method is undetermined but likely due to reuse or prior phishing compromise. MFA was not enabled on the jump box account.

Remediation Actions:
- Immediate network isolation of all affected systems
- Reset of all privileged and domain service accounts
- Implementation of LAPS and forced password rotation for local accounts
- Blocking outbound traffic to 185.44.76.100 at the firewall
- Deployment of EDR response rules for persistence techniques
- Initiated rollout of MFA to all bastion and admin systems
- Engaged law enforcement and threat intelligence partners for IOCs

Lessons Learned:
- MFA gaps on high-privilege infrastructure allowed credential reuse
- Scheduled task creation alerts were not being monitored
- EDR rules for base64-encoded PowerShell were disabled in engineering OU
- Repository audit logging was insufficient to detect staging early

Recommendations:
- Mandatory MFA on all privileged and remote-access accounts
- Enable audit logging on all code and data staging systems
- Expand scheduled task alerting across all domains
- Conduct purple team simulation to test lateral movement detection

This report has been reviewed and approved by the incident response lead and executive security team. Further updates will be shared in postmortem review.

-- End of Report --
"""


### Executive Summary

In [7]:
executive_system_prompt = "You are a SOC assistant summarizing a lengthy incident report."

summary_exec = inference(make_prompt(long_incident_doc, "executive"), executive_system_prompt)
display(Markdown(summary_exec))

**Summary:** On April 4th, our Engineering Jump Box (ENG-JB01) was breached, leading to an 18-hour-long cyberattack that compromised several internal systems, including those holding sensitive source code repositories. Attackers employed tactics such as pass-the-hash attacks, obfuscated PowerShell commands, and scheduled tasks for persistence. They managed to steal over 3GB worth of confidential information before attempting to exfiltrate it through a known malicious IP address linked to APT29. We've isolated the infected machines, reset compromised credentials, blocked the offending IP, and started rolling out multi-factor authentication across critical systems. Moving forward, we're enhancing monitoring capabilities, improving auditing practices, and enforcing stricter controls around privileged accounts to prevent similar incidents.

### Tactical Summary

In [8]:
tactical_system_prompt = "You are a SOC assistant extracting key technical findings from a lengthy incident report."

summary_tactical = inference(make_prompt(long_incident_doc, "tactical"), tactical_system_prompt)
display(Markdown(summary_tactical))

**Summary of Incident IR-2025-0173**

*Key Findings:*

- **Initial Compromise:** Anomalous activity started on ENG-JB01 with potential pass-the-hash technique noted at 01:14 UTC.
  
- **Lateral Movement & Privilege Escalation:** Unusual logins occurred on FIN-SQL-02 using svc-backup-prod account without typical usage patterns; successful at 01:23 UTC. 

- **Execution:** Observed obfuscated PowerShell script execution fetching payload from a malicious URL at 01:25 UTC.

- **Persistence Mechanism:** Creation of a suspicious scheduled task "WindowsHealth" on FIN-SQL-02 at 01:30 UTC.

- **Discovery Activity:** Targeted network scanning within subnet 10.0.42.0/24 from FIN-SQL-02 at 01:33 UTC.

- **Data Collection:** Large files (~3.1GB total) transferred from ENG-GIT-SRV to ENG-JB01 between 02:15 - 04:30 UTC.

- **Exfiltration Attempt:** Indication of data exfiltration over HTTPS to a known APT29 IP address starting at 04:45 UTC.

*Techniques Used:* 
- T1550.002 – Credentials from Password Stores: Pass-the-Hash
- T1059.001 – Command and Scripting Interpreter: PowerShell
- T1053.005 – Server Software Component: Scheduled Task
- T1046 – Network Service Scanning
- T1041 – Exfiltration Over C2 Channel: Exfiltration Over Web Service

*Affected Systems:*
- ENG-JB01 (Engineering Jump Box)
- FIN-SQL-02 (Financial SQL Database Server)
- ENG-GIT-SRV (Source Code Repository Server)

*Compromised Data:* Sensitive source code including proprietary ML models and customer integrations.

*Root Cause Analysis:*
- Likely credential reuse or previous phishing attack leading to initial access.
- Lack of Multi-Factor Authentication (MFA) on ENG-JB01 contributed to the breach.

*Response Measures Taken:*
- Isolated affected hosts.
- Account resets performed.
- Implemented additional security measures such as LAPS for local accounts, blocked the identified C2 IP, updated EDR policies, enforced MFA, and engaged relevant stakeholders.

*Recommendations:*
- Enforce MFA on critical systems like ENG-JB01.
- Improve monitoring for scheduled tasks and PowerShell activities.
- Audit logging must cover code and data storage servers more comprehensively.
- Regularly conduct red-team exercises focusing on lateral movement detection.

*Conclusion:*
A sophisticated adversary group, possibly APT29, exploited existing vulnerabilities related to poor credential management and inadequate monitoring to gain significant access and extract valuable intellectual property. Post-incident analysis emphasizes the importance of robust security hygiene practices and continuous improvement through lessons learned cycles.