1. Introduction

This notebook demonstrates the architecture and functionality of a CVE-aware LLM triage agent built for contextual prioritization of security incidents.

- Combines semantic search over KEV, NVD, and historical triage records
- Uses an MCP tool-exposing agent to make structured, parsable LLM calls
- Maintains runtime metrics, batched execution, and persistent historical context

Why we do this:
Establishes the overall motivation and scope before diving into implementation. Helps readers orient themselves to why each later component exists.
Goal: Build a CVE-analysis agent for contextual security triage.
Setup: Jupyter notebook, Python 3.10+, Redis via Docker, .env file with OPENAI_API_KEY.


In [1]:
# Install dependencies
# Why we do this:
# Ensures all required Python packages are available in the current environment. This makes the notebook reproducible by others or on fresh systems.
%pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.


c:\Users\Dan Guilliams\OneDrive\Code Projects\MCP_Agents_RADSecurity\.venv\Scripts\python.exe: No module named pip


# 2. Start Redis (for idempotency cache)

Redis is used by the FastAPI server for request ID deduplication and idempotency protection. Running it locally ensures the pipeline behaves consistently and avoids redundant processing.

In [2]:
# Start Docker Service for Redis:
!docker run -d --name local-redis -p 6379:6379 redis:latest

docker: Error response from daemon: Conflict. The container name "/local-redis" is already in use by container "d7532e8e07eda16696991be756f16a983f49b82035ed57a286f0bc5f0aaafb22". You have to remove (or rename) that container to be able to reuse that name.

Run 'docker run --help' for more information


3. Project Structure

Display top-level tree
Why we do this:
Helps both the developer and the reader understand the overall folder and file layout of the project. This is especially useful for mapping file roles to architectural components.
```
.
├── AI Engineer Take-Home Exercise_ Gen AI Agent for Contextual CVE Analysis.pdf
├── Context Summaries
├── RADSecurity_Security_Agent-TakeHome.ipynb
├── README.md
├── README_GEMINI.md
├── __init__.py
├── __pycache__
├── archive
├── data
│   ├── dummy_agent_incident_analyses.json # Synthetic incident analyses for demo's sake
│   ├── dummy_incidents.json # Synthetic historical incidents for demo's sake
│   ├── incident_analysis.db # SQLite database
│   ├── incidents.json # Actual input data, from RAD Security
│   ├── kev.json # Retrieved data via setup/download_cve_data.py
│   ├── nvd_subset.json # Isolated data via setup/download_cve_data.py
│   ├── nvdcve-1.1-2025.json # Uzipped via setup/download_cve_data.py
│   ├── nvdcve-1.1-2025.json.zip # Retrieved data via setup/download_cve_data.py
│   └── vectorstore # FAISS indexes
│       ├── historical_incidents # Dummy historical incidents + actual incidents (added upon analysis)
│       │   ├── index.faiss
│       │   └── index.pkl
│       ├── kev # Generated KEV data index
│       │   ├── index.faiss
│       │   └── index.pkl
│       └── nvd # NVD data index
│           ├── index.faiss
│           └── index.pkl
├── dev
│   ├── incident_dashboard.py # Streamlit app to view SQLite database
│   └── query_db.py # Helper to query SQLite database
├── examples # Posterity, early experiments with MCP server usage
├── experimental # Posterity, early experiments with agents using MCP tools
├── logs
│   ├── server.log
│   └── timing_metrics.log
├── main_security_agent_server.py # FastAPI server, hostss the main agent/LLM logic
├── mcp_cve_server.py # Hosts the tools used by the agent, called as a subprocess in main_security_agent_server.py
├── pyproject.toml
├── pytest.ini
├── requirements.txt
├── run_analysis.py # Main script to run the analysis, calls main_security_agent_server.py asynchronously
├── setup
│   ├── build_dummy_analyses_index.py # Builds the dummy analyses index
│   ├── build_faiss_KEV_and_NVD_indexes.py # Builds the KEV and NVD indexes
│   ├── build_historical_incident_analyses_index.py # Builds the historical incidents index
│   ├── download_cve_data.py # Downloads the CVE data
│   └── README.md # General instructions for initial setup
├── tests # Optional, helpful in early discovery and development
│   ├── __init__.py
│   ├── __pycache__
│   ├── test_decorators.py
│   └── test_mcp_cve_server.py
├── tree_structure.txt # This file for sanity's sake
├── utils # Main utility functions that power the project
│   ├── __init__.py
│   ├── __pycache__
│   ├── datastore_utils.py
│   ├── decorators.py
│   ├── flatteners.py
│   ├── logging_utils.py
│   ├── prompt_utils.py
│   └── retrieval_utils.py
└── uv.lock
```


4. Data Ingestion

Load and inspect incidents.json
Why we do this:
Verifies the most essential input dataset is present, correctly formatted, and structured. This provides a foundation for downstream processing like semantic search and matching.


In [3]:
import json
from pathlib import Path

data_dir = Path('data')
with open(data_dir / 'incidents.json') as f:
    incidents = json.load(f)

# Display number of incidents and first entry keys
print(f"Total incidents: {len(incidents)}")
print("Fields in first incident:", list(incidents[0].keys()))

# Display first incident in pretty format for inspection
print("\nFirst incident details:")
print(json.dumps(incidents[0], indent=2))

Total incidents: 39
Fields in first incident: ['incident_id', 'timestamp', 'title', 'description', 'affected_assets', 'observed_ttps', 'indicators_of_compromise', 'initial_findings']

First incident details:
{
  "incident_id": "INC-2023-08-01-001",
  "timestamp": "2023-08-01T09:15:00Z",
  "title": "Unauthorized Access Attempt on VPN Gateway",
  "description": "Multiple failed login attempts followed by a successful connection from an unusual geographic location on the main VPN gateway.",
  "affected_assets": [
    {
      "hostname": "vpn-gateway-01",
      "ip_address": "203.0.113.1",
      "os": "Cisco IOS XE",
      "installed_software": [
        {
          "name": "Cisco IOS XE",
          "version": "17.3.4a"
        }
      ],
      "role": "VPN Gateway"
    }
  ],
  "observed_ttps": [
    {
      "framework": "MITRE ATT&CK",
      "id": "T1110",
      "name": "Brute Force"
    },
    {
      "framework": "MITRE ATT&CK",
      "id": "T1078",
      "name": "Valid Accounts"
    }

Next Steps:
- Run `setup/download_cve_data.py` to pull KEV and NVD feeds
- Preview kev.json and nvd_subset.json

In [4]:
# Example: load kev.json
# Why we do this:
# The KEV file contains the CISA Known Exploited Vulnerabilities. Ensuring this loads correctly means the semantic index will have meaningful input to match against.
with open(data_dir / 'kev.json') as f:
    kev = json.load(f)
print(f"KEV entries: {len(kev.get('vulnerabilities', []))}")

KEV entries: 1342


In [5]:
# Example: load nvd_subset.json
# Why we do this:
# The NVD subset is a filtered snapshot of a much larger feed. We load it to confirm we have a valid, scoped data source for broader CVE coverage beyond KEV.
with open(data_dir / 'nvd_subset.json') as f:
    nvd = json.load(f)
print(f"NVD subset CVEs: {len(nvd)}")

NVD subset CVEs: 3062


## 5. Index Construction & Semantic Retrieval

In this section, we build and verify the FAISS indexes for KEV, NVD, and historical data, then demonstrate semantic search.    
This shows the robustness of our retrieval layer and ensures the agent has high-signal context.

In [6]:
# 5.1 Import necessary utilities
from utils.flatteners import flatten_kev, flatten_nvd, flatten_incident
from utils.retrieval_utils import initialize_embeddings, initialize_indexes, _search
from langchain.docstore.document import Document

# 5.2 Initialize Embeddings & Indexes
# Why we do this:
# Embeddings and FAISS indexes underpin retrieval-augmentation. Initializing once at startup ensures fast, consistent access during agent execution.

initialize_embeddings()
initialize_indexes()
print("Embeddings and FAISS indexes initialized successfully.")

Embeddings and FAISS indexes initialized successfully.


### 5.3 Inspect Flattener Outputs

Why we do this:    
Flatteners convert complex JSON entries into embedding-ready text. Verifying these transformations ensures that our search corpus is well-formed.

In [7]:
# Example KEV entry flattening
sample_kev = kev.get('vulnerabilities', [])[0]
doc_kev = flatten_kev(sample_kev)
print("Flattened KEV document preview:")
print(doc_kev.page_content[:200], "...")
print(f"\n{'-'*50}\n")

# Example NVD entry flattening
sample_nvd = list(nvd.values())[0]
doc_nvd = flatten_nvd(sample_nvd)
print("Flattened NVD document preview:")
print(doc_nvd.page_content[:200], "...")
print(f"\n{'-'*50}\n")

# Example Incident flattening
doc_inc = Document(page_content=flatten_incident(incidents[0]), metadata={"incident_id": incidents[0]["incident_id"]})
print("Flattened Incident document preview:")
print(doc_inc.page_content[:200], "...")

Flattened KEV document preview:
CVE CVE-2025-32756
Fortinet
Multiple Products
Fortinet Multiple Products Stack-Based Buffer Overflow Vulnerability
Fortinet FortiFone, FortiVoice, FortiNDR and FortiMail contain a stack-based overflow ...

--------------------------------------------------

Flattened NVD document preview:
CVE CVE-2025-0020
Violation of Secure Design Principles, Hidden Functionality, Incorrect Provision of Specified Functionality vulnerability in ArcGIS (Authentication) allows Privilege Abuse, Manipulat ...

--------------------------------------------------

Flattened Incident document preview:
Unauthorized Access Attempt on VPN Gateway
Multiple failed login attempts followed by a successful connection from an unusual geographic location on the main VPN gateway.
Credential stuffing or brute  ...


### 5.4 Perform a Semantic Search

Why we do this:    
A simple query over KEV and NVD indexes demonstrates that our retrieval layer returns relevant results. This forms the context the agent uses for decision-making.


In [8]:
# We import the FAISS indexes that we initialized earlier, and we can now see a simple result from a sample query
from utils.retrieval_utils import KEV_FAISS, NVD_FAISS

query_text = incidents[0]['title']
print(f"Search query: {query_text}")

kev_results = _search(KEV_FAISS, query_text, k=3)
print("Top 3 KEV matches:")
for r in kev_results:
    print(f"- {r['cve_id']} (score: {r['variance']:.3f})")

nvd_results = _search(NVD_FAISS, query_text, k=3)
print("Top 3 NVD matches:")
for r in nvd_results:
    print(f"- {r['cve_id']} (score: {r['variance']:.3f})")

Search query: Unauthorized Access Attempt on VPN Gateway
Top 3 KEV matches:


KeyError: 'similarity'

# Section 4 – Historical Learning & Retrieval

Objective: Demonstrate how the system bootstraps, verifies, and dynamically updates the historical incident analysis FAISS index, enabling context normalization over time.

### 4.1 Bootstrapping Historical Index

**Goal:** Populate `INCIDENT_HISTORY_FAISS` from existing incidents (e.g., `dummy_incidents.json`).    
**Why we do this:** Ensures the agent has an initial set of incident embeddings for similarity search, simulating a production environment with historical data.

In [None]:
# Here we see an example of building the historical index via the CLI.  A shell script can simplify the entire setup of downloading initial JSONs, 
# building the KEV and NVD indexes, and building this index, but it can be worth seeing how the basic logic works.
!python setup/build_historical_incident_index.py

### 4.2 Verifying Historical Embeddings

**Goal:** Confirm that semantic search over historical analyses returns relevant past incidents.    
**Why we do this:** Validates the similarity search and ensures metadata (risk scores, summaries) is correctly embedded.

In [None]:
from utils.retrieval_utils import batch_get_historical_context

# Get list of first 5 incident IDs
incident_ids = [incident['incident_id'] for incident in incidents[:5]]
print(f"Incident IDs: {incident_ids}")

# Get historical context for these IDs
hist_hits = batch_get_historical_context(incident_ids)
print("\nHistorical context:")
print(json.dumps(hist_hits, indent=2))

### 4.3 Dynamic Addition of New Analyses

**Goal:** Show how a newly analyzed incident is added to the historical index on-the-fly.    
**Why we do this:** Demonstrates the system's continual learning capability and idempotency safeguards.

In [None]:
import random
from time import sleep
from utils.retrieval_utils import add_incident_to_history, INCIDENT_HISTORY_FAISS, search_similar_incidents

# We will pick an example incident entry and example analysis (one actually generated by the agent in am previous run)
# However, we will change the incident_id for each to ensure it's actually encoded into the index
random_incident_id = f"INC-{random.randint(1000,9999)}-{random.randint(10,99)}-{random.randint(10,99)}-0{random.randint(10,99)}f"

example_incident = {
    "incident_id": f"{random_incident_id}",
    "timestamp": "2023-08-13T11:00:00Z",
    "title": "Subdomain Takeover Attempt",
    "description": "Threat intelligence alert indicates a dangling DNS record pointing to a service that is no longer active, potentially allowing subdomain takeover.",
    "affected_assets": [
      {
        "hostname": "old-blog.example.com",
        "ip_address": "N/A",
        "os": "N/A",
        "installed_software": [],
        "role": "Legacy DNS Entry"
      }
    ],
    "observed_ttps": [
      {
        "framework": "MITRE ATT&CK",
        "id": "T1584",
        "name": "Compromise Infrastructure"
      },
      {
        "framework": "MITRE ATT&CK",
        "id": "T1584.001",
        "name": "Compromise Infrastructure: DNS"
      }
    ],
    "indicators_of_compromise": [
      {
        "type": "dns_record",
        "value": "CNAME old-blog.example.com -> inactive-service.cloudprovider.com",
        "context": "Observed DNS record"
      },
      {
        "type": "threat_intel_alert",
        "value": "Dangling DNS record detected",
        "context": "Threat intel alert"
      }
    ],
    "initial_findings": "Potential subdomain takeover risk due to dangling DNS record."
  }

example_analysis = {
    "incident_id": f"{random_incident_id}",
    "incident_summary": "Subdomain Takeover Attempt",
    "cve_ids": [
      {
        "cve_id": "CVE-2023-41265",
        "cve_summary": "HTTP Tunneling Vulnerability in Qlik Sense which could be exploited if a subdomain is compromised.",
        "cve_relevance": 1.93,
        "cve_risk_level": 0.8
      }
    ],
    "incident_risk_level": 0.75,
    "incident_risk_level_explanation": "Dangling DNS records pose a critical risk for subdomain takeover. The related CVE suggests a known vulnerability in HTTP tunneling that could be leveraged in this context."
  }

print(f"Adding incident {example_incident['incident_id']} to historical index...")
await add_incident_to_history(example_incident, example_analysis)

# We now expect to see entries for this same incident that we just stored to be returned in the search results
similar_incidents = search_similar_incidents(example_incident)
# The incident_ids will likely differ due to running this script, but we can check a very specific field such as the incident_risk_level_explanation to ensure valid results
print(f"Same incident: {example_analysis['incident_risk_level_explanation'] == similar_incidents[0]['incident_risk_level_explanation']}")
print(f"{'-'*50}")
# Print the results 
print(f"Example Incident:\n\tincident_id: {example_incident['incident_id']}\n\tincident_summary: {example_analysis['incident_summary']}\n\nSimilar incidents: {json.dumps(similar_incidents[:3], indent=2)}")


### 4.4 Retrieval-Augmented Prompt Enhancement

**Goal:** Demonstrate how `batch_get_historical_context` fetches structured past analyses for prompt injection.    
**Why we do this:** Shows the exact payload the agent will receive for historical context, ensuring transparency.

In [None]:
from utils.retrieval_utils import batch_get_historical_context

batch_ids = [incidents[0]['incident_id'], incidents[1]['incident_id']]
historical_context = batch_get_historical_context(batch_ids)

print("Historical context for batch:")
import json
print(json.dumps(historical_context, indent=2))

In [None]:

# Next Steps:
# - Section 5: Prompt Construction & Agent Execution
# - Section 6: Persistence & Dashboard Visualization

# %% [markdown]
# Section 5 – Prompt Construction & Agent Execution
# 
# Objective: Show how we assemble the complete LLM prompt with context and then invoke the MCP agent to perform the analysis.

# %% [markdown]
# 5.1 Generate Prompt
# 
# **Goal:** Use `generate_prompt` to combine incident data, KEV/NVD matches, and historical context into a structured System+Human message sequence.
# **Why we do this:** Demonstrates how we package all pre-fetched context into a single, token-efficient prompt for the agent.

# %%
from utils.prompt_utils import generate_prompt, parser
from utils.retrieval_utils import batch_match_incident_to_cves, batch_get_historical_context

# Prepare batch FAISS and historical results for a sample
batch_results = batch_match_incident_to_cves(start_index=0, batch_size=2, top_k=3)
historical_results = batch_get_historical_context(incident_ids=[res['incident_id'] for res in batch_results['results']], top_k=2)

# Generate the prompt messages
prompt_messages = generate_prompt(
    query="Analyze these incidents and output JSON per Pydantic schema.",
    batch_faiss_results=batch_results,
    historical_faiss_results=historical_results
)

# Preview the system and human messages
print("System message preview:")
print(prompt_messages[0].content[:500])
print("Human message preview:")
print(prompt_messages[1].content)

# %% [markdown]
# 5.2 Instantiate and Run Agent
# 
# **Goal:** Create the ReAct agent with loaded MCP tools and invoke it on our prompt messages.
# **Why we do this:** Validates end-to-end integration: tool server, prompt, agent orchestration, and parsing of results.

# %%
import asyncio
from langchain_mcp_adapters.tools import load_mcp_tools
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
from mcp import ClientSession, stdio_client, StdioServerParameters
from utils.logging_utils import setup_logger

# Setup server parameters and model
server_params = StdioServerParameters(command="python", args=["mcp_cve_server.py"])
model = ChatOpenAI(model="gpt-4o-mini", openai_api_key=os.getenv("OPENAI_API_KEY"))

async def run_agent():
    async with stdio_client(server_params) as (r, w):
        async with ClientSession(r, w, read_timeout_seconds=15) as session:
            await session.initialize()
            tools = await load_mcp_tools(session)
            agent = create_react_agent(model, tools, name="CVE_Agent")
            final_msg, full_response = await agent.ainvoke({"messages": prompt_messages})
            # Parse using Pydantic
            analysis = parser.parse(final_msg.content)
            print("Analysis result:")
            print(analysis.json(indent=2))

# Run the agent (in notebook)
await run_agent()

# %% [markdown]
# 5.3 Inspect Agent Metadata
# 
# **Goal:** Examine token counts and tool-call traces from the agent's full response.
# **Why we do this:** Highlights observability and cost metrics captured during execution.

# %%
# Already captured in `full_response`; we can print key metadata:
print(f"Tokens - input: {full_response['usage_metadata']['input_tokens']}, output: {full_response['usage_metadata']['output_tokens']}, total: {full_response['usage_metadata']['total_tokens']}")
print("Tools used:")
for msg in full_response['messages']:
    if hasattr(msg, 'additional_kwargs') and msg.additional_kwargs.get('tool_calls'):
        for call in msg.additional_kwargs['tool_calls']:
            print(f"- {call['function']['name']}")

# Next: Section 6 – Persistence & Dashboard Visualization
# - Section 5: Prompt Construction & Agent Execution
# - Section 6: Persistence & Dashboard Visualization


In [None]:
 = {
    'incident_id': new_incident['incident_id'],
    'incident_risk_level': 0.75,
    'incident_risk_level_explanation': 'Simulated risk assessment for demo purposes',
    'cve_ids': []
}
"""
  {
    "incident_id": "INC-2023-08-13-038",
    "incident_summary": "Subdomain Takeover Attempt",
    "cve_ids": [
      {
        "cve_id": "CVE-2023-41265",
        "cve_summary": "HTTP Tunneling Vulnerability in Qlik Sense which could be exploited if a subdomain is compromised.",
        "cve_relevance": 1.93,
        "cve_risk_level": 0.8
      }
    ],
    "incident_risk_level": 0.75,
    "incident_risk_level_explanation": "Dangling DNS records pose a critical risk for subdomain takeover. The related CVE suggests a known vulnerability in HTTP tunneling that could be leveraged in this context."
  }
  """
print(f"Adding incident {new_incident['incident_id']} to historical index...")
add_incident_to_history(new_incident, new_analysis)
print("Addition complete. Verify with another search:")
hits_after = _search(INCIDENT_HISTORY_FAISS, new_incident['title'], k=1)
print(f"Top match after addition: {hits_after[0]['incident_id']} (should be {new_incident['incident_id']})")

# %% [markdown]
# 4.4 Retrieval-Augmented Prompt Enhancement
# 
# **Goal:** Demonstrate how `batch_get_historical_context` fetches structured past analyses for prompt injection.
# **Why we do this:** Shows the exact payload the agent will receive for historical context, ensuring transparency.

# %%
from utils.retrieval_utils import batch_get_historical_context

batch_ids = [incidents[0]['incident_id'], new_incident['incident_id']]
historical_context = batch_get_historical_context(batch_ids)

print("Historical context for batch:")
import json
print(json.dumps(historical_context, indent=2))
# %% [markdown]
# Next Steps:
# - Section 5: Prompt Construction & Agent Execution
# - Section 6: Persistence & Dashboard Visualization

# %% [markdown]
# Section 5 – Prompt Construction & Agent Execution
# 
# Objective: Show how we assemble the complete LLM prompt with context and then invoke the MCP agent to perform the analysis.

# %% [markdown]
# 5.1 Generate Prompt
# 
# **Goal:** Use `generate_prompt` to combine incident data, KEV/NVD matches, and historical context into a structured System+Human message sequence.
# **Why we do this:** Demonstrates how we package all pre-fetched context into a single, token-efficient prompt for the agent.

# %%
from utils.prompt_utils import generate_prompt, parser
from utils.retrieval_utils import batch_match_incident_to_cves, batch_get_historical_context

# Prepare batch FAISS and historical results for a sample
batch_results = batch_match_incident_to_cves(start_index=0, batch_size=2, top_k=3)
historical_results = batch_get_historical_context(incident_ids=[res['incident_id'] for res in batch_results['results']], top_k=2)

# Generate the prompt messages
prompt_messages = generate_prompt(
    query="Analyze these incidents and output JSON per Pydantic schema.",
    batch_faiss_results=batch_results,
    historical_faiss_results=historical_results
)

# Preview the system and human messages
print("System message preview:")
print(prompt_messages[0].content[:500])
print("Human message preview:")
print(prompt_messages[1].content)

# %% [markdown]
# 5.2 Instantiate and Run Agent
# 
# **Goal:** Create the ReAct agent with loaded MCP tools and invoke it on our prompt messages.
# **Why we do this:** Validates end-to-end integration: tool server, prompt, agent orchestration, and parsing of results.

# %%
import asyncio
from langchain_mcp_adapters.tools import load_mcp_tools
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
from mcp import ClientSession, stdio_client, StdioServerParameters
from utils.logging_utils import setup_logger

# Setup server parameters and model
server_params = StdioServerParameters(command="python", args=["mcp_cve_server.py"])
model = ChatOpenAI(model="gpt-4o-mini", openai_api_key=os.getenv("OPENAI_API_KEY"))

async def run_agent():
    async with stdio_client(server_params) as (r, w):
        async with ClientSession(r, w, read_timeout_seconds=15) as session:
            await session.initialize()
            tools = await load_mcp_tools(session)
            agent = create_react_agent(model, tools, name="CVE_Agent")
            final_msg, full_response = await agent.ainvoke({"messages": prompt_messages})
            # Parse using Pydantic
            analysis = parser.parse(final_msg.content)
            print("Analysis result:")
            print(analysis.json(indent=2))

# Run the agent (in notebook)
await run_agent()

# %% [markdown]
# 5.3 Inspect Agent Metadata
# 
# **Goal:** Examine token counts and tool-call traces from the agent's full response.
# **Why we do this:** Highlights observability and cost metrics captured during execution.

# %%
# Already captured in `full_response`; we can print key metadata:
print(f"Tokens - input: {full_response['usage_metadata']['input_tokens']}, output: {full_response['usage_metadata']['output_tokens']}, total: {full_response['usage_metadata']['total_tokens']}")
print("Tools used:")
for msg in full_response['messages']:
    if hasattr(msg, 'additional_kwargs') and msg.additional_kwargs.get('tool_calls'):
        for call in msg.additional_kwargs['tool_calls']:
            print(f"- {call['function']['name']}")

# Next: Section 6 – Persistence & Dashboard Visualization
# - Section 5: Prompt Construction & Agent Execution
# - Section 6: Persistence & Dashboard Visualization
