<a href="https://colab.research.google.com/github/FatmaSayah20/URL-Security-Report-Generator-VT-LLM-/blob/main/url__report_generator.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# URL Security Report Generator

Description:
This script integrates VirusTotal API and a Large Language Model (LLM)
to analyze URLs and generate detailed security reports.

- VirusTotal: Used for scanning URLs to detect potential threats such as malware, phishing, and suspicious activity.
- LLM Model: Processes and summarizes the VirusTotal analysis results into a comprehensive, easy-to-understand report.

Use case:
Ideal for cybersecurity applications requiring automated URL threat detection and insightful report generation.



In [None]:
! pip install langchain==0.3.13
! pip install langchain-core==0.3.63
! pip install langchain-openai==0.2.14
! pip install requests




## 1. Import Required Libraries


In [None]:
from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent, AgentExecutor
from langchain.tools import tool
from langchain_core.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory
import requests
import subprocess
import time
import base64

## 2. Environment Setup

In [None]:
model_name = "gemini-2.5-flash"
base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
VT_API_KEY = ""  # <-- add your VirusTotal API key here
GEMINI_API_KEY = ''  # <-- add your Gemini API key here


## 3. Initialize the LLM

In [None]:
llm = ChatOpenAI(
    api_key=GEMINI_API_KEY,
    base_url=base_url,
	model = model_name
)

## 4. Define Tool


In [None]:

@tool
def check_url_virustotal(url: str):
    """
    Check a URL using VirusTotal and return analysis stats.
    """
    headers = {
        "x-apikey": VT_API_KEY
    }

    # Submit URL for scanning
    submit_response = requests.post(
        "https://www.virustotal.com/api/v3/urls",
        headers=headers,
        data={"url": url}
    )

    if submit_response.status_code != 200:
        return {"error": "Failed to submit URL", "details": submit_response.text}

    analysis_id = submit_response.json()["data"]["id"]

    #  Wait for analysis
    time.sleep(5)

    # Get analysis result
    analysis_response = requests.get(
        f"https://www.virustotal.com/api/v3/analyses/{analysis_id}",
        headers=headers
    )

    if analysis_response.status_code != 200:
        return {"error": "Failed to fetch analysis", "details": analysis_response.text}

    result = analysis_response.json()

    #  Extract useful stats
    stats = result["data"]["attributes"]["stats"]

    return {
        "url": url,
        "harmless": stats.get("harmless", 0),
        "malicious": stats.get("malicious", 0),
        "suspicious": stats.get("suspicious", 0),
        "undetected": stats.get("undetected", 0)
    }


In [None]:
check_url_virustotal.invoke("https://canarytokens.com/terms/images/about/tnlqkvs3v6ydbh4mlh0hhtao6/index.html") #Don't click on the URL

{'url': 'https://canarytokens.com/terms/images/about/tnlqkvs3v6ydbh4mlh0hhtao6/index.html',
 'harmless': 64,
 'malicious': 6,
 'suspicious': 0,
 'undetected': 27}

## 5. Build the Agent
Combine the LLM and tools to create the agent.

In [None]:
prompt = """Answer the following questions as best you can. You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}"""

In [None]:
tools = tools=[check_url_virustotal]

In [None]:
agent = create_react_agent(llm, tools = tools,
                           prompt=PromptTemplate.from_template(prompt))

## 6. Run the Agent

In [None]:
ex_agent = AgentExecutor(
    agent=agent,
    tools=tools,
    memory=ConversationBufferMemory(),
    verbose=True

)

In [None]:
response = ex_agent.invoke({
    "input": "Analyze this URL: https://canarytokens.com/terms/images/about/tnlqkvs3v6ydbh4mlh0hhtao6/index.html and generate a structured security report" #PUT_URL_HERE
})
 #Don't click on the URL



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mAction: check_url_virustotal
Action Input: https://canarytokens.com/terms/images/about/tnlqkvs3v6ydbh4mlh0hhtao6/index.html[0m[36;1m[1;3m{'url': 'https://canarytokens.com/terms/images/about/tnlqkvs3v6ydbh4mlh0hhtao6/index.html', 'harmless': 64, 'malicious': 6, 'suspicious': 0, 'undetected': 27}[0m[32;1m[1;3mFinal Answer:
**Structured Security Report for URL Analysis**

**1. URL Analyzed:**
`https://canarytokens.com/terms/images/about/tnlqkvs3v6ydbh4mlh0hhtao6/index.html`

**2. Analysis Tool:**
VirusTotal

**3. Overall Verdict:**
**Potentially Malicious/Suspicious**

**4. Detection Statistics:**
*   **Malicious Detections:** 6 engines
*   **Harmless Detections:** 64 engines
*   **Suspicious Detections:** 0 engines
*   **Undetected:** 27 engines
*   **Total Engines Scanned:** 97

**5. Key Findings/Observations:**
*   **Multiple Malicious Detections:** Despite a large number of harmless detections, 6 security engines flagg

In [None]:
print(response['output'])

**Structured Security Report for URL Analysis**

**1. URL Analyzed:**
`https://canarytokens.com/terms/images/about/tnlqkvs3v6ydbh4mlh0hhtao6/index.html`

**2. Analysis Tool:**
VirusTotal

**3. Overall Verdict:**
**Potentially Malicious/Suspicious**

**4. Detection Statistics:**
*   **Malicious Detections:** 6 engines
*   **Harmless Detections:** 64 engines
*   **Suspicious Detections:** 0 engines
*   **Undetected:** 27 engines
*   **Total Engines Scanned:** 97

**5. Key Findings/Observations:**
*   **Multiple Malicious Detections:** Despite a large number of harmless detections, 6 security engines flagged this URL as malicious. This indicates a non-negligible risk.
*   **Canarytokens Context:** The domain `canarytokens.com` is associated with Canarytokens, a service designed to create "tripwires" or "honeypots" to detect unauthorized access or activity. It is possible that some security engines might flag URLs generated by this service due to their nature as potential traps or indicato