<a href="https://colab.research.google.com/github/KaifAhmad1/code-test/blob/main/Enhanced_Cyber_Security_Copilot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Problem Statement

##### Task
Develop a co-pilot for threat researchers, security analysts, and professionals that addresses the limitations of current AI solutions like ChatGPT and Perplexity.

##### Current Challenges
1. **Generic Data**: Existing AI solutions provide generic information that lacks specificity.
2. **Context Understanding**: These solutions fail to understand and maintain context.
3. **Limited Information**: The data sources are often limited and not comprehensive.
4. **Single Source Dependency**: Relying on a single source of information reduces reliability and accuracy.
5. **Inadequate AI Models**: Current models do not meet the specialized needs of cybersecurity professionals.

##### Requirement
Create a chatbot capable of collecting and curating data from multiple sources, starting with search engines, and expanding to website crawling and Twitter scraping.

###### Technical Specifications
- **No Hallucinations**: Ensure the chatbot provides accurate and reliable information.
- **RAG (Retrieval-Augmented Generation)**: Use RAG to determine which connectors to use based on user inputs.
- **Query Chunking and Distribution**: Optimize the process of breaking down queries and distributing them across different sources.
- **Data Curation Steps**:
  1. Collect links from approximately 50 sources.
  2. Aggregate data from websites and Twitter.
  3. Curate data using a knowledge graph to find relationships and generate responses.
- **Chatbot Capabilities**: Answer queries such as:
  - "List all details on {{BFSI}} security incidents in {{India}}."
  - "List all ransomware attacks targeting the healthcare industry in {{last 7 days/last 3 months/last week/last month}}."
  - "Provide recent incidents related to Lockbit Ransomware gang / BlackBasta Ransomware."

##### Goal
Develop a data collector that integrates multiple specific sources to enrich the knowledge base, enabling the model to better understand context and deliver accurate results. The solution should be modular, allowing customization and configuration of sources.

##### Summary
The goal is to build an advanced, modular chatbot for cybersecurity professionals that overcomes the limitations of existing AI solutions by integrating multiple data sources and ensuring context-aware, accurate responses. The chatbot will utilize state-of-the-art techniques like RAG and knowledge graphs to provide comprehensive, curated information from diverse sources.


**Install Dependencies**

In [1]:
# Install Dependencies
%pip install -q apify-client langchain langchain-community langchain-groq networkx pyvis spacy transformers pandas
%pip install -q sentence-transformers requests beautifulsoup4 ratelimit langgraph pyLDAvis faiss-cpu

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.6/70.6 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m990.3/990.3 kB[0m [31m21.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m57.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m756.0/756.0 kB[0m [31m35.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m103.5/103.5 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

**Import Libraries and Set Up Logging**

In [2]:
import os
from datetime import datetime, timedelta
from typing import List, Dict, Any, TypedDict, Annotated
import logging
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
from concurrent.futures import ThreadPoolExecutor, as_completed
from ratelimit import limits, sleep_and_retry
from bs4 import BeautifulSoup
from apify_client import ApifyClient
from langchain.embeddings import HuggingFaceBgeEmbeddings
from langchain.vectorstores import FAISS
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
from langchain_groq import ChatGroq
import json
from langchain.agents import Tool
from langchain.memory import ConversationBufferMemory
from langchain.callbacks import get_openai_callback
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolExecutor
from textblob import TextBlob

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

**Constants and API Keys**

In [3]:
# Constants and API Keys
APIFY_API_KEY = "apify_api_yUkcz99gMX1pwNckRi7EyXLwhVTd0j3m4Mtt"
NEWS_API_KEY = os.getenv("c50f733b00e34575a7c203c38cd97391")
GROQ_API_KEY = "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"
WEBSITES = [
    "https://www.cisa.gov/uscert/ncas/alerts",
    "https://attack.mitre.org/",
    "https://www.darkreading.com/",
    "https://threatpost.com/",
    "https://krebsonsecurity.com/",
    "https://www.bleepingcomputer.com/",
    "https://www.zdnet.com/topic/security/",
    "https://www.securityweek.com/",
    "https://www.sans.org/newsletters/newsbites/",
    "https://www.cyberscoop.com/",
    "https://www.csoonline.com/",
    "https://www.infosecurity-magazine.com/",
    "https://www.wired.com/category/security/",
    "https://www.schneier.com/",
    "https://www.theregister.com/security/",
    "https://thehackernews.com/",
    "https://www.cyberdefensemagazine.com/",
    "https://www.fireeye.com/blog.html",
    "https://unit42.paloaltonetworks.com/",
    "https://www.microsoft.com/security/blog/",
    "https://www.us-cert.gov/ncas/current-activity",
    "https://nakedsecurity.sophos.com/",
    "https://www.recordedfuture.com/blog/",
    "https://www.cybersecurity-insiders.com/",
    "https://www.malwarebytes.com/blog/",
]
RSS_FEEDS = [
    "https://www.cisa.gov/uscert/ncas/alerts.xml",
    "https://krebsonsecurity.com/feed/",
    "https://threatpost.com/feed/",
    "https://www.darkreading.com/rss_simple.asp"
]

**Initialize Apify Client and Configure Requests Session**

In [4]:
# Initialize Apify client
apify_client = ApifyClient(APIFY_API_KEY)

# Configure requests session with retries and timeouts
session = requests.Session()
retries = Retry(total=5, backoff_factor=0.1, status_forcelist=[429, 500, 502, 503, 504])
session.mount('https://', HTTPAdapter(max_retries=retries))
session.mount('http://', HTTPAdapter(max_retries=retries))

**Rate-Limited GET Request**

In [5]:
# Rate-limited GET request
@sleep_and_retry
@limits(calls=15, period=1)  # 5 calls per second
def rate_limited_get(url: str, **kwargs) -> requests.Response:
    return session.get(url, timeout=10, **kwargs)

**Website Scraping Functions and Fetch Data Functions**

In [6]:
# Website Scraping Functions and Fetch Data Functions
def scrape_website(url: str) -> Dict[str, Any]:
    """Scrape a website using BeautifulSoup."""
    try:
        response = rate_limited_get(url)
        if response.status_code == 403:
            logger.warning(f"Access forbidden for {url}")
            return {"url": url, "text": "", "timestamp": datetime.now().isoformat(), "error": "403 Forbidden"}
        elif response.status_code == 404:
            logger.warning(f"Not found for {url}")
            return {"url": url, "text": "", "timestamp": datetime.now().isoformat(), "error": "404 Not Found"}
        response.raise_for_status()
        soup = BeautifulSoup(response.content, 'html.parser')
        text = soup.get_text(separator=' ', strip=True)
        return {"url": url, "text": text, "timestamp": datetime.now().isoformat()}
    except Exception as e:
        logger.error(f"Error scraping {url}: {str(e)}")
        return {"url": url, "text": "", "timestamp": datetime.now().isoformat(), "error": str(e)}

def scrape_websites(urls: List[str]) -> List[Dict[str, Any]]:
    """Scrape multiple websites concurrently."""
    logger.info(f"Scraping {len(urls)} websites...")
    with ThreadPoolExecutor(max_workers=10) as executor:
        future_to_url = {executor.submit(scrape_website, url): url for url in urls}
        results = [future.result() for future in as_completed(future_to_url)]
    logger.info(f"Successfully scraped {len(results)} pages.")
    return results

def fetch_tweets(query: str, max_tweets: int = 100) -> List[Dict[str, Any]]:
    """Fetch tweets using Apify's Twitter scraper."""
    logger.info(f"Fetching tweets for query: {query}")
    actor_input = {"searchTerms": [query], "maxTweets": max_tweets, "languageCode": "en"}
    try:
        run = apify_client.actor("apidojo/tweet-scraper").call(run_input=actor_input)
        dataset_id = run["defaultDatasetId"]
        items = apify_client.dataset(dataset_id).list_items().items
        logger.info(f"Fetched {len(items)} tweets.")
        return items
    except Exception as e:
        logger.error(f"Error fetching tweets: {str(e)}")
        return []

def fetch_news(query: str, max_results: int = 50) -> List[Dict[str, Any]]:
    """Fetch news articles using NewsAPI."""
    logger.info(f"Fetching news for query: {query}")
    url = "https://newsapi.org/v2/everything"
    params = {"q": query, "language": "en", "pageSize": max_results, "apiKey": NEWS_API_KEY, "sortBy": "publishedAt"}
    try:
        response = rate_limited_get(url, params=params)
        if response.status_code == 401:
            logger.warning("Unauthorized access to NewsAPI")
            return []
        response.raise_for_status()
        articles = response.json().get("articles", [])
        logger.info(f"Fetched {len(articles)} news articles.")
        return articles
    except Exception as e:
        logger.error(f"Error fetching news: {str(e)}")
        return []

def fetch_cve_data() -> List[Dict[str, Any]]:
    """Fetch CVE data from CIRCL API."""
    logger.info("Fetching CVE data")
    url = "https://cve.circl.lu/api/last"
    try:
        response = rate_limited_get(url, timeout=30)  # Increase the timeout
        response.raise_for_status()
        cve_items = response.json()
        logger.info(f"Fetched {len(cve_items)} CVE items.")
        return cve_items
    except Exception as e:
        logger.error(f"Error fetching CVE data: {str(e)}")
        return []

def fetch_rss_feeds(urls: List[str]) -> List[Dict[str, Any]]:
    """Fetch RSS feeds using Apify's RSS scraper."""
    logger.info(f"Fetching RSS feeds from {len(urls)} URLs")
    run_input = {"startUrls": urls, "maxItems": 50}
    try:
        run = apify_client.actor("jupri/rss-xml-scraper").call(run_input=run_input)
        dataset_id = run["defaultDatasetId"]
        items = apify_client.dataset(dataset_id).list_items().items
        logger.info(f"Fetched {len(items)} RSS feed items.")
        return items
    except Exception as e:
        logger.error(f"Error fetching RSS feeds: {str(e)}")
        return []

# Test data collection
scraped_data = scrape_websites(WEBSITES)
tweets = fetch_tweets("cybersecurity")
news = fetch_news("cybersecurity")
cve_data = fetch_cve_data()
rss_feeds = fetch_rss_feeds(RSS_FEEDS)
logger.info(f"Scraped data: {scraped_data}")
logger.info(f"Tweets: {tweets}")
logger.info(f"News: {news}")
logger.info(f"CVE data: {cve_data}")
logger.info(f"RSS feeds: {rss_feeds}")

ERROR:__main__:Error fetching CVE data: requests.sessions.Session.get() got multiple values for keyword argument 'timeout'


**Curate Data Function**

In [7]:
# Curate Data Function
def curate_data(website_data, tweets, news, cve_data, rss_feeds):
    """Curate data from various sources."""
    curated_data = []

    for page in website_data:
        curated_data.append({
            "source": "Website",
            "url": page.get("url"),
            "text": page.get("text"),
            "timestamp": page.get("timestamp")
        })

    for tweet in tweets:
        curated_data.append({
            "source": "Twitter",
            "text": tweet.get("text"),
            "user": tweet.get("user"),
            "timestamp": tweet.get("timestamp")
        })

    for article in news:
        curated_data.append({
            "source": "News",
            "url": article.get("url"),
            "title": article.get("title"),
            "description": article.get("description"),
            "timestamp": article.get("publishedAt")
        })

    for cve in cve_data:
        cve_meta = cve.get("cve", {}).get("CVE_data_meta", {})
        description_data = cve.get("cve", {}).get("description", {}).get("description_data", [{}])
        curated_data.append({
            "source": "CVE",
            "cve_id": cve_meta.get("ID"),
            "description": description_data[0].get("value"),
            "timestamp": cve.get("publishedDate")
        })

    for feed in rss_feeds:
        curated_data.append({
            "source": "RSS",
            "url": feed.get("link"),
            "title": feed.get("title"),
            "description": feed.get("description"),
            "timestamp": feed.get("pubDate")
        })

    return curated_data

# Test data curation
curated_data = curate_data(scraped_data, tweets, news, cve_data, rss_feeds)
logger.info(f"Curated data: {curated_data}")

**Process Scraped Data**

In [8]:
# Process Scraped Data
def preprocess_item(item: Dict[str, Any]) -> Dict[str, Any]:
    """Preprocess a single data item."""
    processed_item = {
        "source": item["source"],
        "content": "",
        "timestamp": item.get("timestamp", ""),
        "keywords": [],
        "sentiment": 0
    }

    if item["source"] == "Website":
        processed_item["content"] = item.get("text", "")[:500]  # Truncate to first 500 characters
    elif item["source"] == "Twitter":
        processed_item["content"] = item.get("text", "")
    elif item["source"] in ["News", "RSS"]:
        processed_item["content"] = f"{item.get('title', '')} - {item.get('description', '')}"
    elif item["source"] == "CVE":
        processed_item["content"] = f"{item.get('cve_id', '')} - {item.get('description', '')}"

    processed_item["keywords"] = extract_keywords(processed_item["content"])
    processed_item["sentiment"] = perform_sentiment_analysis(processed_item["content"])

    return processed_item

def extract_keywords(text: str, top_n: int = 5) -> List[str]:
    """Extract top keywords from text."""
    words = text.lower().split()
    word_freq = {}
    for word in words:
        if len(word) > 3:  # Ignore short words
            word_freq[word] = word_freq.get(word, 0) + 1
    return sorted(word_freq, key=word_freq.get, reverse=True)[:top_n]

def perform_sentiment_analysis(text: str) -> float:
    """Perform sentiment analysis on text."""
    return TextBlob(text).sentiment.polarity

def process_curated_data(curated_data: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    """Process all curated data items."""
    return [preprocess_item(item) for item in curated_data]

# Test data processing
processed_texts = process_curated_data(curated_data)
logger.info(f"Processed texts: {processed_texts}")

In [9]:
# Store Data in Vector Database
from typing import List, Dict, Any
from langchain.docstore.document import Document
from langchain.vectorstores.faiss import FAISS
from langchain.embeddings.huggingface import HuggingFaceBgeEmbeddings
import logging

# Setup logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)

def store_in_vector_db(processed_data: List[Dict[str, Any]], file_path: str = "vector_store") -> None:
    """Store processed data in a vector database."""
    embeddings = HuggingFaceBgeEmbeddings(
        model_name="BAAI/bge-small-en",
        model_kwargs={"device": "cpu"},
        encode_kwargs={"normalize_embeddings": True}
    )
    # Convert the processed data to Document objects
    documents = [
        Document(page_content=item["content"], metadata=item)
        for item in processed_data
    ]

    vector_store = FAISS.from_documents(documents, embeddings)
    vector_store.save_local(file_path)
    logger.info(f"Vector store saved at {file_path}")

# Test vector store creation
store_in_vector_db(processed_texts)

# Load the vector store with the embeddings
embeddings = HuggingFaceBgeEmbeddings(
    model_name="BAAI/bge-small-en",
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": True}
)
vector_store = FAISS.load_local("vector_store", embeddings, allow_dangerous_deserialization=True)
logger.info(f"Vector store loaded")

  from tqdm.autonotebook import tqdm, trange
INFO:numexpr.utils:NumExpr defaulting to 2 threads.
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: BAAI/bge-small-en
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/90.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/684 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

INFO:faiss.loader:Loading faiss with AVX2 support.
INFO:faiss.loader:Successfully loaded faiss with AVX2 support.
INFO:root:Vector store saved at vector_store
INFO:sentence_transformers.SentenceTransformer:Load pretrained SentenceTransformer: BAAI/bge-small-en
INFO:root:Vector store loaded


In [10]:
import json
import logging
from typing import List, Dict, Any
from textblob import TextBlob
from langchain.vectorstores import FAISS
from langchain.memory import ConversationBufferMemory
from langchain.callbacks import get_openai_callback
from langchain.tools import Tool
from langgraph.graph import START, END, StateGraph

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def analyze_cve_severity(cve_description: str) -> str:
    """Analyzes the severity of a CVE based on its description."""
    severity_keywords = ["critical", "high", "medium", "low"]
    severity = "unknown"
    for keyword in severity_keywords:
        if keyword in cve_description.lower():
            severity = keyword
            break
    return f"The CVE severity is {severity}."

def extract_iocs(text: str) -> List[str]:
    """Extracts potential Indicators of Compromise (IOCs) from text."""
    # Implement actual IOC extraction logic here
    iocs = ["example.com", "192.168.1.1"]  # Placeholder
    return [ioc for ioc in iocs if ioc in text]

def trend_analysis(data: List[Dict[str, Any]], timeframe: str) -> str:
    """Analyzes cybersecurity trends over a given timeframe."""
    # Implement actual trend analysis logic here
    return f"Trend analysis for the timeframe {timeframe} shows increasing threats."

def sentiment_analysis(text: str) -> str:
    """Analyzes the sentiment of a given text."""
    sentiment = TextBlob(text).sentiment.polarity
    if sentiment > 0:
        return "Positive sentiment"
    elif sentiment < 0:
        return "Negative sentiment"
    else:
        return "Neutral sentiment"

def topic_modeling(texts: List[str], num_topics: int = 5) -> List[str]:
    """Performs topic modeling on a collection of texts."""
    # Implement actual topic modeling logic here
    topics = ["Cybersecurity", "Threats", "Vulnerabilities", "Attacks", "Defense"]
    return topics[:num_topics]

In [11]:
def define_tools(vector_store: FAISS, scraped_data: List[Dict[str, Any]]) -> List[Tool]:
    return [
        Tool(
            name="Search",
            func=lambda q: vector_store.similarity_search(q, k=3),
            description="Useful for searching information in the knowledge base"
        ),
        Tool(
            name="Summarize",
            func=lambda q: llm.predict(f"Summarize the following text:\n{q}"),
            description="Useful for summarizing long pieces of text"
        ),
        Tool(
            name="Analyze CVE Severity",
            func=analyze_cve_severity,
            description="Analyzes the severity of a CVE based on its description"
        ),
        Tool(
            name="Extract IOCs",
            func=extract_iocs,
            description="Extracts potential Indicators of Compromise (IOCs) from text"
        ),
        Tool(
            name="Trend Analysis",
            func=lambda timeframe: trend_analysis(scraped_data, timeframe),
            description="Analyzes cybersecurity trends over a given timeframe (week, month, or 3months)"
        ),
        Tool(
            name="Sentiment Analysis",
            func=sentiment_analysis,
            description="Analyzes the sentiment of a given text"
        ),
        Tool(
            name="Topic Modeling",
            func=lambda texts: topic_modeling(texts, num_topics=5),
            description="Performs topic modeling on a collection of texts"
        )
    ]
# Test agent tools
tools = define_tools(vector_store, scraped_data)
logger.info(f"Tools defined: {tools}")

INFO:__main__:Tools defined: [Tool(name='Search', description='Useful for searching information in the knowledge base', func=<function define_tools.<locals>.<lambda> at 0x7a6dc3b80b80>), Tool(name='Summarize', description='Useful for summarizing long pieces of text', func=<function define_tools.<locals>.<lambda> at 0x7a6dc3b80c10>), Tool(name='Analyze CVE Severity', description='Analyzes the severity of a CVE based on its description', func=<function analyze_cve_severity at 0x7a6dc3b808b0>), Tool(name='Extract IOCs', description='Extracts potential Indicators of Compromise (IOCs) from text', func=<function extract_iocs at 0x7a6dc3b80790>), Tool(name='Trend Analysis', description='Analyzes cybersecurity trends over a given timeframe (week, month, or 3months)', func=<function define_tools.<locals>.<lambda> at 0x7a6dc3b80ca0>), Tool(name='Sentiment Analysis', description='Analyzes the sentiment of a given text', func=<function sentiment_analysis at 0x7a6dc3b80820>), Tool(name='Topic Model

In [12]:
def create_multi_agent_system(tools: List[Tool]):
    workflow = StateGraph(AgentState)

    # Add agent nodes
    workflow.add_node("researcher", researcher_agent)
    workflow.add_node("analyst", analyst_agent)
    workflow.add_node("advisor", advisor_agent)
    workflow.add_node("threat_hunter", threat_hunter_agent)
    workflow.add_node("incident_responder", incident_responder_agent)

    # Add conditional edges using select_next_agent
    for node in ["researcher", "analyst", "advisor", "threat_hunter", "incident_responder"]:
        workflow.add_conditional_edges(
            node,
            select_next_agent,
            {
                "researcher": "researcher",
                "analyst": "analyst",
                "advisor": "advisor",
                "threat_hunter": "threat_hunter",
                "incident_responder": "incident_responder"
            }
        )

    # Set the entrypoint
    workflow.set_entry_point("researcher")

    # Compile the graph
    return workflow.compile()

# Test multi-agent system
multi_agent_system = create_multi_agent_system(tools)
logger.info(f"Multi-agent system created")

NameError: name 'AgentState' is not defined

In [13]:
# Main Function
def main(scraped_data: List[Dict[str, Any]]):
    try:
        processed_texts = process_curated_data(scraped_data)
        vector_store = store_in_vector_db(processed_texts)
        tools = define_tools(vector_store, scraped_data)
        multi_agent_system = create_multi_agent_system(tools)

        logger.info("Enhanced Cybersecurity Multi-Agent system initialized successfully.")

        memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

        queries = [
            "Assess the vulnerability CVE-2024-12345 in Windows Server.",
            "Provide a security recommendation for mitigating phishing attacks.",
            "List all details on BFSI security incidents in India.",
            "List all ransomware attacks targeting the healthcare industry in the last 7 days.",
            "Provide recent incidents related to Lockbit Ransomware gang.",
            "Provide recent incidents related to BlackBasta Ransomware."
        ]

        for query in queries:
            print(f"\nQuery: {query}")
            with get_openai_callback() as cb:
                initial_state = AgentState(
                    messages=[{"role": "human", "content": query}],
                    current_agent="researcher",
                    scratchpad=[]
                )
                final_state = multi_agent_system.invoke(initial_state)

                # Process and display the final response
                final_response = final_state['messages'][-1]['content']
                print(f"Response: {final_response}")

                logger.info(f"Tokens used: {cb.total_tokens}")
                logger.info(f"Cost of query: ${cb.total_cost:.4f}")

    except Exception as e:
        logger.error(f"An error occurred: {str(e)}")

if __name__ == "__main__":
    try:
        # Load your scraped data here
        with open('scraped_data.json', 'r') as f:
            scraped_data = json.load(f)
        main(scraped_data)
    except FileNotFoundError:
        logger.error("scraped_data.json file not found. Please ensure the file exists in the current directory.")
    except json.JSONDecodeError:
        logger.error("Error decoding JSON from scraped_data.json. Please ensure the file contains valid JSON.")
    except Exception as e:
        logger.error(f"An unexpected error occurred: {str(e)}")

ERROR:__main__:scraped_data.json file not found. Please ensure the file exists in the current directory.
