<a href="https://colab.research.google.com/github/KaifAhmad1/code-test/blob/main/Enhanced_Cyber_Security_Copilot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### Problem Statement

##### Task
Develop a co-pilot for threat researchers, security analysts, and professionals that addresses the limitations of current AI solutions like ChatGPT and Perplexity.

##### Current Challenges
1. **Generic Data**: Existing AI solutions provide generic information that lacks specificity.
2. **Context Understanding**: These solutions fail to understand and maintain context.
3. **Limited Information**: The data sources are often limited and not comprehensive.
4. **Single Source Dependency**: Relying on a single source of information reduces reliability and accuracy.
5. **Inadequate AI Models**: Current models do not meet the specialized needs of cybersecurity professionals.

##### Requirement
Create a chatbot capable of collecting and curating data from multiple sources, starting with search engines, and expanding to website crawling and Twitter scraping.

###### Technical Specifications
- **No Hallucinations**: Ensure the chatbot provides accurate and reliable information.
- **RAG (Retrieval-Augmented Generation)**: Use RAG to determine which connectors to use based on user inputs.
- **Query Chunking and Distribution**: Optimize the process of breaking down queries and distributing them across different sources.
- **Data Curation Steps**:
  1. Collect links from approximately 50 sources.
  2. Aggregate data from websites and Twitter.
  3. Curate data using a knowledge graph to find relationships and generate responses.
- **Chatbot Capabilities**: Answer queries such as:
  - "List all details on {{BFSI}} security incidents in {{India}}."
  - "List all ransomware attacks targeting the healthcare industry in {{last 7 days/last 3 months/last week/last month}}."
  - "Provide recent incidents related to Lockbit Ransomware gang / BlackBasta Ransomware."

##### Goal
Develop a data collector that integrates multiple specific sources to enrich the knowledge base, enabling the model to better understand context and deliver accurate results. The solution should be modular, allowing customization and configuration of sources.

##### Summary
The goal is to build an advanced, modular chatbot for cybersecurity professionals that overcomes the limitations of existing AI solutions by integrating multiple data sources and ensuring context-aware, accurate responses. The chatbot will utilize state-of-the-art techniques like RAG and knowledge graphs to provide comprehensive, curated information from diverse sources.


In [13]:
%pip install -q apify-client langchain langchain-community langchain-groq networkx pyvis spacy transformers pandas
%pip install -q sentence-transformers requests beautifulsoup4 ratelimit langgraph

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/102.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.6/102.6 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [14]:
import os
from datetime import datetime, timedelta
from typing import List, Dict, Any, Annotated, TypedDict
import logging
import requests
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
from concurrent.futures import ThreadPoolExecutor, as_completed
from ratelimit import limits, sleep_and_retry
from bs4 import BeautifulSoup
from apify_client import ApifyClient
from langchain.embeddings import HuggingFaceBgeEmbeddings
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
from langchain_groq import ChatGroq
import networkx as nx
from pyvis.network import Network
import spacy
from transformers import pipeline
import json
from langchain.agents import Tool
from langchain.memory import ConversationBufferMemory
from langchain.callbacks import get_openai_callback
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolExecutor

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

In [15]:
# Constants
APIFY_API_KEY = "apify_api_yUkcz99gMX1pwNckRi7EyXLwhVTd0j3m4Mtt"
NEWS_API_KEY = os.getenv("c50f733b00e34575a7c203c38cd97391")
GROQ_API_KEY = "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"
WEBSITES = [
    "https://www.cisa.gov/uscert/ncas/alerts",
    "https://attack.mitre.org/",
    "https://www.darkreading.com/",
    "https://threatpost.com/",
    "https://krebsonsecurity.com/",
    "https://www.bleepingcomputer.com/",
    "https://www.zdnet.com/topic/security/",
    "https://www.securityweek.com/",
    "https://www.sans.org/newsletters/newsbites/",
    "https://www.cyberscoop.com/",
    "https://www.csoonline.com/",
    "https://www.infosecurity-magazine.com/",
    "https://www.wired.com/category/security/",
    "https://www.schneier.com/",
    "https://www.theregister.com/security/",
    "https://thehackernews.com/",
    "https://www.cyberdefensemagazine.com/",
    "https://www.fireeye.com/blog.html",
    "https://unit42.paloaltonetworks.com/",
    "https://www.microsoft.com/security/blog/",
    "https://www.us-cert.gov/ncas/current-activity",
    "https://nakedsecurity.sophos.com/",
    "https://www.recordedfuture.com/blog/",
    "https://www.cybersecurity-insiders.com/",
    "https://www.malwarebytes.com/blog/",
]
RSS_FEEDS = [
    "https://www.cisa.gov/uscert/ncas/alerts.xml",
    "https://krebsonsecurity.com/feed/",
    "https://threatpost.com/feed/",
    "https://www.darkreading.com/rss_simple.asp"
]

In [18]:
# Initialize Apify client
apify_client = ApifyClient(APIFY_API_KEY)

# Configure requests session with retries and timeouts
session = requests.Session()
retries = Retry(total=5, backoff_factor=0.1, status_forcelist=[429, 500, 502, 503, 504])
session.mount('https://', HTTPAdapter(max_retries=retries))
session.mount('http://', HTTPAdapter(max_retries=retries))

In [19]:
# Rate-limited GET request
@sleep_and_retry
@limits(calls=15, period=1)  # 5 calls per second
def rate_limited_get(url: str, **kwargs) -> requests.Response:
    return session.get(url, timeout=10, **kwargs)

In [20]:
# Website scraping using Apify actor
def scrape_website_with_apify(url: str) -> Dict[str, Any]:
    logger.info(f"Scraping {url} with Apify...")
    try:
        actor_input = {
            "url": url,
            "proxyConfiguration": {"useApifyProxy": True}
        }
        run = apify_client.actor("apify/website-content-crawler").call(run_input=actor_input)
        dataset_id = run["defaultDatasetId"]
        items = apify_client.dataset(dataset_id).list_items().items
        if items:
            return {"url": url, "text": items[0].get("text", ""), "timestamp": datetime.now().isoformat()}
        else:
            return {"url": url, "text": "", "timestamp": datetime.now().isoformat(), "error": "No content found"}
    except Exception as e:
        logger.error(f"Error scraping {url} with Apify: {str(e)}")
        return {"url": url, "text": "", "timestamp": datetime.now().isoformat(), "error": str(e)}

# Website scraping
def scrape_website(url: str) -> Dict[str, Any]:
    try:
        response = rate_limited_get(url)
        response.raise_for_status()
        soup = BeautifulSoup(response.content, 'html.parser')
        text = soup.get_text(separator=' ', strip=True)
        return {"url": url, "text": text, "timestamp": datetime.now().isoformat()}
    except Exception as e:
        logger.error(f"Error scraping {url}: {str(e)}")
        return {"url": url, "text": "", "timestamp": datetime.now().isoformat(), "error": str(e)}

def scrape_websites(urls: List[str]) -> List[Dict[str, Any]]:
    logger.info(f"Scraping {len(urls)} websites...")
    with ThreadPoolExecutor(max_workers=10) as executor:
        future_to_url = {executor.submit(scrape_website, url): url for url in urls}
        results = [future.result() for future in as_completed(future_to_url)]
    logger.info(f"Successfully scraped {len(results)} pages.")
    return results

# Fetch tweets
def fetch_tweets(query: str, max_tweets: int = 100) -> List[Dict[str, Any]]:
    logger.info(f"Fetching tweets for query: {query}")
    actor_input = {
        "searchTerms": [query],
        "maxTweets": max_tweets,
        "languageCode": "en"
    }
    try:
        run = apify_client.actor("apidojo/tweet-scraper").call(run_input=actor_input)
        dataset_id = run["defaultDatasetId"]
        items = apify_client.dataset(dataset_id).list_items().items
        logger.info(f"Fetched {len(items)} tweets.")
        return items
    except Exception as e:
        logger.error(f"Error fetching tweets: {str(e)}")
        return []

# Fetch news articles
def fetch_news(query: str, max_results: int = 50) -> List[Dict[str, Any]]:
    logger.info(f"Fetching news for query: {query}")
    url = "https://newsapi.org/v2/everything"
    params = {
        "q": query,
        "language": "en",
        "pageSize": max_results,
        "apiKey": NEWS_API_KEY,
        "sortBy": "publishedAt"
    }
    try:
        response = rate_limited_get(url, params=params)
        response.raise_for_status()
        articles = response.json().get("articles", [])
        logger.info(f"Fetched {len(articles)} news articles.")
        return articles
    except Exception as e:
        logger.error(f"Error fetching news: {str(e)}")
        return []

# Scrape Reddit
def scrape_reddit(query: str, max_results: int = 100) -> List[Dict[str, Any]]:
    logger.info(f"Scraping Reddit for: {query}")
    actor_input = {
        "searchTerms": [query],
        "maxPosts": max_results
    }
    try:
        run = apify_client.actor("comchat/reddit-api-scraper").call(run_input=actor_input)
        dataset_id = run["defaultDatasetId"]
        items = apify_client.dataset(dataset_id).list_items().items
        logger.info(f"Fetched {len(items)} Reddit posts.")
        return items
    except Exception as e:
        logger.error(f"Error scraping Reddit: {str(e)}")
        return []

# Fetch CVE data
def fetch_cve_data() -> List[Dict[str, Any]]:
    logger.info("Fetching CVE data")
    url = "https://cve.circl.lu/api/last"
    try:
        response = rate_limited_get(url)
        response.raise_for_status()
        cve_items = response.json()
        logger.info(f"Fetched {len(cve_items)} CVE items.")
        return cve_items
    except Exception as e:
        logger.error(f"Error fetching CVE data: {str(e)}")
        return []

# Fetch Google News articles
def fetch_google_news(query: str, max_results: int = 50) -> List[Dict[str, Any]]:
    logger.info(f"Fetching Google News for: {query}")
    actor_input = {
        "queries": query,
        "maxPagesPerQuery": max_results
    }
    try:
        run = apify_client.actor("lhotanova/google-news-scraper").call(run_input=actor_input)
        dataset_id = run["defaultDatasetId"]
        items = apify_client.dataset(dataset_id).list_items().items
        logger.info(f"Fetched {len(items)} Google News articles.")
        return items
    except Exception as e:
        logger.error(f"Error fetching Google News: {str(e)}")
        return []

# Fetch Bing search results
def fetch_bing_search(query: str, max_results: int = 50) -> List[Dict[str, Any]]:
    logger.info(f"Fetching Bing search results for: {query}")
    actor_input = {
        "queries": query,
        "maxPagesPerQuery": max_results
    }
    try:
        run = apify_client.actor("curious_coder/bing-search-scraper").call(run_input=actor_input)
        dataset_id = run["defaultDatasetId"]
        items = apify_client.dataset(dataset_id).list_items().items
        logger.info(f"Fetched {len(items)} Bing search results.")
        return items
    except Exception as e:
        logger.error(f"Error fetching Bing search results: {str(e)}")
        return []

# Fetch LinkedIn posts
def fetch_linkedin_posts(query: str, max_posts: int = 100) -> List[Dict[str, Any]]:
    logger.info(f"Fetching LinkedIn posts for query: {query}")
    actor_input = {
        "searchTerms": [query],
        "maxPosts": max_posts,
        "languageCode": "en"
    }
    try:
        run = apify_client.actor("curious_coder/linkedin-post-search-scraper").call(run_input=actor_input)
        dataset_id = run["defaultDatasetId"]
        items = apify_client.dataset(dataset_id).list_items().items
        logger.info(f"Fetched {len(items)} LinkedIn posts.")
        return items
    except Exception as e:
        logger.error(f"Error fetching LinkedIn posts: {str(e)}")
        return []

# Fetch RSS feeds
def fetch_rss_feeds(urls: List[str]) -> List[Dict[str, Any]]:
    logger.info(f"Fetching RSS feeds from {len(urls)} URLs")
    run_input = {
        "startUrls": urls,
        "maxItems": 50
    }
    try:
        run = apify_client.actor("jupri/rss-xml-scraper").call(run_input=run_input)
        dataset_id = run["defaultDatasetId"]
        items = apify_client.dataset(dataset_id).list_items().items
        logger.info(f"Fetched {len(items)} RSS feed items.")
        return items
    except Exception as e:
        logger.error(f"Error fetching RSS feeds: {str(e)}")
        return []

In [21]:
# Curate data from various sources
def curate_data(website_data, tweets, news, reddit_posts, cve_data, google_news, bing_results, linkedin_posts, rss_feeds):
    curated_data = []

    # Process and curate data from websites
    for page in website_data:
        curated_data.append({
            "source": "Website",
            "url": page.get("url"),
            "text": page.get("text"),
            "timestamp": page.get("timestamp")
        })

    # Process and curate data from Twitter
    for tweet in tweets:
        curated_data.append({
            "source": "Twitter",
            "text": tweet.get("text"),
            "user": tweet.get("user"),
            "timestamp": tweet.get("timestamp")
        })

    # Process and curate data from news articles
    for article in news:
        curated_data.append({
            "source": "News",
            "url": article.get("url"),
            "title": article.get("title"),
            "description": article.get("description"),
            "timestamp": article.get("publishedAt")
        })

    # Process and curate data from Reddit posts
    for post in reddit_posts:
        curated_data.append({
            "source": "Reddit",
            "url": post.get("url"),
            "title": post.get("title"),
            "selftext": post.get("selftext"),
            "timestamp": post.get("created_utc")
        })

    # Process and curate data from CVE data
    for cve in cve_data:
        cve_meta = cve.get("cve", {}).get("CVE_data_meta", {})
        description_data = cve.get("cve", {}).get("description", {}).get("description_data", [{}])
        curated_data.append({
            "source": "CVE",
            "cve_id": cve_meta.get("ID"),
            "description": description_data[0].get("value"),
            "timestamp": cve.get("publishedDate")
        })

    # Process and curate data from Google News articles
    for article in google_news:
        curated_data.append({
            "source": "Google News",
            "url": article.get("url"),
            "title": article.get("title"),
            "description": article.get("description"),
            "timestamp": article.get("publishedAt")
        })

    # Process and curate data from Bing search results
    for item in bing_results:
        curated_data.append({
            "source": "Bing",
            "url": item.get("url"),
            "title": item.get("title"),
            "snippet": item.get("snippet"),
            "timestamp": item.get("timestamp")
        })

    # Process and curate data from LinkedIn posts
    for post in linkedin_posts:
        curated_data.append({
            "source": "LinkedIn",
            "text": post.get("text"),
            "user": post.get("user"),
            "timestamp": post.get("timestamp")
        })

    # Process and curate data from RSS feeds
    for feed in rss_feeds:
        curated_data.append({
            "source": "RSS",
            "url": feed.get("link"),
            "title": feed.get("title"),
            "description": feed.get("description"),
            "timestamp": feed.get("pubDate")
        })

    return curated_data

In [22]:
# Define tags and queries
tags = [
    "malware", "ransomware", "threat", "cybersecurity", "phishing",
    "data breach", "DDoS attack", "APT", "zero-day", "exploit",
    "vulnerability", "incident response", "threat intelligence",
    "SIEM", "EDR", "XDR", "cloud security", "IoT security",
    "AI security", "blockchain security", "cryptography",
    "network security", "application security", "DevSecOps",
    "container security", "Kubernetes security", "SOAR",
    "threat hunting", "OSINT", "penetration testing",
    "red teaming", "blue teaming", "purple teaming",
    "cyber insurance", "compliance", "GDPR", "HIPAA",
    "PCI DSS", "NIST", "ISO 27001", "zero trust",
    "passwordless", "biometrics", "MFA", "IAM", "PAM",
    "cyber resilience", "cyber hygiene", "security awareness",
    "social engineering", "insider threat", "supply chain attack",
    "quantum computing", "post-quantum cryptography", "5G security",
    "OT security", "ICS security", "SCADA security", "mobile security",
    "endpoint security", "email security", "web security",
    "API security", "CASB", "CWPP", "CSPM", "CNAPP",
    "cyber warfare", "cyber espionage", "hacktivism", "cyber terrorism",
    "cyber crime", "dark web", "threat actor", "nation-state attack",
    "latest cybersecurity incidents", "recent cyber attacks", "real-time threats",
    "emerging vulnerabilities", "critical infrastructure security", "cyber defense",
    "cybersecurity trends", "cybersecurity news", "cybersecurity alerts",
    "cybersecurity updates", "cybersecurity bulletins", "cybersecurity advisories",
    "cybersecurity reports", "cybersecurity analysis", "cybersecurity research"
]

queries = [
    "cybersecurity threats",
    "vulnerability assessment",
    "latest security updates",
    "List all details on {{BFSI}} security incidents in {{India}}.",
    "List all ransomware attacks targeting the healthcare industry in {{last 7 days/last 3 months/last week/last month}}.",
    "Provide recent incidents related to Lockbit Ransomware gang / BlackBasta Ransomware.",
    "Recent data breaches",
    "Latest phishing campaigns",
    "Real-time cybersecurity alerts",
    "Emerging cyber threats",
    "Critical infrastructure security incidents",
    "Recent DDoS attacks",
    "Latest zero-day vulnerabilities",
    "Recent APT activities",
    "Latest cybersecurity news",
    "Recent cybersecurity trends",
    "Latest cybersecurity advisories",
    "Recent cybersecurity bulletins",
    "Latest cybersecurity reports",
    "Recent cybersecurity research",
    "Latest cybersecurity analysis"
]

# Main function to orchestrate the process
def main():
    for query in queries:
        website_data = scrape_websites(WEBSITES)
        tweets = fetch_tweets(query)
        news = fetch_news(query)
        reddit_posts = scrape_reddit(query)
        cve_data = fetch_cve_data()
        google_news = fetch_google_news(query)
        bing_results = fetch_bing_search(query)
        linkedin_posts = fetch_linkedin_posts(query)
        rss_feeds = fetch_rss_feeds(RSS_FEEDS)

        curated_data = curate_data(
            website_data, tweets, news, reddit_posts, cve_data,
            google_news, bing_results, linkedin_posts, rss_feeds
        )

        for data in curated_data:
            print(json.dumps(data, indent=2))

if __name__ == "__main__":
    main()

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=cybersecurity+threats&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__main__:Er

{
  "source": "Website",
  "url": "https://www.bleepingcomputer.com/",
  "text": "",
  "timestamp": "2024-07-30T12:15:48.300594"
}
{
  "source": "Website",
  "url": "https://www.securityweek.com/",
  "text": "",
  "timestamp": "2024-07-30T12:15:48.372711"
}
{
  "source": "Website",
  "url": "https://www.darkreading.com/",
  "text": "Dark Reading | Security | Protect The Business Dark Reading is part of the Informa Tech Division of Informa PLC Informa PLC | ABOUT US | INVESTOR RELATIONS | TALENT This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales and Scotlan. Number 8860726. Black Hat News Omdia Cybersecurity Newsletter Sign-Up Newsletter Sign-Up Cybersecurity Topics Related Topics Application Security Cybersecurity Careers Cloud Security Cyber Risk Cyberattacks & Data Breaches Cybersecurity Analytics Cybersecurity Operations Data Pr

ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=vulnerability+assessment&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__main__

{
  "source": "Website",
  "url": "https://www.securityweek.com/",
  "text": "",
  "timestamp": "2024-07-30T12:17:15.691947"
}
{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communica

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=latest+security+updates&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__main__:

{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communications Technology Supply Chain Security Partnerships and Collaboration Physical Security Risk Management How can we help? Govern

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=List+all+details+on+%7B%7BBFSI%7D%7D+security+incidents+in+%7B%7BIndia%7D%7D.&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not val

{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communications Technology Supply Chain Security Partnerships and Collaboration Physical Security Risk Management How can we help? Govern

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=List+all+ransomware+attacks+targeting+the+healthcare+industry+in+%7B%7Blast+7+days%2Flast+3+months%2Flast+week%2Flast+month%7D%7D.&language=en&pageSize=50&sortBy=publishedAt
E

{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communications Technology Supply Chain Security Partnerships and Collaboration Physical Security Risk Management How can we help? Govern

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Provide+recent+incidents+related+to+Lockbit+Ransomware+gang+%2F+BlackBasta+Ransomware.&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input i

{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communications Technology Supply Chain Security Partnerships and Collaboration Physical Security Risk Management How can we help? Govern

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Recent+data+breaches&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__main__:Err

{
  "source": "Website",
  "url": "https://www.bleepingcomputer.com/",
  "text": "",
  "timestamp": "2024-07-30T12:25:33.018765"
}
{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Commu

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Latest+phishing+campaigns&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__main_

{
  "source": "Website",
  "url": "https://www.bleepingcomputer.com/",
  "text": "",
  "timestamp": "2024-07-30T12:26:59.453417"
}
{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Commu

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Real-time+cybersecurity+alerts&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__

{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communications Technology Supply Chain Security Partnerships and Collaboration Physical Security Risk Management How can we help? Govern

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Emerging+cyber+threats&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__main__:E

{
  "source": "Website",
  "url": "https://www.bleepingcomputer.com/",
  "text": "",
  "timestamp": "2024-07-30T12:29:51.541696"
}
{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Commu

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Critical+infrastructure+security+incidents&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is requi

{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communications Technology Supply Chain Security Partnerships and Collaboration Physical Security Risk Management How can we help? Govern

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Recent+DDoS+attacks&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__main__:Erro

{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communications Technology Supply Chain Security Partnerships and Collaboration Physical Security Risk Management How can we help? Govern

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Latest+zero-day+vulnerabilities&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:_

{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communications Technology Supply Chain Security Partnerships and Collaboration Physical Security Risk Management How can we help? Govern

ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Recent+APT+activities&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__main__:Er

{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communications Technology Supply Chain Security Partnerships and Collaboration Physical Security Risk Management How can we help? Govern

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Latest+cybersecurity+news&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__main_

{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communications Technology Supply Chain Security Partnerships and Collaboration Physical Security Risk Management How can we help? Govern

ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Recent+cybersecurity+trends&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__mai

{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communications Technology Supply Chain Security Partnerships and Collaboration Physical Security Risk Management How can we help? Govern

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Latest+cybersecurity+advisories&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:_

{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communications Technology Supply Chain Security Partnerships and Collaboration Physical Security Risk Management How can we help? Govern

ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Recent+cybersecurity+bulletins&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__

{
  "source": "Website",
  "url": "https://www.securityweek.com/",
  "text": "",
  "timestamp": "2024-07-30T12:41:18.884344"
}
{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communica

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Latest+cybersecurity+reports&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__ma

{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communications Technology Supply Chain Security Partnerships and Collaboration Physical Security Risk Management How can we help? Govern

ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Recent+cybersecurity+research&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__m

{
  "source": "Website",
  "url": "https://www.bleepingcomputer.com/",
  "text": "",
  "timestamp": "2024-07-30T12:44:11.775386"
}
{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Commu

ERROR:__main__:Error scraping https://www.securityweek.com/: 403 Client Error: Forbidden for url: https://www.securityweek.com/
ERROR:__main__:Error scraping https://www.bleepingcomputer.com/: 403 Client Error: Forbidden for url: https://www.bleepingcomputer.com/
ERROR:__main__:Error scraping https://www.theregister.com/security/: 403 Client Error: Forbidden for url: https://www.theregister.com/security/
ERROR:__main__:Error scraping https://www.cybersecurity-insiders.com/: 403 Client Error: Forbidden for url: https://www.cybersecurity-insiders.com/
ERROR:__main__:Error scraping https://www.us-cert.gov/ncas/current-activity: 404 Client Error: Not Found for url: https://www.cisa.gov/ncas/current-activity
ERROR:__main__:Error fetching news: 401 Client Error: Unauthorized for url: https://newsapi.org/v2/everything?q=Latest+cybersecurity+analysis&language=en&pageSize=50&sortBy=publishedAt
ERROR:__main__:Error scraping Reddit: Input is not valid: Field input.searchList is required
ERROR:__m

{
  "source": "Website",
  "url": "https://www.cisa.gov/uscert/ncas/alerts",
  "text": "Cybersecurity Alerts & Advisories | CISA Skip to main content An official website of the United States government Here\u2019s how you know Here\u2019s how you know Official websites use .gov A .gov website belongs to an official government organization in the United States. Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you\u2019ve safely connected to the .gov website. Share sensitive information only on official, secure websites. Free Cyber Services #protect2024 Secure Our World Shields Up Report A Cyber Issue Search Menu Close Topics Topics Cybersecurity Best Practices Cyber Threats and Advisories Critical Infrastructure Security and Resilience Election Security Emergency Communications Industrial Control Systems Information and Communications Technology Supply Chain Security Partnerships and Collaboration Physical Security Risk Management How can we help? Govern

In [23]:
# Function to process the scraped data
def process_scraped_data(data: List[Dict[str, Any]]) -> List[str]:
    processed_texts = []
    for item in data:
        if item["source"] == "Website":
            processed_texts.append(f"Website Content ({item['url']}): {item['text']}")
        elif item["source"] in ["Twitter", "LinkedIn"]:
            processed_texts.append(f"{item['source']} Post: {item['text']} - Posted by {item['user']} at {item['timestamp']}")
        elif item["source"] in ["News", "Google News", "RSS"]:
            processed_texts.append(f"{item['source']} Article: {item['title']} - {item['description']} (Published: {item['timestamp']})")
        elif item["source"] == "Reddit":
            processed_texts.append(f"Reddit Post: {item['title']} - {item['selftext']} (Posted: {item['timestamp']})")
        elif item["source"] == "CVE":
            processed_texts.append(f"CVE: {item['cve_id']} - {item['description']} (Published: {item['timestamp']})")
        elif item["source"] == "Bing":
            processed_texts.append(f"Bing Search Result: {item['title']} - {item['snippet']} (Indexed: {item['timestamp']})")
    return processed_texts

In [None]:
# Initialize HuggingFace embeddings
embeddings = HuggingFaceBgeEmbeddings(
    model_name="BAAI/bge-small-en",
    model_kwargs={"device": "cpu"},
    encode_kwargs={"normalize_embeddings": True}
)

# Initialize Llama-3.1 from Meta using Groq LPU Inference
llm = ChatGroq(
    temperature=0,
    model="llama-3.1-70b-versatile",
    api_key=GROQ_API_KEY
)

# Define system and human messages
system_message = """You are an expert cybersecurity analyst with extensive knowledge in threat analysis,
vulnerability assessment, and security recommendations. Provide detailed, precise, and actionable insights.
Always consider the latest threat intelligence and best practices in your analysis."""
prompt_template = ChatPromptTemplate.from_messages([("system", system_message), ("human", "{text}")])

In [None]:
import logging
from pyvis.network import Network
import networkx as nx

# Configure logger
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class KnowledgeGraph:
    def __init__(self):
        self.graph = nx.DiGraph()

    def add_node(self, node, **attrs):
        self.graph.add_node(node, **attrs)

    def add_edge(self, u, v, **attrs):
        self.graph.add_edge(u, v, **attrs)

    def visualize(self, output_file):
        net = Network(notebook=True)
        for node in self.graph.nodes(data=True):
            net.add_node(node[0], title=node[1].get('title', node[0]))
        for edge in self.graph.edges(data=True):
            net.add_edge(edge[0], edge[1], title=edge[2].get('relation', ''))
        net.show(output_file)
        logger.info(f"Knowledge graph visualized at {output_file}")

# Initialize knowledge graph
kg = KnowledgeGraph()
# Visualize the graph
kg.visualize("knowledge_graph.html")

In [None]:
# Function to create vector store
def create_vector_store(texts: List[str]) -> FAISS:
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    docs = text_splitter.create_documents(texts)
    vector_store = FAISS.from_documents(docs, embeddings)
    return vector_store

In [None]:
# Advanced cybersecurity analysis tools
def analyze_cve_severity(cve_description: str) -> str:
    severity_keywords = {
        "critical": 10, "high": 7, "medium": 5, "low": 3,
        "remote code execution": 9, "privilege escalation": 8,
        "denial of service": 6, "information disclosure": 4
    }

    description_lower = cve_description.lower()
    max_severity = max(score for keyword, score in severity_keywords.items() if keyword in description_lower)

    if max_severity >= 9:
        return f"Critical (Score: {max_severity})"
    elif max_severity >= 7:
        return f"High (Score: {max_severity})"
    elif max_severity >= 4:
        return f"Medium (Score: {max_severity})"
    else:
        return f"Low (Score: {max_severity})"

def extract_iocs(text: str) -> Dict[str, List[str]]:
    iocs = {
        "ip_addresses": [],
        "domains": [],
        "hashes": []
    }

    words = text.split()
    for word in words:
        if word.count('.') == 3 and all(part.isdigit() for part in word.split('.')):
            iocs["ip_addresses"].append(word)
        elif '.' in word and not word[0].isdigit():
            iocs["domains"].append(word)
        elif len(word) in [32, 40, 64] and all(c in '0123456789abcdefABCDEF' for c in word):
            iocs["hashes"].append(word)

    return iocs

def trend_analysis(data: List[Dict[str, Any]], timeframe: str) -> str:
    keywords = ["ransomware", "phishing", "data breach", "malware", "zero-day"]
    timeframe_days = {"week": 7, "month": 30, "3months": 90}

    if timeframe not in timeframe_days:
        return "Invalid timeframe. Please use 'week', 'month', or '3months'."

    cutoff_date = datetime.now() - timedelta(days=timeframe_days[timeframe])
    recent_data = [item for item in data if datetime.fromisoformat(item['timestamp']) > cutoff_date]

    keyword_counts = {keyword: sum(1 for item in recent_data if keyword in (item.get('text', '') + item.get('title', '') + item.get('description', '')).lower()) for keyword in keywords}

    sorted_trends = sorted(keyword_counts.items(), key=lambda x: x[1], reverse=True)
    trend_report = f"Top cybersecurity trends in the last {timeframe}:\n"
    trend_report += "\n".join(f"- {keyword.capitalize()}: {count} mentions" for keyword, count in sorted_trends)

    return trend_report


In [None]:
# Define the agent's tools
def define_tools(vector_store: FAISS, scraped_data: List[Dict[str, Any]]) -> List[Tool]:
    return [
        Tool(
            name="Search",
            func=lambda q: vector_store.similarity_search(q, k=3),
            description="Useful for searching information in the knowledge base"
        ),
        Tool(
            name="Summarize",
            func=lambda q: llm.predict(f"Summarize the following text:\n{q}"),
            description="Useful for summarizing long pieces of text"
        ),
        Tool(
            name="Analyze CVE Severity",
            func=analyze_cve_severity,
            description="Analyzes the severity of a CVE based on its description"
        ),
        Tool(
            name="Extract IOCs",
            func=extract_iocs,
            description="Extracts potential Indicators of Compromise (IOCs) from text"
        ),
        Tool(
            name="Trend Analysis",
            func=lambda timeframe: trend_analysis(scraped_data, timeframe),
            description="Analyzes cybersecurity trends over a given timeframe (week, month, or 3months)"
        )
    ]

In [None]:
# Define agent types
class AgentState(TypedDict):
    messages: Annotated[List[Dict[str, str]], "The messages in the conversation"]
    current_agent: Annotated[str, "The current agent processing the message"]
    scratchpad: Annotated[List[Dict[str, str]], "The agent's scratchpad"]

# Define agent nodes
def create_agent_node(role: str, system_message: str):
    def agent_function(state: AgentState, tools: List[Tool]):
        messages = state['messages']
        prompt = ChatPromptTemplate.from_messages([
            ("system", system_message),
            ("human", "{input}"),
            ("human", "Thought: {agent_scratchpad}")
        ])
        chain = LLMChain(llm=llm, prompt=prompt)
        result = chain.run(input=messages[-1]['content'], agent_scratchpad=state['scratchpad'])
        return {**state, "messages": messages + [{"role": "assistant", "content": result}]}
    return agent_function

researcher_agent = create_agent_node(
    "researcher",
    "You are a cybersecurity researcher. Your role is to gather and analyze information using the provided tools."
)

analyst_agent = create_agent_node(
    "analyst",
    "You are a cybersecurity analyst. Your role is to interpret data and provide insights based on the information gathered."
)

advisor_agent = create_agent_node(
    "advisor",
    "You are a cybersecurity advisor. Your role is to provide recommendations and action plans based on the analysis."
)

# Define the agent selection function
def select_next_agent(state: AgentState):
    last_message = state['messages'][-1]['content'].lower()
    if "research" in last_message or "information" in last_message:
        return "researcher"
    elif "analyze" in last_message or "interpret" in last_message:
        return "analyst"
    elif "recommend" in last_message or "advise" in last_message:
        return "advisor"
    else:
        return END

In [None]:
# Create the multi-agent system
def create_multi_agent_system(tools: List[Tool]):
    workflow = StateGraph(AgentState)

    # Add agent nodes
    workflow.add_node("researcher", researcher_agent)
    workflow.add_node("analyst", analyst_agent)
    workflow.add_node("advisor", advisor_agent)

    # Add edges
    for node in ["researcher", "analyst", "advisor"]:
        workflow.add_edge(node, select_next_agent)

    # Set the entrypoint
    workflow.set_entry_point("researcher")

    # Compile the graph
    return workflow.compile()

# Main function to run the RAG and Multi-Agent system
def main(scraped_data: List[Dict[str, Any]]):
    try:
        processed_texts = process_scraped_data(scraped_data)
        vector_store = create_vector_store(processed_texts)
        tools = define_tools(vector_store, scraped_data)
        multi_agent_system = create_multi_agent_system(tools)

        logger.info("Enhanced Cybersecurity Multi-Agent system initialized successfully.")

        memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

        while True:
            user_input = input("Ask a question about cybersecurity (or type 'exit' to quit): ")
            if user_input.lower() == 'exit':
                break

            with get_openai_callback() as cb:
                initial_state = AgentState(
                    messages=[{"role": "human", "content": user_input}],
                    current_agent="researcher",
                    scratchpad=[]
                )
                final_state = multi_agent_system.invoke(initial_state)

                # Process and display the final response
                final_response = final_state['messages'][-1]['content']
                print(final_response)

                logger.info(f"Tokens used: {cb.total_tokens}")
                logger.info(f"Cost of query: ${cb.total_cost:.4f}")

    except Exception as e:
        logger.error(f"An error occurred: {str(e)}")

In [None]:
if __name__ == "__main__":
    try:
        # Load your scraped data here
        with open('scraped_data.json', 'r') as f:
            scraped_data = json.load(f)
        main(scraped_data)
    except FileNotFoundError:
        logger.error("scraped_data.json file not found. Please ensure the file exists in the current directory.")
    except json.JSONDecodeError:
        logger.error("Error decoding JSON from scraped_data.json. Please ensure the file contains valid JSON.")
    except Exception as e:
        logger.error(f"An unexpected error occurred: {str(e)}")