<a href="https://colab.research.google.com/github/Hannan2004/Cerebras-AI-Fellowship/blob/main/CerebrasAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# CyberGuard: Multi-Agent LLM Framework for Intelligent Network Security Analysis

Project Overview:
----------------
CyberGuard is an innovative approach to network security that leverages Large Language Models
(LLMs) through a sophisticated multi-agent system. This project demonstrates the potential
of applying advanced AI techniques to cybersecurity challenges, specifically focusing on
real-time threat detection and analysis.

Key Innovations:
--------------
1. Multi-Agent Architecture: Specialized agents for different types of attack detection
2. LLM-Powered Analysis: Utilizing Cerebras' powerful language models for intelligent threat detection
3. Real-Time Processing: Efficient pipeline for processing network logs
4. Adaptive Learning: Context-aware analysis of network patterns
5. Structured Reporting: Comprehensive threat analysis and visualization

Use Cases:
---------
- Enterprise Network Security
- Cloud Infrastructure Protection
- Security Operations Centers (SOC)
- Automated Threat Detection
- Network Anomaly Detection

Technical Implementation:
-----------------------
- Framework: LangChain
- Model: Cerebras LLaMA 3.1 70B
- Processing: Real-time log analysis
- Output: Structured security insights

# Installing Necessary Libraries

In [2]:
!pip install langchain
!pip install langchain-cerebras

Collecting langchain-cerebras
  Downloading langchain_cerebras-0.3.0-py3-none-any.whl.metadata (4.9 kB)
Collecting langchain-openai<0.3.0,>=0.2.0 (from langchain-cerebras)
  Downloading langchain_openai-0.2.6-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-core<0.4.0,>=0.3.0 (from langchain-cerebras)
  Downloading langchain_core-0.3.15-py3-none-any.whl.metadata (6.3 kB)
Collecting openai<2.0.0,>=1.54.0 (from langchain-openai<0.3.0,>=0.2.0->langchain-cerebras)
  Downloading openai-1.54.3-py3-none-any.whl.metadata (24 kB)
Collecting tiktoken<1,>=0.7 (from langchain-openai<0.3.0,>=0.2.0->langchain-cerebras)
  Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Downloading langchain_cerebras-0.3.0-py3-none-any.whl (7.8 kB)
Downloading langchain_openai-0.2.6-py3-none-any.whl (50 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading langchain_core-

# 1. Import Required Libraries

In [3]:
import pandas as pd
from langchain import PromptTemplate, LLMChain
from langchain_cerebras import ChatCerebras
from google.colab import files, userdata
import time
import logging

# 2. Configure Logging

In [4]:
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# 3. Setting Up API Authentication

In [5]:
def setup_llm():
    """
    Initialize the Cerebras LLM [Llama 3.1 (70b parameters model)] with appropriate configuration
    """
    try:
        api_key = userdata.get('CEREBRAS-API-KEY')
        if not api_key:
            raise ValueError("API key not found")

        return ChatCerebras(
            model="llama-3.1-70b",
            temperature=0,
            api_key=api_key,
            max_tokens=2048
        )
    except Exception as e:
        logger.error(f"Error setting up LLM: {str(e)}")
        raise


# 4. Define Agent Templates

In [6]:
class SecurityAgentTemplates:
    data_ingestion = """
    You are an expert at processing network logs. Analyze the following logs and convert them into a structured format for further analysis.
    Focus on extracting key fields such as timestamp, source IP, destination IP, port numbers, and protocol information.

    Raw Logs:
    {raw_logs}

    Please provide the structured data in a clear, organized format with the following information:
    - Timestamp patterns
    - IP address distributions
    - Port usage patterns
    - Protocol statistics
    """

    ddos_detection = """
    You are an expert in cybersecurity specializing in DDoS attack detection. Analyze the structured network data for patterns indicating DDoS attacks, such as:
    - High frequency of requests from similar source IPs
    - Unusual traffic patterns
    - Suspicious request distributions

    Structured Data:
    {structured_data}

    Provide a detailed analysis including:
    1. Attack indicators found
    2. Source IP addresses involved
    3. Attack patterns and characteristics
    4. Confidence level of detection
    """

    port_scan = """
    You are a cybersecurity expert in network reconnaissance detection. Analyze the given structured network data for port scanning activities, looking for:
    - Sequential port access patterns
    - Multiple ports accessed from single sources
    - Unusual port access frequencies

    Structured Data:
    {structured_data}

    Detail your findings with:
    1. Identified scanning patterns
    2. Source IP addresses involved
    3. Targeted port ranges
    4. Scanning techniques used
    """

    web_attack = """
    You are a web security specialist. Analyze the structured network data for signs of web-based attacks, including:
    - SQL Injection attempts
    - Cross-Site Scripting (XSS)
    - Directory traversal
    - Command injection
    - Authentication bypass attempts

    Structured Data:
    {structured_data}

    Provide comprehensive analysis including:
    1. Attack types detected
    2. Malicious patterns found
    3. Source IP addresses
    4. Targeted endpoints
    5. Potential impact assessment
    """

# 5. Initialize Analysis Pipeline

In [7]:
class SecurityAnalysisPipeline:
    def __init__(self, llm):
        self.llm = llm
        self.templates = SecurityAgentTemplates()
        self._initialize_chains()

    def _initialize_chains(self):
        """Initialize all LLM chains for different analysis types"""
        self.data_ingestion_chain = LLMChain(
            prompt=PromptTemplate(
                template=self.templates.data_ingestion,
                input_variables=["raw_logs"]
            ),
            llm=self.llm
        )

        self.ddos_detection_chain = LLMChain(
            prompt=PromptTemplate(
                template=self.templates.ddos_detection,
                input_variables=["structured_data"]
            ),
            llm=self.llm
        )

        self.port_scan_chain = LLMChain(
            prompt=PromptTemplate(
                template=self.templates.port_scan,
                input_variables=["structured_data"]
            ),
            llm=self.llm
        )

        self.web_attack_chain = LLMChain(
            prompt=PromptTemplate(
                template=self.templates.web_attack,
                input_variables=["structured_data"]
            ),
            llm=self.llm
        )

    def analyze_logs(self, df):
        """
        Run the complete analysis pipeline on the provided logs

        Args:
            df (pandas.DataFrame): DataFrame containing network logs

        Returns:
            dict: Analysis results from all security agents
        """
        logger.info("Starting log analysis pipeline")
        start_time = time.time()

        try:
            # Convert DataFrame to formatted string
            raw_logs = df.to_string(index=False)

            # Run each analysis stage
            structured_data = self.data_ingestion_chain.run({"raw_logs": raw_logs})
            ddos_results = self.ddos_detection_chain.run({"structured_data": structured_data})
            port_scan_results = self.port_scan_chain.run({"structured_data": structured_data})
            web_attack_results = self.web_attack_chain.run({"structured_data": structured_data})

            execution_time = time.time() - start_time

            return {
                "structured_data": structured_data,
                "ddos_analysis": ddos_results,
                "port_scan_analysis": port_scan_results,
                "web_attack_analysis": web_attack_results,
                "execution_time": execution_time
            }

        except Exception as e:
            logger.error(f"Error in analysis pipeline: {str(e)}")
            raise

# 6. Main Execution

In [8]:
def main():
    """
    Main function to run the security analysis pipeline
    """
    try:
        # Upload and read log file
        logger.info("Waiting for log file upload...")
        uploaded = files.upload()
        file_path = list(uploaded.keys())[0]

        # Read logs into DataFrame
        df = pd.read_csv(file_path)
        logger.info(f"Successfully loaded log file: {file_path}")

        # Initialize and run pipeline
        llm = setup_llm()
        pipeline = SecurityAnalysisPipeline(llm)
        results = pipeline.analyze_logs(df)

        # Display results
        print("\n=== Security Analysis Results ===")
        print(f"\nExecution Time: {results['execution_time']:.2f} seconds")
        print("\n1. Data Processing Results:")
        print(results['structured_data'])
        print("\n2. DDoS Attack Analysis:")
        print(results['ddos_analysis'])
        print("\n3. Port Scan Analysis:")
        print(results['port_scan_analysis'])
        print("\n4. Web Attack Analysis:")
        print(results['web_attack_analysis'])

    except Exception as e:
        logger.error(f"Error in main execution: {str(e)}")
        raise

if __name__ == "__main__":
    main()

Saving dataset (1).csv to dataset (1).csv


  self.data_ingestion_chain = LLMChain(
  structured_data = self.data_ingestion_chain.run({"raw_logs": raw_logs})



=== Security Analysis Results ===

Execution Time: 3.58 seconds

1. Data Processing Results:
Based on the provided network logs, I have extracted the key fields and organized them into a structured format for further analysis.

**Timestamp Patterns:**

* The logs are from July 7, 2017, with two distinct time ranges:
	+ 3:30 ( majority of the logs)
	+ 3:58 (logs from 172.16.0.1 to 192.168.10.50)

**IP Address Distributions:**

* **Source IP Addresses:**
	+ 172.16.0.1 ( majority of the logs, 43 entries)
	+ 104.16.28.216 (2 entries)
	+ 104.16.207.165 (1 entry)
	+ 104.20.10.120 (2 entries)
	+ 104.17.241.25 (1 entry)
	+ 104.19.196.102 (1 entry)
	+ 104.28.13.116 (1 entry)
	+ 104.97.123.193 (1 entry)
	+ 104.97.125.160 (3 entries)
	+ 104.97.139.37 (1 entry)
	+ 104.97.140.32 (1 entry)
	+ 121.29.54.141 (5 entries)
	+ 138.201.37.241 (1 entry)
	+ 144.76.121.178 (1 entry)
	+ 145.243.233.163 (1 entry)
	+ 151.101.0.166 (1 entry)
	+ 151.101.0.249 (2 entries)
	+ 151.101.1.108 (1 entry)
	+ 151.101.1.5 