In [None]:
!pip install mistralai

Collecting mistralai
  Downloading mistralai-1.7.0-py3-none-any.whl.metadata (30 kB)
Collecting eval-type-backport>=0.2.0 (from mistralai)
  Downloading eval_type_backport-0.2.2-py3-none-any.whl.metadata (2.2 kB)
Downloading mistralai-1.7.0-py3-none-any.whl (301 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m301.5/301.5 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading eval_type_backport-0.2.2-py3-none-any.whl (5.8 kB)
Installing collected packages: eval-type-backport, mistralai
Successfully installed eval-type-backport-0.2.2 mistralai-1.7.0


In [None]:
import pandas as pd
from mistralai import Mistral
from google.colab import userdata

api = userdata.get('MISTRAL_API_KEY')

client = Mistral(api_key=api)

In [None]:
system_message = """
You are a cybersecurity expert trained to analyze network flow logs and determine whether the traffic is benign or part of a Distributed Denial of Service (DDoS) attack.

Use the ReAct (Reasoning + Acting) approach:
1. First, REASON through the meaning of the network features such as IPs, ports, protocols, flow duration, packet counts, and byte rates.
2. Then, ACT by concluding whether the flow is BENIGN or DDOS based on patterns, anomalies, or suspicious behavior.

Be concise but clear in your reasoning, and always end with the final classification in the format: "Final Answer: BENIGN" or "Final Answer: DDOS".
"""

few_shot_example = """
Example:

Input:
Source IP : 172.16.0.5
Destination IP : 149.171.126.12
Source Port : 49855
Destination Port : 443
Protocol : 6
Flow Duration : 1147838
Total Fwd Packets : 12
Total Backward Packets : 15
Flow Bytes/s : 12.2

Analysis:
Let me analyze this network flow:
- Source IP 172.16.0.5 is a private IP address (likely internal network)
- Destination IP 149.171.126.12 is a public IP
- Source Port 49855 is a high ephemeral port (normal for client connections)
- Destination Port 443 indicates HTTPS traffic (normal web browsing)
- Protocol 6 is TCP (standard for web browsing)
- Flow Duration of 1147838 microseconds (~1.15 seconds) is reasonable for a web request
- Packet counts (12 forward, 15 backward) are balanced and typical for normal web traffic
- Flow Bytes/s at 12.2 is relatively low and consistent with regular web browsing

This appears to be normal HTTPS web traffic from an internal network to an external server with balanced packet exchange and reasonable data rates.

Final Answer: BENIGN
"""

In [None]:
df = pd.read_csv('/content/sample_20_rows.csv')
X = df.drop('Label', axis=1)
y = df['Label']

In [None]:
for i in range(20):
    prompt = f"""
Source IP : {X['Source IP'][i]}
Destination IP : {X['Destination IP'][i]}
Source Port : {X['Source Port'][i]}
Destination Port : {X['Destination Port'][i]}
Protocol : {X['Protocol'][i]}
Flow Duration : {X['Flow Duration'][i]}
Total Fwd Packets : {X['Total Fwd Packets'][i]}
Total Backward Packets : {X['Total Backward Packets'][i]}
Flow Bytes/s : {X['Flow Bytes/s'][i]}
"""

    response = client.chat.complete(
        model="mistral-large-latest",
        messages=[
            {
                "role": "system",
                "content": system_message
            },
            {
                "role": "user",
                "content": few_shot_example
            },
            {
                "role": "user",
                "content": prompt
            }
        ],
        temperature=0.5,
        top_p=0.5,
        stream=False
    )

    print(f"\nResponse for index {i}:")
    print(response.choices[0].message.content)



Response for index 0:
Let me analyze this network flow:

- Source IP 172.16.0.5 is a private IP address (likely internal network)
- Destination IP 192.168.50.1 is also a private IP address (likely internal network)
- Source Port 842 is a low-numbered port, typically used by system processes
- Destination Port 21484 is a high ephemeral port, usually used by client applications
- Protocol 17 is UDP, which is used for low-latency and loss-tolerating connections
- Flow Duration is negative, which is invalid and indicates a potential error or manipulation
- Total Fwd Packets is a non-integer value, which is invalid and suggests data corruption or manipulation
- Total Backward Packets is also a non-integer and negative, which is invalid and suggests data corruption or manipulation
- Flow Bytes/s is negative, which is invalid and suggests data corruption or manipulation

The invalid and negative values for flow duration, packet counts, and byte rates are highly suspicious and indicate potent