<a href="https://colab.research.google.com/github/Stutiporwal1/Development-of-Interactive-Cyber-Threat-Visualization-Dashboard/blob/main/Infosys(Python_Task_W1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Cyber Data Analysis by Prompting**

**Infosys Springboard Internship**
**Week 1: Performing Data Analysis via Prompting**

**Objective**

Create a dummy cyber security dataset

prompt-based instructions

Load and analyze the dataset in Google Colab

Perform Data Analysis & Model comparison via AI prompting

Execute SQL queries via prompting

**Load Dataset in Google** **Colab**

In [None]:
import pandas as pd

# Load dataset
df = pd.read_csv("cyber_threat_dataset.csv")

# Display dataset
df.head()


**Data Analysis via Prompting**

Exploratory Data Analysis (EDA)

In [None]:
df.info()
df.describe()
df['attack_type'].value_counts()


Visual Analysis

In [None]:
import matplotlib.pyplot as plt

df['attack_type'].value_counts().plot(kind='bar')
plt.title("Distribution of Cyber Attacks")
plt.xlabel("Attack Type")
plt.ylabel("Count")
plt.show()


**Model Preparation & Performance** **Comparison**

Data Preprocessing

In [None]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['protocol'] = le.fit_transform(df['protocol'])
df['attack_type'] = le.fit_transform(df['attack_type'])

X = df[['protocol', 'port', 'packet_size', 'request_count', 'failed_login_attempts']]
y = df['attack_type']


**Train-Test Split**

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


Logistic Regression

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

lr = LogisticRegression(max_iter=1000)
lr.fit(X_train, y_train)

lr_pred = lr.predict(X_test)
print("Logistic Regression Accuracy:", accuracy_score(y_test, lr_pred))


Random Forest

In [None]:
from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(n_estimators=100) rf.fit(X_train, y_train)

rf_pred = rf.predict(X_test) print("Random Forest Accuracy:", accuracy_score(y_test, rf_pred))

Support Vector Machine

In [None]:
from sklearn.svm import SVC

svm = SVC()
svm.fit(X_train, y_train)

svm_pred = svm.predict(X_test)
print("SVM Accuracy:", accuracy_score(y_test, svm_pred))


Model Comparison Table

In [None]:
models = ['Logistic Regression', 'Random Forest', 'SVM']
accuracy = [
    accuracy_score(y_test, lr_pred),
    accuracy_score(y_test, rf_pred),
    accuracy_score(y_test, svm_pred)
]

pd.DataFrame({'Model': models, 'Accuracy': accuracy})


**SQL Tasks via Prompting**

Execute SQL queries on cyber security data to analyze attack patterns.

Load Dataset into SQL (SQLite)

In [None]:
import sqlite3

conn = sqlite3.connect(':memory:')
df.to_sql('cyber_logs', conn, index=False)


Count Attack Types

In [None]:
query = """
SELECT attack_type, COUNT(*) AS count
FROM cyber_logs
GROUP BY attack_type
"""
pd.read_sql(query, conn)


Detect High Request Attacks

In [None]:
query = """
SELECT *
FROM cyber_logs
WHERE request_count > 1000
"""
pd.read_sql(query, conn)


Brute Force Detection

In [None]:
query = """
SELECT source_ip, failed_login_attempts
FROM cyber_logs
WHERE failed_login_attempts > 10
"""
pd.read_sql(query, conn)


Phishing Emails

In [None]:
query = """
SELECT email_subject, url
FROM cyber_logs
WHERE attack_type = 'Phishing'
"""
pd.read_sql(query, conn)
