# Python Libraries for Cybersecurity 🚀

This notebook showcases **5 essential Python libraries** for cybersecurity professionals, providing **5 real-world security use cases per library**, complete with code snippets.

## 📌 Libraries Covered:
1. **NumPy** – Statistical analysis for anomaly detection
2. **Pandas** – Log analysis and data processing
3. **Scikit-learn** – Machine learning for threat detection
4. **Requests** – API interactions for OSINT & threat intelligence
5. **Yara-Python** – Malware detection and pattern matching

Each section includes **5 practical scenarios** with **Python code examples** for hands-on learning.


## 1️⃣ NumPy – Statistical Analysis

NumPy helps detect anomalies, perform statistical analysis, and model cyber threats.

### Scenario 1: Detecting outliers in network traffic logs

In [1]:
import numpy as np

logs = np.random.normal(loc=100, scale=20, size=1000)  # Simulated traffic data
threshold = np.mean(logs) + 2 * np.std(logs)
anomalies = logs[logs > threshold]
print(f"Detected {len(anomalies)} anomalies.")

Detected 17 anomalies.


### Scenario 2: Simulating data encryption efficiency

In [None]:
import numpy as np

data_sizes = np.random.randint(10, 1000, size=100)  # Simulating different file sizes
encryption_times = data_sizes * np.random.uniform(0.5, 1.5, size=100)  # Random encryption times
print("Average encryption time:", np.mean(encryption_times))

### Scenario 3: Analyzing failed login attempts distribution

In [3]:
import numpy as np

login_attempts = np.random.poisson(lam=3, size=500)
unique, counts = np.unique(login_attempts, return_counts=True)
print(dict(zip(unique, counts)))

{0: 30, 1: 69, 2: 115, 3: 107, 4: 92, 5: 45, 6: 26, 7: 12, 8: 4}


### Scenario 4: Detecting port scanning behavior

In [None]:
import numpy as np

ports_scanned = np.random.randint(20, 65535, 1000)
port_threshold = np.percentile(ports_scanned, 95)
suspicious_ports = ports_scanned[ports_scanned > port_threshold]
print(f"Suspicious scans detected: {len(suspicious_ports)}")

In [6]:
brew install graphviz  # macOS


SyntaxError: invalid syntax (1165813621.py, line 1)

In [2]:
from graphviz import Digraph

# Create a new directed graph
dot = Digraph()

# Define nodes
dot.node('D', 'Data Collection')
dot.node('P', 'Preprocessing')
dot.node('M', 'Model Training')
dot.node('E', 'Evaluation')
dot.node('O', 'Output & Prediction')

# Define edges
dot.edge('D', 'P', label="Clean & Normalize")
dot.edge('P', 'M', label="Feature Extraction")
dot.edge('M', 'E', label="Validate Model")
dot.edge('E', 'O', label="Generate Insights")

# Render the graph
dot.render('ml_pipeline', format='png', view=True)



ExecutableNotFound: failed to execute PosixPath('dot'), make sure the Graphviz executables are on your systems' PATH

### Scenario 5: Calculating entropy for data randomness detection

In [None]:
import numpy as np

def entropy(data):
    _, counts = np.unique(data, return_counts=True)
    probabilities = counts / counts.sum()
    return -np.sum(probabilities * np.log2(probabilities))

sample_data = np.random.randint(0, 256, 1000)
print("Entropy:", entropy(sample_data))

## 2️⃣ Pandas – Log Analysis

Pandas is ideal for handling, filtering, and analyzing large security logs.

### Scenario 1: Analyzing firewall logs for suspicious activity

In [None]:
import pandas as pd

df = pd.DataFrame({'IP': ['192.168.1.1', '10.0.0.2', '172.16.0.3', '192.168.1.1'],
                   'Action': ['ALLOW', 'BLOCK', 'ALLOW', 'BLOCK']})
print(df.groupby(['IP', 'Action']).size())

### Scenario 2: Filtering malicious IPs from a threat intelligence feed

In [None]:
import pandas as pd

df = pd.DataFrame({'IP': ['8.8.8.8', '192.168.1.1', '10.0.0.2'],
                   'Reputation': ['safe', 'malicious', 'safe']})
malicious_ips = df[df['Reputation'] == 'malicious']
print(malicious_ips)

### Scenario 3: Tracking brute force attack attempts

In [None]:
import pandas as pd

df = pd.DataFrame({'Username': ['admin', 'root', 'guest', 'admin', 'admin'],
                   'Attempts': [10, 15, 3, 5, 8]})
print(df[df['Attempts'] > 5])

### Scenario 4: Visualizing log events over time

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'Time': pd.date_range(start='1/1/2024', periods=10, freq='H'),
                   'Events': [5, 9, 15, 7, 10, 14, 8, 6, 3, 12]})
df.plot(x='Time', y='Events', kind='line')
plt.show()

### Scenario 5: Merging threat reports from multiple sources

In [None]:
import pandas as pd

df1 = pd.DataFrame({'IP': ['192.168.1.1', '10.0.0.2'], 'Threat_Level': ['High', 'Low']})
df2 = pd.DataFrame({'IP': ['192.168.1.1', '172.16.0.3'], 'Threat_Type': ['Botnet', 'Trojan']})
merged_df = pd.merge(df1, df2, on='IP', how='outer')
print(merged_df)

##  3️⃣ Scikit-learn – Machine Learning for Threat Detection

### Scenario 1: Training a model to detect malicious network traffic
**Using a Random Forest classifier to distinguish between normal and malicious network traffic.**

In [None]:
from sklearn.ensemble import RandomForestClassifier
import numpy as np

# Simulated dataset (features: packet size, duration, frequency)
X_train = np.random.rand(100, 3)
y_train = np.random.randint(0, 2, 100)  # 0: Normal, 1: Malicious

model = RandomForestClassifier(n_estimators=10)
model.fit(X_train, y_train)

# Simulated new network traffic
X_test = np.random.rand(1, 3)
prediction = model.predict(X_test)
print("Prediction (0: Normal, 1: Malicious):", prediction)

### Scenario 2: Phishing Email Classification
**Using a Naive Bayes model to classify emails as phishing or legitimate.**

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

emails = ["Free money now!", "Your invoice is attached", "Win a prize! Click here"]
labels = [1, 0, 1]  # 1: Phishing, 0: Legitimate

vectorizer = CountVectorizer()
X_train = vectorizer.fit_transform(emails)
model = MultinomialNB()
model.fit(X_train, labels)

# Predict a new email
new_email = vectorizer.transform(["Your account needs verification"])
print("Prediction (1: Phishing, 0: Legitimate):", model.predict(new_email))

### Scenario 3: Intrusion Detection System (IDS)
**Building an anomaly detection model for network intrusions.**

In [None]:
from sklearn.ensemble import IsolationForest
import numpy as np

# Simulated normal network traffic
X_train = np.random.rand(100, 2)
model = IsolationForest(contamination=0.1)
model.fit(X_train)

# Detect anomalies
X_test = np.random.rand(5, 2)
predictions = model.predict(X_test)
print("Anomaly predictions (-1: Anomalous, 1: Normal):", predictions)

### Scenario 4: Predicting Malware Behavior
**Using a decision tree to classify malware behavior based on system calls.**

In [None]:
from sklearn.tree import DecisionTreeClassifier

X_train = [[5, 3, 1], [2, 1, 0], [4, 3, 2], [6, 5, 2]]  # Features: API calls, registry access, network activity
y_train = [1, 0, 1, 1]  # 1: Malicious, 0: Benign

model = DecisionTreeClassifier()
model.fit(X_train, y_train)

X_test = [[3, 2, 1]]
print("Malware Prediction:", model.predict(X_test))

### Scenario 5: Spam Detection Using Logistic Regression
**Classifying spam messages using logistic regression.**

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import TfidfVectorizer

messages = ["Congratulations, you've won!", "Meeting at 3 PM", "Claim your free gift now"]
labels = [1, 0, 1]  # 1: Spam, 0: Ham

vectorizer = TfidfVectorizer()
X_train = vectorizer.fit_transform(messages)
model = LogisticRegression()
model.fit(X_train, labels)

# Test a new message
X_test = vectorizer.transform(["Urgent! Update your account"])
print("Spam Prediction:", model.predict(X_test))

## 4️⃣ Requests – API Interactions for OSINT & Threat Intelligence

### Scenario 1: Fetching Threat Intelligence Data
**Checking a suspicious IP against a threat intelligence API**

In [None]:
import requests
ip = "8.8.8.8"
response = requests.get(f"https://threatintel.api/check-ip/{ip}")
print(response.json())

### Scenario 2: Checking a Domain Reputation
**Querying a domain reputation service to check if a domain is malicious.**

In [None]:
import requests

domain = "suspicious-site.com"
response = requests.get(f"https://api.domainreputation.com/{domain}")
print(f"Domain Reputation for {domain}:", response.json())

### Scenario 3: Fetching Latest CVEs (Vulnerabilities)
**Retrieving the latest vulnerability information from NIST’s CVE database.**

In [None]:
import requests

response = requests.get("https://services.nvd.nist.gov/rest/json/cves/1.0")
print("Latest CVEs:", response.json())

### Scenario 4: Automating Malware Sample Downloads
**Fetching malware hashes from an open-source threat feed.**

In [None]:
import requests

response = requests.get("https://malware-api.com/hashes")
print("Malware Hashes:", response.json())

### Scenario 5: Monitoring Dark Web Data Leaks
**Checking for compromised credentials in a data breach repository.**

In [None]:
import requests

email = "target@example.com"
response = requests.get(f"https://api.databreach.com/leaks?email={email}")
print("Leak Status:", response.json())

## 5️⃣ Yara-Python – Malware Detection and Pattern Matching

### Scenario 1: Detecting Malware in Files
**Using Yara rules to scan files for malware signatures.**

In [None]:
import yara
rule = yara.compile(source='rule dummy { strings: $a = "malware" condition: $a }')
matches = rule.match(data="This file contains malware")
print(matches)

### Scenario 2: Scanning Memory for Malicious Patterns
**Using Yara to detect malware signatures in running processes.**

In [None]:
import yara

rule = yara.compile(source='rule MemoryScan { strings: $a = "keylogger" condition: $a }')
memory_data = "User typed password keylogger detected"
print("Memory Scan Results:", rule.match(data=memory_data))

### Scenario 3: Detecting Ransomware Behavior
**Using Yara to detect suspicious encryption patterns in files.**

In [None]:
import yara

rule = yara.compile(source='rule Ransomware { strings: $a = "encrypt_file" condition: $a }')
matches = rule.match(data="encrypt_file detected")
print("Ransomware Detection:", matches)

### Scenario 4: Analyzing Suspicious Email Attachments
**Scanning email attachments for malware signatures.**

In [None]:
import yara

rule = yara.compile(source='rule EmailMalware { strings: $a = "phishing_payload" condition: $a }')
attachment_data = "Contains phishing_payload"
print("Email Attachment Scan:", rule.match(data=attachment_data))

### Scenario 5: Hunting for Exploit Kits
**Using Yara to scan for exploit kit indicators.**

In [None]:
import yara

rule = yara.compile(source='rule ExploitKit { strings: $a = "exploit_trigger" condition: $a }')
exploit_data = "exploit_trigger detected"
print("Exploit Kit Scan:", rule.match(data=exploit_data))

### Scenario 6: Scanning a directory for suspicious files

In [None]:
import os
rule = yara.compile(source='rule suspicious { strings: $a = "suspicious" condition: $a }')
for file in os.listdir("/path/to/scan"):
    with open(file, "r", errors='ignore') as f:
        if rule.match(data=f.read()):
            print(f"Suspicious file detected: {file}")