# Lab 01: Python for Security Fundamentals

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/depalmar/ai_for_the_win/blob/main/notebooks/lab01_python_security.ipynb)

**Difficulty: Easy | Time: 3-4 hours | No Prerequisites**

Welcome to AI for the Win! This introductory lab teaches Python basics through security-focused examples. No prior programming experience required.

## Learning Objectives

By the end of this lab, you will:
1. **Write and run Python scripts** - Your first security-focused code
2. **Work with data types** - Strings, numbers, booleans, lists, dictionaries
3. **Control program flow** - If statements, loops, functions
4. **Read and write files** - Logs, CSVs, JSON
5. **Parse security data** - Regular expressions for IOC extraction
6. **Make HTTP requests** - API interactions for threat intelligence

## Why Python for Security?

```
                  WHY PYTHON FOR SECURITY?
                                                             
   Industry Standard: Most security tools use Python      
   Rich Libraries: pandas, requests, scikit-learn         
   Quick Prototyping: Rapid tool development              
   AI/ML Ready: All major frameworks support Python       
                                                             
   Common Uses:                                              
   - Log parsing and analysis                                
   - IOC extraction and enrichment                           
   - Automation of security tasks                            
   - Machine learning for threat detection                   
   - API integrations (VirusTotal, MISP, etc.)              
```

**Next:** Lab 04 (ML Concepts) or Lab 01 if you know ML basics

---

# Part 1: Python Basics

This section covers the fundamental building blocks of Python programming.

## 1.1 Your First Python Code

Python reads code from top to bottom. The `print()` function displays output.

In [None]:
# This is a comment - Python ignores lines starting with #
# Comments help explain your code to others (and your future self!)

print("Welcome to AI for the Win!")
print("Let's learn Python for security!")
print()  # Empty print for spacing

# Try changing these messages and running the cell again!

## 1.2 Variables and Data Types

Variables store data. Python figures out the type automatically.

```
┌─────────────────────────────────────────────────────────────┐
│                  PYTHON DATA TYPES                          │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   STRING (str)      │ Text in quotes    │ "192.168.1.1"    │
│   INTEGER (int)     │ Whole numbers     │ 443, -1, 0       │
│   FLOAT (float)     │ Decimal numbers   │ 7.5, 3.14159     │
│   BOOLEAN (bool)    │ True/False        │ True, False      │
│   LIST (list)       │ Ordered sequence  │ [1, 2, 3]        │
│   DICTIONARY (dict) │ Key-value pairs   │ {"ip": "1.2.3.4"}│
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

In [None]:
# STRINGS - Text data (use quotes, single or double)
ip_address = "192.168.1.100"
hostname = "workstation-01"
alert_message = "Suspicious login detected"

# NUMBERS - Integers (whole) and floats (decimal)
port = 443
failed_attempts = 5
risk_score = 7.5

# BOOLEANS - True or False (note capitalization!)
is_malicious = True
is_whitelisted = False

# f-strings let you embed variables in text (put f before the quotes)
print(f"Alert: {alert_message}")
print(f"Source IP: {ip_address}:{port}")
print(f"Failed attempts: {failed_attempts}")
print(f"Risk score: {risk_score}")
print(f"Is malicious? {is_malicious}")

In [None]:
# Check the type of variables
print(f"ip_address is type: {type(ip_address)}")
print(f"port is type: {type(port)}")
print(f"risk_score is type: {type(risk_score)}")
print(f"is_malicious is type: {type(is_malicious)}")

In [None]:
## 1.3 Lists - Collections of Items

Lists hold multiple items in order. Use square brackets `[]`.

In [None]:
# List of suspicious IPs (IOCs - Indicators of Compromise)
suspicious_ips = ["10.0.0.5", "192.168.1.100", "172.16.0.50"]

# Access items by index (starts at 0, not 1!)
first_ip = suspicious_ips[0]    # "10.0.0.5"
second_ip = suspicious_ips[1]   # "192.168.1.100"
last_ip = suspicious_ips[-1]    # "172.16.0.50" (negative index = from end)

print(f"First IP: {first_ip}")
print(f"Last IP: {last_ip}")
print(f"All IPs: {suspicious_ips}")

# Add items to the list
suspicious_ips.append("10.10.10.10")
print(f"After append: {suspicious_ips}")

# Check if item exists
if "192.168.1.100" in suspicious_ips:
    print("⚠️ IP 192.168.1.100 is in the suspicious list!")

# Get count
print(f"Total suspicious IPs: {len(suspicious_ips)}")

## 1.4 Dictionaries - Key-Value Pairs

Dictionaries map keys to values (like a lookup table). Use curly braces `{}`.

In [None]:
# Security event as a dictionary
event = {
    "timestamp": "2024-01-15T10:30:00Z",
    "source_ip": "192.168.1.100",
    "destination_ip": "10.0.0.5",
    "port": 443,
    "action": "blocked",
    "severity": "high"
}

# Access values by key
print(f"Event severity: {event['severity']}")
print(f"Source: {event['source_ip']}")

# Add new key
event["analyst"] = "alice"
print(f"Assigned analyst: {event['analyst']}")

# Check if key exists
if "severity" in event:
    print("✓ Severity is defined")

# Loop through keys and values
print("\n📋 Full event details:")
for key, value in event.items():
    print(f"  {key}: {value}")

---

# Part 2: Control Flow

Control flow determines what code runs and when.

## 2.1 If Statements - Making Decisions

In [None]:
# Threat classification based on score
threat_score = 8.5

# if-elif-else chain - Python checks each condition in order
# and runs the FIRST one that's True
if threat_score >= 9:
    severity = "CRITICAL"
    color = "🔴"
elif threat_score >= 7:
    severity = "HIGH"
    color = "🟠"
elif threat_score >= 4:
    severity = "MEDIUM"
    color = "🟡"
else:
    severity = "LOW"
    color = "🟢"

print(f"Score {threat_score} -> {color} Severity: {severity}")

# Try changing threat_score to see different results!

In [None]:
# Multiple conditions with AND / OR
failed_logins = 10
severity = "critical"

# AND - both conditions must be true
if failed_logins > 5 and severity in ["high", "critical"]:
    print("⚠️ Account lockout recommended")

# OR - either condition can be true
if severity == "critical" or failed_logins > 20:
    print("🚨 Immediate investigation required")

# Ternary (one-liner if/else) - useful for quick assignments
status = "blocked" if failed_logins > 3 else "allowed"
print(f"Login status: {status}")

## 2.2 Loops - Repeating Actions

Loops let you run code multiple times.

In [None]:
# FOR LOOP - iterate over a sequence (list, string, range)
iocs = ["malware.exe", "evil.dll", "backdoor.ps1"]

print("🔍 Analyzing IOCs:")
for ioc in iocs:
    print(f"  Analyzing: {ioc}")
    if ioc.endswith(".exe"):
        print("    ⚠️ Executable detected!")
    elif ioc.endswith(".ps1"):
        print("    ⚠️ PowerShell script detected!")

# FOR with range - repeat a specific number of times
print("\n📊 Counting failed attempts:")
for i in range(1, 6):  # range(1, 6) gives [1, 2, 3, 4, 5]
    print(f"  Attempt {i}")

# FOR with enumerate - get both index and value
print("\n📋 Alert queue:")
alerts = ["Malware detected", "Port scan", "Brute force"]
for index, alert in enumerate(alerts):
    print(f"  Alert #{index + 1}: {alert}")

In [None]:
# WHILE LOOP - repeat until condition is false
attempts = 0
max_attempts = 3

print("🔐 Login simulation:")
while attempts < max_attempts:
    print(f"  Login attempt {attempts + 1}")
    attempts += 1
print("❌ Max attempts reached - account locked")

---

# Part 3: Functions - Reusable Code

Functions let you package code for reuse. They take inputs (arguments), do something, and optionally return outputs.

In [None]:
# DEFINING A FUNCTION
#
# def function_name(argument1, argument2):  <-- name and inputs
#     """Docstring - explains what function does"""  <-- documentation
#     # code here
#     return result  <-- output (optional)

def calculate_risk_score(failed_logins: int, is_admin: bool, is_after_hours: bool) -> float:
    """
    Calculate risk score based on login behavior.

    This docstring explains:
    - What the function does
    - What arguments it takes
    - What it returns

    Args:
        failed_logins: Number of failed login attempts
        is_admin: Whether the account is an admin
        is_after_hours: Whether the attempt is outside business hours

    Returns:
        Risk score from 0.0 to 10.0
    """
    score = 0.0

    # Base score from failed logins (cap at 5 points)
    score += min(failed_logins, 5)

    # Admin accounts are higher risk
    if is_admin:
        score += 3.0

    # After-hours activity is suspicious
    if is_after_hours:
        score += 2.0

    return min(score, 10.0)  # Cap at 10

# USING THE FUNCTION
risk = calculate_risk_score(failed_logins=4, is_admin=True, is_after_hours=True)
print(f"Risk score: {risk}/10")

if risk >= 7:
    print("🚨 HIGH RISK - Investigate immediately")
elif risk >= 4:
    print("⚠️ MEDIUM RISK - Review within 24 hours")
else:
    print("✅ LOW RISK - Log for reference")

In [None]:
# Another example: IP classification function
def is_private_ip(ip: str) -> bool:
    """
    Check if an IP address is in a private range (RFC 1918).

    Private ranges:
        - 10.0.0.0/8     (10.0.0.0 - 10.255.255.255)
        - 172.16.0.0/12  (172.16.0.0 - 172.31.255.255)
        - 192.168.0.0/16 (192.168.0.0 - 192.168.255.255)

    Args:
        ip: IPv4 address as string (e.g., "192.168.1.1")

    Returns:
        True if private, False if public
    """
    # Split IP into octets (the 4 numbers)
    octets = [int(x) for x in ip.split(".")]

    # Check private ranges
    # 10.0.0.0/8
    if octets[0] == 10:
        return True
    # 172.16.0.0/12 (172.16.x.x through 172.31.x.x)
    if octets[0] == 172 and 16 <= octets[1] <= 31:
        return True
    # 192.168.0.0/16
    if octets[0] == 192 and octets[1] == 168:
        return True

    return False

# Test with different IPs
test_ips = ["192.168.1.1", "8.8.8.8", "10.0.0.1", "172.16.50.1", "1.1.1.1"]
print("🌐 IP Classification:")
for ip in test_ips:
    ip_type = "🏠 Private" if is_private_ip(ip) else "🌍 Public"
    print(f"  {ip}: {ip_type}")

---

# Part 4: Regular Expressions for Security

Regular expressions (regex) are patterns for matching text. They're essential for extracting IOCs (Indicators of Compromise) from logs and reports.

```
┌─────────────────────────────────────────────────────────────┐
│              COMMON REGEX PATTERNS FOR SECURITY              │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   \d       → Any digit (0-9)                               │
│   \w       → Any word character (a-z, A-Z, 0-9, _)         │
│   \s       → Any whitespace (space, tab, newline)          │
│   .        → Any character (except newline)                │
│   +        → One or more of the previous                   │
│   *        → Zero or more of the previous                  │
│   ?        → Zero or one of the previous                   │
│   {n}      → Exactly n of the previous                     │
│   {n,m}    → Between n and m of the previous               │
│   [abc]    → Any character in the set                      │
│   [a-z]    → Any character in the range                    │
│   \b       → Word boundary                                 │
│   ^        → Start of string                               │
│   $        → End of string                                 │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

In [None]:
import re  # The regex module

# EXTRACTING IPs FROM TEXT
# The r"" means "raw string" - treats backslashes literally
# Pattern breakdown:
#   \b           - word boundary (so we don't match partial numbers)
#   (?:\d{1,3}\.){3}  - three groups of 1-3 digits followed by a dot
#   \d{1,3}      - final group of 1-3 digits
#   \b           - word boundary

log_line = "Failed login from 192.168.1.100 to 10.0.0.5 at 2024-01-15 10:30:00"

ip_pattern = r"\b(?:\d{1,3}\.){3}\d{1,3}\b"
ips = re.findall(ip_pattern, log_line)  # findall returns ALL matches

print(f"📍 Log line: {log_line}")
print(f"🔍 Found IPs: {ips}")

In [None]:
# EXTRACTING FILE HASHES
# Hashes are fixed-length hexadecimal strings:
#   MD5:     32 characters (e.g., d41d8cd98f00b204e9800998ecf8427e)
#   SHA1:    40 characters
#   SHA256:  64 characters

text = """
Malware analysis report:
MD5: d41d8cd98f00b204e9800998ecf8427e
SHA256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
Additional sample: MD5: abc123def456abc123def456abc12345
"""

# Patterns: [a-fA-F0-9] matches hex characters, {32} means exactly 32
md5_pattern = r"\b[a-fA-F0-9]{32}\b"
sha256_pattern = r"\b[a-fA-F0-9]{64}\b"

md5_hashes = re.findall(md5_pattern, text)
sha256_hashes = re.findall(sha256_pattern, text)

print(f"🔐 MD5 hashes found ({len(md5_hashes)}):")
for h in md5_hashes:
    print(f"    {h}")

print(f"\n🔐 SHA256 hashes found ({len(sha256_hashes)}):")
for h in sha256_hashes:
    print(f"    {h}")

---

# Part 5: Working with Files

Security work involves reading logs, writing reports, and parsing structured data.

```
┌─────────────────────────────────────────────────────────────┐
│                  FILE OPERATIONS                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   OPEN MODES:                                               │
│   "r"  → Read (file must exist)                            │
│   "w"  → Write (creates new or overwrites)                 │
│   "a"  → Append (adds to end of file)                      │
│   "r+" → Read and write                                    │
│                                                             │
│   COMMON FORMATS:                                           │
│   .txt  → Plain text (logs, blocklists)                    │
│   .csv  → Comma-separated values (alerts, events)          │
│   .json → Structured data (API responses, configs)         │
│                                                             │
│   THE 'with' STATEMENT:                                     │
│   with open("file.txt", "r") as f:                         │
│       content = f.read()                                   │
│   # File automatically closed after the block              │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

In [None]:
import json

# =========================================
# WRITING AND READING JSON FILES
# =========================================
# JSON is the most common format for security data (API responses, configs)

# Create sample threat data
threats = [
    {"ip": "45.33.32.156", "type": "c2", "score": 9.5},
    {"ip": "185.220.101.1", "type": "scanner", "score": 6.0},
]

# WRITE to JSON file
# indent=2 makes it human-readable with nice formatting
with open("threats.json", "w") as f:
    json.dump(threats, f, indent=2)
print("✅ Wrote threats.json")

# READ from JSON file
with open("threats.json", "r") as f:
    loaded = json.load(f)

print(f"📂 Loaded {len(loaded)} threats from file:")
print(json.dumps(loaded, indent=2))

# =========================================
# WORKING WITH TEXT FILES
# =========================================

# Write a blocklist
blocked_ips = ["10.0.0.5", "172.16.0.50", "192.168.1.100"]
with open("blocklist.txt", "w") as f:
    for ip in blocked_ips:
        f.write(f"{ip}\n")  # \n adds newline
print("\n✅ Wrote blocklist.txt")

# Read and process line by line
print("📋 Reading blocklist:")
with open("blocklist.txt", "r") as f:
    for line in f:
        ip = line.strip()  # Remove whitespace/newline
        print(f"  Blocking: {ip}")

---

# Part 6: Making API Requests

APIs (Application Programming Interfaces) let you interact with external services programmatically.

```
┌─────────────────────────────────────────────────────────────┐
│                  HTTP BASICS                                 │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   HTTP METHODS:                                             │
│   GET    → Retrieve data (read-only)                       │
│   POST   → Submit/create data                              │
│   PUT    → Update existing data                            │
│   DELETE → Remove data                                     │
│                                                             │
│   STATUS CODES:                                             │
│   200 → OK (success)                                       │
│   201 → Created                                            │
│   400 → Bad Request (your error)                           │
│   401 → Unauthorized (need auth)                           │
│   403 → Forbidden (not allowed)                            │
│   404 → Not Found                                          │
│   429 → Too Many Requests (rate limited)                   │
│   500 → Server Error                                       │
│                                                             │
└─────────────────────────────────────────────────────────────┘
```

In [None]:
import requests

# =========================================
# MAKING API REQUESTS
# =========================================

# SIMPLE GET REQUEST
response = requests.get("https://httpbin.org/ip", timeout=5)
print(f"Status Code: {response.status_code}")
print(f"Response: {response.json()}")

# GET WITH QUERY PARAMETERS
# ?key=value&key2=value2
params = {"query": "python security", "limit": 10}
response = requests.get("https://httpbin.org/get", params=params, timeout=5)
print(f"\n📤 Sent parameters: {response.json()['args']}")

# =========================================
# ERROR HANDLING (IMPORTANT!)
# =========================================

def safe_api_request(url: str, timeout: int = 5) -> dict:
    """
    Make an API request with proper error handling.

    Always handle errors - APIs can fail for many reasons:
    - Network issues
    - Rate limiting
    - Server errors
    - Invalid responses
    """
    try:
        response = requests.get(url, timeout=timeout)
        response.raise_for_status()  # Raises exception for 4xx/5xx status codes
        return {"success": True, "data": response.json()}
    except requests.Timeout:
        return {"success": False, "error": "Request timed out"}
    except requests.HTTPError as e:
        return {"success": False, "error": f"HTTP {e.response.status_code}"}
    except requests.RequestException as e:
        return {"success": False, "error": str(e)}

# Test with a working URL
result = safe_api_request("https://httpbin.org/ip")
print(f"\n✅ Good request: {result}")

# Test with a bad URL
result = safe_api_request("https://httpbin.org/status/404")
print(f"❌ Bad request: {result}")

---

# Part 7: Putting It All Together - IOC Extractor

Now let's combine everything you've learned to build a real security tool!

In [None]:
# =========================================
# IOC EXTRACTOR - A REAL SECURITY TOOL!
# =========================================
# This combines: functions, regex, dictionaries, and loops

def extract_iocs(text: str) -> dict:
    """
    Extract Indicators of Compromise (IOCs) from text.

    This is a common task in security operations:
    - Parsing threat reports
    - Analyzing phishing emails
    - Processing incident tickets

    Args:
        text: Any text that might contain IOCs

    Returns:
        Dictionary of IOC types and their values
    """
    iocs = {
        # IPv4 addresses
        "ips": re.findall(r"\b(?:\d{1,3}\.){3}\d{1,3}\b", text),

        # MD5 hashes (32 hex characters)
        "md5": re.findall(r"\b[a-fA-F0-9]{32}\b", text),

        # SHA256 hashes (64 hex characters)
        "sha256": re.findall(r"\b[a-fA-F0-9]{64}\b", text),

        # Domains (simplified pattern)
        "domains": re.findall(r"\b[a-zA-Z0-9][a-zA-Z0-9-]{0,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}\b", text),

        # URLs (http or https)
        "urls": re.findall(r'https?://[^\s<>"]+', text),

        # Email addresses
        "emails": re.findall(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', text),
    }

    # Remove duplicates and empty categories
    # set() removes duplicates, list() converts back
    return {k: list(set(v)) for k, v in iocs.items() if v}


# Test with a sample threat report
threat_report = """
THREAT INTELLIGENCE REPORT
==========================

The APT group deployed a new variant of their malware toolkit.

Initial Access: Phishing email from attacker@malicious-corp.com
Subject: "Invoice #12345 - Payment Required"

The malware connects to these C2 servers:
- 45.33.32.156 (primary)
- 185.220.101.1 (backup)
- malware-c2.evil-domain.com

Malware Hashes:
- MD5: d41d8cd98f00b204e9800998ecf8427e
- SHA256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

The payload downloads additional tools from:
https://malware.example.com/beacon
https://update-service.evil-domain.com/payload.exe

Contact security@your-company.com if you observe this activity.
"""

# Extract and display IOCs
print("🔍 EXTRACTING IOCs FROM THREAT REPORT")
print("=" * 50)

extracted = extract_iocs(threat_report)

for ioc_type, values in extracted.items():
    print(f"\n📌 {ioc_type.upper()} ({len(values)} found):")
    for v in values:
        print(f"    • {v}")

---

# 🎉 Congratulations!

You've learned Python basics with security context! You can now:

✅ Write Python scripts with variables, functions, and control flow
✅ Work with lists, dictionaries, and files
✅ Use regular expressions to extract IOCs
✅ Make API requests with error handling

## Quick Reference

```python
# STRINGS
text = "Hello"
text.lower()           # "hello"
text.upper()           # "HELLO"
text.split(",")        # Split into list
"x" in text            # Check if contains
f"Value: {var}"        # f-string formatting

# LISTS
items = [1, 2, 3]
items.append(4)        # Add item
items[0]               # First item
items[-1]              # Last item
len(items)             # Length
[x*2 for x in items]   # List comprehension

# DICTIONARIES
d = {"key": "value"}
d["key"]               # Get value
d.get("key", "default") # Get with default
d.keys()               # All keys
d.values()             # All values
d.items()              # Key-value pairs

# FILES
with open("file.txt", "r") as f:  # Read
with open("file.txt", "w") as f:  # Write
json.load(f)           # Read JSON
json.dump(data, f)     # Write JSON

# REGEX
import re
re.findall(pattern, text)  # Find all matches
re.search(pattern, text)   # Find first match
re.sub(pattern, repl, text) # Replace

# API REQUESTS
import requests
response = requests.get(url, timeout=5)
response.status_code   # HTTP status
response.json()        # Parse JSON response
```

## Next Steps

Continue your learning journey:

| Lab | Topic | What You'll Build |
|-----|-------|-------------------|
| **00b** | ML Concepts Primer | Understanding machine learning |
| **00f** | Hello World ML | Your first classifier |
| **00g** | Working with APIs | API integration skills |
| **01** | Phishing Classifier | Email threat detection |

## Practice Exercises

Try these on your own:

1. **Failed Login Analyzer**: Read login events, count failures per user, flag users with >3 failures
2. **IOC Blocklist Generator**: Validate IPs, filter private addresses, write blocklist file
3. **Log Monitor**: Parse log file, extract ERROR/WARN messages, group by hour

---

*You're ready for more advanced security AI labs! 🚀*