<a href="https://colab.research.google.com/github/brendanpshea/computing_concepts_python/blob/main/IntroCS_07_Cybersecurity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Cybersecurity: Why Security Matters in Code
### Brendan Shea, PhD

Cybersecurity is the practice of protecting systems, networks, and programs from digital attacks. In today's interconnected world, nearly every aspect of our lives involves digital information that needs protection.

As beginning programmers, understanding cybersecurity fundamentals will help you:

* Write safer code that protects user data
* Understand potential vulnerabilities in applications
* Develop habits that prevent security breaches
  * Validating user input
  * Securing sensitive information
  * Testing for common security flaws
* Prepare for careers in an increasingly security-conscious industry

**Cybersecurity** refers to the body of technologies, processes, and practices designed to protect networks, devices, programs, and data from attack, damage, or unauthorized access.

**Malicious actors** are individuals or groups who attempt to exploit vulnerabilities in software and hardware for various purposes, including stealing data, causing disruption, or gaining unauthorized access to systems.

Remember: Security isn't something added at the end of development—it should be considered from the very beginning of any project!

# Understanding Text in Python: Strings, Unicode, and ASCII

Before we can work with cybersecurity concepts, we need to understand how computers store and process text. In Python, text is represented as **strings**, which are sequences of characters.

Each character in a string is represented by a numeric code:

* **ASCII** (American Standard Code for Information Interchange) is an older standard that uses 7 bits to represent 128 different characters
  * Only includes English letters, numbers, and basic symbols
  * Limited to characters used in English

* **Unicode** is a modern standard that can represent virtually every character from all writing systems worldwide
  * Includes characters from all languages, mathematical symbols, emojis, and more
  * Python 3 uses Unicode for all strings by default

* **UTF-8** (Unicode Transformation Format 8-bit) is the most common encoding for Unicode
  * A variable-width encoding that uses between 1 and 4 bytes per character
  * ASCII characters use just 1 byte (efficient for English text)
  * Characters from other languages use 2-4 bytes
  * This makes UTF-8 both compact and universal
  * It's the dominant encoding for the web and most software

In [None]:
# Let's examine how UTF-8 uses different numbers of bytes per character
for char in "ABCйあ😎":
    char_bytes = char.encode('utf-8')
    print(f"Character: {char}, UTF-8 bytes: {char_bytes}, Length: {len(char_bytes)} bytes")


Character: A, UTF-8 bytes: b'A', Length: 1 bytes
Character: B, UTF-8 bytes: b'B', Length: 1 bytes
Character: C, UTF-8 bytes: b'C', Length: 1 bytes
Character: й, UTF-8 bytes: b'\xd0\xb9', Length: 2 bytes
Character: あ, UTF-8 bytes: b'\xe3\x81\x82', Length: 3 bytes
Character: 😎, UTF-8 bytes: b'\xf0\x9f\x98\x8e', Length: 4 bytes


Here's how characters map to their numeric values:

| Character | ASCII Value | Unicode Value (decimal) |
|-----------|-------------|-------------------------|
| 'A'       | 65          | 65                      |
| 'B'       | 66          | 66                      |
| 'Z'       | 90          | 90                      |
| 'a'       | 97          | 97                      |
| 'z'       | 122         | 122                     |
| '!'       | 33          | 33                      |
| 'й' (Cyrillic) | N/A (not in ASCII) | 1081       |
| '東' (Japanese) | N/A (not in ASCII) | 26481      |

Let's explore working with character codes in Python:

In [None]:
# Working with character codes - Spy Communication System Basics

# The ord() function gets the numeric value of a character
print("Character to code conversion:")
print(f"The code for 'A' is: {ord('A')}")
print(f"The code for 'a' is: {ord('a')}")
print(f"The code for '!' is: {ord('!')} \n")


Character to code conversion:
The code for 'A' is: 65
The code for 'a' is: 97
The code for '!' is: 33 



In [None]:

# The chr() function converts a numeric code back to a character
print("Code to character conversion:")
print(f"The character for code 77 is: {chr(77)}")  # M
print(f"The character for code 105 is: {chr(105)}")  # i
print(f"The character for code 54 is: {chr(54)} \n")  # 6


Code to character conversion:
The character for code 77 is: M
The character for code 105 is: i
The character for code 54 is: 6 



In [None]:
# A spy might examine a message character by character
secret_codename = "Agent007"
print(f"Examining codename: {secret_codename}")
for char in secret_codename:
    print(f"Character: {char}, Code: {ord(char)}")

Examining codename: Agent007
Character: A, Code: 65
Character: g, Code: 103
Character: e, Code: 101
Character: n, Code: 110
Character: t, Code: 116
Character: 0, Code: 48
Character: 0, Code: 48
Character: 7, Code: 55


# Essential String Methods for Cryptography

Python's built-in string methods provide powerful tools for manipulating text, which will be essential for our cybersecurity work. Let's explore some key methods through examples:

* **upper()** and **lower()**: Convert text to uppercase or lowercase
  * Useful for normalizing text
  * Example: `"Secret".upper()` returns `"SECRET"`

* **join()**: Combines a list of strings with a specified separator
  * Great for reassembling processed characters
  * Example: `"-".join(['C', 'I', 'A'])` returns `"C-I-A"`

* **split()**: Divides a string into a list based on a delimiter
  * Helpful for breaking messages into processable chunks
  * Example: `"Operation Midnight".split()` returns `['Operation', 'Midnight']`

* **replace()**: Substitutes specified text with new text
  * Essential for substitution operations
  * Example: `"Agent".replace('A', '4')` returns `"4gent"`

Let's explore these methods:

In [None]:
# A classified message
message = "Meet Agent X at the Blue Parrot Cafe"

# 1. Converting case
upper_message = message.upper()
lower_message = message.lower()

print("Original message:", message)
print("Uppercase:", upper_message)
print("Lowercase:", lower_message)

Original message: Meet Agent X at the Blue Parrot Cafe
Uppercase: MEET AGENT X AT THE BLUE PARROT CAFE
Lowercase: meet agent x at the blue parrot cafe


In [None]:
# 2. Splitting strings
words = message.split()
print("\nSplit into words:", words)
print("Number of words:", len(words))

# Split by a specific character
parts = message.split('a')
print("Split by 'a':", parts)


Split into words: ['Meet', 'Agent', 'X', 'at', 'the', 'Blue', 'Parrot', 'Cafe']
Number of words: 8
Split by 'a': ['Meet Agent X ', 't the Blue P', 'rrot C', 'fe']


In [None]:
# 3. Joining strings
code_name = ['S', 'P', 'E', 'C', 'T', 'R', 'E']
joined_name = "".join(code_name)
print("\nJoined characters:", joined_name)

# Join with a separator
dash_name = "-".join(code_name)
print("Joined with dashes:", dash_name)


Joined characters: SPECTRE
Joined with dashes: S-P-E-C-T-R-E


In [None]:
# 4. Replacing text
redacted = message.replace("Agent X", "[REDACTED]")
print("\nRedacted message:", redacted)

# Multiple replacements can be chained
coded = message.replace('e', '3').replace('a', '4').replace('t', '7')
print("Basic letter substitution:", coded)


Redacted message: Meet [REDACTED] at the Blue Parrot Cafe
Basic letter substitution: M337 Ag3n7 X 47 7h3 Blu3 P4rro7 C4f3


**String immutability** means that strings cannot be modified after creation - operations create new strings instead. This is why we need to capture the result when using these methods:


In [None]:
# Demonstrating string immutability
codename = "SKYFALL"
print(f"Original codename: {codename}")

# This doesn't change codename!
codename.replace('S', '$')
print(f"After replace without assignment: {codename}")  # Still "SKYFALL"

Original codename: SKYFALL
After replace without assignment: SKYFALL


In [None]:

# This works because we create a new string and reassign
codename = codename.replace('S', '$')
print(f"After replace with assignment: {codename}")  # Now "$KYFALL"

After replace with assignment: $KYFALL


These string methods provide the building blocks for text manipulation that we'll use in more complex cryptographic operations later.

# Math Operations in Python: The Building Blocks of Encryption

Before we can design encryption algorithms, we need to understand the mathematical operations that make them work. Cryptography relies on several fundamental mathematical concepts that we'll explore in this section. These concepts might seem simple at first, but they form the cornerstone of even the most sophisticated encryption systems used today.

## The Modulo Operation (%)

The **modulo operation** is perhaps the most important mathematical concept in basic cryptography. It gives us the remainder after division between two numbers. This might sound simple, but it allows us to create "wrapping" behavior that's essential for many ciphers.

When we write `a % b`, we're asking "what's the remainder when a is divided by b?" For example:

In [None]:
# Basic modulo examples
print(f"5 % 3 = {5 % 3}")
print(f"17 % 5 = {17 % 5}")

5 % 3 = 2
17 % 5 = 2



The modulo operation is similar to how a clock works. After we reach 12, we "wrap around" to 1 again. In mathematical terms, we're performing calculations in "modulo 12" on a clock.

Imagine we're in a spy movie where agents need to schedule a meeting at a certain number of hours from now. If it's currently 9 o'clock and we need to meet 5 hours later, we'd calculate: (9 + 5) % 12 = 2 o'clock.

In [None]:
# Clock arithmetic example
current_hour = 9
hours_to_add = 5
meeting_time = (current_hour + hours_to_add) % 12
# If result is 0, it means 12 o'clock
meeting_time = 12 if meeting_time == 0 else meeting_time
print(f"Meeting time: {meeting_time} o'clock")  # 2 o'clock

Meeting time: 2 o'clock


This wrapping behavior is crucial for our encryption algorithms, especially when we need to shift letters and wrap around the alphabet. When we shift 'Z' forward, we need to wrap back to 'A', just like hours on a clock.

## Integer Division (//)

While regular division (/) gives us a decimal result, **integer division** (//) gives us only the whole number quotient, discarding any remainder. This is useful when we want to know how many complete units exist without caring about fractional parts.

Integer division and modulo are complementary operations: division tells us how many times one number fits completely into another, while modulo tells us what's left over.


In cryptography, integer division is often used alongside modulo to break numbers into components. For instance, when working with large numbers in advanced encryption, we might need to split them into blocks of a certain size.

## Prime Numbers in Cryptography

**Prime numbers** are integers greater than 1 that are only divisible by 1 and themselves. Examples include 2, 3, 5, 7, 11, and 13. What makes prime numbers special for cryptography is how they behave when multiplied together.

When two prime numbers are multiplied, the result can only be factored back into those original primes. This property becomes extremely important in advanced encryption like RSA, where security relies on the difficulty of factoring large numbers back into their prime components.

While we won't implement advanced prime-based cryptography in this course, understanding the significance of primes provides insight into why modern encryption is secure. Even with the world's fastest computers, factoring very large numbers (300+ digits) into their prime components remains computationally infeasible.

## Combining Operations for Character Transformations

In cryptography, we often convert characters to their numeric codes, perform mathematical operations on those codes, and then convert back to characters. The operations we've discussed are perfect for this.

Consider a simple transformation where we want to shift the letter 'A' forward by 3 positions to get 'D':

1. Convert 'A' to its ASCII value: 65
2. Add the shift value: 65 + 3 = 68
3. Convert the result back to a character: chr(68) = 'D'

But what if we shift 'Z' (ASCII 90) forward by 3? We'd get ASCII 93, which isn't a standard English letter. This is where modulo helps us wrap around the alphabet:

1. Convert 'Z' to its position in the alphabet: 25 (where A=0, B=1, ..., Z=25)
2. Add the shift: 25 + 3 = 28
3. Apply modulo to wrap around: 28 % 26 = 2
4. Convert position 2 back to a letter: 'C'

This small example demonstrates how we'll combine these mathematical operations to create encryption algorithms in the coming sections.

| Operation | Symbol | Example | Result | Use in Cryptography |
|-----------|--------|---------|--------|---------------------|
| Addition | `+` | `5 + 3` | `8` | Shifting characters forward |
| Subtraction | `-` | `10 - 7` | `3` | Shifting characters backward |
| Modulo | `%` | `17 % 5` | `2` | Wrapping around the alphabet |
| Integer Division | `//` | `17 // 5` | `3` | Breaking numbers into blocks |
| Exponentiation | `**` | `2 ** 3` | `8` | Used in advanced encryption |

**Modular arithmetic** forms the mathematical foundation of many encryption techniques. By performing calculations within a fixed range and wrapping around when we exceed that range, we can create reversible transformations that are perfect for encoding and decoding messages.

In the next sections, we'll apply these mathematical principles to implement our first encryption algorithm: the Caesar cipher. This ancient technique will demonstrate how these simple operations can be combined to create a basic encryption system.

In [None]:
print(f"7 / 2 = {7 / 2}")
print(f"7 // 2 = {7 // 2}")

7 / 2 = 3.5
7 // 2 = 3


# The Caesar Cipher: Your First Encryption Algorithm

The **Caesar cipher** is one of the oldest and simplest encryption techniques, named after Julius Caesar who used it to protect military messages. Despite its simplicity, it introduces fundamental concepts of substitution ciphers.

How the Caesar cipher works:

* Each letter in the plaintext is shifted a fixed number of positions in the alphabet
* The number of positions to shift is the **key**
* For example, with a key of 3:
  * 'A' becomes 'D' (shift 3 positions right)
  * 'B' becomes 'E'
  * 'Z' wraps around to 'C'

The mathematical formula for the Caesar cipher is:

* Encryption: `E(x) = (x + k) mod 26`
* Decryption: `D(x) = (x - k) mod 26`

Where `x` is the position of the original letter (0-25), `k` is the key (shift value), and the result is the position of the encrypted letter.

Let's look at the key components of implementing a Caesar cipher:

### Shifting Characters
First, let's explore how we can "shift" characters:

In [None]:
def shift_char(char, key):
    """Shift a single character by the key amount"""
    # Only encrypt letters, leave other characters unchanged
    if not char.isalpha():
        return char

    # Determine if character is uppercase or lowercase
    is_upper = char.isupper()

    # Convert to base position (0-25)
    if is_upper:
        base = ord('A')
    else:
        base = ord('a')

    # Calculate position in 0-25 range
    position = ord(char) - base

    # Shift and wrap around if needed using modulo
    new_position = (position + key) % 26

    # Convert back to character
    return chr(new_position + base)

In [None]:
# Test with a few characters
print("Single character encryption (key = 3):")
print(f"'A' → '{shift_char('A', 3)}'")
print(f"'Z' → '{shift_char('Z', 3)}'")
print(f"'a' → '{shift_char('a', 3)}'")
print(f"'?' → '{shift_char('?', 3)}' (non-alphabetic)")

Single character encryption (key = 3):
'A' → 'D'
'Z' → 'C'
'a' → 'd'
'?' → '?' (non-alphabetic)


In [None]:
# Test with different keys
print("\nShifting 'M' with different keys:")
for key in [1, 5, 13, 25]:
    print(f"Key {key}: '{shift_char('M', key)}'")


Shifting 'M' with different keys:
Key 1: 'N'
Key 5: 'R'
Key 13: 'Z'
Key 25: 'L'


### The Caesar Cipher: Shifting Strings
Now let's apply this character-shifting function to a string.

In [None]:
# Encrypting a message one character at a time

def caesar_cipher(text, key):
    """Apply the Caesar cipher to a string"""
    result = ""

    for char in text:
        result += shift_char(char, key)

    return result

In [None]:
# Try changing the message!
message = "Meet me at midnight"
key = 7

encrypted = caesar_cipher(message, key)
print(f"\nOriginal: {message}")
print(f"Encrypted (key={key}): {encrypted}")

# We can use a negative key for decryption
decrypted = caesar_cipher(encrypted, -key)
print(f"Decrypted: {decrypted}")


Original: Meet me at midnight
Encrypted (key=7): Tlla tl ha tpkupnoa
Decrypted: Meet me at midnight



**Substitution cipher** is a method of encryption where units of plaintext are replaced with ciphertext according to a fixed system. The Caesar cipher is the simplest form of a substitution cipher.

**Key space** refers to the total number of possible keys that can be used. For the Caesar cipher, the key space is very small (only 26 possible keys), making it easy to break.

While the Caesar cipher is not secure for modern use, it demonstrates important encryption principles that all advanced encryption methods build upon. If you look at the code, you'll notice the pattern:
1. Convert the text to a numeric form
2. Apply a mathematical transformation (the shift)
3. Convert back to text

This pattern appears in virtually all encryption algorithms, just with increasingly complex transformations!

# Breaking Caesar's Cipher: The Weakness of Simple Substitution

While the Caesar cipher was useful in ancient times, it's extremely vulnerable to modern analysis. Understanding why it's weak helps illustrate the need for stronger encryption methods.

The main weaknesses of the Caesar cipher include:

* **Limited key space**: With only 26 possible shifts (keys 0-25), an attacker can try all possibilities quickly
  * This is called a **brute force attack**
  * A computer can test all 26 keys almost instantly

* **Frequency analysis**: Letters in any language appear with predictable frequencies
  * In English, 'E' is the most common letter, followed by 'T', 'A', 'O', etc.
  * By analyzing the frequencies of letters in the ciphertext, the original message can often be deduced

* **Pattern preservation**: The Caesar cipher preserves patterns in the text
  * Double letters remain double letters (though different ones)
  * Word lengths remain unchanged
  * Common words like "the" and "and" create recognizable patterns

| English Letter | Frequency % | Caesar Shift (key=3) |
|----------------|-------------|----------------------|
| E | 12.7% | H |
| T | 9.1% | W |
| A | 8.2% | D |
| O | 7.5% | R |

**Cryptanalysis** is the study of analyzing information systems to study the hidden aspects of the systems, including breaking encryption methods.

**Frequency analysis** is a technique used to break substitution ciphers by examining the frequency of letters or groups of letters in the encrypted text.

Breaking the Caesar cipher demonstrates why modern encryption needs to be much more complex. By understanding these vulnerabilities, we can develop better approaches to securing information. The next sections will introduce methods that address these weaknesses and provide stronger security.

# Beyond Caesar: The Vigenère Cipher and Multi-character Keys

The **Vigenère cipher** represents a significant advancement over the Caesar cipher by using multiple shift values instead of just one. This addresses the main weakness of the Caesar cipher by making frequency analysis much more difficult to apply.

## How the Vigenère Cipher Works

The core principle of the Vigenère cipher is using a keyword to determine multiple different shift values:

* Instead of a single numeric key, use a keyword or phrase (like "SPY" or "AGENT")
* Convert each letter of the keyword to a shift value (A=0, B=1, etc.)
* Apply different shifts to different letters of the plaintext
* Repeat the keyword as needed to match the length of the message

For example, with the keyword "KEY", each letter corresponds to a different shift:
* K = 10 (K is the 11th letter, so 10 when zero-indexed)
* E = 4 (E is the 5th letter, so 4 when zero-indexed)
* Y = 24 (Y is the 25th letter, so 24 when zero-indexed)

When encrypting, we apply these shifts in sequence, repeating as needed.

## Building Our Vigenère Cipher

Let's build a Vigenère cipher implementation that reuses our previous functions. First, we need a function to convert a keyword into a sequence of shift values:


In [None]:
# Converting a keyword into shift values

def keyword_to_shifts(keyword):
    """Convert a keyword to a list of shift values (A=0, B=1, etc.)"""
    return [ord(char.upper()) - ord('A') for char in keyword if char.isalpha()]

In [None]:
# Test with a few spy-themed keywords
keywords = ["BOND", "MISSION", "SPECTRE"]

for word in keywords:
    shifts = keyword_to_shifts(word)
    print(f"Keyword: {word}")
    print(f"Shift values: {shifts}\n")

Keyword: BOND
Shift values: [1, 14, 13, 3]

Keyword: MISSION
Shift values: [12, 8, 18, 18, 8, 14, 13]

Keyword: SPECTRE
Shift values: [18, 15, 4, 2, 19, 17, 4]



Now, we can combine this with our `shift_char()` function we defined for the Caesar Cipher to create the Vignere Cipher:

In [None]:
def vigenere_cipher(text, keyword, mode='encrypt'):
    """
    Encrypt or decrypt text using the Vigenère cipher

    Parameters:
    - text: The message to encrypt or decrypt
    - keyword: The keyword to derive shift values from
    - mode: 'encrypt' or 'decrypt'

    Returns:
    - The processed text
    """
    # Convert keyword to shift values
    shifts = keyword_to_shifts(keyword)

    # If no valid shifts (empty keyword), return original text
    if not shifts:
        return text

    # For decryption, use negative shifts
    if mode == 'decrypt':
        shifts = [-shift for shift in shifts]

    result = ""
    shift_index = 0

    # Process each character
    for char in text:
        if char.isalpha():
            # Get the current shift value
            current_shift = shifts[shift_index % len(shifts)]

            # Apply the shift
            result += shift_char(char, current_shift)

            # Move to the next shift value
            shift_index += 1
        else:
            # Non-alphabetic characters remain unchanged
            result += char

    return result

Let's see a simple example of how this works:

| Plaintext | P | Y | T | H | O | N |
|-----------|---|---|---|---|---|---|
| Keyword   | K | E | Y | K | E | Y |
| Key Value | 10| 4 | 24| 10| 4 | 24|
| Encrypted | Z | C | R | R | S | L |

The cipher works by:
1. Taking a letter from the plaintext (e.g., 'P')
2. Taking the corresponding letter from the keyword (e.g., 'K')
3. Converting the keyword letter to a shift value (K = 10)
4. Applying that shift to the plaintext letter ('P' + 10 = 'Z')
5. Moving to the next letter and the next shift value

## Testing Our Vigenère Cipher

Here's a simple test of our implementation:

In [None]:
# Test the Vigenère cipher with a spy message
message = "Meet me at the usual place"
keyword = "AGENT"

encrypted = vigenere_cipher(message, keyword, 'encrypt')
print(f"Original: {message}")
print(f"Keyword: {keyword}")
print(f"Encrypted: {encrypted}")

decrypted = vigenere_cipher(encrypted, keyword, 'decrypt')
print(f"Decrypted: {decrypted}")

Original: Meet me at the usual place
Keyword: AGENT
Encrypted: Mkig fe gx gae awhtl vpnve
Decrypted: Meet me at the usual place


## Security Advantages

**Polyalphabetic cipher** is the technical term for a substitution cipher that uses multiple substitution alphabets, which is exactly what the Vigenère cipher does.

The Vigenère cipher offers significant advantages over the Caesar cipher:

1. **Multiple shift values** make simple frequency analysis ineffective
2. **Keyword length** determines the complexity - longer keywords provide better security
3. The same letter in the plaintext can be encrypted to **different letters** in the ciphertext

While still vulnerable to more advanced cryptanalysis techniques, the Vigenère cipher represents an important step in the evolution of encryption methods and introduces the crucial concept of using variable keys rather than fixed ones.

# Asymmetric Encryption: Public and Private Keys

So far, we've studied symmetric encryption methods like the Caesar cipher and Vigenère cipher, where the same key is used for both encryption and decryption. Now let's explore **asymmetric encryption** (also called public-key cryptography), which uses different keys for encryption and decryption.

Key principles of asymmetric encryption:

* Uses a **key pair**: a public key and a private key
* Messages encrypted with the public key can only be decrypted with the private key
* The public key can be freely shared, while the private key must be kept secret
* Solves the "key exchange problem" of symmetric encryption

## How Asymmetric Encryption Works

In asymmetric encryption, each person generates two mathematically related keys:

* **Public key**: Shared openly with anyone who wants to send you encrypted messages
* **Private key**: Kept secret and used to decrypt messages encrypted with your public key

The fundamental property that makes this work is that it's computationally infeasible to derive the private key from the public key, even though they're mathematically related.

In [None]:
def generate_key_pair(first_prime, second_prime):
    """Return modulus (public key) and prime factors (private key)."""
    modulus = first_prime * second_prime
    private_primes = (first_prime, second_prime)
    return modulus, private_primes

def encrypt_message(plaintext, modulus):
    """**Encryption**: multiply each character’s code by the **modulus**."""
    return [ord(char) * modulus for char in plaintext]

def decrypt_message(ciphertext, private_primes):
    """**Decryption**: recompute the **modulus** from your **private key** and divide."""
    first_prime, second_prime = private_primes
    modulus = first_prime * second_prime
    return ''.join(chr(value // modulus) for value in ciphertext)

# === Demonstration ===
# 1. Key generation
modulus, private_primes = generate_key_pair(17, 23)
print("Public key (modulus):", modulus)
print("Private key (primes):", private_primes)

# 2. Original message
plaintext = "For your eyes only"
print("Plaintext message:   ", plaintext)

# 3. Encryption
ciphertext = encrypt_message(plaintext, modulus)
print("Ciphertext values:   ", ciphertext)

# 4. Decryption
recovered = decrypt_message(ciphertext, private_primes)
print("Recovered message:   ", recovered)

Public key (modulus): 391
Private key (primes): (17, 23)
Plaintext message:    For your eyes only
Ciphertext values:    [27370, 43401, 44574, 12512, 47311, 43401, 45747, 44574, 12512, 39491, 47311, 39491, 44965, 12512, 43401, 43010, 42228, 47311]
Recovered message:    For your eyes only


> Note: The example above is a dramatic simplification to demonstrate the concept. Real asymmetric encryption algorithms like RSA use complex mathematics involving very large prime numbers.

## Real-World Applications of Asymmetric Encryption

Asymmetric encryption is used in many security systems:

* **HTTPS**: Secures websites using TLS/SSL protocols
* **SSH**: Secure remote access to servers
* **Digital signatures**: Verifying the authenticity of messages or software
* **Cryptocurrency**: Securing blockchain transactions
* **Secure email**: Systems like PGP (Pretty Good Privacy)

| Aspect | Symmetric Encryption | Asymmetric Encryption |
|--------|----------------------|------------------------|
| Keys | Same key used for encryption and decryption | Different keys for encryption and decryption |
| Key exchange | Must securely share the key beforehand | Public key can be shared openly |
| Speed | Fast, efficient for large data | Slower, typically used for small data or key exchange |
| Example | Caesar cipher, AES | RSA, ECC |
| Common uses | Bulk data encryption, file encryption | Key exchange, digital signatures, identity verification |

**RSA** (Rivest–Shamir–Adleman) is one of the most widely used asymmetric encryption algorithms, based on the mathematical properties of large prime numbers.

**Key exchange problem** refers to the challenge of securely sharing symmetric encryption keys between parties who want to communicate. Asymmetric encryption elegantly solves this problem.

## Hybrid Encryption

In practice, most modern cryptographic systems use a hybrid approach:

1. Use asymmetric encryption to securely exchange a temporary symmetric key
2. Use the symmetric key to encrypt the actual message data (faster for large amounts of data)
3. Include a digital signature created with the sender's private key to verify authenticity

This combines the security advantages of asymmetric encryption with the speed advantages of symmetric encryption.

Asymmetric encryption represents a major advancement in cryptography that has enabled secure communication over insecure channels like the internet.

# What is Hashing? One-Way Transformations

## Understanding Hashing

**Hashing** is a fundamental concept in cybersecurity that's quite different from encryption. While encryption is designed to be reversed (decrypted) with the right key, hashing is intentionally designed as a one-way process - you can convert input data into a hash, but you cannot convert the hash back to the original data.

Think of hashing like cooking an egg. You can easily turn a raw egg into a fried egg, but you can't turn the fried egg back into a raw egg. The cooking process is one-way, just like hashing.

## What Makes a Hash Function?

A good hash function has several key characteristics:

* **One-way transformation**: You can easily compute a hash from input data, but you cannot determine the original input from the hash
* **Fixed output length**: Regardless of whether your input is a single letter or an entire novel, the output hash has the same length
* **Deterministic**: The same input will always produce the same hash output
* **Avalanche effect**: A small change in the input (even just one character) produces a completely different hash
* **Collision resistance**: It should be very difficult to find two different inputs that produce the same hash

## Everyday Examples of Hashing

Hashing appears in many everyday scenarios:

1. **Library card catalogs**: Books might be organized by taking the first few letters of the author's name and the title to create a unique identifier

2. **Fingerprints**: Your fingerprint is essentially a "hash" of you - it's much smaller than your entire body but can uniquely identify you

3. **Food processing**: When you grind meat, you cannot reconstruct the original cuts from the ground product

## Digital Applications of Hashing

In computing and cybersecurity, hashing serves several critical purposes:

* **Password storage**: Websites store hashes of your passwords, not the actual passwords
* **Data integrity**: Verify files haven't been tampered with by comparing their hashes
* **Digital signatures**: Authenticate the sender of a message
* **Data indexing**: Quickly find information in large databases

## The Avalanche Effect

One of the most important properties of hash functions is that a tiny change in the input causes a completely different output. Let's consider a simple example:

* The hash of "spy" might be "43a8f9"
* The hash of "spz" (just changing the last letter) might be "91c2e7"

Notice how different the hashes are, despite only a single character difference in the input. This property makes hashing excellent for detecting even the smallest changes in data.

## Collision Resistance

While in theory many different inputs could produce the same hash (since there are infinite possible inputs but a finite number of possible hashes of a given length), good hash functions make finding such "collisions" extremely difficult.

If you think of a hash function as mapping books to shelves in a library, a collision would be two different books assigned to the same shelf. Good hash functions spread books evenly across shelves, minimizing the chance of collisions.

## Hashing vs. Encryption

To understand the difference between hashing and encryption, consider this table:

| Aspect | Hashing | Encryption |
|--------|---------|------------|
| Reversibility | One-way (cannot be reversed) | Two-way (can be decrypted) |
| Purpose | Verification without revealing original data | Protecting data for later access |
| Keys | No keys required (except for specialized variants) | Requires encryption/decryption keys |
| Output | Fixed length regardless of input size | Output size related to input size |
| Example use | Password storage | Secure messaging |

## Basic Hashing Example

Let's visualize a very basic hash function before we implement one. Imagine we have a simple rule:

1. Convert each character to its numeric value
2. Add all the values together
3. Take the remainder when divided by 100

Using this rule:
* "hello" → (104 + 101 + 108 + 108 + 111) = 532 → 532 % 100 = 32
* "bye" → (98 + 121 + 101) = 320 → 320 % 100 = 20

This is a very basic hash, but it demonstrates the concept. Real hash functions are much more complex to provide better distribution and security properties.

In the next section, we'll implement a simple hash function in Python to help visualize these principles.

# Building a Simple Hash Function: String to Number Transformation

Now that we understand what hashing is conceptually, let's build a simple hash function in Python to see these principles in action. Our hash function will transform strings into fixed-length numeric values.

## The Components of a Hash Function

A hash function typically processes input data in chunks, gradually building up the final hash value. Let's examine the key components we'll need:

1. **Initialization**: Start with an initial value
2. **Character processing**: Process each character in the input string
3. **Mixing operations**: Use mathematical operations to thoroughly mix the data
4. **Modulo reduction**: Keep the hash value within a specified range

## Implementing a Simple Hash Function

Here's a basic implementation of a hash function that converts strings to numeric values:

In [None]:
def spy_hash(message, size=1000):
    """
    A simple hash function for educational purposes

    Parameters:
    - message: The input string to hash
    - size: The maximum hash value (modulus)

    Returns:
    - An integer hash between 0 and size-1
    """
    # Start with a prime number as the initial hash value
    hash_value = 17

    # Process each character in the message
    for char in message:
        # Get ASCII value of the character
        ascii_value = ord(char)

        # Combine current hash with the character's ASCII value
        # Multiply by another prime (31) to distribute values better
        hash_value = (hash_value * 31 + ascii_value) % size

    return hash_value


Let's examine each component:

1. We start with `17` (a prime number) as our initial hash value
2. For each character, we get its ASCII value using `ord()`
3. We multiply the current hash by `31` (a prime number commonly used in hash functions)
4. We add the ASCII value of the current character
5. We take the modulo with `size` to keep the hash within our desired range

Notice how we multiply by 31 before adding the ASCII value. This ensures that the position of each character affects the final hash, not just its value. Prime numbers like 17 and 31 are commonly used in hash functions because they help ensure a good distribution of hash values.

## Testing Our Hash Function

Let's see how our function handles various inputs:

In [None]:
# Test with agent codenames
agent_codes = ["Agent007", "SecretSquirrel", "BlackWidow", "WinterSoldier"]
print("Hash values:")
for code in agent_codes:
    print(f"'{code}': {spy_hash(code)}")

Hash values:
'Agent007': 843
'SecretSquirrel': 386
'BlackWidow': 916
'WinterSoldier': 452


## Demonstrating Hash Properties

### 1. Deterministic Nature

The same input always produces the same output:

In [None]:
message = "TopSecret"
print(f"\nHash of '{message}': {spy_hash(message)}")
print(f"Hash of '{message}' again: {spy_hash(message)}")  # Same result


Hash of 'TopSecret': 308
Hash of 'TopSecret' again: 308


### 2. The Avalanche Effect

Small changes in input produce significantly different hashes:

In [None]:
message1 = "Meet at the safe house"
message2 = "Meet at the safe house."  # Added just a period

print(f"\nAvalanche effect demonstration:")
print(f"Hash of '{message1}': {spy_hash(message1)}")
print(f"Hash of '{message2}': {spy_hash(message2)}")  # Completely different


Avalanche effect demonstration:
Hash of 'Meet at the safe house': 977
Hash of 'Meet at the safe house.': 333


### 3. Fixed Output Range

Regardless of input length, the output remains within our specified range:

In [None]:
short_text = "Hi"
long_text = "This is a much longer message that contains detailed instructions for the covert operation scheduled for next Tuesday. The package will be delivered at exactly midnight near the old clock tower. Bring the usual verification codes and make sure you're not followed."

print(f"\nFixed range demonstration:")
print(f"Hash of short text: {spy_hash(short_text)}")
print(f"Hash of long text: {spy_hash(long_text)}")


Fixed range demonstration:
Hash of short text: 674
Hash of long text: 40


## Limitations of Simple Hash Functions

While our hash functions demonstrate the basic principles, they have important limitations:

1. **Poor distribution**: Hash values aren't evenly distributed
2. **Weak avalanche effect**: Changes might not affect all output bits
3. **Low collision resistance**: It's relatively easy to find inputs that hash to the same value
4. **Security vulnerabilities**: Not resistant to cryptographic attacks

For real security applications, always use established cryptographic hash functions from Python's `hashlib` module, which we'll explore later.

# Practical Cybersecurity in Python: Protecting Sensitive Data

In real-world applications, we need to protect two types of sensitive information:
* **Passwords** - should be hashed (one-way transformation)
* **Personal data** (like SSNs) - should be encrypted (reversible)

Python makes both these tasks straightforward with its built-in libraries.

## Hashing Passwords with SHA-256

Passwords should never be stored as plain text. Instead, we use a hash function:

In [None]:
import hashlib
import os

# Generate a random salt for security
salt = os.urandom(16)  # 16 random bytes

# Hash a password with the salt
def hash_password(password):
    # Convert password to bytes and combine with salt
    salted_password = salt + password.encode()
    # Create the hash
    hash_object = hashlib.sha256(salted_password)
    # Convert to hexadecimal string
    password_hash = hash_object.hexdigest()
    return password_hash

# Example
my_password = "secret123"
hashed = hash_password(my_password)
print(f"Original: {my_password}")
print(f"Hashed: {hashed}")

Original: secret123
Hashed: b1849f65ae1d623a0d5c5359ab814a28394b7552b8bd06968ca590f064319be9


This simple function:
1. Adds random "salt" data to prevent identical passwords from having the same hash
2. Uses SHA-256 (a secure hash algorithm) to create the hash
3. Returns the hash as a hexadecimal string

## Encrypting Sensitive Data

For data we need to retrieve later (like SSNs), we use encryption instead of hashing:

In [None]:
from cryptography.fernet import Fernet

# Generate a secret key
key = Fernet.generate_key()
cipher = Fernet(key)

# Encrypt data
def encrypt_ssn(ssn):
    encrypted = cipher.encrypt(ssn.encode())
    return encrypted

# Decrypt data
def decrypt_ssn(encrypted_ssn):
    decrypted = cipher.decrypt(encrypted_ssn)
    return decrypted.decode()

# Example
my_ssn = "123-45-6789"
encrypted_ssn = encrypt_ssn(my_ssn)
print(f"Original: {my_ssn}")
print(f"Encrypted: {encrypted_ssn}")
print(f"Decrypted: {decrypt_ssn(encrypted_ssn)}")

Original: 123-45-6789
Encrypted: b'gAAAAABoCjlK8EnhCBGkQ_JqqxO8zIOvPDTcFEDUDpl6oeGl9ROK17KEI_59auHtMl1c5g4O0zZEC8pKpPADHgF_UqfhGvkXoA=='
Decrypted: 123-45-6789


The `Fernet` class provides simple but secure encryption using AES-128.

## Storing User Data Securely

Now let's combine these techniques to store user information in a JSON file:

In [None]:
import json

# Function to register a new user
def register_user():
    # Get user information
    username = input("Enter username: ")
    password = input("Enter password: ")
    ssn = input("Enter SSN (xxx-xx-xxxx): ")

    # Load existing users
    users = {}
    if os.path.exists("users.json"):
        with open("users.json", "r") as f:
            users = json.load(f)

    # Check if username already exists
    if username in users:
        print("Username already exists!")
        return

    # has the password using our function
    password_hash = hash_password(password)

    # Encrypt the SSN
    encrypted_ssn = cipher.encrypt(ssn.encode()).decode()

    # Store user data
    users[username] = {
        "salt": salt.hex(),
        "password_hash": password_hash,
        "encrypted_ssn": encrypted_ssn
    }

    # Save to file
    with open("users.json", "w") as f:
        json.dump(users, f, indent=2)

    print(f"User {username} registered successfully!")

register_user()

If you run the above program, and enter a user, you should see that the resuling JSON file looks something like this:


```javascript
{
  "Smiley": {    // Record for username "Smiley"
    "salt": "a6e8456a13fed9ec6727d9a4b3735373",
    "password_hash": "c4d5473405fb87b5dd54dd8857a8700977d8445338163b13ec0fcba03baa26fd",
    "encrypted_ssn": "gAAAAABoCjq1YHb_YJmAAuldH6YGBEwz7tbIXg_NJ7OYYiLL6HwQP4A2QK23HWThRXCIVdhRaO5hgvG_ftwmaT9pNeRUOVB_bQ=="
  }
}


```

In [None]:
# View the file
!cat users.json

{
  "Smiley": {
    "salt": "a6e8456a13fed9ec6727d9a4b3735373",
    "password_hash": "c4d5473405fb87b5dd54dd8857a8700977d8445338163b13ec0fcba03baa26fd",
    "encrypted_ssn": "gAAAAABoCjq1YHb_YJmAAuldH6YGBEwz7tbIXg_NJ7OYYiLL6HwQP4A2QK23HWThRXCIVdhRaO5hgvG_ftwmaT9pNeRUOVB_bQ=="
  },
  "TacoCat": {
    "salt": "0e2e11bb15bd325cffd25a743ad358e2",
    "password_hash": "eb59ded1794558a91a4451cb3ea1bcded365d092f29855317393f324fdcba64b",
    "encrypted_ssn": "gAAAAABoCjv6SwMzgif6Uw-Ak6ugB5x-5fG6-kHlB4H_1bfnE1ReHAVhqkohi5gfWouqWPsvDjy5p0kDIu5tHMHMwX8781T3bQ=="
  },
  "brendan": {
    "salt": "0e2e11bb15bd325cffd25a743ad358e2",
    "password_hash": "bb51b702f1cbdecdcddb48be87f91908fdf895b5182cf458ff757965481200e4",
    "encrypted_ssn": "gAAAAABoCjxKWUocUM_ip_WqAkpV1vM04ArgHq2x1XIgf3N6RdZMLZJ60FeUSlR-Rop8ogRTzKpOe31jDKHqSbPLVTT5e5WQQg=="
  }
}

# Introduction to Cybersecurity: Comprehensive Chapter Summary

## Core Concepts and Real-World Relevance

Cybersecurity is the practice of protecting systems, networks, and programs from digital attacks. As our lives become increasingly digital, understanding these concepts isn't just academic—it's essential for personal privacy, professional development, and societal well-being. Whether you're developing a mobile app, managing personal finances online, or simply browsing the web, cybersecurity principles affect your daily interactions with technology.

### Text Representation in Computing Systems

Before diving into security methods, understanding how computers store and process text is fundamental:

- **ASCII (American Standard Code for Information Interchange)**
  - 7-bit encoding limited to 128 characters
  - Covers only English letters, numbers, and basic symbols
  - Historically important but limited in global applications

- **Unicode**
  - Modern standard capable of representing characters from all writing systems
  - Includes letters from all languages, mathematical symbols, emojis, etc.
  - The foundation of modern text processing systems

- **UTF-8 (Unicode Transformation Format 8-bit)**
  - Variable-width encoding using 1-4 bytes per character
  - ASCII characters remain efficient (1 byte)
  - Non-English characters use 2-4 bytes
  - The dominant encoding for web content and software

Understanding these encoding systems helps explain how text-based attacks like injection work and why proper character handling is critical for security.

## Mathematical Foundations of Cryptography

Cryptography relies on several key mathematical operations:

- **Modulo Operation (%)**
  - Provides the remainder after division between two numbers
  - Creates "wrapping" behavior essential for cipher algorithms
  - Similar to how a clock wraps from 12 back to 1
  - Example: Shifting 'Z' forward in the alphabet wraps back to 'A'

- **Prime Numbers**
  - Numbers greater than 1 only divisible by 1 and themselves
  - The building blocks of advanced encryption systems like RSA
  - Security of modern encryption relies on the difficulty of factoring large numbers
  - When two large primes are multiplied, finding those original primes is computationally difficult

- **Integer Division (//)**
  - Returns only the whole number quotient
  - Often used alongside modulo in cryptographic algorithms
  - Helps break numbers into components for processing

## Encryption Methods: From Ancient to Modern

### Caesar Cipher
This ancient method demonstrates fundamental encryption concepts:
- Each letter shifts a fixed number of positions in the alphabet
- The shift amount is the "key"
- Formula: E(x) = (x + k) mod 26 for encryption
- Formula: D(x) = (x - k) mod 26 for decryption
- Extremely vulnerable due to limited key space (only 26 possible keys)
- Susceptible to frequency analysis attacks

While no longer secure, the Caesar cipher introduces the critical pattern of:
1. Convert text to numeric form
2. Apply mathematical transformation
3. Convert back to text

This pattern appears in virtually all encryption algorithms, including modern ones.

### Vigenère Cipher
An advancement over Caesar cipher that introduces key concepts:
- Uses a keyword to determine multiple shift values
- Letters repeat based on keyword length
- More resistant to frequency analysis
- Introduces the concept of polyalphabetic substitution
- Demonstrates the importance of key length in security
- Historically significant for introducing variable keys

### Asymmetric Encryption
Modern encryption relies on public-key cryptography:
- Uses mathematically related but different keys for encryption and decryption
- Public key (freely shared) encrypts messages; private key (kept secret) decrypts them
- Based on mathematical problems that are easy in one direction but difficult to reverse
- Solves the "key exchange problem" of symmetric encryption
- Enables secure communication between parties who have never met

### Real-World Applications of Encryption Methods

| Method | Historical Context | Modern Applications | Security Level |
|--------|-------------------|---------------------|---------------|
| Caesar Cipher | Used by Julius Caesar for military messages (~100 BC) | Educational purposes, simple puzzles | Very Low |
| Vigenère Cipher | Developed in the 16th century, considered unbreakable for 300 years | Educational purposes, very simple applications | Low |
| Symmetric Encryption (AES) | Standardized in 2001 | File encryption, database security, session encryption | High (with proper implementation) |
| Asymmetric Encryption (RSA, ECC) | Developed in the 1970s | HTTPS, digital signatures, cryptocurrencies, secure messaging | High |
| Hybrid Systems | Became standard in the 1990s | TLS/SSL, secure messaging apps, VPNs | High |

## Beyond Encryption: Hashing and Data Protection

### Hashing: One-Way Transformations
Unlike encryption, hashing is intentionally designed to be irreversible:
- Converts input of any size to a fixed-length output
- Same input always produces the same hash (deterministic)
- Small changes in input produce dramatically different hashes (avalanche effect)
- Primary uses include password storage, data integrity verification, and digital signatures
- Passwords should never be stored in plain text or encrypted—they should be hashed

Key properties of effective hash functions:
- One-way transformation (cannot derive input from output)
- Fixed output length regardless of input size
- Deterministic (same input always produces same output)
- Collision resistance (difficult to find two inputs that produce the same hash)
- Avalanche effect (small input changes cause large output changes)

### Real-World Security Best Practices

- **Password Security**
  - Use hashing (not encryption) for password storage
  - Always add random "salt" to prevent identical passwords from having the same hash
  - Use established algorithms like SHA-256 with proper implementation
  - Consider using specialized password hashing functions (bcrypt, Argon2)

- **Data Protection**
  - Use encryption (not hashing) for data that needs to be retrieved
  - Properly manage encryption keys
  - Use established libraries rather than creating custom solutions
  - Consider the sensitivity of data when choosing protection methods

- **System Design**
  - Security should be considered from the beginning, not added later
  - Apply "defense in depth" with multiple security layers
  - Regularly update and patch systems
  - Test for vulnerabilities continuously

### Everyday Applications
The principles you've learned apply to your daily digital life:
- Creating strong, unique passwords
- Recognizing phishing attempts
- Evaluating the security of websites and applications
- Protecting personal data
- Making informed decisions about security trade-offs

## Conclusion

Understanding cybersecurity fundamentals helps you write safer code, identify vulnerabilities, and protect sensitive information. As technology continues to evolve, these principles will remain relevant while specific implementations change. The mathematical and conceptual foundations you've learned in this chapter will serve as building blocks for more advanced security topics and provide a framework for evaluating new security challenges throughout your career.

Remember that security isn't something added at the end of development—it should be considered from the very beginning of any project. By incorporating these principles into your programming practice early, you'll develop habits that naturally lead to more secure and robust systems.

## Python Practice
Here are some Python practice problems related to the chapter's content:

In [None]:
!wget https://github.com/brendanpshea/computing_concepts_python/raw/main/python_code_quiz/pyquiz.py -q -nc
from pyquiz import PracticeTool
practice_tool = PracticeTool(json_url='https://github.com/brendanpshea/computing_concepts_python/raw/main/python_code_quiz/python_07_cyber.json')

## Practice With Quizlet

In [None]:
%%html
<iframe src="https://quizlet.com/1043033391/learn/embed?i=psvlh&x=1jj1" height="700" width="100%" style="border:0"></iframe>

## Glossary

| Term | Definition |
|------|------------|
| Cybersecurity | The practice of protecting systems, networks, and programs from digital attacks, essential for personal privacy and professional development. |
| ASCII | A 7-bit encoding system limited to 128 characters, covering only English letters, numbers, and basic symbols. |
| Unicode | A modern standard capable of representing characters from all writing systems worldwide, including letters from all languages, mathematical symbols, and emojis. |
| UTF-8 | A variable-width encoding that uses between 1-4 bytes per character, with ASCII characters using just 1 byte for efficiency. |
| ord() | Python function that returns the numeric Unicode value of a given character. |
| chr() | Python function that converts a numeric code back to a character. |
| Modulo Operation (%) | Returns the remainder after division between two numbers, creating "wrapping" behavior essential for cipher algorithms. |
| Integer Division (//) | Returns only the whole number quotient without the remainder, often used alongside modulo in cryptographic algorithms. |
| Substitution Cipher | A method of encryption where units of plaintext are replaced with ciphertext according to a fixed system. |
| Caesar Cipher | An ancient encryption method where each letter shifts a fixed number of positions in the alphabet, with the shift amount being the "key". |
| Key Space | The total number of possible keys that can be used in an encryption system. |
| Cryptanalysis | The study of analyzing information systems to examine hidden aspects, including breaking encryption methods. |
| Frequency Analysis | A technique used to break substitution ciphers by examining the frequency of letters or groups of letters in encrypted text. |
| Vigenère Cipher | A polyalphabetic substitution cipher that uses a keyword to determine multiple shift values, making it more resistant to frequency analysis. |
| Polyalphabetic Cipher | A substitution cipher that uses multiple substitution alphabets, typically determined by a keyword. |
| Asymmetric Encryption | Encryption system using different keys for encryption (public key) and decryption (private key). |
| RSA | One of the most widely used asymmetric encryption algorithms, based on the mathematical properties of large prime numbers. |
| Key Exchange Problem | The challenge of securely sharing symmetric encryption keys between parties who want to communicate. |
| Hashing | A one-way transformation process that converts input of any size to a fixed-length output that cannot be reversed. |
| Avalanche Effect | The property where a small change in input produces a dramatically different hash output. |
| Collision Resistance | The property of a hash function that makes it difficult to find two different inputs that produce the same hash output. |
| Salt | Random data added to a password before hashing to prevent identical passwords from producing the same hash value. |
| SHA-256 | A secure hash algorithm that creates a 256-bit (32-byte) hash value, commonly used for password hashing. |
| Fernet | A class in Python's cryptography library that provides simple but secure encryption using AES-128. |
| Defense in Depth | A security approach that uses multiple layers of security controls throughout a system. |
