# HI-TEC 2025
![HI-TEC 2025](https://raw.githubusercontent.com/FSCJ-FacultyDev/HITEC2025/main/images/hitec2025-logo.png)
## Secure Programming with Python

### Prof. Pamela Brauda and David Singletary
### Florida State College at Jacksonville

# Day 1: Introductory Topics
0. Welcome and Course Overview
1. Git and GitHub for Secure Version Control
2. Jupyter Notebooks and Google Colab
3. Secure Coding Basics

# Welcome

Thank you for enrolling in our workshop. The material we’ll be presenting this week draws from a variety of sources, including books, websites, and personal experience gained over eight years of teaching Python at Florida State College at Jacksonville (FSCJ). This instruction has been part of our A.S. in Computer Information Technology curriculum, our A.S. in Data Science Technology curriculum, our (now defunct) ITECH job skills grant, our (now winding down) FinTech grant, and our non-credit (CWE) course offerings.

We've included a notebook containing references, and we loosely cite these throughout the content (apologies in advance to the American Psychological Association and to any workshop attendees who are sticklers for proper source citations).

# 1. Secure Version Control with Git and GitHub
- Let's kick the workshop off by building from the ground up with secure version control using two de facto tools: Git and GitHub.
  - These form the backbone of secure software development infrastructure by enabling collaboration, access control, traceability, and early vulnerability detection.
- [Git](https://git-scm.com/) and [GitHub](https://github.com/) allow students to learn secure coding practices and conduct peer reviews. Together, these tools provide the following capabilities:
  - Support collaboration among multiple contributors without conflicts.
  - Enhance reproducibility by offering transparency in both development and research workflows.
  - Enforce access control by restricting unauthorized modifications through protected branches.
  - Encourage structured, versioned, and well-documented coding practices.


![GitHub Classroom Overview](https://raw.githubusercontent.com/FSCJ-FacultyDev/SWC-Columbus-2025/main/images/day1-gitlogo.png)
![GitHub Classroom Overview](https://raw.githubusercontent.com/FSCJ-FacultyDev/SWC-Columbus-2025/main/images/day1-ghlogo.png)

- **Git** ensures code changes are tracked, managed, and reviewed systematically for security and accountability.
  - Branching and merging allow teams to develop features and security patches separately before integrating them into the main codebase.
  - Commit history and diffs provide visibility into changes, helping to identify and prevent security vulnerabilities.
  - Rollback and recovery capabilities enable reverting to previous versions to mitigate security breaches or accidental changes.
- **GitHub** provides cloud-based tools for managing repositories, enforcing security policies, and controlling access.
  - Pull requests and code reviews ensure changes are reviewed by peers before merging, reducing the risk of introducing vulnerabilities.
  - Security tools in GitHub (e.g., Dependabot, secret scanning, code scanning) help detect vulnerabilities in dependencies and source code.
  - Access controls and compliance features, such as protected branches and role-based permissions, enforce secure coding practices and industry standards.

# GitHub Classroom

![GitHub Classroom Overview](https://raw.githubusercontent.com/FSCJ-FacultyDev/SWC-Columbus-2025/main/images/day1-ghclassroom.png)

- We use GitHub Classroom extensively in many of our data science and software development courses to manage coding assignments, encourage collaboration, and teach real-world version control practices.
- The platform allows instructors to automatically generate private repositories for each student or team, streamlining assignment distribution and submission.
- It enables instructors to monitor student progress, provide in-line feedback through pull requests, and even automate testing and grading using GitHub Actions.
- Students gain hands-on experience with industry-standard tools while reinforcing best practices in collaborative and secure software development.
- See our [HITEC 2023 presentation (PDF)](https://github.com/FSCJ-FacultyDev/SWC-Columbus-2025/raw/main/docs/GitHubClassroom-Instructor.pdf) for more information on setting up GitHub Classroom.



# 🛠️ Hands-On: Set Up a Private GitHub Repo for a Python Project

In this exercise, we will create a private GitHub repository for a simple Python project, define a dependency in `requirements.txt`, and write a basic Python script that uses that dependency.

---

### 1. Create a Private GitHub Repository

a. Go to [https://github.com](https://github.com) and log in.  
b. Click the **+** icon in the top-right corner and choose **New repository**.  
c. Fill in:  
   - Repository name: `python-demo-project` (or similar)
   - Description (optional)  
   
d. Under **Visibility**, select **Private**.  
e. (Optional) Check **"Add a README file"**.  
f. Click **Create repository**.  

---

### 2. Add a Dependency File

a. In your new repository, click **Add file** > **Create new file**.  
b. Name the file **requirements.txt**  
c. In the editor, add the following line:  

    requests==2.31.0

4. Press **Commit changes** and enter a commit message in the dialog (e.g., *Create requirements.txt*), and click **Commit changes**.

---

### 3. Add a Python Script

a. In the repository, click **Add file** > **Create new file**.  
b. Name the file **main.py**  
c. Paste in the following code:  

    import requests

    response = requests.get("https://www.example.com")
    if response.status_code == 200:
        print("Successfully reached example.com")
    else:
        print("Request failed with status:", response.status_code)


4. Commit the file.



# Multi-Factor Authentication
- GitHub supports Multi-Factor Authentication (MFA) to enhance account security by requiring users to provide an additional verification factor beyond their password (Settings > Password and authentication)
- MFA can be enabled via:
  - Time-based one-time passwords (TOTP) generated by authenticator apps like Google Authenticator, Authy, or the GitHub mobile app
  - Security keys that support FIDO2/WebAuthn
  - SMS-based authentication (less recommended due to security concerns).
- Once enabled, MFA is required during login and when performing sensitive actions, such as modifying account settings or accessing repositories with heightened security policies.
- Additionally, GitHub allows organizations to enforce MFA for members, ensuring stronger protection for repositories and codebases.

# Role-Based Access Control (RBAC)
- RBAC is a security model that restricts system access based on predefined roles assigned to users, ensuring they have only the necessary permissions to perform their tasks.
- Instead of granting individual permissions directly, RBAC assigns permissions to roles, which are then assigned to users, simplifying access management and reducing security risks.
- Originally formalized by NIST, RBAC is widely used in enterprise environments, databases, operating systems, and cloud platforms to enforce least privilege and improve compliance with security policies.
- It helps organizations efficiently manage user access, streamline administrative tasks, and minimize the risk of unauthorized actions
- GitHub uses RBAC by assigning predefined roles (like Read, Write, Admin) at the repository, organization, and enterprise levels to control user access, enabling least-privilege permissions and scalable access management.

# Student Repositories
- Access control must be balanced with collaborative learning when using GitHub in a classroom environment.
- The goal is to promote collaboration while maintaining the integrity of the repository and preventing accidental or unauthorized modifications.
- Instructors can assign Read access to students who only need to view a repository, Write access for those contributing code without merging, and Maintain or Admin roles for team leads or advanced students managing repository settings.
- Branch protection rules allows instructors to ensure students follow proper version control workflows, such as requiring pull requests and code reviews before merging.
- GitHub Classroom allows use of private (template) repositories to generate private forkable repos for students to support academic integrity.

# Branch Protection Rules
- Branch protection rules in GitHub help enforce version control best practices by restricting direct changes to important branches. To set them up:
  - In the repository on GitHub, click on the "Settings" tab.
  - In the left sidebar, under "Code and automation", click "Branches".
  - In the "Branch protection rules" section, click "Add rule".
  - In the "Branch name pattern" field, enter the branch name you want to protect (e.g., main, develop, or use wildcards like feature/*).
  - Select Protection Options. GitHub provides several protection rules you can enable:
    - Require pull request reviews before merging: set a required number of approvals (e.g., at least one review).Block self-reviews to enforce team feedback.
    - Require status checks to pass before merging: ensure automated tests (CI/CD) pass before merging.Select specific checks (e.g., Linting, Unit Tests, Build).
    - Require commit signatures: enforce cryptographic signatures to verify commit authenticity.
    - Restrict who can push to the branch: allow only instructors or specific team members to push directly.
    - Require branches to be up to date before merging: prevent merging outdated branches to avoid conflicts
    - Prevent branch deletion: stop accidental or malicious deletion of protected branches.
  - Save the Rule
    - Review the settings.
    - Click "Create" to apply the protection rule.

### Branch Protection Rules for GitHub Projects

---

#### 1. Require Pull Request Reviews Before Merging  
*Prevents direct commits to the main branch and enforces a code review process.*

**Example Rule:**
- Require at least one approved review before merging.  
- Prevent self-approval (someone else must review the changes).  
- Dismiss stale approvals if new commits are added.

---

#### 2. Require Status Checks to Pass Before Merging  
*Prevents merging unless automated tests (CI/CD pipelines) pass.*

**Example Rule:**
- Require GitHub Actions tests to pass before merging.  
- Block merging if tests fail.

---

#### 3. Restrict Who Can Push to a Branch (intro courses)
 *Limits who can make direct changes to critical branches.*

**Example Rule:**
- Only instructors can push directly to main.  
- Students must use feature branches and open pull requests.

---

#### 4. Prevent Deletion of Protected Branches  
*Stops accidental or malicious deletion of important branches.*

**Example Rule:**
- Prevent deletion of the main and develop branches.


# 🛠️ Hands-On: Explore Branch Protection Rule in a GitHub repo

In this exercise, we will explore branch protection rules for our previous GitHub repository.

---

- Modifying these rules is a common practice in professional workflows.

1. Go to your repository on GitHub.
2. Click the Settings tab (you must have admin access to see this option).
3. In the left sidebar, select Branches.
4. Under Branch protection rules, click **Add classic branch protection rule**
5. In Branch name pattern, type **main**
6. Verify **Allow deletions** is unchecked — this prevents the branch from being deleted.
7. Other rules you can set:
  - Require pull request reviews before merging
  - Require status checks (e.g. automation tool checks) to pass before merging
8. Click Create or Save changes at the bottom.

# Managing Sensitive Data (Secrets)
## Never Commit Secrets!
- API keys, credentials, and other sensitive data should never be committed to version control systems like Git because they can easily be exposed to unauthorized users, especially if the repository is public or shared across teams.
- [Don't be like Dropbox!](https://blog.gitguardian.com/dropbox-breach-hack-github-circleci/)
- Once exposed, these  values can be used by attackers to gain access to protected systems, steal data, abuse services (e.g., triggering rate limits or incurring unexpected costs), or compromise application security.
- Even in private repositories, accidental leaks are possible through forks, backups, or misconfigured access controls.
- Best practices dictate storing secrets in environment variables or secure vaults, and using .gitignore to exclude local configuration files that contain sensitive information.

# Tools for Protecting Secrets
- Tools like [git-secrets](https://github.com/awslabs/git-secrets) and [truffleHog](https://trufflesecurity.com/trufflehog) are designed to detect and prevent the accidental leakage of secrets—such as API keys, passwords, and tokens—into Git repositories.
- These tools scan commit messages, staged files, and repository history for patterns that resemble sensitive information, helping developers catch issues before they are pushed to remote servers.
-- **git-secrets** can be integrated as a pre-commit hook to block commits containing known secret patterns using regular expressions for detection.
  - A common example of a known secret pattern is an AWS Access Key ID, e.g.,

```
          AKIAIOSFODNN7EXAMPLE
```

  - detected by the regex

```
          AKIA[0-9A-Z]{16}
```

- **truffleHog** performs deep scans for high-entropy (highly random) strings that may indicate keys or credentials.


# Static Analysis and Dependency Scanning of Code "at Rest"
- Static analysis and dependency scanning in Python are essential practices for identifying code issues and securing third-party packages early in the development cycle.
- **Static analysis** analyzes code without executing it in order to detect potential errors, code smells (signs of poor design or maintainability issues), security vulnerabilities, and other issues.
- Static analysis tools like [pylint](https://pypi.org/project/pylint/), [flake8](https://flake8.pycqa.org/en/latest/), and [bandit](https://bandit.readthedocs.io/) examine Python code without executing it, flagging syntax errors, code style violations, and potential security flaws such as use of unsafe functions or insecure imports.
- **Dependency scanning** automatically identifies and evaluates third-party libraries used in a project to detect known security vulnerabilities, outdated packages, and other issues.
- Dependency scanning tools like [pip-audit](https://pypi.org/project/pip-audit/), [safety](https://www.getsafety.com/cli), or GitHub's built-in [Dependabot](https://docs.github.com/en/code-security/dependabot) check packages configured packages for known security vulnerabilities.

# 🛠️ Hands-On: Set Up a Private GitHub Repo with Dependabot Dependency Scanning

- **Dependabot** is a built-in GitHub feature that scans dependency files (like requirements.txt for Python or package.json for Node.js) and opens pull requests to update outdated or insecure packages.
- It helps keep projects secure and up to date with minimal manual effort.
- In this exercise we will configure Dependabot to automatically check our Python project's dependencies for updates and known security vulnerabilities.

1. Go to your Python project's GitHub Repository

2. Enable GitHub Security Features
  - a.	Click the Settings tab of the repository.
  - b.	In the left sidebar, go to **Advanced Security**.
  - d.	Confirm the following are enabled (the default) and enable them if necessary:
    - Dependency graph
    - Dependabot alerts
    - Dependabot security updates

3. Add a Dependabot Configuration File
  - a.	Return to the main repo page and create a YAML file named **.github/dependabot.yml** (YAML is a configuration language; we are using it here to tell Dependabot how and when to automatically check for and suggest updates to the project dependencies).
  - c.	Paste the following configuration for Python dependencies:

```
version: 2
updates:
  - package-ecosystem: "pip"
    directory: "/"  # Location of requirements.txt
    schedule:
      interval: "weekly"
```
4. Commit the file

5. Review Alerts and Pull Requests
  - a.	Under “Security/Dependabot alerts”, review any detected vulnerabilities. If issues are found, Dependabot may automatically open pull requests to update affected packages (check your email!)
  - c.	Under “Insights/Dependency Graph/Dependabot”, review **Recent update jobs**.


# 🛠️ Hands-on: Clone a repository

**The following step can be done only if you have access to Git and Python**  
**Execution assumes Windows platform**

- You can download and install a "portable" version of Git using the following link (select based on applicable platform)  

  - [Git for Windows/x64 Portable](https://github.com/git-for-windows/git/releases/download/v2.49.0.windows.1/PortableGit-2.49.0-64-bit.7z.exe)

  - [Git for Windows/ARM64 Portable](https://github.com/git-for-windows/git/releases/download/v2.49.0.windows.1/PortableGit-2.49.0-arm64.7z.exe)

- Python can be installed in user mode on a Windows system from [Python.org](https://www.python.org/)

- On your computer, open a command tool and run:

```
    git clone https://github.com/YOUR-USERNAME/python-demo-project.git  
    cd python-demo-project  
    python -m venv venv  
    venv\Scripts\activate  
    pip install -r requirements.txt  
    python main.py  
    (expected output: **Successfully reached example.com**)  
    deactivate  
```

# 2. Jupyter Notebooks and Google Colab
- Jupyter Notebooks and Google Colab provide flexible, interactive environments for secure coding practices, enabling developers to test and refine security-focused scripts in isolated, controlled settings.
  - Support for Python and various security-related libraries, making them useful for tasks like penetration testing, secure coding education, and cryptographic implementations.
  - Built-in execution controls, users can safely run code in segmented cells, reducing the risk of unintended operations.
- Colab’s cloud-based execution adds an extra layer of security by sandboxing processes away from local machines, preventing potential malware execution.
- Both platforms facilitate reproducibility and collaboration, allowing security teams to document vulnerabilities and share insights while maintaining strict access controls.

# 3. Secure Coding Basics


# Coding Style
- Good coding style is essential because it promotes clear, consistent, and readable code, making it easier to spot logic errors, unintended behavior, and security flaws during development and review.

# GotoFail: A Case Study in Style-Related Vulnerabilities
The **goto fail** bug was a critical security flaw in Apple’s SSL/TLS implementation (discovered in 2014)
  - It was caused by a duplicate goto statement in the C code.
  - The defect caused a certificate validation operation to prematurely exit, allowing attackers to impersonate secure websites and intercept encrypted communications.


```
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
    goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
    goto fail;
    goto fail;  // <- This accidental second 'goto fail;' is the bug
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
    goto fail;
```


# PEP 8 and Python Style Guidelines

- A Python Enhancement Proposal (PEP) describes a new feature, process, or  guideline for the Python community.
  - PEPs provide a structured way to propose, discuss, and document changes to the language.
    - PEP 1 (https://peps.python.org/pep-0001/) was written in March 2000 and defines what PEPs are for and how they should be used.
    - PEPs 2–7 set guidelines for Python's development, including the PEP index and workflow (PEP 2), PEP guidelines for informational proposals (PEP 3), procedures for Python releases (PEP 4 and 6), deprecation policy (PEP 5), and how to submit patches (PEP 7).
    - PEP 8 (https://peps.python.org/pep-0008/) provides Python coding style guidelines for maintainable code
  - As instructors of introductory programming courses, we have the opportunity to teach the need for code style discipline and best practices early in a student's learning journey.
  - By teaching established and consistent conventions we can help students develop habits that lead to more reliable and professional software.



# Guidelines

- These guidelines emphasize code readability and foster a mindset that prioritizes security and maintainability—critical skills for aspiring developers.

1. Formatting for Readability and Maintainability
2. Naming Conventions
3. Module Imports
4. Documentation and Comments

# Formatting for Readability and Maintainability

- Consistent Indentation (https://peps.python.org/pep-0008/#indentation)
  - Use 4 spaces (not tabs) per indentation level to avoid confusion that could lead to logical errors.
- Maximum Line Length (https://peps.python.org/pep-0008/#maximum-line-length)
  - 79 characters prevents the need for horizontal scrolling, making it easier to audit code for security flaws.



# Naming Conventions
- https://peps.python.org/pep-0008/#naming-conventions
- Use meaningful names for variables, functions, and classes.
- Avoid **shadowing**
  - Shadowing occurs when a local variable or function name in your code overrides a built-in name, making the original built-in temporarily inaccessible.
  - Don't use names that overwrite Python built-ins (e.g., don’t name a variable **id**, **list**, or **sum**).
- Use CAPITALIZED_NAMES for constants that shouldn’t be modified.

In [None]:
# shadowing and constants

TAX_RATE = 0.07 # FL sales tax, won't change without legislation
sum = 10  # Overwrites the built-in sum function

numbers = [1, 2, 3]
total = sum(numbers)

del sum # fixes it, comment out previous line first
total = sum(numbers)
print(total)

print(total + (1 + TAX_RATE))

# Module Imports

# 🛠️ Hands-On: Demonstrating Wildcard Import Issues
- Avoid wildcard imports

```
    from module import *
```

- This can introduce unintended variables and functions into the namespace, leading to unpredictable behavior.

In [None]:
# Create two module files dynamically with conflicting function names

# First module: math_tools with an add() function
# that performs numerical addition
with open('math_tools.py', 'w') as f:
    f.write("""
def add(x, y):
    return x + y
""")

# Second module: string_tools with an add() function
# that performs string concatenation
with open('string_tools.py', 'w') as f:
    f.write("""
def add(x, y):
    return x + " " + y
""")

# Import all contents from math_tools using wildcard import
from math_tools import *
print("Imported math_tools")
# Show memory address of the current add() function
print("id(add) after math_tools import:", id(add))

# Import all contents from string_tools — this silently
# overwrites the previous add()
from string_tools import *
print("Imported string_tools")
 # Show memory address has change — function was overwritten
print("id(add) after string_tools import:", id(add))

# Test which 'add' function is currently in scope
# (hint: it's the one from string_tools).
print("\nTesting add(2, 3):")
try:
    # Will raise a TypeError because string concatenation expects strings
    result = add(2, 3)
    print("Result of add(2, 3):", result)
except Exception as e:
    # Catch error caused by conflicting function definitions
    print("Error:", e)

# Clean up: delete dynamically created module files
import os
os.remove('math_tools.py')
os.remove('string_tools.py')


In [None]:
# Fix the problem shown in previous cell.
# Import the modules explicitly using aliases to avoid conflicts
import math_tools as mt
import string_tools as st

# Call add() from each module explicitly
print("Calling math_tools.add(2, 3):")
try:
    result_math = mt.add(2, 3)  # This performs numerical addition
    print("Result:", result_math)
except Exception as e:
    print("Error in math_tools.add:", e)

print("\nCalling string_tools.add('hello', 'world'):")
try:
    result_string = st.add("hello", "world")  # This performs string concatenation
    print("Result:", result_string)
except Exception as e:
    print("Error in string_tools.add:", e)

# ==== Complete Day 1 Exercise 1 Here ====
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FSCJ-FacultyDev/SWC-Columbus-2025/blob/main/exercises/Day1Exercise1_WildCardImports.ipynb)


# Whitespace

- https://peps.python.org/pep-0008/#whitespace-in-expressions-and-statements

- Use spaces around operators and after commas to improve readability. Do this:

```
    x = a + b
```

- instead of this:

```
    x=a+b
```

- Avoid extraneous whitespace in expressions; instead of:

```
    x = (a + b ) # (trailing space inside parentheses)
```

- Do this:

```
    x = (a + b)
```

# Exception Handling
- https://peps.python.org/pep-0008/#programming-recommendations
- Don’t use "bare except" statements. Instead of:

```
    try:
        process_data()
    except:
        pass # Silently ignores errors (including critical ones)
```

- Do this:

```
    try:
        process_data()
    except (ValueError, KeyError) as e:
        logger.error(f"Processing failed: {e}")  # Log the issue
```

- Raise exceptions explicitly and use meaningful exception types with clear messages.  
  Instead of:

```
    raise Exception("Error occurred")  # Valid but non-specific
```

- Do this:

```
    raise ValueError("Invalid input")
```

# Documentation and Comments
- https://peps.python.org/pep-0008/#comments
- Use docstrings for function behavior and security emphasis

In [None]:
def sanitize_input(user_input):
    """
    Cleans user input to prevent injection attacks.

    This function removes potentially dangerous characters
    to protect against code injection vulnerabilities.

    Args:
        user_input (str): The input string provided by the user.

    Returns:
        str: A sanitized version of the input safe for further processing.
    """
    return user_input.replace("<", "").replace(">", "").replace(";", "")

print(help(sanitize_input))


- Avoid inline comments disclosing security-sensitive information. Instead of:

```
	# Hashing passwords with MD5 (insecure)
```

- Do this:

```
	# Securely hash passwords
```

# Loops, Lists, Tuples, and Dictionaries

# Loops / Secure Iteration

- Avoid Infinite Loops and Ensure Proper Termination
- Infinite loops can cause a program to become unresponsive, consume excessive system resources, or create security vulnerabilities such as denial-of-service (DoS) risks.
- To ensure proper termination of loops, follow these best practices

In [None]:
# Use explicit loop conditions: ensure loops have well-defined termination conditions

count = 0
while count < 10:  # Proper termination condition
    print(count)
    count += 1  # Ensure progress towards termination

In [None]:
# Avoid using while True without a break condition

while True:
    # processing steps ...
    #
    user_input = input("Enter 'exit' to stop: ")
    if user_input.lower() == 'exit':
        break

# Implement Timeouts/Iteration Limits
- When processing user input or external data, avoid infinite loops by implementing timeouts or iteration limits.

  ```
  import time

  start_time = time.time()
  timeout = 5  # seconds

  while time.time() - start_time < timeout:
      if some_condition():  # Replace with actual condition
          break
  ```

In [None]:
# Instead of looping through large datasets, use generators to iterate over
# data securely and avoid excessive memory usage.

# The secure_generator function below produces one value at a time
# instead of returning an entire list

def secure_generator(n):
    for i in range(n):
        yield i  # Generates values on demand

for value in secure_generator(10):
    print(value)

# Security Considerations for Lists and Tuples

| List | Tuple |
|------|-------|
| Mutable | Immutable                  |
| Dynamic collections | Fixed data structures |
| Memory overhead for dynamic resizing | More memory efficient |
| Built-in methods for modification | Fewer built-in methods |
| Slightly slower for dynamic resizing | Sightly faster for large datasets |


# Tuples for Secure Data
- Tuples are hashable (can be used as dictionary keys or set elements) as long as they only contain hashable elements
- This ensures that security-sensitive mappings (e.g., user permissions, access control lists) remain unchanged.

In [None]:
# Example: Is an admin with read permissions allowed to perform a write?

non_hashable_tuple = (["admin", "read"], "write") # contains a list
access_rights = { non_hashable_tuple: True }

hashable_tuple = (("admin", "read"), "write") # hashable
access_rights = { hashable_tuple: True }

# Copying Lists
- Prevent unintended data modifications in lists by using tuples for data that should remain unchanged
  - make copies of lists before passing them to functions if modification is not intended
- Copying lists ensures that modifications to copied objects do not unintentionally affect the original, important when dealing with mutable and nested data structures
- A Python list's copy method performs a shallow copy: it only copies the outer list and keeps references to mutable elements inside.
- The **copy** module (https://docs.python.org/3/library/copy.html) allows programmers to create both shallow and deep copies of objects.
  - The copy function in this module behaves similarly to the List's copy method.
  - A **deep copy** creates a new compound object and recursively inserts copies into it of the objects found in the original.
  - This is only relevant for compound objects (objects that contain other objects, like lists or class instances).

# Copying Lists: The Shallow Copy Problem
- Since a shallow copy only copies references to the inner lists, modifying an element inside the copy also affects the original_list.

In [None]:
import copy

original_list = [[1, 2, 3], [4, 5, 6]]
shallow_copy = copy.copy(original_list)

# Modifying an inner list

shallow_copy[0][0] = 99

print(original_list)  # Output (modified): [[99, 2, 3], [4, 5, 6]]
print(shallow_copy)   # Output (modified): [[99, 2, 3], [4, 5, 6]]

# 🛠️ Hands-On: Make a Deep Copy

- copy.deepcopy() creates a new object and recursively copies all objects within it, ensuring that nested mutable objects are fully duplicated rather than just referenced so the copied object is completely independent.

In [None]:
import copy

original_list = [[1, 2, 3], [4, 5, 6]]
deep_copy = copy.deepcopy(original_list)

# Modifying an inner list
deep_copy[0][0] = 99

print(original_list)  # Output: [[1, 2, 3], [4, 5, 6]]  (Unchanged)
print(deep_copy)      # Output: [[99, 2, 3], [4, 5, 6]]  (Modified)

# ==== Complete Day 1 Exercise 2 Here ====
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FSCJ-FacultyDev/SWC-Columbus-2025/blob/main/exercises/Day1Exercise2_DeepCopies.ipynb)


# Index Errors/Boundary Overflow Best Practices

# 🛠️ Hands-On: Validating Index Values
- Always check if an index is within range before accessing elements.
- Use len() to determine valid index ranges.
- Validate user input before using it as an index.


In [None]:
 # Safely retrieve an element from a list
 def get_element(lst, indx):
    if not isinstance(indx, int): # Validate user input
        print("Error: Index must be an integer.")
        return None
    if 0 <= indx < len(lst):  # Check if index is in range
        return lst[indx]
    else:
        print("Error: Index out of range.")
        return None

# Example usage
my_list = ["apple", "banana", "cherry"]
# Valid index
print(get_element(my_list, 1))
# Out-of-range index
print(get_element(my_list, 5))
# Invalid input (not an integer)
print(get_element(my_list, "two"))

# 🛠️ Hands-On: Handling Unexpected Errors Using try-except.


In [None]:
# handle unexpected errors using try-except

# return a default value or a meaningful message on failure.
# log errors for debugging instead of silently failing.

my_list = [10, 20, 30]
try:
    print(my_list[3])
except IndexError:
    print("Index out of range!")

# How are errors logged in a real-world Python application?

- There are several popular libraries used for logging in Python applications
- Options include Loguru (https://github.com/Delgan/loguru), Structlog (https://www.structlog.org/en/stable/), and Sentry Docs (https://docs.sentry.io/platforms/python/)
- The standard library's logging module (https://docs.python.org/3/library/logging.html) is a commonly used tool which uses a file handler.

In [None]:
import logging

# Configure logging to write to a file
logging.basicConfig(
    # filename='app.log', # Log file name, goes to console otherwise
    force = True,  # only needed in notebook environment
    level=logging.ERROR,              # Minimum level to log
    format='%(asctime)s - %(levelname)s - %(message)s'
)

try:
    # Code that might raise an exception
    x = [1, 2, 3]
    print(x[5])  # Will raise IndexError
except IndexError as e:
    logging.error("An index error occurred.")  # no traceback
    #logging.error("An index error occurred.", exc_info=True)

# 🛠️ Hands-On: Iteration vs. Direct Indexing
- Use for loops instead of manually tracking indices
  - Manual index tracking is often unnecessary and can introduce errors, making code harder to read and maintain.
  - Use built-in iteration methods that automatically handle indexing.

In [None]:
# instead of this:

my_list = ["apple", "banana", "cherry"]
index = 0  # Manually track index

while index < len(my_list):
    print(my_list[index])
    index += 1  # Manually update

In [None]:
# do this:

my_list = ["apple", "banana", "cherry"]
for item in my_list:
    print(item)  # No need to manage an index

# Need to track an index? Use enumerate()
- E.g., when reading a file line by line, to keep track of the line numbers for logging, debugging, or reporting errors.

In [None]:
# Use enumerate to iterate with index-value pairs
fruits = ["apple", "banana", "cherry"]

for indx, fruit in enumerate(fruits):
    print(f"Index {indx}: {fruit}")

# List Comprehensions
- While built-in iteration methods provide both provide security and efficiency benefits over direct indexing, list comprehensions can offer advantages
- List comprehensions:
	- are more readable than built-in iteration functions, making code easier to understand.
	- allow inline conditions (if-else) without needing additional filter() calls or complex lambda functions.
	- are optimized at the C-level, often making them faster.
	- can be simpler to debug when dealing with complex transformations.

#	🛠️ Hands-On: Using a List Comprehension

In [None]:
# list comprehensions

numbers = [1, 2, 3, 4, 5, 6]

# compare to map/filter for readability
squared_evens = [x**2 for x in numbers if x % 2 == 0]
squared_evens_map = list(map(lambda x: x**2, filter(lambda x: x % 2 == 0, numbers)))
print("List Comprehension:", squared_evens)
print("Map + Filter:", squared_evens_map)

# inline conditions
words = ["apple", "banana", "cherry", "date"]
word_lengths = [len(word) if "a" in word else 0 for word in words]
print("Word Lengths:", word_lengths)

# optimized execution
large_numbers = list(range(1000000))
double_numbers = [x * 2 for x in large_numbers]  # Faster than map()

# simple debugging
debug_list = [x * 2 for x in numbers if x % 2 == 0]
print("Debug List:", debug_list)

# Use zip() to Iterate Over Multiple Lists Safely
- zip() allows you to iterate over multiple lists by pairing corresponding elements together.
- This prevents IndexErrors, which can occur when iterating manually with indices if lists have different lengths.
- By default, zip() stops at the shortest list, ensuring safe iteration.

In [None]:
# zip example

names = ["Alice", "Bob", "Charlie"]
ages = [25, 30, 35]
cities = ["New York", "Los Angeles"]

# Safe iteration using zip()
for name, age, city in zip(names, ages, cities):
    print(f"{name}, {age} years old, lives in {city}")

# zip() stops at "Los Angeles" because cities has
# fewer elements than names and ages, preventing
# out-of-bounds errors

# Use zip() to Iterate Over Multiple Lists Safely (cont)
- To handle mismatched lengths differently, itertools.zip_longest() can be used.
- In the following code cell, zip_longest() ensures that "Charlie" is included even though the cities list has fewer elements
- "Unknown" is the default value.

In [None]:
from itertools import zip_longest

names = ["Alice", "Bob", "Charlie"]
ages = [25, 30, 35]
cities = ["New York", "Los Angeles"]

# Safe iteration using zip_longest() with a default value of "Unknown"
for name, age, city in zip_longest(names, ages, cities, fillvalue="Unknown"):
    print(f"{name}, {age} years old, lives in {city}")

# Avoid Modifying Lists While Iterating Over Them
- Modifying a list while iterating over it can lead to unpredictable behavior, including skipped elements, infinite loops, or IndexError exceptions.
- A best practice for more secure coding is to create a copy of the list before iterating or use list comprehensions and filtering to generate a new list instead of modifying the original.
- Use list.copy(), list[:], or list() to be sure that changes do not affect the iteration process.
- If items need to be removed, iterate in reverse or use list comprehensions to avoid altering the list in place.
- For performance-sensitive operations, consider using filter() or itertools.dropwhile() for efficient, memory-safe modifications.

# Using Reverse Iteration (Safe Removal)
- Iterating in reverse ensures that index shifts do not affect upcoming elements.

In [None]:
numbers = [1, 2, 3, 4, 5, 6]

# Iterate in reverse
for i in range(len(numbers) - 1, -1, -1):
    if numbers[i] % 2 == 0:
        del numbers[i]

print(numbers)

# Use List Comprehensions
- Using a list comprehension avoids modifying the original list directly.

In [None]:
numbers = [1, 2, 3, 4, 5, 6]

# Remove even numbers
filtered_numbers = [x for x in numbers if x % 2 != 0]
print(filtered_numbers)

# Use filter() for Large Lists
- filter() is memory-efficient as it processes elements "lazily"
  - Generates elements one by one as needed, reducing memory usage for large datasets

In [None]:
numbers = [1, 2, 3, 4, 5, 6]

filtered_numbers = list(filter(lambda x: x % 2 != 0, numbers))
print(filtered_numbers)

# Use itertools.dropwhile() to Safely and Efficiently Remove Leading Elements
- dropwhile() stops dropping elements as soon as a condition fails.
  - This is useful for sorted lists where unwanted items are at the front.

In [None]:
from itertools import dropwhile

numbers = [2, 4, 6, 1, 3, 5]
remaining_numbers = list(dropwhile(lambda x: x % 2 == 0, numbers))
print(remaining_numbers)

# Dictionaries
## Use dict.get() for Dictionaries Instead of Direct Key Access
- dict.get() for dictionary access is generally safer than direct key access (e.g., my_dict['key']) because it prevents potential KeyError exceptions that could disrupt program flow or unintentionally expose sensitive error messages.
- A default return value can be specified when the key is missing, avoiding unhandled exceptions and reducing the likelihood of information leakage or crashes due to unexpected input or data manipulation by malicious users.
- Gracefully handles edge cases.

# 🛠️ Hands-On: Using dict.get() to Safely Access a Dictionary Element

In [None]:
user_data = {
    "username": "alice",
    "email": "alice@example.com"
    # Note: 'phone' key is missing
}

# safe access using dict.get()
phone = user_data.get("phone", "Not provided")
print(f"Phone: {phone}")

# unsafe access using direct indexing (will raise KeyError)
try:
    phone_direct = user_data["phone"]
    print(f"Phone (direct): {phone_direct}")
except KeyError:
    print("Error: 'phone' key not found — unhandled exception avoided using .get()")

# defaultdict for Missing Keys
- **collections.defaultdict** is a built-in class which proactively handles missing dictionary keys without raising exceptions, enhancing code reliability and security.
- Specifying a default factory function, such as int or list, automatically initializes missing keys with a safe, predictable value
  - Prevents KeyError exceptions that might otherwise expose implementation details or crash the application due to unanticipated input.
- In scenarios where input data is partially controlled by users, using defaultdict maintains program integrity, reduces error-handling complexity, and safeguards against logic flaws that could be exploited by attackers.

# 🛠️ Hands-On: Using defaultdict() to Handle Missing Keys

In [None]:
from collections import defaultdict

# Initialize defaultdict with list as the default factory
user_actions = defaultdict(list)

# Simulated user input (some users may be missing from initial data)
user_actions["alice"].append("login")
user_actions["bob"].append("upload_file")
user_actions["charlie"].append("logout")

# Accessing a non-existent user — will NOT raise KeyError
user_actions["david"].append("download_report")

# Output all user actions
for user, actions in user_actions.items():
    print(f"{user}: {actions}")


# 🛠️ Hands-On: Using defaultdict(int) to Count Things
- Use defaultdict(int) to securely count events like login attempts, API calls, or input errors. Useful for
  - Rate limiting
  - Brute force detection
  - Abuse tracking for unregistered usersput errors.

In [None]:
# count the login attempts per user using defaultdict to avoid manual key checks.

from collections import defaultdict

# Initialize defaultdict with int — default value is 0
login_attempts = defaultdict(int)

# Simulated login attempts (from user input or logs)
attempts = ["alice", "bob", "alice", "charlie", "alice", "bob", "david"]

# Count attempts without checking if the key exists
for user in attempts:
    login_attempts[user] += 1

# print the dictionary
# cast to a true dictionary first so the defaultdict "noise" isn't shown
print(dict(login_attempts))

# "pretty print" output the number of attempts per user
for user, count in login_attempts.items():
    print(f"{user}: {count} login attempt(s)")

# ==== Complete Day 1 Exercise 3 Here ====
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/FSCJ-FacultyDev/SWC-Columbus-2025/blob/main/exercises/Day1Exercise3_UsingDefaultDict.ipynb)


# 🛠️ Hands-On: Secure Login Handling with Account Lockout and Event Logging

In [None]:
from collections import defaultdict
from datetime import datetime

# track failed login attempts
login_attempts = defaultdict(int)

# lockout policy
MAX_ATTEMPTS = 3
LOG_FILE = "security_log.txt"

# security_log.txt will contain lockout entries
def log_event(message):
    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    with open(LOG_FILE, "a") as f:
        f.write(f"[{timestamp}] {message}\n")

# simulated login handler
def login(user, success):
    if login_attempts[user] >= MAX_ATTEMPTS:
        print(f"***Account for '{user}' is locked!***")
        log_event(f"LOCKOUT: {user} attempted login while locked.")
        return

    if success:
        print(f"{user} logged in successfully.")
        login_attempts[user] = 0  # Reset on success
    else:
        login_attempts[user] += 1
        print(f"Failed login attempt for {user} ({login_attempts[user]})")
        if login_attempts[user] >= MAX_ATTEMPTS:
            print(f"{user} has been locked out after {MAX_ATTEMPTS} failed attempts.")
            log_event(f"LOCKOUT: {user} locked out after {MAX_ATTEMPTS} failed attempts.")

# simulated login attempts
login("alice", False)
login("alice", False)
login("alice", False)
login("alice", True)  # show as locked and logged
login("bob", False)
login("bob", True)
login("carol", False)
login("carol", False)
login("carol", False)

# Using Safer Data Structures
- Use `collections.deque` when implementing fixed-size buffers.
- Good for scenarios like rolling logs, recent event tracking, or sliding windows, where only the most recent items need to be retained.
- Automatically removes oldest entries when the maximum size is reached, eliminating the need for manual cleanup logic.
- Helps prevent unbounded memory growth in applications that process continuous or untrusted input streams, making it both efficient and safer.

# 🛠️ Hands-On: Using a Deque

In [None]:
from collections import deque

# Create a deque with a fixed max size of 3
recent_inputs = deque(maxlen=3)

# Simulated stream of user inputs
inputs = ["a", "b", "c", "d", "e"]

for item in inputs:
    recent_inputs.append(item)
    print(f"Current buffer: {list(recent_inputs)}")


# If You Must Use Indexes, Use Slicing
  - Unlike direct indexing (e.g., my_list[3]), which could raise an IndexError if the index is out of bounds, slicing gracefully handles it.

  # 🛠️ Hands-On: Using Slices

In [None]:
# uncomment for error example
#my_list = [10, 20, 30]
#print(my_list[5])  # IndexError: index out of range

my_list = [10, 20, 30]
print(my_list[:5])  # Prevents out-of-range errors