# Week 08 ‚Äî From Notebook to Script: Project Structure

**Time budget:** ~2 hours  
**Goal:** Move code into .py modules and run from VS Code; introduce argparse.

**Theme (PhD focus):** Human factors of privacy & security — scraping public pages (privacy policies, cookie notices, security help pages, standards/regulator guidance) and extracting *UX-relevant* signals.

---


## Responsible scraping note (important)
We will only scrape **public pages** and keep the volume small.
- Prefer a few pages, not thousands
- Respect robots.txt/Terms of Service when you scale later
- Avoid collecting personal data
- Add delays for politeness when doing multi-page work


## Setup
We‚Äôll use `requests` + `BeautifulSoup`. Install if needed:

```bash
pip install requests beautifulsoup4 pandas matplotlib
```


In [None]:
import re
import time
import json
from urllib.parse import urljoin, urlparse

import requests
from bs4 import BeautifulSoup

import pandas as pd
import matplotlib.pyplot as plt

### üß† Concept: Notebooks vs. Scripts

| Feature | Jupyter Notebook (`.ipynb`) | Python Script (`.py`) |
| :--- | :--- | :--- |
| **Analogy** | A Lab Notebook / Sketchpad | A Factory Machine |
| **Best For** | Exploration, Charts, Learning | Automation, Scheduling, Production |
| **Execution** | Cell by Cell (Manual) | Top to Bottom (Automatic) |

**The Workflow**: Explore in Notebook → Solidify in Script.

## From notebook → scripts (preview)
This week is mostly conceptual in notebook, but you‚Äôll *prepare* to move to VS Code.

You‚Äôll design:
- `scraper.py` (fetch/parse)
- `analysis.py` (stats)
- `run.py` (entrypoint)

In this notebook, we mimic a ‚Äúmodule‚Äù by grouping functions and using a `main()`.


In [None]:
def main():
    urls = [
        "https://www.mozilla.org/en-US/privacy/",
        "https://www.nist.gov/privacy-framework",
    ]
    rows = []
    for u in urls:
        try:
            html = requests.get(u, timeout=20).text
            soup = BeautifulSoup(html, "html.parser")
            rows.append({"url": u, "title": soup.title.get_text(strip=True) if soup.title else None})
        except Exception as e:
            rows.append({"url": u, "error": str(e)})
    return rows

rows = main()
rows

## Next step (outside notebook)
You‚Äôll paste the key functions into VS Code as `.py` files and run them from the terminal.
