# Assignment 1 — Runner (FULL, kernel‑safe)

This runner guarantees a clean **start → run → stop** sequence.

**What it does**
1. Imports `assignment.py` (your functions) and prints config.
2. Verifies **PySpark** is available and gives a friendly message if not.
3. **Re‑uses** an active Spark session or **starts** one if needed.
4. Runs **Processing → Analysis** using your functions.
5. Always **stops Spark** at the end (even if something fails).

> Tip: Use the *same Spark/PySpark kernel* as the original assignment notebook.


In [3]:

# --- Imports & config ---
import sys, pathlib
sys.path.append(str(pathlib.Path.cwd()))  # ensure local module import

# Optional hot‑reload while you iterate
try:
    %load_ext autoreload
    %autoreload 2
except Exception:
    pass

try:
    import assignment as a1
except Exception as e:
    raise RuntimeError("Could not import assignment.py; ensure it is in the SAME folder as this notebook.") from e

base, user, wasbs_data, wasbs_user = a1.setup_paths()
print(f"BASE_NOTEBOOK = {base}")
print(f"USERNAME      = {user}")
print(f"WASBS_DATA    = {wasbs_data}")
print(f"WASBS_USER    = {wasbs_user}")


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
BASE_NOTEBOOK = DATA420-25S2 Assignment 1.ipynb
USERNAME      = dew59
WASBS_DATA    = wasbs://campus-data@madsstorage002.blob.core.windows.net/ghcnd/
WASBS_USER    = wasbs://campus-user@madsstorage002.blob.core.windows.net/dew59/


In [4]:

# --- Environment check (PySpark) ---
try:
    import pyspark
    from pyspark.sql import SparkSession
    print("PySpark        =", pyspark.__version__)
    print("Active session =", SparkSession.getActiveSession())
except ModuleNotFoundError as e:
    raise RuntimeError(
        "PySpark is not available in this kernel.\n"
        "Use **Kernel → Change Kernel…** and select the SAME Spark/PySpark kernel as the original notebook,\n"
        "or install locally (e.g., `pip install pyspark==3.3.5`) with Java 8/11."
    ) from e


RuntimeError: PySpark is not available in this kernel.
Use **Kernel → Change Kernel…** and select the SAME Spark/PySpark kernel as the original notebook,
or install locally (e.g., `pip install pyspark==3.3.5`) with Java 8/11.

In [None]:

# --- Start/Reuse Spark, run, and stop ---
from pyspark.sql import SparkSession

spark = SparkSession.getActiveSession()
if spark is None:
    print("No active SparkSession; starting a new one via a1.start_spark() …")
    spark, sc = a1.start_spark(app_suffix="(runner-full)")
else:
    sc = spark.sparkContext
    print("Reusing active SparkSession:", sc.appName)

try:
    # PROCESSING
    dfs = a1.run_processing(spark, wasbs_data)
    print("Loaded dataframes:", sorted([k for k,v in dfs.items() if v is not None]))

    # ANALYSIS
    answers = a1.run_analysis(dfs)
    print("\n--- Answers (partial; fill functions in assignment.py) ---")
    for k, v in answers.items():
        print(f"{k:26s} = {v}")

    # (Optional) VISUALIZATIONS
    # figs = a1.run_visualizations(dfs, save_prefix=wasbs_user + "figs/")
    # figs
finally:
    a1.stop_spark(spark)
    print("Spark stopped.")
