
# Project 1 â€” Computing in Context (SIPA)

**Dataset:** `2012_SAT_Results_20251105.csv`  
**Overview:** Compute mean, median, and mode for one numeric column using (a) pandas and (b) only the Python standard library. Then draw a visualization using only stdlib.


In [None]:

import pandas as pd
import csv, re
from pathlib import Path

DATA_PATH = Path('/mnt/data/2012_SAT_Results_20251105.csv')
assert DATA_PATH.exists(), f"Dataset not found at {DATA_PATH}"


## 1) Load & preview (pandas)

In [None]:

df = pd.read_csv(DATA_PATH)
print(df.shape)
df.head(10)


## 2) pandas: mean/median/mode

In [None]:

numeric_col = "SAT Math Avg. Score"
df[numeric_col] = pd.to_numeric(df[numeric_col], errors="coerce")
mean_p = df[numeric_col].mean()
median_p = df[numeric_col].median()
mode_p = df[numeric_col].mode().iloc[0]

print("PANDAS RESULTS")
print(f"Column: {numeric_col}")
print(f"Mean:   {mean_p:.2f}")
print(f"Median: {median_p}")
print(f"Mode:   {mode_p}")


## 3) Hard way (stdlib only)

In [None]:

values = []
with open(DATA_PATH, newline="") as f:
    for row in csv.DictReader(f):
        raw = str(row[numeric_col])
        cleaned = re.sub(r"[^0-9\.-]", "", raw)
        if cleaned in {"", "-", "."}:
            continue
        try:
            values.append(float(cleaned))
        except ValueError:
            pass

# Mean
mean_h = sum(values) / len(values) if values else float("nan")

# Median
vals_sorted = sorted(values)
n = len(vals_sorted)
if n == 0:
    median_h = float("nan")
elif n % 2 == 1:
    median_h = vals_sorted[n // 2]
else:
    median_h = (vals_sorted[n // 2 - 1] + vals_sorted[n // 2]) / 2

# Mode
freq = {}
for v in vals_sorted:
    freq[v] = freq.get(v, 0) + 1
mode_h = max(freq, key=freq.get) if freq else float("nan")

print("HARD-WAY RESULTS (stdlib only)")
print(f"Count:  {n}")
print(f"Mean:   {mean_h:.6f}")
print(f"Median: {median_h}")
print(f"Mode:   {mode_h}")


## 4) Visualization (ASCII via stdlib only)

In [None]:

bins = list(range(200, 801, 50))
labels = [f"{b:3d}-{b+49:3d}" for b in bins[:-1]] + ["750-799"]
counts = {lab: 0 for lab in labels}

def label_for(v):
    if v >= 750:
        return "750-799"
    for b in bins[:-1]:
        if b <= v <= b+49:
            return f"{b:3d}-{b+49:3d}"
    return None

for v in values:
    lab = label_for(v)
    if lab:
        counts[lab] += 1

max_count = max(counts.values()) if counts else 1
print(f"ASCII Histogram for: {numeric_col}\n")
for lab in labels:
    bar_len = int((counts[lab] / max_count) * 50) if max_count else 0
    print(f"{lab}: {'#'*bar_len} {counts[lab]}")



## 5) Reflection
Pandas makes aggregation concise and reliable; the stdlib version makes each step explicit and reinforces how stats are computed. The ASCII chart communicates the distribution without external plotting libraries.
