# Architecture Data Analysis + Visualization Primer

Designed for complete Python beginners. Load pilot study data, compute simple summaries, and make figures you can reuse in A3/A4 (charts with uncertainty, multi-panel layouts, and a word cloud).

## Setup
```bash
# !pip install pandas numpy matplotlib seaborn scipy wordcloud
```
This assumes the repo root contains `data/` and `scripts/`.

In [None]:
from pathlib import Path
import sys
if '__file__' in globals():
    ROOT = Path(__file__).resolve().parents[1]
else:
    candidate = Path.cwd()
    if (candidate / 'scripts').exists():
        ROOT = candidate
    elif candidate.name == 'notebooks' and (candidate.parent / 'scripts').exists():
        ROOT = candidate.parent
    else:
        ROOT = candidate
if str(ROOT / 'scripts') not in sys.path:
    sys.path.append(str(ROOT / 'scripts'))

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

from plotting_utils import set_style, plot_error_bars, wordcloud_from_series
import data_simulation as sim
set_style()


## Load pilot observations
We compare two conditions (A vs. B) for a space.

In [None]:
pilot_path = ROOT / 'data' / 'pilot_observations_fake.csv'
if not pilot_path.exists():
    sim.save_default_fake_data(ROOT)
pilot = pd.read_csv(pilot_path, parse_dates=['timestamp'])
pilot.head()

### Simple EDA
Summaries and a first line chart.

In [None]:
pilot.describe(include='all')

In [None]:
plt.figure(figsize=(9,4))
sns.lineplot(data=pilot, x='timestamp', y='occupancy_count', hue='condition')
plt.title('Occupancy over time (pilot)')
plt.tight_layout(); plt.show()

## Bar chart with error bars
Mean dwell time by condition with 95% CIs.

In [None]:
_ = plot_error_bars(pilot, group_col='condition', value_col='dwell_time_mean_min', title='Dwell time (mean ± 95% CI)')
plt.show()

### Optional: quick statistical comparison
Independent t-test on dwell time between A and B (pilot scale).

In [None]:
a = pilot.loc[pilot.condition=='A', 'dwell_time_mean_min']
b = pilot.loc[pilot.condition=='B', 'dwell_time_mean_min']
stats.ttest_ind(a, b, equal_var=False)

## Distribution plots
Box/violin charts communicate variability quickly.

In [None]:
fig, ax = plt.subplots(1,2, figsize=(10,4))
sns.boxplot(data=pilot, x='condition', y='occupancy_count', ax=ax[0])
sns.violinplot(data=pilot, x='condition', y='dwell_time_mean_min', ax=ax[1])
ax[0].set_title('Occupancy distribution')
ax[1].set_title('Dwell time distribution')
plt.tight_layout(); plt.show()

## Bootstrapped confidence intervals (beginner-friendly)
Resample with replacement to estimate uncertainty without heavy math.

In [None]:
def bootstrap_ci(x, n_boot=2000, ci=0.95, seed=42):
    rng = np.random.default_rng(seed)
    means = []
    for _ in range(n_boot):
        sample = rng.choice(x, size=len(x), replace=True)
        means.append(np.mean(sample))
    lo = np.percentile(means, (1-ci)/2*100)
    hi = np.percentile(means, (1+(ci))/2*100)
    return float(np.mean(means)), float(lo), float(hi)

for cond in ['A','B']:
    m, lo, hi = bootstrap_ci(pilot.loc[pilot.condition==cond, 'dwell_time_mean_min'].values)
    print(cond, 'mean≈', round(m,2), 'CI', (round(lo,2), round(hi,2)))

## Survey text → Word cloud (with fallback)
Turn open-text feedback into a quick visual.

In [None]:
survey_path = ROOT / 'data' / 'survey_responses_fake.csv'
if not survey_path.exists():
    sim.save_default_fake_data(ROOT)
survey = pd.read_csv(survey_path)
_ = wordcloud_from_series(survey['response_text'])
plt.show()

## Stakeholder-ready figures
Examples you can export directly to your A4: add thresholds, annotations, and save to PNG.

In [None]:
fig, ax = plt.subplots(figsize=(7,4))
sns.barplot(data=pilot, x='condition', y='occupancy_count', estimator=np.mean, ci=95, ax=ax, color='#64B5F6')
ax.axhline(24, ls='--', c='red', label='Target ≥ 24 users')
ax.set_title('Occupancy (mean ± 95% CI) with target line')
ax.legend()
plt.tight_layout()
out = ROOT / 'docs' / 'fig_occupancy_bar.png'
plt.savefig(out, dpi=200)
out

## Multi-panel "Object Card" starter
A compact layout combining key visuals. Replace with your own charts later.

In [None]:
fig, axs = plt.subplots(2,2, figsize=(10,7))
sns.lineplot(data=pilot, x='timestamp', y='occupancy_count', hue='condition', ax=axs[0,0])
axs[0,0].set_title('Occupancy over time')
sns.boxplot(data=pilot, x='condition', y='dwell_time_mean_min', ax=axs[0,1])
axs[0,1].set_title('Dwell time distribution')
sns.scatterplot(data=pilot, x='temp_c', y='humidity_pct', hue='condition', ax=axs[1,0])
axs[1,0].set_title('Env. conditions')
sns.barplot(data=pilot, x='condition', y='occupancy_count', estimator=np.mean, ci=95, ax=axs[1,1])
axs[1,1].set_title('Occupancy mean ± CI')
plt.tight_layout()
out = ROOT / 'docs' / 'object_card_starter.png'
plt.savefig(out, dpi=200)
out

## Recap
- Load data, summarize, and plot with uncertainty
- Use word clouds or top-word bars for survey text
- Save ready-to-drop figures for A4 and presentations

Tip: replace the fake CSVs in `data/` with your own using the same column names to reuse these cells without edits.