# Assignment 3 — Task 1: Linked Views

This section presents four coordinated views built with Altair/Vega-Lite. Linking is achieved through shared selections and parameters so that filtering or highlighting in one view updates the others. The intent is to compare distributions and relationships without losing context.

In [None]:
import pandas as pd
import altair as alt
import geopandas as gpd

alt.data_transformers.disable_max_rows()
print(f"Altair {alt.__version__}")

In [None]:
df = pd.read_parquet('sf_property_data_clean.parquet')
df = df.dropna(subset=['year', 'total_assessed_value', 'neighborhood'])

# Prepare aggregated datasets
nbhd_yearly = df.groupby(['neighborhood', 'year']).agg({
    'total_assessed_value': ['median', 'mean', 'count']
}).reset_index()
nbhd_yearly.columns = ['neighborhood', 'year', 'median_value', 'mean_value', 'property_count']

# Current year stats
current_stats = df[df['year'] == 2023].groupby('neighborhood').agg({
    'total_assessed_value': 'median',
    'land_value_pct': 'mean',
    'building_age': 'mean'
}).reset_index()
current_stats.columns = ['neighborhood', 'median_value', 'land_pct', 'building_age']

# Calculate appreciation
value_2015 = df[df['year'] == 2015].groupby('neighborhood')['total_assessed_value'].median()
value_2023 = df[df['year'] == 2023].groupby('neighborhood')['total_assessed_value'].median()
appreciation = ((value_2023 - value_2015) / value_2015 * 100).reset_index()
appreciation.columns = ['neighborhood', 'appreciation_pct']
current_stats = current_stats.merge(appreciation, on='neighborhood', how='left')

# Sample for scatter plots
df_sample = df[df['year'] == 2023].sample(n=15000, random_state=42)

print(f"Data loaded: {len(df):,} records")

### View 1 — Property Value Distribution

**Questions Answered:**
- How are property values distributed across San Francisco?
- What is the relationship between building age and property value?
- Which property types dominate different value ranges?

**Interactions:**
- Brush selection on histogram to filter value range
- Filtered scatter plot (age vs value) and bar chart (property types) update
- Tooltips show exact values on hover

**Why:** Brush ideal for continuous variables; provides visual distribution context; immediate feedback across linked views.

**Alternatives Rejected:**
- Slider input: lacks distribution overview
- Click on bins: only discrete selection, not continuous ranges
- Small multiples: separates views, prevents cross-filtering

**Encodings:**
- Histogram (mark_bar): effective for frequency distributions (Cleveland & McGill 1984)
- Log scale (scatter Y): handles right-skewed value distribution (Tufte 2001)
- Position (X/Y): most accurate perceptual channel (Mackinlay 1986)
- Conditional color: steelblue/lightgray for selection contrast (Ware 2012)
- Opacity (0.5): mitigates overplotting in scatter

**Three Panels:**
1. Histogram (top): Value distribution with brush
2. Scatter (bottom-left): Age vs Value (filtered)
3. Bar (bottom-right): Property type counts (filtered)


In [None]:
brush = alt.selection_interval(encodings=['x'])

hist = alt.Chart(df_sample).mark_bar().encode(
    x=alt.X('total_assessed_value:Q', bin=alt.Bin(maxbins=40), title='Property Value ($)'),
    y=alt.Y('count()', title='Count'),
    color=alt.condition(brush, alt.value('steelblue'), alt.value('lightgray'))
).add_params(brush).properties(width=700, height=120, title='Brush to Select Value Range')

scatter = alt.Chart(df_sample).mark_circle(size=30, opacity=0.5).encode(
    x=alt.X('building_age:Q', title='Building Age (years)', scale=alt.Scale(domain=[0, 150])),
    y=alt.Y('total_assessed_value:Q', title='Value ($)', scale=alt.Scale(type='log')),
    color=alt.condition(brush, alt.Color('property_class_code_definition:N', legend=None), alt.value('lightgray')),
    tooltip=['building_age', 'total_assessed_value', 'property_class_code_definition']
).transform_filter(brush).properties(width=350, height=300, title='Age vs Value (Filtered)')

bar = alt.Chart(df_sample).mark_bar().encode(
    y=alt.Y('property_class_code_definition:N', sort='-x', title='Property Type'),
    x=alt.X('count()', title='Count'),
    color=alt.value('steelblue')
).transform_filter(brush).properties(width=330, height=300, title='Property Types (Filtered)')

hist & (scatter | bar)

### View 2 — Neighborhood Comparison

**Questions Answered:**
- Which neighborhoods have highest property values (2023)?
- How have neighborhood values evolved 2015-2023?
- What characteristics (appreciation, land %, age) distinguish high-value neighborhoods?

**Interactions:**
- Click to select neighborhoods (Shift+click for multiple)
- Time series highlights selected neighborhoods
- Characteristics bar shows only selected neighborhoods
- Tooltips provide exact values

**Why:** Click natural for categorical selection; multi-select enables comparison; highlighting maintains context while emphasizing selection (Shneiderman 1996).

**Alternatives Rejected:**
- Dropdown: hides overview, requires multiple clicks
- Brush on time series: less intuitive for categories
- Small multiples: 20+ panels overwhelm display

**Encodings:**
- Sorted bar chart: facilitates ranking (Cleveland & McGill 1984)
- Viridis color: perceptually uniform for values (Rogowitz & Treinish 1998)
- Conditional encoding: selected (color/opacity) vs unselected (gray) for focus+context (Card et al. 1999)
- Line with points: standard for temporal data (Tufte 2001)
- Stroke width (1px vs 3px): reinforces selection (Bertin 1983)

**Three Panels:**
1. Bar chart (top): Neighborhoods by median value (clickable)
2. Time series (middle): Value trends (highlighted)
3. Characteristics (bottom): Metrics for selected neighborhoods


In [None]:
top_nbhds = current_stats.nlargest(20, 'median_value')['neighborhood'].tolist()
nbhd_subset = nbhd_yearly[nbhd_yearly['neighborhood'].isin(top_nbhds)].copy()
stats_subset = current_stats[current_stats['neighborhood'].isin(top_nbhds)].copy()

selection = alt.selection_point(fields=['neighborhood'], empty=True)

# Bar chart for selection
bar_select = alt.Chart(stats_subset).mark_bar().encode(
    y=alt.Y('neighborhood:N', sort='-x', title='Neighborhood'),
    x=alt.X('median_value:Q', title='Median Value 2023 ($)', axis=alt.Axis(format='$,.0f')),
    color=alt.condition(selection, alt.Color('median_value:Q', scale=alt.Scale(scheme='viridis'), legend=None), alt.value('lightgray')),
    tooltip=['neighborhood', alt.Tooltip('median_value:Q', format='$,.0f'), alt.Tooltip('appreciation_pct:Q', format='.1f')]
).add_params(selection).properties(width=600, height=400, title='Click Neighborhoods (Shift+Click for Multiple)')

# Linked time series
lines = alt.Chart(nbhd_subset).mark_line(point=True).encode(
    x=alt.X('year:O', title='Year'),
    y=alt.Y('median_value:Q', title='Median Value ($)', axis=alt.Axis(format='$,.0f')),
    color=alt.condition(selection, alt.Color('neighborhood:N', legend=None), alt.value('lightgray')),
    strokeWidth=alt.condition(selection, alt.value(3), alt.value(1)),
    opacity=alt.condition(selection, alt.value(1), alt.value(0.2)),
    tooltip=['neighborhood', 'year', alt.Tooltip('median_value:Q', format='$,.0f')]
).properties(width=600, height=250, title='Value Trends (Highlighted)')

# Linked characteristics
detail = alt.Chart(stats_subset).mark_bar().encode(
    x=alt.X('metric:N', title=None, axis=alt.Axis(labelAngle=0)),
    y=alt.Y('value:Q', title='Score'),
    color=alt.Color('metric:N', legend=None),
    tooltip=['metric:N', alt.Tooltip('value:Q', format='.2f')]
).transform_filter(selection).transform_fold(
    ['appreciation_pct', 'land_pct', 'building_age'],
    as_=['metric', 'value']
).properties(width=600, height=180, title='Selected Characteristics')

bar_select & lines & detail

### View 3 — Building Age and Value Explorer

**Questions Answered:**
- How does building age correlate with property value?
- What is the density distribution across age-value space?
- How does land value percentage vary with age?

**Interactions:**
- Pan and zoom (drag/scroll) on scatter plot
- Zoom selection filters heatmap density view
- Tooltips show exact values on hover

**Why:** Pan/zoom essential for 15k+ points with overplotting; scale binding maintains consistency; overview+detail follows Shneiderman's mantra.

**Alternatives Rejected:**
- Static scatter: severe overplotting obscures patterns
- Hexbin only: loses individual property resolution
- Brush selection: requires reselection after each exploration

**Encodings:**
- Scatter (mark_circle): standard for bivariate continuous (Cleveland 1993)
- Log Y-scale: linearizes exponential relationships (Tufte 2001)
- Color (land_value_pct): adds third dimension via viridis (Ware 2012)
- Low opacity (0.3): mitigates overplotting (Wickham 2016)
- Heatmap (mark_rect): effective for 2D density (Carr et al. 1987)
- Blues scale: conventional for count/density (Brewer et al. 2003)

**Two Panels:**
1. Scatter (left): Age vs Value with pan/zoom (overview)
2. Heatmap (right): Density in selected region (detail)

In [None]:
zoom = alt.selection_interval(bind='scales', encodings=['x', 'y'])

overview = alt.Chart(df_sample).mark_circle(size=15, opacity=0.3).encode(
    x=alt.X('building_age:Q', title='Building Age', scale=alt.Scale(domain=[0, 150])),
    y=alt.Y('total_assessed_value:Q', title='Value ($)', scale=alt.Scale(type='log')),
    color=alt.Color('land_value_pct:Q', scale=alt.Scale(scheme='viridis'), title='Land %'),
    tooltip=['building_age', 'total_assessed_value', 'land_value_pct']
).add_params(zoom).properties(width=400, height=400, title='Pan/Zoom to Navigate')

heatmap = alt.Chart(df_sample).mark_rect().encode(
    x=alt.X('building_age:Q', bin=alt.Bin(maxbins=20), title='Building Age'),
    y=alt.Y('total_assessed_value:Q', bin=alt.Bin(maxbins=20), scale=alt.Scale(type='log'), title='Value ($)'),
    color=alt.Color('count()', scale=alt.Scale(scheme='blues'), title='Count'),
    tooltip=['count()']
).transform_filter(zoom).properties(width=400, height=400, title='Density in Selection')

overview | heatmap

### View 4 — Property Type Value Trends

**Questions Answered:**
- How have values evolved 2015-2023 for different property types?
- How do top 5 types compare between 2015 and 2023?
- Does median vs mean aggregation affect trends?

**Interactions:**
- Radio button toggles median/mean aggregation
- Both panels update simultaneously via shared parameter
- Tooltips show exact values on hover

**Why:** Aggregation toggle reveals whether trends driven by typical properties (median) or outliers (mean). Synchronized updates maintain consistency without manual coordination.

**Alternatives Rejected:**
- Separate median/mean charts: doubles space, requires manual comparison
- Dropdown: hides options unlike radio buttons
- All property types: severe overplotting with 20+ lines
- Small multiples: harder to compare trends across distant panels

**Encodings:**
- Line chart: standard for temporal data (Tufte 2001)
- Position (X/Y): most accurate for quantitative comparison (Mackinlay 1986)
- Color hue: distinguishes property types (Tableau10)
- Faceted bars: enables direct before/after comparison (Cleveland & McGill 1984)
- Top 5 filter: reduces overplotting while covering majority

**Two Panels:**
1. Time series (top): 2015-2023 trends for top 5 types
2. Faceted bars (bottom): 2015 vs 2023 comparison

In [None]:
if 'type_yearly' not in globals():
    type_yearly = (
        df.groupby(['property_class_code_definition', 'year'])
          .agg(median_value=('total_assessed_value', 'median'),
               mean_value=('total_assessed_value', 'mean'),
               count=('total_assessed_value', 'count'))
          .reset_index()
          .rename(columns={'property_class_code_definition': 'property_type'})
    )

count_col = 'count' if 'count' in type_yearly.columns else (
    'total_count' if 'total_count' in type_yearly.columns else None
)
if count_col is None:
    type_yearly['count'] = 1
    count_col = 'count'

top_types = (
    type_yearly.groupby('property_type')[count_col]
      .sum().nlargest(5).index.tolist()
)
type_subset = type_yearly[type_yearly['property_type'].isin(top_types)].copy()

# Long form so we can toggle mean/median
type_long = pd.melt(
    type_subset,
    id_vars=['property_type', 'year', count_col],
    value_vars=['median_value', 'mean_value'],
    var_name='agg_type', value_name='value'
)
type_long['agg_type'] = type_long['agg_type'].str.replace('_value', '', regex=False)

year_order = sorted(type_long['year'].unique().tolist())

# ✅ Selection init must be an ARRAY of objects
agg_selection = alt.selection_point(
    fields=['agg_type'],
    bind=alt.binding_radio(options=['median', 'mean'], name='Aggregation: '),
    value=[{'agg_type': 'median'}]
)

# Time-series lines
lines = (
    alt.Chart(type_long, title='Property Type Trends')
      .transform_filter(agg_selection)
      .mark_line(point=True, strokeWidth=2)
      .encode(
          x=alt.X('year:O', title='Year', sort=year_order),
          y=alt.Y('value:Q', title='Property Value ($)', axis=alt.Axis(format='$,.0f')),
          color=alt.Color('property_type:N', title='Property Type'),
          tooltip=[
              alt.Tooltip('property_type:N', title='Type'),
              alt.Tooltip('year:O', title='Year'),
              alt.Tooltip('value:Q', title='Value', format='$,.0f'),
              alt.Tooltip('agg_type:N', title='Aggregate')
          ]
      )
      .properties(width=800, height=300)
)

# Faceted bars comparing first and last available years (e.g., 2015 vs 2023)
first_year, last_year = year_order[0], year_order[-1]
type_compare = type_long[type_long['year'].isin([first_year, last_year])]

bars = (
    alt.Chart(type_compare, title=f'{first_year} vs {last_year}')
      .transform_filter(agg_selection)
      .mark_bar()
      .encode(
          x=alt.X('year:O', title='Year', sort=[first_year, last_year]),
          y=alt.Y('value:Q', title='Value ($)', axis=alt.Axis(format='$,.0f')),
          color=alt.Color('property_type:N', legend=None),
          column=alt.Column('property_type:N', title=None, sort=top_types,
                            header=alt.Header(labelOrient='bottom', labelAngle=0)),
          tooltip=[
              alt.Tooltip('property_type:N', title='Type'),
              alt.Tooltip('year:O', title='Year'),
              alt.Tooltip('value:Q', title='Value', format='$,.0f'),
              alt.Tooltip('agg_type:N', title='Aggregate')
          ]
      )
      .properties(width=130, height=300)
)

# Attach the selection control
(lines.add_params(agg_selection)) & bars