# Interactive Visualization Tutorial with Altair

This tutorial will teach you how to create interactive visualizations using Altair, focusing on analyzing system performance data. We'll start with simpler examples using a small dataset, then progress to working with system performance metrics.

## Understanding Altair's Design Philosophy

Altair follows a declarative approach to visualization, which means:
- You describe *what* you want to show, not *how* to show it
- Visualizations are built by mapping data fields to visual properties (color, position, size)
- Interactivity can be added through selections and transformations

For example, instead of manually drawing points, you declare "I want a scatter plot where x is time, y is CPU usage, and color represents server status."

## Setup

First, install required packages:
```bash
pip install altair vega_datasets
```

In [None]:
import altair as alt
import pandas as pd
from vega_datasets import data

# Enable notebook rendering
alt.renderers.enable('default')

# Allow processing of larger datasets
alt.data_transformers.enable('default', max_rows=None)

# Load example dataset
stocks = data.stocks()
print("Data Shape:", stocks.shape)
print("\nFirst few rows:")
print(stocks.head())

## Part 1: Understanding Basic Chart Components

Every Altair chart has these main components:
1. Data source (DataFrame)
2. Mark type (how to represent data)
3. Encodings (mapping data to visual properties)
4. Properties (chart size, titles, etc.)

Let's build a chart step by step:

In [None]:
# Start with basic chart
basic = alt.Chart(stocks)

# Add mark type
basic = basic.mark_line()

# Add encodings
basic = basic.encode(
    x='date:T',  # :T means temporal (time) data
    y='price:Q'  # :Q means quantitative (numeric) data
)

# Add properties
basic = basic.properties(
    width=600,
    height=300,
    title='Basic Line Chart'
)

# Display chart
basic

### Understanding Encodings

Encodings map data to visual properties. Common encodings include:
- `x`, `y`: Position
- `color`: Color of marks
- `size`: Size of marks
- `tooltip`: Information shown on hover

Data types in encodings:
- `:T` - Temporal (dates, times)
- `:Q` - Quantitative (numbers)
- `:N` - Nominal (categories)
- `:O` - Ordinal (ordered categories)

Let's enhance our chart with more encodings:

In [None]:
enhanced = alt.Chart(stocks).mark_line().encode(
    x='date:T',
    y='price:Q',
    color='symbol:N',  # Color by stock symbol
    tooltip=['date', 'price', 'symbol']  # Show these on hover
).properties(
    width=600,
    height=300,
    title='Enhanced Line Chart'
).interactive()  # .interactive() here enables both zooming and panning

enhanced

When you add `.interactive()` to a chart, you can:
- Zoom in by scrolling your mouse wheel or pinching on a trackpad
- Pan by clicking and dragging
- Double-click to reset the view

# Deep Dive into Interactive Visualization Techniques

Let's explore different ways to make our visualizations interactive in Altair. We'll use our system performance data to understand each technique and see how they can help us analyze system behavior.

Altair provides three main ways to add interactivity:

-  **Basic Interaction** (zoom/pan)
   ```python
   chart.interactive()
   ```

-  **Selections** (brushing/highlighting)
   ```python
   selection = alt.selection_interval()
   chart.add_params(selection)
   ```

-  **Conditional Encoding** (changing properties based on selection)
   ```python
   color=alt.condition(selection, 'category:N', alt.value('gray'))
   ```


## 1. Simple Zooming and Panning

The simplest form of interactivity in Altair is adding zoom and pan capabilities. This is particularly useful when analyzing time series data, as it allows us to focus on specific time periods or events.

First, let's load and prepare our data:

In [None]:
# Load and prepare system data
df = pd.read_csv('system-1.csv')
df['datetime'] = pd.to_datetime(df['timestamp'], unit='s')
df['memory_used_pct'] = (1 - df['sys-mem-available']/df['sys-mem-total']) * 100

# Sample data for better performance
df_sampled = df.sample(n=5000, random_state=42)

# Or aggregate by time periods (alternative approach)
df_hourly = df.set_index('datetime').resample('1h').mean()

In [None]:
# Let's create a basic zoomable time series plot
basic_zoom = alt.Chart(df_sampled).mark_line().encode(
    x='datetime:T',
    y='load-15m',
    tooltip=['datetime:T', 'load-15m']
).properties(
    width=800,
    height=400,
    title='System Load Over Time (Zoom & Pan Enabled)'
).interactive()  # This enables both zooming and panning

basic_zoom

## 2. Advanced Zoom with Multiple Metrics

Often, we want to see multiple metrics together while maintaining the ability to zoom. Let's create a more sophisticated zoomable visualization that shows both system load and memory usage.

In [None]:
# Create a chart with two metrics and coordinated zooming

# First line for system load
load_line = alt.Chart(df_sampled).mark_line(color='blue').encode(
    x='datetime:T',
    y=alt.Y('load-15m', title='System Load', scale=alt.Scale(domain=[0, 50])),
    tooltip=['datetime', 'load-15m']
)

# Second line for memory usage
memory_line = alt.Chart(df_sampled).mark_line(color='red').encode(
    x='datetime:T',
    y=alt.Y('memory_used_pct', title='Memory Usage %'),
    tooltip=['datetime', 'memory_used_pct']
)

# Combine the charts with a shared zoom
multi_zoom = alt.layer(
    load_line,
    memory_line
).resolve_scale(
    y='independent'  # Each line gets its own y-axis scale
).properties(
    width=800,
    height=400,
    title='System Metrics Over Time (Coordinated Zoom)'
).interactive()

multi_zoom

## 3. Brush Selection with Zoom

Sometimes we want both the ability to zoom and to select specific ranges of data. We can combine these interactions to create a powerful analysis tool.

In [None]:
# Create a selection brush
brush = alt.selection_interval(
    encodings=['x']  # Only select along x-axis
)

# Create the main chart with both zoom and brush selection
main_view = alt.Chart(df_sampled).mark_line().encode(
    x='datetime:T',
    y='load-15m',
    color=alt.condition(
        brush,
        alt.value('blue'),
        alt.value('gray')
    )
).properties(
    width=800,
    height=300
).add_params(brush).interactive()

# Create a detail view that shows selected data
detail_view = alt.Chart(df_sampled).mark_line(color='red').encode(
    x='datetime:T',
    y='memory_used_pct'
).transform_filter(
    brush  # Only show data within the brush selection
).properties(
    width=800,
    height=200
)

# Combine the views
brush_zoom = alt.vconcat(main_view, detail_view)

brush_zoom

## Multi-Metric Dashboard

Let's create a dashboard showing multiple metrics with linked views. We'll build it step by step:

In [None]:
# 1. Create time-based selection
time_brush = alt.selection_interval(
    encodings=['x']  # Only select along x-axis
)

# 2. Create base chart settings
base = alt.Chart(df_sampled).encode(
    x='datetime:T',
    tooltip=['datetime:T', 'load-15m', 'memory_used_pct', 'server-up']
).properties(width=800)

# 3. Create overview chart
overview = base.mark_line(color='gray').encode(
    y='load-15m:Q',
    color=alt.Color('server-up:Q', 
                   scale=alt.Scale(scheme='yellowgreenblue'))
).properties(
    height=100,
    title='System Load Overview (Select Time Range)'
).add_params(time_brush)

# 4. Create detail chart
detail = base.mark_line().encode(
    y='memory_used_pct:Q',
    color=alt.Color('server-up:Q', 
                   scale=alt.Scale(scheme='yellowgreenblue'))
).transform_filter(
    time_brush  # Filter by time selection
).properties(
    height=300,
    title='Memory Usage in Selected Period'
)

# 5. Combine charts
dashboard = alt.vconcat(
    overview,
    detail,
    title='System Performance Dashboard'
) #add .interactive() for zooming and panning

dashboard

## Understanding the Interaction Types

1. **Basic Zoom (`.interactive()`)**
   - Best for: Exploring time series data
   - Use when: You want to examine specific time periods in detail
   - Limitations: No data filtering or selection capabilities

2. **Brush Selection (`selection_interval()`)**
   - Best for: Selecting ranges of data for detailed analysis
   - Use when: You want to filter or highlight specific data points
   - Can be restricted to specific dimensions (x-axis only, y-axis only, or both)

3. **Combined Interactions**
   - Best for: Complex analysis requiring both overview and detail
   - Use when: You need to both zoom for detail and select data for comparison
   - Allows for the most flexible data exploration

## Tips for Working with Interactive Plots

1. **Performance Considerations**
   ```python
   # Sample data for smoother interactions
   df_sampled = df.sample(n=5000, random_state=42)
   
   # Or aggregate data
   df_hourly = df.resample('1H').mean()
   ```

2. **Useful Properties**
   ```python
   # Add these to your charts for better usability
   chart.properties(
       width=800,  # Width in pixels
       height=400,  # Height in pixels
       title='Descriptive Title',
       background='white'  # Background color
   )
   ```

3. **Tooltips for Context**
   ```python
   # Add informative tooltips
   tooltip=['datetime:T', 
           alt.Tooltip('load-15m', format='.2f'),
           alt.Tooltip('memory_used_pct', title='Memory %', format='.1f')]
   ```

## Key Concepts for Assignment

1. **Chart Types**
   - `mark_line()`: For time series (load, CPU, memory over time)
   - `mark_point()`: For scatter plots (relationships between metrics)
   - `mark_bar()`: For histograms (distribution analysis)

2. **Interactivity Features**
   - `.interactive()`: Basic zoom/pan
   - `selection_interval()`: Area selection
   - `transform_filter()`: Filter by selection

3. **Data Types**
   - `:T` for time (datetime)
   - `:Q` for quantitative (load, memory, CPU)
   - `:N` for nominal (categories)

4. **Server Status Visualization**
   - Use `color` encoding, for e.g., with `yellowgreenblue` scheme
   - Include in tooltips
   - Consider in filters

5. **Performance Tips**
   - Sample data if too large
   - Use appropriate mark types
   - Consider aggregation for large datasets

# System Performance Additional Visualization Examples

This notebook demonstrates interactive visualizations using the system performance dataset. These examples show different ways to visualize the data than what's required in the assignment.

## Example 1: Server Status Timeline

This visualization shows server status changes over time as a continuous band, with color indicating status and tooltips for details.

In [None]:
status_timeline = alt.Chart(df_sampled).mark_rect().encode(
    x='datetime:T',
    color=alt.Color('server-up:Q', 
                   scale=alt.Scale(scheme='redyellowgreen'),
                   title='Server Status'),
    tooltip=['datetime:T', 'server-up', 'load-15m']
).properties(
    width=800,
    height=100,
    title='Server Status Timeline'
).interactive()

status_timeline

## Example 2: Temperature and Load Analysis

This visualization combines temperature change rate with system load, allowing us to explore how they relate over time.


In [None]:
## Example 2: Temperature and Load Analysis

# Base chart settings
base = alt.Chart(df_sampled).encode(
    x='datetime:T',
    tooltip=['datetime:T', 'sys-thermal', 'load-15m', 'server-up']
)

# Temperature line
temp_line = base.mark_line(color='orange').encode(
    y=alt.Y('sys-thermal:Q', title='Temperature Change Rate')
)

# Load line using dual axis
load_line = base.mark_line(color='blue', opacity=0.5).encode(
    y=alt.Y('load-15m:Q', title='System Load')
)

# Combine with layering
combined = alt.layer(temp_line, load_line).resolve_scale(
    y='independent'
).properties(
    width=800,
    height=300,
    title='Temperature Change Rate vs System Load'
).interactive()  # Make it interactive this way instead

combined

## Example 3: Daily Resource Pattern Analysis

This visualization shows how resource usage varies by hour of day, helping identify daily patterns.

In [None]:
# Add hour information
df_sampled['hour'] = pd.to_datetime(df_sampled['datetime']).dt.hour

# Create hour-based visualization
hour_chart = alt.Chart(df_sampled).mark_circle(size=60).encode(
    alt.X('hour:O', 
          axis=alt.Axis(title='Hour of Day'),
          scale=alt.Scale(domain=list(range(24)))),
    alt.Y('cpu-user:Q',
          axis=alt.Axis(title='CPU Time Rate')),
    color=alt.Color('load-15m:Q', 
                    scale=alt.Scale(scheme='viridis'),
                    title='System Load'),
    size=alt.Size('memory_used_pct:Q',
                  scale=alt.Scale(range=[20, 200]),
                  title='Memory Usage %'),
    tooltip=['hour:O', 'cpu-user:Q', 
             'load-15m:Q', 'memory_used_pct:Q']
).properties(
    width=800,
    height=400,
    title='Resource Usage Patterns by Hour'
).interactive()

hour_chart

## Example 4: Resource Correlation Dashboard

An interactive dashboard showing relationships between different metrics.

In [None]:
# Create brush selection
brush = alt.selection_interval()

# Base scatter plot
base = alt.Chart(df_sampled).encode(
    tooltip=['datetime:T', 'load-15m', 
             'memory_used_pct', 'cpu-user']
).properties(width=300, height=300)

# Create three different views
scatter1 = base.mark_point(opacity=0.5).encode(
    x='cpu-user:Q',
    y='load-15m:Q',
    color=alt.condition(brush, 'server-up:Q', 
                       alt.value('lightgray'))
).add_params(brush)

scatter2 = base.mark_point(opacity=0.5).encode(
    x='memory_used_pct:Q',
    y='load-15m:Q',
    color=alt.condition(brush, 'server-up:Q', 
                       alt.value('lightgray'))
)

scatter3 = base.mark_point(opacity=0.5).encode(
    x='cpu-user:Q',
    y='memory_used_pct:Q',
    color=alt.condition(brush, 'server-up:Q', 
                       alt.value('lightgray'))
)

# Combine all views
dashboard = alt.vconcat(
    scatter1,
    alt.hconcat(scatter2, scatter3)
).resolve_scale(
    color='shared'
).properties(
    title=alt.Title('Resource Relationships Dashboard',
                   anchor='middle')
)

dashboard