# Advanced Visualization with Plotly Express (px)

## Interactive dimensionality reduction with t-SNE

Matplotlib allows precise control over static figures, but **it is not designed for interactive exploration** of complex, high-dimensional embeddings.
For that purpose, we use **Plotly Express (`px`)**, which provides:

* native interactivity (hover, zoom, rotation),
* GPU-accelerated rendering in the browser,
* tight integration with pandas DataFrames,
* exportable interactive figures (HTML).

## What is Plotly Express (`px`)?

**Plotly Express** is a **high-level plotting API** built on top of Plotly.

Conceptually, it is closer to **seaborn** than to matplotlib:

* one function call → one complete figure,
* semantic mapping via column names,
* automatic legends, color scales, and tooltips.

In [1]:
import plotly.express as px

## Dataset: handwritten digits

We reuse the handwritten digits dataset:

* 1797 samples
* 64 features per sample (8×8 image)
* label: digit 0–9

In [2]:
from sklearn.datasets import load_digits
from sklearn.manifold import TSNE
import pandas as pd

digits = load_digits()
X = digits.data
y = digits.target

## Step 1: t-SNE parameters

In [3]:
tsne_2d = TSNE(
    n_components=2,      # Output dimensionality (2D embedding)
    perplexity=30,       # Roughly: number of nearest neighbors
    learning_rate=200,   # Step size in optimization
    random_state=42      # Ensures reproducibility
)

### Parameter intuition:

* **`n_components`**

  * 2 → easier to read
  * 3 → requires interaction
* **`perplexity`**

  * typical range: 5–50
  * too small → fragmented clusters
  * too large → over-smoothed structure
* **`learning_rate`**

  * too low → collapsed embedding
  * too high → unstable layout
* **`random_state`**

  * t-SNE is stochastic → results differ without fixing seed

## Step 2: Compute embedding

In [4]:
X_2d = tsne_2d.fit_transform(X)

## Step 3: Prepare data for Plotly

Plotly Express **expects a tidy DataFrame**.

In [5]:
df_2d = pd.DataFrame(
    X_2d,
    columns=["tsne_1", "tsne_2"]
)

# Labels as strings → categorical color scale
df_2d["digit"] = y.astype(str)

## Step 4: Interactive 2D visualization

In [6]:
fig_2d = px.scatter(
    df_2d,
    x="tsne_1",
    y="tsne_2",
    color="digit",           # Semantic color mapping
    title="t-SNE projection (2D)",
    
    # Control what appears on hover
    hover_data={
        "digit": True,       # Show class label
        "tsne_1": False,     # Hide raw coordinates
        "tsne_2": False
    },
    
    width=900,
    height=700
)

fig_2d.show()

### Why this is technically better than matplotlib here

* Hover replaces static annotations
* Zoom allows inspection of dense regions
* No manual event handling (callbacks)
* Scales better with many points

---

## Step 5: t-SNE in 3D (same data, different projection)

In [7]:
tsne_3d = TSNE(
    n_components=3,
    perplexity=30,
    learning_rate=200,
    random_state=42
)

X_3d = tsne_3d.fit_transform(X)

df_3d = pd.DataFrame(
    X_3d,
    columns=["x", "y", "z"]
)

df_3d["digit"] = y.astype(str)

## Step 6: Interactive 3D scatter plot

In [8]:
fig_3d = px.scatter_3d(
    df_3d,
    x="x",
    y="y",
    z="z",
    color="digit",
    title="t-SNE projection (3D)",
    
    hover_data={
        "digit": True,
        "x": False,
        "y": False,
        "z": False
    },
    
    width=1000,
    height=800
)

fig_3d.show()

## Exporting interactive results

In [9]:
fig_3d.write_html("tsne_digits_3d.html")