# Visualisation

MDS and t-SNE

	• Stress Function and Distance Preservation (MDS)
	• KL Divergence and Perplexity (t-SNE)
	• Python: Visualization of High-Dimensional Data


Here is your MDS and t-SNE section, formatted for a Jupyter Notebook with theory, equations, and Python code for visualization:

⸻

MDS and t-SNE

⸻

1. Stress Function and Distance Preservation (MDS)

Multidimensional Scaling (MDS) attempts to embed high-dimensional data into lower dimensions while preserving pairwise distances.

Let $d_{ij}$ be the original (high-dimensional) distance between points $i$ and $j$, and let $\delta_{ij}$ be the corresponding distance in the low-dimensional embedding.

The stress function is:

$$
\text{Stress} = \sqrt{ \frac{ \sum_{i < j} (d_{ij} - \delta_{ij})^2 }{ \sum_{i < j} d_{ij}^2 } }
$$

MDS minimizes this stress function to find a configuration that best preserves the original distances.

⸻

2. KL Divergence and Perplexity (t-SNE)

t-Distributed Stochastic Neighbor Embedding (t-SNE) preserves local structure by modeling high-dimensional similarities with probabilities.
	•	In high-dim space, similarity between $x_i$ and $x_j$ is modeled as:
$$
p_{j|i} = \frac{\exp(-|x_i - x_j|^2 / 2\sigma_i^2)}{\sum_{k \neq i} \exp(-|x_i - x_k|^2 / 2\sigma_i^2)}
$$
with perplexity controlling $\sigma_i$.
	•	In low-dim space, similarity is modeled with a Student-t distribution:
$$
q_{ij} = \frac{(1 + |y_i - y_j|^2)^{-1}}{\sum_{k \neq l} (1 + |y_k - y_l|^2)^{-1}}
$$
	•	The cost function minimized is KL divergence between $P$ and $Q$:
$$
KL(P || Q) = \sum_{i \ne j} p_{ij} \log \left( \frac{p_{ij}}{q_{ij}} \right)
$$

⸻

3. Python: Visualization of High-Dimensional Data (Iris Dataset)

import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.manifold import MDS, TSNE

# Load Iris data
iris = load_iris()
X = iris.data
y = iris.target
labels = iris.target_names

# MDS
mds = MDS(n_components=2, dissimilarity='euclidean', random_state=42)
X_mds = mds.fit_transform(X)

# t-SNE
tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_tsne = tsne.fit_transform(X)

# Plotting function
def plot_embedding(X_embedded, title):
    plt.figure(figsize=(6, 5))
    for i in range(3):
        plt.scatter(X_embedded[y == i, 0], X_embedded[y == i, 1], label=labels[i])
    plt.title(title)
    plt.xlabel("Dim 1")
    plt.ylabel("Dim 2")
    plt.legend()
    plt.grid(True)
    plt.show()

# Visualizations
plot_embedding(X_mds, "MDS on Iris Dataset")
plot_embedding(X_tsne, "t-SNE on Iris Dataset")



⸻

Would you like to add UMAP comparison or try it on custom high-dimensional datasets like word embeddings or finance vectors?