# **t-SNE (t-distributed Stochastic Neighbor Embedding)**

t-SNE is a powerful and widely-used dimensionality reduction technique developed by Laurens van der Maaten and Geoffrey Hinton in 2008. It is specifically designed for visualizing high-dimensional data in lower dimensions, typically 2D or 3D. t-SNE has gained significant popularity in various fields, including machine learning, bioinformatics, and data science, due to its ability to reveal complex structures and patterns in data that might be hidden in higher dimensions. t-SNE is a non-linear dimensionality reduction algorithm that is particularly well-suited for visualizing high-dimensional data. It is based on the concept of stochastic neighbor embedding, which attempts to preserve the local structure of the data in the lower-dimensional representation. t-SNE is a variation of the SNE algorithm that is capable of handling larger datasets and is more efficient in terms of computation time.

The basic idea behind t-SNE is to model the similarity between two data points in the high-dimensional space using a probability distribution over the neighbors of each point. The algorithm then tries to find a lower-dimensional representation of the data that preserves these similarities as much as possible. To do this, it uses a cost function that measures the difference between the similarity distributions in the high-dimensional space and the lower-dimensional space. The cost function is then minimized using a gradient descent algorithm.

The advantages of t-SNE are that it is a non-linear algorithm, which means it can capture non-linear relationships between the data points. It is also capable of handling large datasets and is more efficient than other non-linear dimensionality reduction algorithms. The disadvantages are that it is computationally expensive and requires a good choice of hyperparameters.

## **How does t-SNE work?** 
t-SNE a non-linear dimensionality reduction algorithm finds patterns in the data based on the similarity of data points with features, the similarity of points is calculated as the conditional probability that point A would choose point B as its neighborr. 

It then tries to minimize the difference between these conditional probabilities (or similarities) in higher-dimensional and lower-dimensional space for a perfect representation of data points in lower-dimensional space. 


# **What is the difference between PCA and t-SNE algorithm?**

Even though PCA and t-SNE both are unsupervised algorithms that are used to reduce the dimensionality of the dataset. PCA is a deterministic algorithm to reduce the dimensionality of the algorithm and the t-SNE algorithm a randomized non-linear method to map the high dimensional data to the lower dimensional. The data that is obtained after reducing the dimensionality via the t-SNE algorithm is generally used for visualization purpose only.

One more thing that we can say is an advantage of using the t-SNE data is that it is not effected by the outliers but the PCA algorithm is highly affected by the outliers because the methodologies that are used in the two algorithms is different. While we try to preserve the variance in the data using PCA algorithm we use t-SNE algorithm to retain teh local structure of the dataset.



**Key Features of t-SNE:**

1. **Non-linear Dimensionality Reduction:** Unlike linear methods such as PCA, t-SNE can capture non-linear relationships in the data, making it particularly effective for complex datasets.

2. **Preservation of Local Structure:** t-SNE focuses on maintaining the relationships between nearby points in the high-dimensional space, allowing it to reveal clusters and patterns effectively.

3. **Adaptive to Different Scales:** The algorithm can adapt to different scales of patterns in the data, potentially revealing both fine and coarse structures simultaneously.

4. **Probabilistic Approach:** t-SNE uses probability distributions to represent similarities between data points, which allows for a more nuanced representation of relationships.

**Parameter Details of t-SNE:**

1. **Perplexity:** This parameter is related to the number of nearest neighbors that is used in other manifold learning algorithms. It is a balance between local and global aspects of the data. Typical values range from 5 to 50.

2. **Learning Rate (eta):** The learning rate for t-SNE is usually in the range of 10 to 1000. If the learning rate is too low, the optimization will get stuck. If the learning rate is too high, the data may look like a 'ball' with any point approximately equidistant from its nearest neighbors.

3. **Number of Iterations:** The number of iterations for optimization. More iterations can lead to better convergence but will take longer to compute.

4. **Initialization:** The initial positions of the points in the low-dimensional space. Common options are 'random' or 'pca'.

5. **Metric:** The metric to use when calculating distance between instances in a feature array. Default is 'euclidean'.

**Applications of t-SNE:**

- **Visualization of High-dimensional Data:** t-SNE is extensively used to create 2D or 3D visualizations of complex datasets, enabling researchers and analysts to gain insights that might be difficult to obtain through other means.

- **Image and Text Analysis:** In computer vision and natural language processing, t-SNE is often used to visualize relationships between images or documents in a lower-dimensional space.

- **Bioinformatics:** t-SNE has found applications in visualizing gene expression data, helping researchers identify patterns and relationships in complex genomic datasets.

- **Anomaly Detection:** By revealing clusters and patterns, t-SNE can help identify outliers or anomalies in datasets.

- **Feature Engineering:** While primarily used for visualization, t-SNE embeddings can sometimes be used as features for downstream machine learning tasks.

**Limitations and Considerations:**

1. **Computational Complexity:** t-SNE can be computationally expensive, especially for large datasets. Various approximation techniques have been developed to address this issue.

2. **Non-deterministic:** The algorithm involves random initialization, which means different runs can produce slightly different results.

3. **Difficulty in Interpreting Global Structure:** While t-SNE excels at preserving local structure, it may distort global relationships. Distances and densities in the t-SNE plot should be interpreted cautiously.

4. **Sensitivity to Hyperparameters:** The results of t-SNE can be sensitive to the choice of perplexity and number of iterations, requiring careful tuning for optimal results.

5. **Curse of Intrinsic Dimensionality:** t-SNE may struggle with very high-dimensional data where the intrinsic dimensionality is also high.

Despite these limitations, t-SNE remains a valuable tool in the data scientist's toolkit, offering unique insights into complex, high-dimensional datasets. Its ability to reveal hidden structures and patterns makes it an essential technique for exploratory data analysis and visualization in many fields of study.

In [1]:
import plotly.express as px
from sklearn.datasets import make_classification

X, y = make_classification(
    n_features=6,
    n_classes=3,
    n_samples=1500,
    n_informative=2,
    random_state=5,
    n_clusters_per_class=1,
)


fig = px.scatter_3d(x=X[:, 0], y=X[:, 1], z=X[:, 2], color=y, opacity=0.8)
fig.show()

In [2]:
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

In [3]:
fig = px.scatter(x=X_pca[:, 0], y=X_pca[:, 1], color=y)
fig.update_layout(
    title="PCA visualization of Custom Classification dataset",
    xaxis_title="First Principal Component",
    yaxis_title="Second Principal Component",
)
fig.show()

In [4]:
from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X)
tsne.kl_divergence_

1.1273040771484375

In [5]:
import numpy as np

perplexity = np.arange(50, 1000, 50)
divergence = []

for i in perplexity:
    model = TSNE(n_components=2, init="pca", perplexity=i)
    reduced = model.fit_transform(X_tsne)
    divergence.append(model.kl_divergence_)
fig = px.line(x=perplexity, y=divergence, markers=True)
fig.update_layout(xaxis_title="Perplexity Values", yaxis_title="Divergence")
fig.update_traces(line_color="red", line_width=1)
fig.show()