# Deep Learning for Geo/Environmental sciences

<center><img src="../logo_2.png" alt="logo" width="500"/></center>

<em>*Created with ChapGPT</em>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/climate-analytics-lab/sioc209-2024-sp/blob/main/sioc209-2024-sp/06_unsupervised_learning/11_dimensionality_reduction.ipynb)

## Lecture 13: Contrastive Learning

 - [Recap](#Recap)
 - [Contrastive Learning](#Contrastive-Learning)
 - [Tile2Vec](#Tile2Vec)

## Recap

In the last lecture we introduced unsupervised learning and discussed clustering algorithms. We learned about K-means clustering and hierarchical clustering.

As we discussed, unsupervised learning is a type of machine learning that looks for previously undetected patterns and structure in a dataset with no pre-existing labels.

### Dimensionality Reduction

Dimensionality reduction is the process of reducing the number of features in a dataset. This can be useful for a number of reasons:

- Reducing the number of features can help to reduce the computational complexity of a model.
- Reducing the number of features can help to reduce the risk of overfitting.
- Reducing the number of features can help to visualize high-dimensional data in a lower-dimensional space.
- Reducing the number of features can help to identify the most important features in a dataset.
- Reducing the number of features can help to remove noise from a dataset.

There are two main types of dimensionality reduction techniques:

1. Feature selection: Feature selection involves selecting a subset of the original features in the dataset. This can be done using a variety of techniques, such as filter methods, wrapper methods, and embedded methods.

2. Feature extraction: Feature extraction involves transforming the original features in the dataset into a lower-dimensional space. This can be done using techniques such as Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Autoencoders.

In this lecture, we will focus on feature extraction techniques for dimensionality reduction.

As you can see, PCA and t-SNE are powerful techniques for reducing the dimensionality of a dataset and visualizing high-dimensional data in a lower-dimensional space.

While PCA will preserve the global structure of the data, t-SNE will preserve the local structure of the data. This makes t-SNE particularly well-suited for visualizing clusters in high-dimensional data. Both techniques have their strengths and weaknesses, and the choice of which technique to use will depend on the specific characteristics of the data and the goals of the analysis.

## Tile2Vec  - I think move this to next week

Tile2Vec is a technique for learning visual representations of satellite imagery using unsupervised learning. The technique is based on the idea of learning a representation of the spatial context of the image tiles, rather than the content of the tiles themselves.

Tile2Vec works by training a convolutional neural network to predict the spatial context of the image tiles. The network is trained using a contrastive loss function, which encourages the network to learn representations that are close together for similar tiles and far apart for dissimilar tiles.

The result is a set of visual representations of the image tiles that capture the spatial context of the tiles. These representations can be used for a variety of tasks, such as image retrieval, image classification, and image segmentation.

Tile2Vec is a powerful technique for learning visual representations of satellite imagery and has been shown to outperform other techniques for satellite image retrieval and classification.

### Contrastive Learning

Contrastive learning is a technique for learning representations of data by contrasting similar and dissimilar pairs of data points. The idea is to learn a representation that brings similar data points closer together in the embedding space and pushes dissimilar data points further apart.

Contrastive learning is widely used in unsupervised learning and self-supervised learning, where the goal is to learn representations of data without the need for labeled data.

The contrastive loss function is typically defined as the sum of two terms: a positive term that encourages similar data points to be close together in the embedding space, and a negative term that encourages dissimilar data points to be far apart.

Mathematically, the contrastive loss function can be expressed as follows:

$$
L = -\frac{1}{N} \sum_{i=1}^{N} \log \frac{\exp(f(x_i, x_i^+))}{\exp(f(x_i, x_i^+)) + \sum_{j=1}^{N} \exp(f(x_i, x_j^-))}
$$

where $ N $ is the number of data points, $ x_i $ is a data point, $ x_i^+ $ is a similar data point, $ x_i^- $ is a dissimilar data point, and $ f $ is a function that maps the data points to the embedding space.

The contrastive loss function encourages the network to learn representations that are close together for similar data points and far apart for dissimilar data points. This allows the network to learn a meaningful representation of the data that captures the underlying structure of the data.

The tile2vec algorithm uses contrastive learning to learn visual representations of satellite imagery by contrasting similar and dissimilar image tiles. This is sometimes called a triplet loss, where the network is trained to minimize the distance between similar pairs of data points and maximize the distance between dissimilar pairs of data points:

$$
L = \sum_{i=1}^{N} \max(0, \alpha + d(f(x_i), f(x_i^+)) - d(f(x_i), f(x_i^-)))
$$

where $ d $ is a distance function, $ f $ is a function that maps the data points to the embedding space, $ x_i $ is a data point, $ x_i^+ $ is a similar data point, $ x_i^- $ is a dissimilar data point, and $ \alpha $ is a margin that separates the similar and dissimilar pairs.

We'll see an example of applying tile2vec to satellite imagery in the next lecture.