# unit 6.0 - Unsupervised and self-supervised learning

[See also: unit  4 types of learning](https://githubtocolab.com/culurciello/deep-learning-course-source/blob/main/source/lectures/04-types-learning.ipynb)


## Supervised Learning

Supervised learning is the type of machine learning we have focused on this course so far. To summarize here is what we learned so far:

- **Labeled Data**: In supervised learning, we have a dataset where each example is paired with a **label** (also known as the **ground truth**). These labels represent the correct output for a given input. For instance, in image classification, each image is associated with a specific class label (e.g., "cat" or "dog").

- **Training the Model**: We train a model (such as a neural network) using this labeled data. The model learns to map inputs (features) to the corresponding labels by minimizing the difference between its predictions and the ground truth.

- **Prediction**: Once trained, the model can make predictions on new, unseen data. It generalizes from the training examples to make accurate predictions for similar inputs.


## Unsupervised Learning

Unsupervised learning is a type of machine learning where we work with unlabeled data. Unlike supervised learning, there are no explicit labels (ground truth) associated with the examples. The primary goal of unsupervised learning is to discover patterns, structures, or relationships within the data itself.

Examples:
- Clustering: Algorithms like K-means group similar data points together based on their features.
- Dimensionality Reduction: Techniques like PCA (Principal Component Analysis) and t-SNE reduce the dimensionality of data while preserving important information.
- Density Estimation: Methods that estimate the underlying probability distribution of the data.

Use Cases: Unsupervised learning is useful for tasks like anomaly detection, recommendation systems, and data visualization.

## Self-Supervised Learning

Now, let's explore self-supervised learning:

- **Unlabeled Data**: Unlike supervised learning, self-supervised learning doesn't rely on labeled datasets. Instead, it operates on **unstructured data** (e.g., images, text, or audio) without explicit labels.

- **Implicit Labels**: Self-supervised models generate their own **implicit labels** from the data itself. These labels are derived from the inherent structure or relationships within the input. For example:
    - In text: one can predict the next word (or character, or token) from previous words (characters or tokens)
    - In images one can reconstruct a missing portion of the image from the rest of the image 
    - In videos one can predict future frames from past ones

- **Cost-Effectiveness**: Self-supervised learning is particularly useful when gathering labeled data is challenging or expensive. By avoiding manual annotation, it reduces the need for human experts to painstakingly label vast amounts of training data.

- **Examples**: Self-supervised learning has powered various deep learning architectures:
    - Large language models (e.g., BERT and GPT) for natural language understanding.
    - Image synthesis models (like variational autoencoders and generative adversarial networks).
    - Computer vision models (such as SimCLR and Momentum Contrast).

In summary, while supervised learning relies on explicit labels, self-supervised learning creatively generates its own labels from unstructured data. Both approaches contribute to advancing artificial intelligence and solving complex problems.