# Machine Learning Big Picture

## Machine Learning: Teaching Computers to Learn

**Machine learning** is a branch of artificial intelligence (AI) that focuses on developing computer systems that can learn and adapt without explicit programming. Instead of following rigid rules, these systems learn from data and improve their performance over time.

A more formal definition is:

A computer program is said to learn from experience `E` with respect to some task
`T` and some performance measure `P`, if its performance on `T`, as measured by `P`,
improves with experience `E`. (Tom Mitchell, 1997)

### How Does it Work?
Imagine teaching a child to recognize a cat. You would show them pictures of different cats and explain what makes them cats. Over time, the child learns to identify cats even without being explicitly told the rules. Machine learning works similarly. 

1. **Data Collection:** Gather relevant data for the task. For example, to teach a computer to recognize images of cats, you would collect a large dataset of cat images.
2. **Data Preparation:** Clean and process the data to make it suitable for machine learning algorithms. 
3. **Model Selection:** Choose an appropriate machine learning algorithm based on the problem. There are various types, including:
   * **Supervised learning:** The algorithm learns from labeled data (e.g., image classification).
   * **Unsupervised learning:** The algorithm finds patterns in unlabeled data (e.g., customer segmentation).
   * **Reinforcement learning:** The algorithm learns by trial and error (e.g., game playing).
4. **Model Training:** The algorithm learns patterns from the data. This process involves adjusting the model's parameters to minimize errors.
5. **Model Evaluation:** Assess the model's performance on new data to ensure accuracy.
6. **Prediction or Decision-Making:** Use the trained model to make predictions or decisions on unseen data.

### Real-World Applications
Machine learning is used in countless applications, including:

* **Image and speech recognition:** Facial recognition, voice assistants
* **Natural language processing:** Language translation, sentiment analysis
* **Recommendation systems:** Product recommendations, movie suggestions
* **Medical diagnosis:** Disease detection, drug discovery
* **Financial forecasting:** Stock market prediction, fraud detection
* **Self-driving cars:** Autonomous navigation

**In essence, machine learning empowers computers to learn from experience and make intelligent decisions, transforming industries and our daily lives.**

Would you like to delve deeper into a specific aspect of machine learning, such as its different types or real-world applications? 


## Why Use Machine Learning

The classic way of solving a problem using computers is writing codes that obey some set of rigid rules:

<img src="./pics/1.png" alt="Machine Learning" width="600" height="400">

While writing explicit rules for a computer can be effective for simple tasks, machine learning often outshines this approach for several reasons:

### 1. Complexity and Unpredictability
* **Complex Patterns:** Many real-world problems involve intricate patterns and relationships that are difficult or impossible to define precisely through rules. Machine learning algorithms can uncover these hidden patterns from data.
* **Dynamic Environments:** In scenarios where conditions change rapidly, like financial markets or online advertising, machine learning models can adapt to new trends and information, whereas rigid rules would become outdated quickly.

<img src="./pics/2.png" alt="Machine Learning" width="600" height="400">

### 2. Efficiency and Scalability
* **Large Datasets:** Machine learning excels at handling massive amounts of data, which would be overwhelming for rule-based systems. 
* **Automation:** Once trained, machine learning models can automate decision-making processes, saving time and resources compared to manually creating and maintaining rules.

### 3. Accuracy and Performance
* **Learning from Data:** Machine learning models can learn from vast amounts of data, improving their accuracy over time.
* **Continuous Improvement:** With new data, models can be retrained to enhance their performance. 

### 4. Human Limitations
* **Subjectivity:** Humans can introduce biases or overlook critical factors when writing rules. Machine learning algorithms can provide a more objective perspective.
* **Cognitive Limits:** It's challenging for humans to comprehend and process the complexity of some problems, while machine learning algorithms can handle these tasks effectively.



## How Machine Learning Helps Humans Learn

**Machine learning models can serve as powerful tools for human learning by:**

### 1. Unveiling Hidden Patterns:
* **Data Mining:** ML models excel at sifting through vast amounts of data to identify underlying patterns and correlations that might be overlooked by humans.
* **New Insights:** By examining these patterns, humans can gain new perspectives and insights into complex problems or phenomena.

### 2. Improving Understanding:
* **Model Inspection:** Analyzing the components of a trained ML model can reveal the factors it considers most important for making predictions.
* **Knowledge Enhancement:** Understanding these factors can deepen human knowledge about the subject matter. For instance, analyzing a spam filter can provide insights into common spam characteristics. 

### 3. Facilitating Knowledge Discovery:
* **Accelerated Learning:** ML can process information and identify trends much faster than humans, accelerating the learning process.
* **New Research Avenues:** The patterns discovered by ML models can suggest new research directions or hypotheses.

**In essence, machine learning acts as a catalyst for human learning by augmenting human cognitive abilities and providing a structured approach to exploring complex datasets.**

<img src="./pics/3.png" alt="Machine Learning" width="600" height="400">

## Types of Machine Learning Systems

Machine learning systems come in many varieties, so it is helpful to categorize them into broad types based on the following criteria:

- The way they are supervised during training (such as supervised, unsupervised, semi-supervised, self-supervised, and others)
- Whether they can learn incrementally in real-time (online learning) or require a complete dataset for training (batch learning)
- Whether they operate by comparing new data points to known data points or by identifying patterns in the training data to build a predictive model, similar to the approach scientists use (instance-based versus model-based learning)

These criteria are not mutually exclusive and can be combined in various ways. For example, a cutting-edge spam filter might learn continuously using a deep neural network trained with human-labeled examples of spam and non-spam messages, making it an online, model-based, supervised learning system.

### Supervision

Machine learning systems can be categorized based on the level and type of supervision they receive during training. While there are numerous classifications, we will focus on the primary ones: supervised learning, unsupervised learning, self-supervised learning, semi-supervised learning, and reinforcement learning.

#### Supervised learning

Supervised machine learning is a type of machine learning where the model is trained on labeled data. In this approach, the algorithm learns to map input data (features) to the correct output (labels) based on examples provided in the training dataset. The main goal is to learn a function that, given new input data, can predict the corresponding output.

**Examples:**

1. **Email Spam Filter**: A system that learns from a dataset of emails labeled as "spam" or "not spam" to classify new incoming emails accordingly. It uses features like the email content, sender information, and subject line to make predictions about whether an email is likely to be spam.

2. **Credit Scoring System**: A system that evaluates the creditworthiness of individuals applying for loans by learning from past data containing the applicant's financial history, demographic information, and previous loan repayment behavior. The system predicts whether a new applicant is likely to default on a loan.

3. **Speech Recognition System**: A system that learns to transcribe spoken language into text by training on audio recordings paired with their corresponding transcriptions. It identifies patterns in the sounds associated with particular words or phrases to accurately convert speech to text.

4. **Image Classification System**: A system that identifies objects in images by training on a dataset of images labeled with different categories (e.g., "cat," "dog," "car"). It learns to recognize visual patterns and features that differentiate one category from another.

5. **House Price Prediction System:** A house price prediction system is designed to estimate the selling price of a house based on various features like the size (square footage), number of bedrooms and bathrooms, location (city, neighborhood), age of the property, proximity to amenities (schools, parks), and other relevant factors.

#### Two Important Types of Supervised Learning

1. **Classification**
   - **Purpose**: To categorize input data into discrete classes or categories.
   - **Description**: In classification tasks, the output variable is a categorical label. The algorithm learns to assign inputs to one or more predefined classes based on the training data. The output is a discrete value that represents a specific class or category.
   - **Examples**:
     - **Email Spam Detection**: Classifying emails as "spam" or "not spam."
     - **Image Recognition**: Classifying images into categories like "cat," "dog," "car," etc.
     - **Sentiment Analysis**: Determining if a review or social media post is "positive," "negative," or "neutral."
     - **Medical Diagnosis**: Identifying if a tumor is "benign" or "malignant" based on medical imaging data.

<img src="./pics/4.jpg" alt="Classification" width="600" height="400">

2. **Regression**
   - **Purpose**: To predict a continuous numeric value.
   - **Description**: In regression tasks, the output variable is continuous rather than categorical. The algorithm learns the relationship between input features and the output continuous value, aiming to predict real numbers as outputs.
   - **Examples**:
     - **House Price Prediction**: Predicting the selling price of a house based on features like size, location, and number of rooms.
     - **Stock Price Forecasting**: Predicting future stock prices based on historical data.
     - **Weather Prediction**: Estimating future temperatures or precipitation levels based on historical weather data.
     - **Sales Forecasting**: Predicting the future sales of a product based on historical sales data, marketing efforts, and economic indicators.


<img src="./pics/1.webp" alt="Regression" width="600" height="400">

#### Unsupervised learning

Unsupervised learning is a type of machine learning where the algorithm is trained on data without labeled outputs. Unlike supervised learning, there is no explicit “correct” answer provided during training. The algorithm's goal is to find hidden patterns or structures in the data. 

Unsupervised learning is primarily used for clustering, dimensionality reduction, and association tasks. It is useful when you have data but do not know what to look for, or when the cost of labeling data is high.

**Examples**:

1. **Customer Segmentation in Marketing**:
   - **Purpose**: To group customers into distinct segments based on their behavior and demographics.
   - **Example**: A retail company analyzes customer purchase history, browsing behavior, age, income, and location data to identify different customer segments, such as "frequent buyers," "discount seekers," or "new customers." This helps in targeted marketing campaigns, personalized recommendations, and improving customer retention strategies.

<img src="./pics/5.jpg" alt="Clustring" width="600" height="400">

2. **Anomaly Detection for Fraud Detection**:
   - **Purpose**: To detect unusual patterns or behaviors that could indicate fraudulent activity.
   - **Example**: A financial institution uses an unsupervised learning system to monitor credit card transactions. The system learns the normal spending patterns of each customer and identifies any transaction that deviates significantly from these patterns as potentially fraudulent. For instance, a sudden, large transaction in a foreign country might trigger an alert.

<img src="./pics/6.png" alt="Anomaly" width="600" height="400">

3. **Market Basket Analysis**:
   - **Purpose**: To identify patterns in customer purchase behavior.
   - **Example**: A supermarket uses unsupervised learning to analyze large datasets of customer transactions. The system identifies which items are frequently bought together (e.g., bread and butter or chips and soda). This information helps in optimizing product placements, designing promotional offers, and bundling products to increase sales.

4. **Document Clustering for News Categorization**:
   - **Purpose**: To automatically organize a large collection of documents or articles into topics or categories.
   - **Example**: A news website uses unsupervised learning to cluster articles into categories such as politics, sports, technology, and entertainment. This helps users easily navigate and find relevant news content and assists the website in recommending related articles to readers.

5. **Image Compression**:
   - **Purpose**: To reduce the size of images by minimizing redundancy in the data.
   - **Example**: An image processing application uses unsupervised learning to compress images by reducing the number of colors or patterns needed to represent the image. This allows for faster transmission and storage of images while maintaining acceptable visual quality.

6. **Genetic Clustering in Bioinformatics**:
   - **Purpose**: To group genes or proteins with similar functions or characteristics.
   - **Example**: Researchers use unsupervised learning to analyze gene expression data from various samples. The algorithm clusters genes with similar expression patterns, which can help identify gene functions, understand diseases, or find potential targets for drug development.

7. **Dimensionality Reduction for Data Visualization**:
   - **Purpose**: To simplify high-dimensional data for easier visualization and interpretation.
   - **Example**: A data scientist uses unsupervised learning to reduce the number of features in a dataset containing many variables, such as customer attributes or product characteristics. This reduction allows for creating 2D or 3D visualizations that make it easier to understand the underlying patterns or structures in the data.

8. **Social Network Analysis**:
   - **Purpose**: To detect communities or clusters of users based on their connections and interactions.
   - **Example**: A social media platform uses unsupervised learning to identify groups of users with similar interests or interaction patterns. This helps in recommending new friends, suggesting groups or pages to follow, and understanding community dynamics.


#### Semi-supervised learning

Semi-supervised learning is a type of machine learning that sits between supervised and unsupervised learning. In this approach, a model is trained using a small amount of labeled data combined with a larger amount of unlabeled data. The goal is to leverage the structure in the unlabeled data to improve the learning process, especially when labeled data is scarce or expensive to obtain.

<img src="./pics/7.png" alt="Anomaly" width="600" height="400">

##### How It Works
1. **Labeled Data**: A small portion of the data has labels, meaning it is annotated with the correct answers or categories.
2. **Unlabeled Data**: The majority of the data is unlabeled, but the model tries to infer useful information from it to enhance learning.
3. **Learning Process**: The model uses the labeled data to learn a basic structure or relationship, and then it incorporates information from the unlabeled data to refine its predictions.

**Examples:**

1. **Image Classification**:
   - **Labeled Data**: A small number of images are labeled (e.g., photos of cats and dogs).
   - **Unlabeled Data**: A large collection of images without labels is available.
   - **Process**: The model first learns to distinguish between cats and dogs using the labeled images. Then, it uses the structure or patterns in the unlabeled images (such as shapes, edges, and textures) to improve its understanding of these categories, ultimately improving classification accuracy.
   
2. **Speech Recognition**:
   - **Labeled Data**: A small amount of labeled audio clips where the spoken words are transcribed.
   - **Unlabeled Data**: A large collection of raw audio without transcriptions.
   - **Process**: The model initially learns basic patterns like phonemes and words from the labeled data. The unlabeled audio is then used to refine its predictions by learning additional speech patterns, improving recognition of new words or dialects.

3. **Text Classification**:
   - **Labeled Data**: A small set of text documents is labeled with categories like "sports," "technology," or "entertainment."
   - **Unlabeled Data**: A large corpus of documents without any category labels.
   - **Process**: The model uses the labeled data to learn the distinguishing features of each category. Then, it analyzes the structure and co-occurrence of words in the unlabeled data to improve classification performance on unseen documents.

##### Real-World Application: Google Photos
Google Photos uses semi-supervised learning to identify objects and people in images. While some photos are tagged or labeled by users, many images remain unlabeled. Google Photos learns from the few labeled images and uses information from the large dataset of unlabeled images to improve its image classification capabilities.

##### Advantages of Semi-Supervised Learning
- **Reduces labeling costs**: Less labeled data is required, which can be expensive and time-consuming to obtain.
- **Improved performance**: The model leverages both labeled and unlabeled data, leading to better generalization and more robust predictions.
- **Scalable**: It allows the model to benefit from the abundance of unlabeled data in the real world.

#### Self-supervised learning

Self-supervised learning is a type of machine learning where the system learns to predict part of its input data from other parts of the data. It is a form of unsupervised learning, but instead of relying on external labels, it generates its own labels from the data itself. In essence, the model learns by solving auxiliary tasks where it uses portions of the data to infer or predict other parts.

Self-supervised learning has become particularly important in tasks where labeled data is scarce but unlabeled data is abundant. The approach enables models to learn meaningful representations from the data, which can later be fine-tuned for specific tasks (e.g., classification, regression) using much smaller amounts of labeled data.

A model trained with self-supervised learning is **usually not the end goal**. Typically, you'll want to adjust and fine-tune the model for a related task that is of greater importance to you.

<img src="./pics/8.png" alt="Anomaly" width="700" height="450">

Imagine you want a model that can identify different pet species from photos. If you have a lot of pictures of pets without labels, you can start by teaching a model to fix damaged images. This model will learn to recognize different pets because it has to know which features to add to a masked face. After that, you can change the model to predict pet species instead of fixing images. Finally, you'll need to train it on a set of pictures with labels to make sure it associates the correct species with the right names.


##### How It Works
In self-supervised learning, the model is tasked with creating and solving "pretext tasks." These tasks are designed in such a way that solving them requires the model to learn useful representations of the data.

1. **Input Data**: The model is trained on a large amount of unlabeled data.
2. **Generated Labels**: The model generates labels by using part of the data to predict another part (e.g., masking parts of an image or text).
3. **Learning Process**: The model is trained to predict the hidden or missing part, thereby learning meaningful representations of the data without the need for human-annotated labels.

**Examples:**

1. **Natural Language Processing (NLP) - BERT (Bidirectional Encoder Representations from Transformers)**:
   - **Pretext Task**: Masked language modeling (MLM).
   - **Process**: A portion of words in a sentence is masked, and the model is trained to predict the missing (masked) words based on the context provided by the rest of the sentence. This allows the model to learn rich representations of words and their relationships.
   - **Example**: For the sentence "The cat sat on the [MASK]," the model predicts the missing word ("mat"). By solving this task over a large text corpus, the model learns deep semantic relationships between words.
   - **Application**: After pretraining on this task, BERT can be fine-tuned for specific tasks like sentiment analysis, question-answering, or named entity recognition using relatively small amounts of labeled data.

2. **Computer Vision - Contrastive Learning**:
   - **Pretext Task**: Image augmentation and contrastive loss.
   - **Process**: Two different views of the same image are created using data augmentations (e.g., cropping, rotation, or color jittering). The model learns to identify that these different views represent the same underlying image while distinguishing them from other images.
   - **Example**: A picture of a dog is cropped to focus on the head in one view and rotated in another view. The model learns that these two variations are still representations of the same dog. The goal is to bring these augmented views closer in the feature space while pushing apart views of different images.
   - **Application**: Once the model has learned good image representations, it can be fine-tuned for downstream tasks like object detection or image classification using a much smaller labeled dataset.
   
   Techniques like **SimCLR** and **MoCo** (Momentum Contrast) are widely used contrastive learning methods.

3. **Speech Processing - Audio Representation Learning**:
   - **Pretext Task**: Predict future segments of audio based on previous ones.
   - **Process**: In this task, a model is trained to predict the next part of an audio waveform given previous segments. The model learns meaningful temporal dependencies in the audio, such as patterns of speech, without the need for transcribed labels.
   - **Example**: In a speech clip, the model might be tasked with predicting the next second of audio based on the previous second, learning the flow and rhythm of speech. 
   - **Application**: These learned representations can then be fine-tuned for speech recognition, speaker identification, or emotion detection tasks.

4. **Time-Series Forecasting**:
   - **Pretext Task**: Predict the next value(s) in a time series given previous values.
   - **Process**: A self-supervised model can be trained to predict future data points (e.g., stock prices, temperature) based on past values. This enables the model to capture the underlying temporal patterns and trends in the data.
   - **Example**: For a dataset of daily temperature measurements, the model learns to predict tomorrow's temperature based on past days. By doing this repeatedly, the model captures seasonal patterns, trends, and fluctuations.
   - **Application**: These learned patterns can then be used in applications like forecasting or anomaly detection.

##### Real-World Application: GPT Models (Generative Pretrained Transformers)
GPT models are an example of self-supervised learning in NLP. They are trained using a **causal language modeling** pretext task, where the goal is to predict the next word in a sequence based on the preceding words. Through this task, GPT learns to capture complex language structures, enabling it to generate coherent text and perform tasks like translation, summarization, and conversation.

##### Advantages of Self-Supervised Learning
- **Scalability**: It leverages large amounts of unlabeled data, which is much easier to collect than labeled data.
- **Transferability**: The representations learned through self-supervised tasks can be transferred to other downstream tasks, reducing the need for labeled data in those tasks.
- **Efficiency**: Self-supervised models often outperform models trained purely on labeled data, particularly when labeled data is scarce.