# Machine Learning

#### **Supervised Learning**
- **Data**: Labeled data (input features and corresponding output labels).
- **Goal**: Learn a mapping function from input features to output labels.
- **Process**: The algorithm is trained on the labeled data to learn patterns and relationships.


<img src="../../pandas/images/supervise.png" width="200">

`classification` and `regression` are not the only tasks that can be performed under supervised learning. <br> 
While they are two of the most common and well-known applications, there are other tasks that fall within the umbrella of supervised learning.

**`Classification`**
- **Task**: Predicting a categorical label or class.
- **Examples**:
    - Email spam detection (spam or not spam)
    - Image classification (cat, dog, or other)
    - Sentiment analysis (positive, negative, or neutral)
- **Common algorithms**:
    - Decision trees
    - Random forests
    - Support vector machines (SVMs)
    - Logistic regression
    - Neural networks (especially deep learning for complex problems)

**`Regression`**
- **Task**: Predicting a continuous numerical value.
- **Examples**:
    - Predicting house prices
    - Forecasting stock prices
    - Predicting customer churn
- **Common algorithms:**
    - Linear regression
    - Polynomial regression
    - Decision trees
    - Random forests
    - Neural networks

---


#### **Unsupervised Learning**
Is a type of machine learning where the algorithm learns from unlabeled data.<br> 
Unlike supervised learning, there's no explicit target variable to predict.<br>
Instead, the goal is to find patterns, structures, or relationships within the data itself.

- **Examples**:
    - Clustering (grouping similar data points)
    - Dimensionality reduction (reducing the number of features)
    - Anomaly detection (identifying unusual data points)
    - Association Rule Mining (Discovers interesting relationships or dependencies between items in a dataset.)


<img src="../../pandas/images/unsupervised.png" width="200">

**`Clustering`**
- **Task**: Grouping similar data points together.
- **Techniques**:
    - K-means clustering (Assigns data points to clusters based on their distance to cluster centroids.)
    - Hierarchical clustering (Creates a hierarchy of clusters by merging or splitting them based on similarity.)
    - DBSCAN (Groups data points based on density and distance thresholds.)
    - Gaussian mixture models (Assumes data is generated from a mixture of Gaussian distributions and fits the model to the data.)
- **Applications**:
    - Customer segmentation
    - Image segmentation
    - Anomaly detection

**`Association Rule Mining`**
- **Task**: Discovering interesting relationships between items in a dataset.
- **Techniques**:
    - Apriori algorithm (Finds frequent itemsets and generates association rules based on support, confidence, and lift.)
    - FP-growth algorithm (An efficient algorithm for frequent itemset mining.)
- **Applications**:
    - Market basket analysis
    - Recommendation systems

**`Dimensionality Reduction`**
- **Task**: Reducing the number of features in a dataset while preserving important information.
- **Techniques**:
    - Principal Component Analysis (PCA) (Finds a new set of uncorrelated variables (principal components) that capture the most variance in the data.)
    - t-SNE (t-Distributed Stochastic Neighbor Embedding) (Preserves local structure in the data while mapping high-dimensional data to a lower-dimensional space.)
    - UMAP (Uniform Manifold Approximation and Projection) (A more recent technique that is often faster and better preserves global structure than t-SNE.)
- **Applications**:
    - Data visualization
    - Feature engineering
    - Noise reduction

**`Anomaly detection`**
- **Task**: To identify data points that deviate significantly from normal patterns or expected behavior.
- **Techniques**:
    - Autoencoders (Neural networks trained to reconstruct input data. Anomalies are detected when the reconstruction error is high.)
    - Recurrent neural networks (RNNs) (Used for time series data, RNNs can learn patterns and detect anomalies based on temporal dependencies.)
    - Time series forecasting models (Can be used to predict future values and identify anomalies based on deviations from the predicted values.)
- **Applications**:
    - Fraud detection: Identifying unusual transactions or activities.
    - Network intrusion detection: Detecting malicious network traffic.
    - Quality control: Identifying defective products.
    - Medical diagnosis: Detecting anomalies in medical data (e.g., heart rate, blood pressure).
    - Sensor data analysis: Identifying sensor failures or unusual readings.

**In summary:**

- **Clustering groups** similar data points together.
- **Association rule mining** finds relationships between items.
- **Dimensionality reduction** simplifies data while preserving key information.
- **Anomaly detection** identify data points that deviate from normal patterns.

---

#### **Reinforcement Learning**
- **Data**: Rewards or punishments based on actions taken in an environment.
- **Goal**: Learn a policy to maximize rewards over time.
- **Process**: The agent interacts with an environment, taking actions and receiving feedback (rewards or punishments). It learns to make decisions that maximize rewards.
-  **Examples**:
    - Game playing (e.g., AlphaGo, AlphaZero)
    - Robotics (learning to control robots)

<img src="../../pandas/images/reinforcementLearning.png" width="200">

**Supervised learning** is like a teacher guiding a student, providing correct answers and feedback.<br>
**Unsupervised learning** is like exploring a new territory without a map, trying to find patterns and structures.<br>
**Reinforcement learning** is like learning through trial and error, receiving rewards or punishments for actions.<br>