###  <b><u> Unsupervised learning:</b></u>

Unsupervised learning is a type of machine learning where the algorithm is trained on data without labeled responses. The goal is to uncover hidden patterns or intrinsic structures from input data. Unsupervised learning is particularly useful in situations where the data is unlabeled, making it difficult to apply supervised learning techniques. Here are some key concepts, techniques, and applications of unsupervised learning in machine learning:

### Key Concepts

1. **Clustering**: Grouping a set of objects in such a way that objects in the same group (or cluster) are more similar to each other than to those in other groups. Common clustering algorithms include:
   - **K-means Clustering**: Partitions the data into \( K \) clusters by minimizing the variance within each cluster.
   - **Hierarchical Clustering**: Builds a hierarchy of clusters either by a bottom-up (agglomerative) or top-down (divisive) approach.
   - **DBSCAN (Density-Based Spatial Clustering of Applications with Noise)**: Forms clusters based on the density of points in the data space, capable of finding arbitrarily shaped clusters and handling noise.

2. **Dimensionality Reduction**: Reducing the number of random variables under consideration by obtaining a set of principal variables. Techniques include:
   - **Principal Component Analysis (PCA)**: Projects the data onto a lower-dimensional space by maximizing the variance.
   - **t-Distributed Stochastic Neighbor Embedding (t-SNE)**: Reduces dimensionality while preserving the local structure of data, useful for visualization in 2D or 3D.
   - **Autoencoders**: Neural networks used for learning efficient codings of input data by training the network to ignore noise and reduce dimensionality.

3. **Association Rule Learning**: Identifying interesting relations between variables in large databases. Techniques include:
   - **Apriori Algorithm**: Identifies frequent itemsets and constructs association rules from them.
   - **Eclat Algorithm**: Uses a depth-first search approach to find frequent itemsets.

4. **Anomaly Detection**: Identifying rare items, events, or observations which raise suspicions by differing significantly from the majority of the data. Techniques include:
   - **Isolation Forest**: Identifies anomalies by isolating observations.
   - **One-Class SVM**: Identifies the boundary of the normal data points in the feature space and classifies points outside this boundary as anomalies.

### Techniques

1. **K-means Clustering**:
   - Initialize \( K \) centroids randomly.
   - Assign each data point to the nearest centroid.
   - Recalculate the centroids as the mean of all data points assigned to each centroid.
   - Repeat the assignment and update steps until convergence.

2. **Hierarchical Clustering**:
   - Agglomerative: Start with each data point as a single cluster and iteratively merge the closest clusters.
   - Divisive: Start with all data points in one cluster and iteratively split the cluster into smaller clusters.

3. **Principal Component Analysis (PCA)**:
   - Standardize the data.
   - Compute the covariance matrix.
   - Perform eigenvalue decomposition to obtain eigenvectors (principal components).
   - Project the data onto the selected principal components.

4. **t-SNE**:
   - Compute pairwise similarities in the high-dimensional space.
   - Compute pairwise similarities in the low-dimensional space.
   - Minimize the divergence between these two similarity distributions using gradient descent.

5. **Autoencoders**:
   - Train a neural network to map the input to a lower-dimensional latent space (encoding).
   - Reconstruct the input from the latent space (decoding).
   - Minimize the reconstruction error.

### Applications

1. **Customer Segmentation**: Grouping customers based on purchasing behavior for targeted marketing.
2. **Image Compression**: Reducing the size of image files while maintaining quality using techniques like PCA and autoencoders.
3. **Anomaly Detection**: Identifying fraudulent transactions, network intrusions, or rare diseases.
4. **Market Basket Analysis**: Discovering associations between products in transaction data for cross-selling strategies.
5. **Document Clustering**: Organizing large sets of documents into meaningful clusters for easier retrieval and summarization.
6. **DBSCAN** :DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular clustering algorithm used in machine learning and data mining. It is designed to identify clusters of varying shapes and sizes in data that contain noise and outliers. Unlike other clustering methods such as k-means, DBSCAN does not require specifying the number of clusters in advance. Here is a detailed overview of DBSCAN, including its principles, algorithm, parameters, advantages, and limitations.

### Tools and Libraries

- **Python**: Libraries like scikit-learn, TensorFlow, Keras, and PyTorch provide implementations of unsupervised learning algorithms.
- **R**: Packages such as `cluster`, `factoextra`, and `arules`.
- **MATLAB**: Functions and toolboxes for clustering, dimensionality reduction, and other unsupervised techniques.

Unsupervised learning is essential for exploratory data analysis, pattern recognition, and knowledge discovery in datasets without predefined labels. By leveraging these techniques, you can gain insights into the underlying structure of your data and make informed decisions based on these insights.

<img src="https://media.geeksforgeeks.org/wp-content/uploads/20231124111325/Unsupervised-learning.png" width="275" height="200">
<img src="https://www.bombaysoftwares.com/_next/image?url=https%3A%2F%2Fbs-cms-media-prod.s3.ap-south-1.amazonaws.com%2FUnsupervised_Learning_951a02cb59.jpg&w=3840&q=75" width="275" height="200">
<img src="https://cdn-images-1.medium.com/v2/resize:fit:1600/1*C7meM0f0_oy8RPM9HDb5gg.png" width="275" height="200">
<img src="https://eastgate-software.com/wp-content/uploads/2023/10/Unsupervised-Learning-Clustering.png" width="275" height="200">