In [None]:
Q1. Explain the basic concept of clustering and give examples of applications where clustering is useful.
Ans:
Clustering is a fundamental concept in machine learning and data analysis that involves grouping similar data points together based on their inherent characteristics or patterns. 
The goal of clustering is to identify clusters or subgroups within a dataset, where data points within the same cluster are more similar to each other than to those in other clusters. 
Clustering allows for the exploration, organization, and understanding of complex data by revealing underlying structures and relationships.

Here are some examples of applications where clustering is useful:

1. Customer Segmentation: Clustering is often employed to segment customers based on their purchasing behavior, demographics, or preferences. 
This helps businesses tailor marketing strategies, personalize offerings, and understand different customer segments to optimize customer satisfaction and maximize revenue.

2. Image Segmentation: Clustering can be utilized to segment images into meaningful regions based on similarity in color, texture, or other visual attributes.
It finds applications in object recognition, computer vision, and medical imaging, where identifying distinct regions of interest is crucial.

3. Document Clustering: Clustering can group similar documents together based on their content, allowing for document organization, topic discovery, and information retrieval.
It is commonly used in text mining, document classification, and recommendation systems.

4. Anomaly Detection: Clustering can help identify anomalies or outliers in a dataset by considering data points that do not conform to the majority patterns or clusters. 
Anomalies can be indicative of fraudulent activities, network intrusions, or malfunctioning equipment in various domains such as finance, cybersecurity, and industrial monitoring.

5. Social Network Analysis: Clustering can be applied to analyze social networks and identify communities or groups of individuals with similar interests or connections. 
It helps in understanding social structures, influence propagation, and targeted marketing strategies.

6. Gene Expression Analysis: Clustering is used to group genes or samples with similar expression profiles in genomic studies. 
It enables the discovery of gene clusters associated with specific diseases, biomarkers, or treatment responses, aiding in personalized medicine and drug development.

7. Market Segmentation: Clustering can assist in market research and segmentation by grouping products or services based on similar features, pricing, or customer preferences. 
This enables companies to target specific market segments and develop effective marketing strategies.

8. Geographic Data Analysis: Clustering can help identify spatial patterns and groupings in geographic data such as urban planning, crime analysis, or ecological studies. 
It aids in understanding regional characteristics, resource allocation, and decision-making processes.

These examples demonstrate the broad applicability of clustering across various domains, highlighting its utility in data exploration, pattern recognition, and decision support. 
Clustering provides valuable insights into complex datasets and empowers data-driven decision-making processes.

In [None]:
Q2. What is DBSCAN and how does it differ from other clustering algorithms such as k-means and
hierarchical clustering?
Ans:
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering algorithm that groups together data points based on their density and proximity. 
It differs from other clustering algorithms such as k-means and hierarchical clustering in several key ways:

1. Handling Arbitrary-Shaped Clusters: DBSCAN can identify clusters of arbitrary shapes, whereas k-means and hierarchical clustering typically assume clusters with a spherical or convex shape. 
DBSCAN is able to discover clusters with irregular shapes, including clusters that are dense and well-separated, clusters that are dense but overlapping, 
and clusters that are of varying densities.

2. No Assumption of Cluster Number: Unlike k-means, DBSCAN does not require specifying the number of clusters in advance.
It automatically determines the number of clusters based on the density and proximity of data points, making it more suitable for datasets where the number of clusters is unknown or variable.

3. Ability to Detect Outliers: DBSCAN is effective at identifying outliers or noise points in the data. 
It labels data points that do not belong to any cluster as noise, allowing for the detection of anomalies or unusual data instances.

4. Parameter Sensitivity: DBSCAN has two important parameters: "epsilon" (ε), which defines the radius within which neighboring points are considered part of the same cluster,
and "minPts," which sets the minimum number of neighboring points required to form a dense region. 
Selecting appropriate values for these parameters can impact the clustering results, and finding optimal values may require some experimentation and domain knowledge.

5. Density-Based Clustering Approach: DBSCAN operates based on the concept of density.
It defines clusters as regions of high-density separated by regions of low-density.
It aims to find dense areas and connect neighboring points based on their density, disregarding sparser areas. 
In contrast, k-means focuses on minimizing the distance between centroids and data points, 
while hierarchical clustering builds a hierarchical structure by repeatedly merging or splitting clusters based on distance or similarity measures.

6. Computation Efficiency: DBSCAN can be more computationally efficient than hierarchical clustering, especially for large datasets, 
as it does not require calculating distances between all pairs of data points.
Instead, it uses local density information to identify clusters, which can result in faster processing times.

DBSCAN is particularly effective for datasets with varying cluster densities, non-linear cluster shapes, and when the number of clusters is unknown. 
It is robust to outliers and noise and does not rely on distance-based assumptions.
However, it may struggle with high-dimensional data and when clusters have significantly different densities.
Proper parameter selection is crucial to achieving good clustering results with DBSCAN.

In [None]:
Q3. How do you determine the optimal values for the epsilon and minimum points parameters in DBSCAN
clustering?
Ans:
Determining the optimal values for the epsilon (ε) and minimum points (minPts) parameters in DBSCAN clustering can be a challenging task. 
The optimal values depend on the specific characteristics of the dataset, such as the density and distribution of the data points, as well as the desired clustering outcomes. 
Here are a few approaches that can be helpful in determining the optimal parameter values:

1. Visual Inspection: Plotting the data points and examining the distribution and density can provide insights into suitable parameter values. 
By visually analyzing the dataset, you can estimate the appropriate neighborhood size (epsilon) that captures the local density and 
identify the minimum number of points (minPts) required to consider a region as a dense region.

2. Elbow Method: Although the elbow method is commonly used for determining the optimal number of clusters in k-means clustering, 
it can also be applied to DBSCAN to find an appropriate value for epsilon. 
The idea is to plot the distance to the kth nearest neighbor for each data point and observe the point at which the distance curve exhibits a significant change. 
This point can indicate an appropriate value for epsilon.

3. Reachability Plot: The reachability plot is a useful tool for analyzing the density connectivity within the dataset. 
It involves ordering the data points based on their reachability distance (distance to the nearest core point) and plotting the reachability distance against the index. 
By examining the plot, you can identify distinct regions or jumps that correspond to different densities, which can guide the selection of epsilon and minPts.

4. Silhouette Score: The silhouette score is a metric that quantifies the quality of clustering results. 
It measures how well each data point fits into its assigned cluster, considering both cohesion (similarity to data points within the cluster) and separation (dissimilarity to data points in other clusters). 
By evaluating the silhouette score for different parameter combinations, you can identify the values that yield the highest overall score.

5. Domain Knowledge and Experimentation: Having domain knowledge about the dataset can guide the selection of appropriate parameter values. 
Understanding the underlying characteristics, the expected cluster density, and the specific requirements of the problem can provide valuable insights.
Additionally, conducting iterative experiments by trying different combinations of epsilon and minPts and evaluating the clustering results can help fine-tune the parameter values.

Its important to note that determining the optimal parameter values in DBSCAN may require some trial and error and domain expertise. 
The choice of parameter values can significantly impact the clustering results, so its recommended to assess the stability and 
consistency of the clustering outcomes across different parameter settings and consider the interpretability and meaningfulness of the resulting clusters.

In [None]:
Q4. How does DBSCAN clustering handle outliers in a dataset?
Ans:
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering handles outliers in a dataset by classifying them as noise points or outliers. 
This is one of the advantages of DBSCAN compared to other clustering algorithms.

In DBSCAN, outliers are data points that do not belong to any dense region or cluster. 
These points are typically located in sparser areas of the dataset or far away from any cluster.
DBSCAN identifies outliers by considering the density of data points and their proximity to other points.

The algorithm classifies points into three categories:

1. Core Points: Core points are data points that have a sufficient number of neighboring points within a specified radius (epsilon, ε). 
These core points are considered the foundation of a cluster and play a crucial role in determining cluster membership.

2. Border Points: Border points are data points that are within the proximity of a core point but do not have enough neighboring points to be considered core points themselves. 
Border points are assigned to the cluster of their corresponding core point.

3. Noise Points/Outliers: Noise points, also known as outliers, are data points that are neither core points nor border points.
These points do not have enough neighboring points within the specified radius to form a cluster. 
DBSCAN labels them as noise points or outliers.

By classifying outliers separately from the clusters, DBSCAN provides flexibility in dealing with datasets that contain noise or irregularly shaped clusters. 
The identification of outliers can be useful for various applications, including anomaly detection, identifying unusual data instances, or removing noise from the dataset.

Its important to note that the detection of outliers in DBSCAN depends on the parameter choices, specifically the epsilon (ε) and minimum points (minPts) parameters. 
Adjusting these parameters can influence the sensitivity to outliers. 
For example, increasing the epsilon value may result in more points being labeled as noise, while decreasing it may cause some outliers to be included in clusters.

When using DBSCAN, it is recommended to analyze and interpret the outliers in the context of the dataset and the specific problem at hand. 
Depending on the application, outliers may be treated differently, such as investigating their nature, assessing their impact on the clustering results, 
or even considering them as valuable data points with unique characteristics.

In [None]:
Q5. How does DBSCAN clustering differ from k-means clustering?
Ans:
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering and k-means clustering are two distinct clustering algorithms with different approaches and characteristics. 
Here are the key differences between DBSCAN and k-means clustering:

1. Clustering Approach:
   - DBSCAN: DBSCAN is a density-based clustering algorithm. 
It groups data points based on their density and proximity, forming clusters where data points are closely packed together. 
It can discover clusters of arbitrary shapes and sizes and is robust to noise and outliers.
   - K-means: K-means is a centroid-based clustering algorithm.
    It aims to partition data points into a predefined number of clusters by minimizing the sum of squared distances between data points and the cluster centroids. 
    It assumes that clusters are spherical and equally sized.

2. Number of Clusters:
   - DBSCAN: DBSCAN does not require specifying the number of clusters in advance. 
It automatically determines the number of clusters based on the density and proximity of data points.
   - K-means: K-means requires the number of clusters to be specified before running the algorithm. 
    The number of clusters is a user-defined parameter.

3. Handling Cluster Shapes and Sizes:
   - DBSCAN: DBSCAN can discover clusters of arbitrary shapes and sizes. 
It is capable of identifying clusters with irregular shapes, clusters that are dense but overlapping, and clusters with varying densities.
   - K-means: K-means assumes clusters to be spherical and equally sized. 
    It struggles to handle clusters with irregular shapes or varying sizes.

4. Treatment of Outliers:
   - DBSCAN: DBSCAN can identify outliers or noise points as data points that do not belong to any dense region or cluster. 
It classifies such points as noise or outliers, providing a separate category for them.
   - K-means: K-means does not explicitly handle outliers. 
    Outliers can have a significant impact on the centroid computation and cluster assignments, potentially leading to suboptimal results.

5. Parameter Sensitivity:
   - DBSCAN: DBSCAN has two important parameters: epsilon (ε), which defines the radius within which neighboring points are considered part of the same cluster,
and minimum points (minPts), which sets the minimum number of neighboring points required to form a dense region. 
Selecting appropriate parameter values can be crucial for obtaining meaningful clustering results.
   - K-means: K-means has a single parameter, the number of clusters, which is specified in advance. 
    While selecting the number of clusters is important, it does not require fine-tuning multiple parameters like DBSCAN.

Overall, DBSCAN is advantageous for discovering clusters with irregular shapes, handling varying cluster densities, and automatically identifying outliers.
On the other hand, k-means is simpler to implement, computationally efficient, and suitable for datasets where spherical clusters and a known number of clusters are expected.
The choice between DBSCAN and k-means depends on the characteristics of the data, the desired clustering outcomes, and the presence of prior knowledge about the number and shapes of clusters.

In [None]:
Q6. Can DBSCAN clustering be applied to datasets with high dimensional feature spaces? If so, what are
some potential challenges?
Ans:
DBSCAN clustering can be applied to datasets with high-dimensional feature spaces, but there are some potential challenges associated with using DBSCAN in such cases:

1. Curse of Dimensionality: High-dimensional spaces suffer from the curse of dimensionality, where the density of data points becomes sparse. 
As the number of dimensions increases, the available data points become more spread out, making it harder to define dense regions. 
The effectiveness of density-based clustering algorithms like DBSCAN relies on the local density of data points, which can be challenging to determine accurately in high-dimensional spaces.

2. Distance Metric Selection: DBSCAN uses a distance or similarity metric (e.g., Euclidean distance) to measure the proximity between data points. 
In high-dimensional spaces, traditional distance metrics can become less meaningful due to the increased number of dimensions. 
The phenomenon known as "distance concentration" occurs, where distances between data points tend to become similar, making it difficult to differentiate between dense and sparse regions accurately. 
It is crucial to carefully select or adapt the distance metric to account for the peculiarities of high-dimensional data.

3. Dimensionality Reduction: In high-dimensional spaces, it is often helpful to apply dimensionality reduction techniques to reduce the number of dimensions and mitigate the curse of dimensionality. 
Techniques like Principal Component Analysis (PCA) or t-SNE (t-Distributed Stochastic Neighbor Embedding) can be used to project the data onto lower-dimensional spaces while preserving the important structural information. 
By reducing the dimensionality, DBSCAN can potentially perform better by focusing on the most informative features.

4. Parameter Selection: Choosing suitable parameter values for DBSCAN becomes more challenging in high-dimensional spaces. 
The epsilon (ε) parameter, which defines the neighborhood size, needs to be carefully selected as the notion of distance changes with the increased number of dimensions. 
Likewise, determining the appropriate minimum points (minPts) value becomes more complex, as the concept of density changes in high-dimensional spaces. 
It may require experimentation and fine-tuning of parameters to obtain meaningful clustering results.

5. Interpretability and Visualization: High-dimensional data can be difficult to interpret and visualize.
While DBSCAN can cluster the data, understanding and visualizing the resulting clusters become challenging in high-dimensional spaces. 
Effective visualization techniques like dimensionality reduction or feature selection can aid in understanding the clustering outcomes and revealing underlying patterns.

In summary, while DBSCAN can be applied to datasets with high-dimensional feature spaces, it is important to be aware of the challenges associated with the curse of dimensionality,
distance metric selection, parameter sensitivity, and interpretation of results. 
Preprocessing steps such as dimensionality reduction and careful consideration of distance metrics are often necessary to address these challenges and improve the performance and interpretability of DBSCAN in high-dimensional settings.

In [None]:
Q7. How does DBSCAN clustering handle clusters with varying densities?
Ans:
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering is well-suited for handling clusters with varying densities. 
Unlike some other clustering algorithms, DBSCAN does not assume that clusters have the same density or size. 
It is capable of discovering clusters of arbitrary shapes and sizes, making it robust in scenarios where clusters have varying densities.
Heres how DBSCAN handles clusters with varying densities:

1. Density-Based Clustering: DBSCAN identifies clusters based on the notion of density.
It defines a data point as a core point if there are at least a specified number of neighboring points (minPts) within a specified radius (epsilon, ε). 
A neighborhood is considered dense if it contains a sufficient number of neighboring points. 
DBSCAN starts from a core point and expands the cluster by connecting it to other core points that are reachable within the specified radius. 
This mechanism allows DBSCAN to capture clusters of different densities.

2. Core Points and Border Points: In DBSCAN, core points are at the heart of forming clusters. 
They have enough neighboring points within the specified radius and are considered dense. 
Border points are data points that are within the specified radius of a core point but do not have enough neighbors to be considered core points themselves.
Border points are still part of the cluster and contribute to its shape, but they may have a lower density compared to core points.

3. Density-Reachability: DBSCAN uses density-reachability to connect dense regions of varying densities.
A point is considered density-reachable from another point if there is a path of core points connecting them, even if the intermediate points are not core points themselves. 
This enables DBSCAN to connect clusters that have varying densities and bridge regions with lower densities.

4. Flexibility in Parameter Selection: DBSCANs ability to handle varying densities is influenced by the parameter choices of epsilon (ε) and minimum points (minPts). 
The epsilon parameter defines the maximum distance for points to be considered part of the same cluster, 
while the minPts parameter determines the minimum number of neighboring points required for a point to be considered a core point. 
Adjusting these parameters allows flexibility in capturing clusters with different densities. 
Larger epsilon values can capture clusters with lower densities, while smaller values can focus on denser regions.

By considering local density and connectivity, DBSCAN can effectively identify clusters with varying densities. 
It is particularly useful in scenarios where clusters are irregularly shaped, have different sizes, or exhibit different degrees of density. 
This flexibility makes DBSCAN a valuable tool for clustering data with complex density patterns, such as spatial data, anomaly detection, or grouping data with varying levels of granularity.

In [None]:
Q8. What are some common evaluation metrics used to assess the quality of DBSCAN clustering results?
Ans:
There are several evaluation metrics commonly used to assess the quality of DBSCAN clustering results. 
These metrics provide quantitative measures of how well the clustering algorithm has performed in terms of the compactness, separation, and consistency of the resulting clusters. 
Here are some common evaluation metrics for DBSCAN clustering:

1. Silhouette Coefficient: The Silhouette Coefficient measures the compactness and separation of clusters. 
It considers both the average intra-cluster distance and the average nearest-cluster distance for each data point. 
The coefficient ranges from -1 to 1, with values closer to 1 indicating well-separated and compact clusters, 
values around 0 indicating overlapping or poorly separated clusters, and values close to -1 indicating incorrect clustering.

2. Davies-Bouldin Index (DBI): The DBI evaluates the compactness and separation of clusters similar to the Silhouette Coefficient. 
It measures the average similarity between clusters while considering their spread. 
A lower DBI value indicates better clustering, with values closer to 0 indicating compact, well-separated clusters.

3. Calinski-Harabasz Index: The Calinski-Harabasz Index measures the ratio of between-cluster dispersion to within-cluster dispersion. 
It assesses the compactness and separation of clusters, with higher index values indicating better-defined clusters.

4. Dunn Index: The Dunn Index evaluates the compactness and separation of clusters by considering both the minimum inter-cluster distance and the maximum intra-cluster distance. 
It aims to maximize the inter-cluster distance while minimizing the intra-cluster distance.
Higher Dunn Index values correspond to better clustering results.

5. Rand Index: The Rand Index measures the similarity between the clustering results and a reference clustering (if available). 
It compares the pairwise agreements between the true and predicted cluster assignments.
A higher Rand Index value indicates better agreement between the true and predicted clusters.

6. Jaccard Index: The Jaccard Index is another similarity-based measure that compares the similarity between the true and predicted cluster assignments. 
It considers the number of shared data points between clusters. 
Higher Jaccard Index values indicate better clustering agreement.

Its important to note that evaluation metrics alone may not provide a complete assessment of clustering quality. 
The choice of evaluation metric depends on the specific characteristics of the data, the clustering goals, and the availability of ground truth or reference clustering. 
It is often recommended to use a combination of metrics and also visually inspect the resulting clusters to gain a comprehensive understanding of the clustering performance.

In [None]:
Q9. Can DBSCAN clustering be used for semi-supervised learning tasks?
Ans:
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering is primarily an unsupervised learning algorithm designed to identify clusters in unlabeled data. 
However, DBSCAN can also be utilized in semi-supervised learning tasks by incorporating limited labeled information into the clustering process.
Here are a few ways DBSCAN can be used in semi-supervised learning:

1. Cluster-based labeling: After applying DBSCAN clustering on the unlabeled data, the resulting clusters can be used to infer labels for the unannotated data points. 
This assumes that data points within the same cluster share similar characteristics or belong to the same class. 
The labels of the labeled data points within a cluster can be propagated to the unlabeled points in that cluster, effectively assigning them labels.

2. Seed-based labeling: In semi-supervised scenarios, DBSCAN can be initialized with a small number of labeled seed points. 
These seed points act as anchor points with known labels. 
DBSCAN then expands the clusters by considering both the density-based neighborhood relationships and the labels of the seed points. 
The labels of the seed points can influence the assignment of labels to nearby unlabeled points during the clustering process.

3. Incorporating labeled points as constraints: DBSCAN can be modified to incorporate labeled points as constraints during the clustering process. 
Labeled points are treated as "must-link" or "cannot-link" constraints, indicating whether they should be assigned to the same cluster or different clusters. 
This modification guides the clustering algorithm to respect the known labels while discovering clusters.

While DBSCAN can be adapted for semi-supervised learning tasks, its important to note that it is primarily designed for unsupervised clustering. 
Its effectiveness in semi-supervised scenarios depends on the availability and quality of labeled data, as well as the characteristics of the dataset and the underlying assumptions of the clustering algorithm. 
In some cases, other semi-supervised learning algorithms specifically designed for utilizing labeled and unlabeled data, such as co-training or self-training approaches, may be more suitable.

In [None]:
Q10. How does DBSCAN clustering handle datasets with noise or missing values?
Ans:
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) clustering can handle datasets with noise or missing values to some extent.
Heres how DBSCAN handles these scenarios:

1. Noise Handling: DBSCAN is designed to handle noise or outliers in the data. 
It identifies data points that do not belong to any cluster as noise points. 
Noise points are typically isolated and have low density, meaning they do not have enough neighboring points within the specified radius (epsilon, ε) to be considered core points.
DBSCAN explicitly distinguishes noise points from the clustered data, allowing you to identify and exclude them from the resulting clusters if desired.

2. Missing Values: DBSCAN can handle datasets with missing values, but it requires some preprocessing steps. 
Since DBSCAN relies on distance or similarity measures, missing values can pose challenges. 
Here are two common approaches to handle missing values in DBSCAN:

   a. Data Imputation: One approach is to impute the missing values before applying DBSCAN. 
There are various techniques available for imputing missing values, such as mean imputation, median imputation, or more sophisticated methods like k-nearest neighbors (KNN) imputation. 
By imputing the missing values, you can ensure that the distance calculations in DBSCAN are performed properly.

   b. Exclude Missing Values: Another approach is to exclude data points with missing values from the DBSCAN analysis. 
    This approach is suitable when missing values are prevalent and imputation may not be appropriate or feasible. 
    In this case, you can either remove the data points with missing values or treat them as separate noise points during the clustering process.

Its important to note that the effectiveness of handling noise or missing values in DBSCAN depends on the nature and extent of the noise or missing data in the dataset, as well as the chosen approach for handling them. 
Preprocessing steps such as data imputation or handling missing values appropriately are crucial to ensure accurate and meaningful clustering results.
Additionally, other clustering algorithms specifically designed to handle missing values,
such as K-Prototypes or algorithms based on probabilistic models, may be more suitable for datasets with substantial missing values.

In [None]:
Q11. Implement the DBSCAN algorithm using a python programming language, and apply it to a sample
dataset. Discuss the clustering results and interpret the meaning of the obtained clusters.
Ans:

In [3]:
import numpy as np
from sklearn.datasets import make_moons
from sklearn.neighbors import NearestNeighbors

def dbscan(X, eps, min_samples):
    n_samples, n_features = X.shape
    labels = np.zeros(n_samples)  # Initialize cluster labels
    visited = np.zeros(n_samples, dtype=bool)  # Keep track of visited points
    
    cluster_label = 0
    
    # Find core points and assign cluster labels
    for i in range(n_samples):
        if visited[i]:
            continue
        visited[i] = True
        
        neighbors = region_query(X, i, eps)
        if len(neighbors) < min_samples:
            labels[i] = -1  # Mark as noise
        else:
            cluster_label += 1
            expand_cluster(X, i, neighbors, labels, visited, cluster_label, eps, min_samples)
    
    return labels

def expand_cluster(X, point_index, neighbors, labels, visited, cluster_label, eps, min_samples):
    labels[point_index] = cluster_label
    
    i = 0
    while i < len(neighbors):
        neighbor = neighbors[i]
        if not visited[neighbor]:
            visited[neighbor] = True
            neighbor_neighbors = region_query(X, neighbor, eps)
            
            if len(neighbor_neighbors) >= min_samples:
                neighbors += neighbor_neighbors
        if labels[neighbor] == 0:
            labels[neighbor] = cluster_label
        i += 1

def region_query(X, point_index, eps):
    nbrs = NearestNeighbors(radius=eps).fit(X)
    indices = nbrs.radius_neighbors([X[point_index]])[1][0]
    return indices

In [None]:
# Sample dataset
X, y = make_moons(n_samples=200, noise=0.05, random_state=42)

# Applying DBSCAN
eps = 0.3  # Epsilon parameter
min_samples = 5  # Minimum number of samples
labels = dbscan(X, eps, min_samples)

# Interpretation of the obtained clusters
unique_labels = np.unique(labels)
n_clusters = len(unique_labels) - 1  # Excluding noise points

print(f"Number of clusters found: {n_clusters}")

for label in unique_labels:
    if label == -1:
        print(f"Noise points: {np.sum(labels == label)}")
    else:
        print(f"Cluster {label}: {np.sum(labels == label)} points")