Machine_learning_clustering_Credit_Card_Customer

What is clustreing in machine learning?

Cluster analysis is a machine learning approach that involves grouping unlabelled data based on shared characteristics. Essentially, it involves dividing a dataset into clusters of data points that are similar to each other, while being dissimilar to those in other clusters. To accomplish this, the algorithm seeks out patterns within the unlabelled dataset, such as shape, size, color, or behavior, and then groups data points based on the presence or absence of those patterns. Unlike supervised learning methods, cluster analysis is an unsupervised technique that doesn't require labeled data. Once the algorithm has completed the clustering process, each group is assigned a unique cluster ID, which can be used to simplify the processing of large and complex datasets by the machine learning system.

I have used threes approaches of clustering method on the data of credit card. The steps of the works are as follows:

I check and rectify the missing values in the dataset:

The correlations between the parameters of the dataset is analyzed:

The histograms of two parameters of the dataset have been visualized.

The dataset has been scaled:

K-means clustering

Now, we are able to apply the clustering methods.

In data mining, the K-means algorithm is utilized to handle learning data. Initially, a set of random centroids is selected and assigned to each cluster. The algorithm then iteratively performs calculations to optimize the positions of the centroids until they become stable or a defined number of iterations is reached. At this point, the creation and optimization of clusters stop, indicating a successful clustering process. Important step is to determine the number of clusters. I used Silhouette Coefficient.

3 clusters were chosen based on silhouette analysis.

Clustering results by K-mean between two variables:

Clustering results by K-mean between all variables:

Agglomerative Hierarchical Clustering

The Agglomerative Hierarchical Clustering Technique involves treating each data point as a separate cluster at the beginning. Through a series of iterations, similar clusters are combined with one another until a single cluster or K clusters are formed.

Important step is to determine the number of clusters. I used Silhouette Coefficient.

2 clusters were chosen based on silhouette analysis.

Clustering results by Agglomerative Hierarchical Clustering Technique between two variables:

Clustering results by Agglomerative Hierarchical Clustering Technique between all variables:

DBSCAN clustering

DBSCAN, an acronym for Density-Based Spatial Clustering of Applications with Noise, is an unsupervised clustering algorithm that relies on density to group similar data points together. It identifies clusters as dense regions, separated by areas of lower densities.

eps of this method was chosen 4500 based on the changing point of K-NN analysis.

Clustering results by DBSCAN between two variables:

Clustering results by DBSCAN between Aall variables:

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Customer_Data-kaggle.csv		Customer_Data-kaggle.csv
LICENSE		LICENSE
README.md		README.md
balancee.png		balancee.png
correlation.png		correlation.png
credit_clustering.py		credit_clustering.py
credit_limit.png		credit_limit.png
dbs_cluster_num.png		dbs_cluster_num.png
dbscan_all_cluster.png		dbscan_all_cluster.png
dbscan_pair_cluster.png		dbscan_pair_cluster.png
hierarchy_all_cluster.png		hierarchy_all_cluster.png
hierarchy_cluster_number.png		hierarchy_cluster_number.png
hierarchy_pair_cluster.png		hierarchy_pair_cluster.png
kmean_all_cluster.png		kmean_all_cluster.png
kmean_cluster.png		kmean_cluster.png
kmean_pair_cluster.png		kmean_pair_cluster.png
missing.png		missing.png
scaled.png		scaled.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine_learning_clustering_Credit_Card_Customer

What is clustreing in machine learning?

K-means clustering

Agglomerative Hierarchical Clustering

DBSCAN clustering

About

Releases

Packages

Languages

License

Sarvandani/Machine_learning_clustering_Credit_Card_Customer

Folders and files

Latest commit

History

Repository files navigation

Machine_learning_clustering_Credit_Card_Customer

What is clustreing in machine learning?

K-means clustering

Agglomerative Hierarchical Clustering

DBSCAN clustering

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages