Skip to content

Capella35/Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Customer Segmentation Using Clustering

Data The dataset contains information on marketing newsletters/e-mail campaigns (e-mail offers sent to customers) and transaction level data from customers. The transactional data shows which offer customers responded to, and what the customer ended up buying. The data is presented as an Excel workbook containing two worksheets. Each worksheet contains a different dataset. Data wrangling We're trying to learn more about how our customers behave, so we can use their behavior (whether or not they purchased something based on an offer) as a way to group similar minded customers together. We can then study those groups to look for patterns and trends which can help us formulate future offers.

The first thing we need is a way to compare customers. To do this, we're going to create a matrix that contains each customer and a 0/1 indicator for whether or not they responded to a given offer. K-Means Clustering Recall that in K-Means Clustering we want to maximize the distance between centroids and minimize the distance between data points and the respective centroid for the cluster they are in. True evaluation for unsupervised learning would require labeled data; however, we can use a variety of intuitive metrics to try to pick the number of clusters K. We will introduce two methods: the Elbow method, the Silhouette method and the gap statistic. Other Clustering Algorithms are also used for clustering: Affinity propagation damping, sample preference Not scalable with n_samples Many clusters, uneven cluster size, non-flat geometry Graph distance (e.g. nearest-neighbor graph) Spectral clustering number of clusters Medium n_samples, small n_clusters Few clusters, even cluster size, non-flat geometry Graph distance (e.g. nearest-neighbor graph) Agglomerative clustering number of clusters, linkage type, distance Large n_samples and n_clusters Many clusters, possibly connectivity constraints, non Euclidean distances Any pairwise distance DBSCAN neighborhood size Very large n_samples, medium n_clusters Non-flat geometry, uneven cluster sizes Distances between nearest points

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published