Skip to content

Use unsupervised machine learning, PCA algorithm, and K-Means clustering to analyze and classify a database of cryptocurrencies.

Notifications You must be signed in to change notification settings

cedoula/Cryptocurrencies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Cryptocurrencies

Analysis Overview

The purpose of this project is to use unsupervised machine learning to analyze a database of cryptocurrencies and create a report including the traded cryptocurrencies classified by group according to their features.
This classification report could be used by an investment bank to propose a new cryptocurrency investment portfolio to its clients.
We use the following methods for the analysis:

  • preprocessing the database,
  • reducing the data dimension using Principal Component Analysis,
  • clustering cryptocurrencies using K-Means,
  • visualizing classification results with 2D and 3D scatter plots.

Resources

Results

Following the preprocessing and cleaning phase we have a total of 532 tradable cryptocurrencies.

Clustering Cryptocurrencies using K-Means - Elbow Curve

We don't know what would be the output of the analysis so we are using unsupervised machine learning to identify clusters of the cryptocurrencies.
We produced the elbow curve below using the K-Means method iterating on k values from 1 to 10.

The best k value appears to be 4 so we would conclude on an output of 4 clusters to categorize the crytocurrencies.

Visualizing Cryptocurrencies Results

3D-Scatter plot with clusters

This 3-D scatter plot was obtained using the PCA algorithm to reduce the crytocurrencies dimensions to three principal components.

2D-Scatter plot with clusters

This 2-D scatter plot was obtained using the PCA algorithm to reduce the crytocurrencies dimensions to two principal components.

Both these scatter plots show the distribution and the four clusters of cryptocurrencies.
We can identify the outliers like the unique cryptocurrency in the class #2.

Tradable Cryptocurrencies Table

Most of the cryptocurrencies are part of class #0 and #1.
The snapshot above shows that BitTorrent is the only cryptocurrency in class #2.

2D-Scatter plot with TotalCoinMined vs TotalCoinSupply

Plotting the scatter plot from two cryptocurrency features directly does not efficiently segregate the different classes. Then using the PCA algorithm is the right method for better visualizations.

Summary

We have identified the classification of 532 cryptocurrencies based on similarities of their features.
Particularities of each group need to be analyzed to determined their performance and potential interest for the investment bank's clients.

About

Use unsupervised machine learning, PCA algorithm, and K-Means clustering to analyze and classify a database of cryptocurrencies.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published