# Cryptocurrency Clusters

## Background:
    
####    You are on the Advisory Services Team of a financial consultancy. One of your clients, a prominent investment bank, is interested in offering a new cryptocurrency investment portfolio for its customers. The company, however, is lost in the vast universe of cryptocurrencies. They’ve asked you to create a report that includes what cryptocurrencies are on the trading market and determine whether they can be grouped to create a classification system for this new investment.
####    You have been handed raw data, so you will first need to process it to fit the machine learning models. Since there is no known classification system, you will need to use unsupervised learning. You will use several clustering algorithms to explore whether the cryptocurrencies can be grouped together with other similar cryptocurrencies. You will use data visualization to share your findings with the investment bank.

In [8]:
# Importing dependencies
import pandas as pd
import numpy as np
from pathlib import Path
from sklearn.preprocessing import normalize
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA

In [11]:
#Read `crypto_data.csv` into Pandas. The dataset was obtained from [CryptoCompare](https://min-api.cryptocompare.com/data/all/coinlist).
file_path = "C://Users/abrow/Desktop/EMERSON_BOOTCAMP/BOOTCAMP_AMB/Homeworks/hw15_02-12-2022_Unsupervised_Machine_Learning/Cryptocurrency-ChallengeUML/Instructions/crypto_data.csv"

crypto_df = pd.read_csv(file_path)
crypto_df.head()

Unnamed: 0.1,Unnamed: 0,CoinName,Algorithm,IsTrading,ProofType,TotalCoinsMined,TotalCoinSupply
0,42,42 Coin,Scrypt,True,PoW/PoS,41.99995,42
1,365,365Coin,X11,True,PoW/PoS,,2300000000
2,404,404Coin,Scrypt,True,PoW/PoS,1055185000.0,532000000
3,611,SixEleven,SHA-256,True,PoW,,611000
4,808,808,SHA-256,True,PoW/PoS,0.0,0


### Data Preparation:

In [13]:
# Discarding all cryptocurrencies that are not being traded. 
crypto_df.drop(crypto_df.loc[crypto_df['IsTrading']==False].index,inplace=True)
crypto_df

Unnamed: 0.1,Unnamed: 0,CoinName,Algorithm,IsTrading,ProofType,TotalCoinsMined,TotalCoinSupply
0,42,42 Coin,Scrypt,True,PoW/PoS,4.199995e+01,42
1,365,365Coin,X11,True,PoW/PoS,,2300000000
2,404,404Coin,Scrypt,True,PoW/PoS,1.055185e+09,532000000
3,611,SixEleven,SHA-256,True,PoW,,611000
4,808,808,SHA-256,True,PoW/PoS,0.000000e+00,0
...,...,...,...,...,...,...,...
1243,SERO,Super Zero,Ethash,True,PoW,,1000000000
1244,UOS,UOS,SHA-256,True,DPoI,,1000000000
1245,BDX,Beldex,CryptoNight,True,PoW,9.802226e+08,1400222610
1246,ZEN,Horizen,Equihash,True,PoW,7.296538e+06,21000000


In [16]:
# Dropping the `IsTrading` column from the dataframe.
isTrading_dropped = crypto_df .drop(['IsTrading'], axis=1)
isTrading_dropped

Unnamed: 0.1,Unnamed: 0,CoinName,Algorithm,ProofType,TotalCoinsMined,TotalCoinSupply
0,42,42 Coin,Scrypt,PoW/PoS,4.199995e+01,42
1,365,365Coin,X11,PoW/PoS,,2300000000
2,404,404Coin,Scrypt,PoW/PoS,1.055185e+09,532000000
3,611,SixEleven,SHA-256,PoW,,611000
4,808,808,SHA-256,PoW/PoS,0.000000e+00,0
...,...,...,...,...,...,...
1243,SERO,Super Zero,Ethash,PoW,,1000000000
1244,UOS,UOS,SHA-256,DPoI,,1000000000
1245,BDX,Beldex,CryptoNight,PoW,9.802226e+08,1400222610
1246,ZEN,Horizen,Equihash,PoW,7.296538e+06,21000000


In [None]:
# Finding the rows that have null values
for column in isTrading_dropped.columns:
    print(f"{column})
# Removing all rows that have at least one null value.


In [None]:
# Filter for cryptocurrencies that have been mined. That is, the total coins mined should be greater than zero.


In [None]:
# In order for your dataset to be comprehensible to a machine learning algorithm, its data should be numeric. 

In [None]:
# Deleting the `CoinName` from the original dataframe.


In [None]:
# Convert the remaining features with text values, `Algorithm` and `ProofType`, into numerical data. 
# Use Pandas to create dummy variables. Examine the number of rows and columns of your dataset now. How did they change?


In [None]:
# Standardize your dataset so that columns that contain larger values do not unduly influence the outcome.



### Dimensionality Reduction

In [None]:
# Creating dummy variables above dramatically increased the number of features in your dataset. 
# Perform dimensionality reduction with PCA. 
# Rather than specify the number of principal components when you instantiate the PCA model, it is possible to state the desired ##explained variance##. 
# For this project, preserve 90% of the explained variance in dimensionality reduction. How did the number of the features change?


In [None]:
# Reduce the dataset dimensions with t-SNE and visually inspect the results. 
# Run t-SNE on the principal components: the output of the PCA transformation. 


In [None]:
# Create a scatter plot of the t-SNE output. Observe whether there are distinct clusters or not.

In [None]:
### Cluster Analysis with k-Means
# Create an elbow plot to identify the best number of clusters. 
# Use a for-loop to determine the inertia for each `k` between 1 through 10. Determine, if possible, where the elbow of the plot is, and at which value of `k` it appears.



In [None]:
### Recommendation
# Based on your findings, make a brief (1-2 sentences) recommendation to your clients. Can the cryptocurrencies be clustered together? If so, into how many clusters? 

