Objective - to simulate advisor in a major financial advisory firm. Scenario - a novel approach to assembling cryptocurrency based investment portfolios utilizing atypical factors that may impact market and result in better performance is proposed.
Product - combine financial Python programming with unsupervised learning to create a Jupyter notebook that clusters cryptocurrencies by their performance in different time periods. Provide visualization/s, i.e. plot the results to demonstrate performance to the board.
Import the Data from CSV file (provided in the starter code)
Read the “crypto_market_data.csv” file from the Resources folder into a DataFrame
Prepare the Data
Generate the summary statistics, and use HvPlot to visualize your data inspect DataFrame contents
Find the Best Value for k Using the Original Data
Cluster Cryptocurrencies with K-means Using the Original Data
Optimize Clusters with Principal Component Analysis (PCA)
Find the Best Value for k Using the Principal Component Analysis (PCA) Data
Cluster the Cryptocurrencies with K-means Using the Principal Component Analysis (PCA) Data
Visualize and Compare the Results
Supplemental processing and analysis:
Beyond the scope of the assignment, the author sought to conduct additional analysis of the data obtained. Supplemental and/or extra analysis beyond the scope of the project is noted as 'supplemental' were approrpiate.
Elbow Method and Silhouette Analysis - distortion elbow and silhouette plots were supplemented for the analysis
Supplemental CSV file was downloaded from Kaggle, 'Top 100 Cryptocurrency 2022' Kaggle
This project leverages Jupyter Lab v3.4.4 and python v3.9.13 with the following packages:
-
pandas - software library written for the python programming language for data manipulation and analysis.
-
hvplot - provides a high-level plotting API built on HoloViews that provides a general and consistent API for plotting data into numerous formats listed within linked documentation.
-
Path - from pathlib - Object-oriented filesystem paths, Path instantiates a concrete path for the platform the code is running on.
-
K-Means - From scikitlearns cluster, K-Means clustering is one of the most widely used unsupervised machine learning algorithms that form clusters of data based on the similarity between data instances.
-
PCA - From scikitlearns decomposition, principal component analysis (PCA); linear dimensionality reduction using Singular Value Decomposition(SVD) of the data to project it to a lower dimensional space, input data is centered but not scaled for each feature before applying the SVD.
-
StandardScaler - From scikitlearns preprocessing, standardize features by removing the mean and scaling to unit variance.
For additional and / or supplemental processing and visulaization this project also makes use of the following packages:
-
matplotlib.pyplot - Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python; matplotlib.pyplot is a collection of functions that make matplotlib work like MATLAB
-
seaborn - Software library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures.
-
KElbowVisualizer - from Yellowbrick, implements the “elbow” method to help data scientists select the optimal number of clusters by fitting the model with a range of values for K.
-
SilhouetteVisualizer - from Yellowbrick, utilize Silhouette Coefficient when the dataset ground-truth is unknown, computes the density of clusters modeled, score is computed by averaging the silhouette coefficient for each sample, computed as the difference between the average intra-cluster distance and the mean nearest-cluster distance for each sample, normalized by the maximum value. This produces a score between 1 and -1, where 1 is highly dense clusters and -1 is completely incorrect clustering.
-
html5lib - a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.
MacBook Pro (16-inch, 2021)
Chip Appple M1 Max
macOS Monterey version 12.6
Homebrew 3.5.10
Homebrew/homebrew-core (git revision 0b6b6d9004e; last commit 2022-08-30)
Homebrew/homebrew-cask (git revision 63ae652861; last commit 2022-08-30)
anaconda Command line client 1.10.0
conda 22.9.0
Python 3.9.13
pip 22.2.2 from /opt/anaconda3/envs/jupyterlab_env/lib/python3.9/site-packages/pip (python 3.9)
git version 2.37.2
In the terminal, navigate to directory where you want to install this application from the repository and enter the following command
git clone git@github.com:Billie-LS/ex_machina_crypto_learn.git
Recommended operation via virtual environment, environment created and parameters used as below:
> conda create -n <name_env> python=3.9 anaconda
> conda activate <name_env>
> pip install fire
> pip install questionary
> conda update jupyterlab
> pip install python-dotenv
> pip install alpaca-trade-api
> conda install -c pyviz hvplot geoviews
> conda update conda
> conda update SQLAlchemy
> conda install -c conda-forge voila
> pip install pandas_datareader
> pip install yellowbrick
From terminal, the installed application is run through jupyter lab web-based interactive development environment (IDE) interface by typing at prompt:
> jupyter lab
Version control can be reviewed at:
https://github.com/Billie-LS/ex_machina_crypto_learn
Loki 'billie' Skylizard LinkedIn @GitHub
Vinicio De Sola LinkedIn @GitHub
Santiago Pedemonte LinkedIn @GitHub
Stratis Gavnoudias LinkedIn @GitHub
Roberto Salazar LinkedIn @GitHub
Mounika Mamindla LinkedIn
Charles Twitchell LinkedIn @GitHub
MIT License
Copyright (c) [2022] [Loki 'billie' Skylizard]
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.