This project predicts the future price of Bitcoin using historical data on the price of Bitcoin and data from Wikipedia on the edits to the Bitcoin page. We can train a random forest model to tell us if Bitcoin prices will increase or decrease tomorrow. Then we use a gradient boosting model and improve the predictors to increase accuracy.
We'll develop a backtesting system and use a robust error metric so we can tell if the algorithm is performing well.
This project can be extended to other cryptocurrencies as well.
Project Steps
- Load in data
- Clean and merge data
- Create an initial machine learning model and estimate accuracy
- Switch to a more powerful model and improve our predictors
File overview:
Downloading Bitcoin Price Data.ipynb
- a Jupyter notebook that creates our wikipedia edit dataset.Prep data for Machine Learning_Training Baseline ML Mode.ipynb
- a Jupyter notebook that contains the code to predict Bitcoin prices
To follow this project, please install the following locally:
- JupyerLab
- Python 3.8+
- Python packages
- pandas
- yfinance
- scikit-learn
- xgboost
- mwclient
- transformers
Computing the Wikipedia edit data takes time. It can be faster to use the version that's already been generated. It's in this repository, and called wikipedia_edits.csv
. Feel free to download and use the file. You can also get it from here.