CryptoPulse leverages data engineering practices to pipe data through anaytics pipeline to analyze cryptocurrency market sentiments and forecast trends on an interactive dashbaord. By integrating social media analysis with financial data, it provides actionable insights into crypto market dynamics. This project embodies a comprehensive end-to-end pipeline—from data ingestion and sentiment analysis to trend forecasting and interactive visualization—facilitating informed investment decisions in the volatile world of cryptocurrencies.
Note
- Unfortunately the project has not reached its end before the deadline, but will be continued. Therefore, not all components run in the cloud as intended but are at least containerized...
- The project has initally been developed locally to test and debug all components first before migrating to the cloud.
In a Nutshell the project comprises of the following components:
- bitcoin trend metrics of the last year from binance's api
- reddit text data from subreddit r/Cryptocurrency
- a metric from alternativeme that combines a bunch of data sources to compute the sentiment like metric called "fear and greed index"
- reddit data is being fetched from bigquery with spark, pushed through sentiment model and written back to bigquery with sentiment metrics
- same goes for binance data, where the closing price is set as the target variable to fine tune facebook's trend forecastig model called "Prophet" and predict the next 30 days. Then written bach to dataset.
- the interface runs queries against the data warehouse in bigquery and fetches all tables from the dataset to map the visualiaztions to the UI.
- possibility to intercatively use the resulting plots by:
-
selecting date ranges
-
pronounce or outcomment visual metric components
-
zoom in - zoom out, hover, click
<<<<<<<<...>>>>>>>> <<<<__DIAGRAM__>>>> <<<<<<<<...>>>>>>><
-
- The content below was the attempt to migrate the locally developed components to the cloud. Due to lack of time the code still needs to be debugged.
- Alternatively option "b) Playground" can alternatively be used to test the locally developed pipeline and play with the dashboard.
- by running:
git clone https://github.com/Ocean-code-1995/CryptoPulse-Sentiment-Trends
Use terraform to initialize cloud storage.
- create gcp credential, store
project_id
in newly created variables.tfvars file andproject-id.json in
../secrets/ for the credentials:
/>/>/> or run bash files after having set:
- PROJECT_ID
- BILLING_ACCOUNT_ID in config.sh --- placeholders --- and having created the json key for google cloud and stored in ../secrets/key.json
project_id = "---your-google-project-id---"
credentials = "../secrets/cryptopulse-secret.json"
region = "europe-west3"
zone = "europe-west3-b"
-
then run:
cd terraform
terraform init:
terraform validate
terraform apply -var-file="variables.tfvars"
.. and hope fingers crossed everything goes according to plan.
- In case of failure then please rather run
terraform init -reconfigure
first, after fixing and apply again.
-
shut down ressources with
terraform destroy
after project review.
To be able to access the data from reddit and binance, api credentials need to be initialized as listed below:
- store in: DataEng_Cyrpto_Sentiment/.env_vars
REDDIT_CLIENT_ID="-----------------------------"
REDDIT_CLIENT_SECRET="-----------------------------"
REDDIT_USER_AGENT="-----------------------------"
REDDIT_USERNAME="-----------------------------"
REDDIT_PASSWORD="-----------------------------"
BINANCE_APIKEY = "-----------------------------"
BINANCE_SECRET = "-----------------------------"
To fetch data from reddit the "python reddit api wrapper" is used to ethically get hold of text data and adhere to all requriements and rate limits as managed by the wrapper.
links:
- Create a Reddit Account: Sign up or log in at reddit.com.
- Create an App: Go to your app preferences and click on “Create App” or “Create Another App”.
- Fill Out the Form: Provide a name, select the "script" option, and describe your usage. Add http://localhost:8080 as the redirect URI.
- Get Credentials: Upon creation, you'll receive your client_id (just under the app name) and client_secret.
links
:- Create a Binance Account: Register at binance.com and complete any necessary identity verification.
- Enable Two-Factor Authentication (2FA): This is usually required to use the API.
- Create API Key: Navigate to the API Management section of your dashboard, label your new API key, and click “Create”.
- Store API Key and Secret: Note down the API Key and Secret provided; you’ll use these to interact with the Binance API.
Can immediately be fired with no worries, since there is no key creation etc required
Mage will be used as an pipeline orchestration tool. After having acquired the required credentials, the data loader will fetch the data. Its data will be passed to the transformer block to perform data cleaning, type casting and partitioning operations. Finally the formatted data hits exporter block to write the data to its respective big query tables
- run
bash build.sh
to fire containers for data ingestion pipeline and streamlit dashboard - access mage via
*localhost:6789
* in webbrowser (chrome recommended)
The fellowing two operations can optionally
be run locally in order to enrich the data by reading, processing and writing the data back to bigquery:
-
Reddit
:- enrich reddit text data with sentiment scores using a huggingface model out of the box.
- create conda env and install requirements
- run NLP/sentiment_main.py
-
Binance
:- Train and finetune Facebook's prophet model to get hold of simple trend forecasts for bitcoin's closing price.
- create conda env and install requirements
- run timeseries_forecasting/prophet2/main.py
- accessible through
localhost:8500
- the dashaboard aims at displaying the following charts if all data has been successfully loaded and enriched:
- candlestick chart for the bitcoin trend analytics including volume bars
- time series 1: positive sentiment score in %
- time series 2: fear and greed metrics in %
- closing price forecast (30 days ahead)
- donut chart 1: sentiment scores (%)
- donut chart 2: fear & greed index (%)
- donut chart 3: correlation of closing price vs fear and greed idx
Note:
- 1 to 3 with zoom in mechnism.
- 4 until 7 can be selected for current day or a custom date range
- However, if the data is not available it hits an except message.
- use playground to get the test pipeline and dashboard running as per instruction below:
- run:
git clone https://github.com/Ocean-code-1995/CryptoPulse-Sentiment-Trends
- store on Desktop !!!
- the local testing set up is stored in playground dir
To be able to access the data from reddit and binance, api credentials need to be initialized as listed below:
- store the credentials below in: DataEng_Cyrpto_Sentiment/.env_vars
REDDIT_CLIENT_ID="-----------------------------"
REDDIT_CLIENT_SECRET="-----------------------------"
REDDIT_USER_AGENT="-----------------------------"
REDDIT_USERNAME="-----------------------------"
REDDIT_PASSWORD="-----------------------------"
BINANCE_APIKEY = "-----------------------------"
BINANCE_SECRET = "-----------------------------"
To fetch data from reddit the "python reddit api wrapper" is used to ethically get hold of text data and adhere to all requriements and rate limits as managed by the wrapper.
links:
- Create a Reddit Account: Sign up or log in at reddit.com.
- Create an App: Go to your app preferences and click on “Create App” or “Create Another App”.
- Fill Out the Form: Provide a name, select the "script" option, and describe your usage. Add http://localhost:8080 as the redirect URI.
- Get Credentials: Upon creation, you'll receive your client_id (just under the app name) and client_secret.
links
:- Create a Binance Account: Register at binance.com and complete any necessary identity verification.
- Enable Two-Factor Authentication (2FA): This is usually required to use the API.
- Create API Key: Navigate to the API Management section of your dashboard, label your new API key, and click “Create”.
- Store API Key and Secret: Note down the API Key and Secret provided; you’ll use these to interact with the Binance API.
Can immediately be fired with no worries, since there is no key creation etc required
- cd into playground first
- conda create --name cryptopulse_env python=3.10
- conda activate cryptopulse
- pip install -r requirements.txt
- cd into data_acquisition/batch_processing/alternative_me
- run a_main.py
- cd into data_acquisition/batch_processing/binance
- run b_main.py
- cd into data_acquisition/batch_processing/reddit
- run r_main.py
After the data has been piped, the ml pipelines fetch the data again, ürocess and enrich it with predictions and finally write them back to the storage.
- cd into NLP
- python sentiment.main.py
- cd into timeseries_forecasting/prophet
- conda create --name prophet_env python=3.8
- conda activate prophet_env
- pip install -r requirements.txt
- python main.py
- cd into dashbaord
- conda create --name dashboard_env python=3.10
- conda activate dashboard_env
- pip install -r dashboard_requirements.txt
- python dashbaord_app.py
- streamlit run dashboard_app.py --server.port 8501
- open localhost:8502 in chrome preferably