Skip to content

Tutor task172 spring2025 ingest and analyze bitcoin prices using apache flink#200

Closed
tharun2k1 wants to merge 10 commits into
masterfrom
TutorTask172_Spring2025_Ingest_and_Analyze_Bitcoin_Prices_Using_Apache_Flink
Closed

Tutor task172 spring2025 ingest and analyze bitcoin prices using apache flink#200
tharun2k1 wants to merge 10 commits into
masterfrom
TutorTask172_Spring2025_Ingest_and_Analyze_Bitcoin_Prices_Using_Apache_Flink

Conversation

@tharun2k1
Copy link
Copy Markdown
Collaborator

@tharun2k1 tharun2k1 commented Apr 11, 2025

Link to issue: Issue#172
Lind to folder : #folder
link to README.md
Project Title: Real-Time Bitcoin Price Analysis using PyFlink and InfluxDB

Overview:
This project demonstrates an end-to-end real-time Bitcoin price analysis pipeline. It streams live prices from the CoinGecko API, logs metrics to InfluxDB, fetches historical data for forecasting, and visualizes future trends using NeuralProphet.

Initail setup(Prerequisites)

Step 1: Clone the Repository
git clone https://github.com/[your-username]/tutorials.git
cd tutorials/DATA605/Spring2025/projects/TutorTask172_Spring2025_Ingest_and_Analyze_Bitcoin_Prices_Using_Apache_Flink/tutorial_template/docker_data605_style

Step 2: Docker Installation (If Not Installed)
Make sure Docker is installed.

Step 3: Build the Container Environment
Make the shell scripts executable and run the build script to build your custom Docker image:
chmod +x docker_*.sh
./docker_build.sh

Key Features:

🔁 Real-time Bitcoin price fetching with rolling metrics (MA, EMA, StdDev, etc.)

📥 Data storage in InfluxDB with Dockerized setup

📈 Forecasting using NeuralProphet with historical Yahoo Finance data

📊 Visual plots and seasonality analysis

✅ Clean modular structure using bitcoin_utils.py API

What's Included:

bitcoin_utils.py: Main API logic

bitcoin.API.ipynb: Shows how to use the streaming API

bitcoin.Fetch.ipynb: Full historical fetch → forecast → visualization pipeline

bitcoin.API.md and bitcoin.fetch.md: Technical documentation

docker-compose.yml, Dockerfile, .env: Complete deployment stack

README.md: Full setup guide with run instructions and explanation

Setup Requirements:

  • Docker + Docker Compose
  • Token setup via InfluxDB UI on first run
  • Port availability: 8086 (InfluxDB), 8888 (Jupyter)

How to Run:
Described step-by-step in README.md (includes how to generate token, restart containers, and launch notebooks).

Why we use two Docker containers for clean separation of concerns:

  • influxdb_container: Runs the InfluxDB service to store time-series data.
  • umd_data605_app: Runs the application (Python + Jupyter + PyFlink) that fetches Bitcoin prices and sends metrics to InfluxDB.

Keeping them separate ensures:

  • Each container has a single responsibility.
  • Easier debugging, scaling, and maintenance.
  • Flexibility to replace or upgrade one service without touching the other.

Why Docker Network

Docker containers are isolated by default. To allow them to communicate (e.g., the app pushing data into InfluxDB), we connect them using a custom bridge network (flink_influx_network):
This makes sure:

  • The app can reach InfluxDB at http://influxdb_container:8086 (container name acts like a hostname).
  • Both services remain discoverable to each other but isolated from the host unless explicitly exposed.

Why Set Up InfluxDB and Generate Tokens

  • InfluxDB 2.x uses token-based authentication for secure access.
  • On first-time setup, we must:
  • Run only the InfluxDB container.
  • Open http://localhost:8086 and manually:
  • Create an admin user, org, and bucket.
  • Generate an All-Access Token.
  • This token is needed so the app container can authenticate and write metrics to the InfluxDB service securely.
  • Once the token is created:
  • We store it in a .env file.
  • It is automatically injected into the app via docker-compose.yml

Docker Components
-Dockerfile – Sets up Python + Flink + all dependencies
-docker-compose.yml – Runs two containers: one for your app and one for InfluxDB
-.env file – Stores the secure InfluxDB token and config variables


Tools & Technologies Used:

  • Python 3.8 – Main programming language
  • Docker & Docker Compose – Containerized environment and multi-service orchestration
  • Jupyter Notebook – For running interactive analysis and forecasts
  • InfluxDB 2.0 – Time-series database to store BTC price and stats
  • Apache Flink (PyFlink) – For real-time Bitcoin price streaming and processing

Python Libraries used

  • neuralprophet – Forecasting future Bitcoin prices
  • influxdb-client – Writing to InfluxDB from Python
  • pandas, numpy – Data manipulation and computation
  • yfinance – Historical BTC data from Yahoo Finance
  • requests – API call to CoinGecko for live price
  • matplotlib – Plotting and visualization

🌐 APIs Used:

  • CoinGecko API – Real-time Bitcoin price
  • Yahoo Finance (via yfinance) – Historical BTC price data

Copy link
Copy Markdown
Contributor

@tkpratardan tkpratardan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

Keep working in this branch. We can merge a checkpoint after some progress has been made.

this contain changes to bitcoin_flink_job.py and other docker changes
@tharun2k1 tharun2k1 requested a review from Prahar08modi April 27, 2025 02:00
tharun2k1 added 2 commits May 17, 2025 22:41
… InfluxDB

Included:
- `bitcoin_utils.py`: Core API for fetching BTC prices, computing metrics, writing to InfluxDB
- `bitcoin.API.ipynb`: Demonstrates API class usage (streaming 10 data points)
- `bitcoin.Fetch.ipynb`: Full pipeline including historical data, NeuralProphet forecast, plots
- `docker-compose.yml`: Orchestrates InfluxDB and Jupyter app in a shared Docker network
- `Dockerfile`: Builds the Python environment for Jupyter + required packages
- `.env`: Stores runtime InfluxDB token and config (excluded token value for security)
- `bitcoin.API.md`: Markdown docs explaining API functionality and usage
- `bitcoin.fetch.md`: Markdown documentation for full pipeline demo (stream + forecast)
- `README.md`: Complete setup, run, and usage guide with step-by-step instructions

Excluded:
- `.DS_Store`, `.ipynb_checkpoints/`, `build logs`, and other autogenerated/OS-specific files (via `.gitignore`)

 Ready for instructor review and reproduction on any system with Docker installed.
@tharun2k1
Copy link
Copy Markdown
Collaborator Author

Hi @tkpratardan @gpsaggese @Prahar08modi ,

I have completed the project. Kindly review it when you get a chance.

Issue ID: #72
PR Link: #PR

@tharun2k1 tharun2k1 requested a review from tkpratardan May 21, 2025 16:27
tkpratardan and others added 4 commits June 12, 2025 02:34
…es_Using_Apache_Flink' of github.com:causify-ai/tutorials into TutorTask172_Spring2025_Ingest_and_Analyze_Bitcoin_Prices_Using_Apache_Flink
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants