# Bitcoin Market & News Analytics: Data Engineering Pipeline

## Overview

This technical assessment evaluates your data engineering skills through the implementation of a dual-pipeline system for Bitcoin market data and news analytics. You'll build a complete data processing workflow following the medallion architecture pattern (Bronze-Silver-Gold).

## Project Requirements

### Data Sources
- **Real-time Data**: Bitcoin price data from any public cryptocurrency API
- **Batch Data**: Bitcoin news CSV (provided at `data/news_btc.csv`)

### Pipeline Architecture
Implement a Bronze-Silver-Gold medallion architecture with the following layers:

#### Bronze Layer 
- Raw data ingestion without modification
- Schema validation for consistency

#### Silver Layer
- Data cleaning and standardization
- Timestamp normalization
- Missing value handling

#### Gold Layer
- Feature engineering (price indicators, news sentiment)  
- Aggregations and metrics calculation
- Data quality testing

### Technical Components
- Real-time pipeline with configurable update frequency
- Batch processing for news data analysis
- SQLite/PostgreSQL database for persistent storage
- Comprehensive logging and error handling
- Documentation of code and architecture decisions

## Tasks

### Task 1: Configure Data Ingestion
- Set up API connection for Bitcoin prices
- Import news CSV to dataframe
- Create database schema for both data types

### Task 2: Build Bronze → Silver → Gold Pipeline
- Implement real-time data processing
- Process historical news data
- Add data quality checks at each stage

### Task 3: Create Analysis Components
- Calculate technical indicators for price data
- Extract sentiment scores from news headlines 
- Correlate price movements with news sentiment

## Deliverables (GH Repository)
- A GitHub repository containing the following:

    - Fully functional Python code in Jupyter notebook
    - SQLite database with processed data tables
    - Brief documentation explaining your approach
    - README file with instructions to run the pipeline
    - Bonus:  Do a sample strategy using the processed data

## Evaluation Criteria

Your solution will be evaluated based on:
- Code quality and organization
- Pipeline architecture and robustness
- Error handling and logging implementation
- Documentation clarity
- Technical approach to data processing
- Working end-to-end pipeline