# üöÄ BTC Price Prediction Pipeline - Colab Edition

This notebook allows you to run the full BTC Price Prediction pipeline in Google Colab.

## Features
- üì• **Data Collection**: Fetch data from YFinance, CoinGecko, and Social Media.
- ‚öôÔ∏è **Processing**: Clean and engineers features.
- üß† **Training**: Train LSTM, Transformer, and Ensemble models.
- üìä **Evaluation**: Visualize predictions and metrics.


## 1. Setup & Installation

In [None]:
# @title Clone Repository
import os

# Clone the repository (replace with your repo URL if different)
REPO_URL = "https://github.com/bimoBintang/btc-price-prediction-hybrid-lstm-sentiment-crispdm.git"
REPO_NAME = "btc-price-prediction-hybrid-lstm-sentiment-crispdm"

if not os.path.exists(REPO_NAME):
    !git clone $REPO_URL
else:
    print("Repository already cloned!")

# Change working directory
os.chdir(REPO_NAME)
print(f"Current working directory: {os.getcwd()}")

In [None]:
# @title Install Dependencies
!pip install -r requirements.txt
# Install snscrape from git to fix Python 3.12 compatibility issues
!pip install git+https://github.com/JustAnotherArchivist/snscrape.git
!pip install yfinance ta textblob twscrape

## 2. Configuration
Set up your environment variables and API keys.

In [None]:
# @title Set API Keys
import os
from google.colab import userdata

# Try getting from Colab secrets first, else prompt
try:
    os.environ['COINGECKO_API_KEY'] = userdata.get('COINGECKO_API_KEY')
    print("Loaded COINGECKO_API_KEY from secrets")
except:
    print("COINGECKO_API_KEY not found in secrets (Optional)")

# Create a .env file for the pipeline to use
with open(".env", "w") as f:
    f.write(f"COINGECKO_API_KEY={os.environ.get('COINGECKO_API_KEY', '')}\n")
    # Add other keys as needed
    
print(".env file created successfully")

## 3. Run Pipeline
Execute the pipeline steps separately or all at once.

In [None]:
import sys
import pandas as pd
import matplotlib.pyplot as plt

# Add src to path
sys.path.append(os.path.join(os.getcwd(), 'src'))

from src.pipeline.data_collection.data_collector import DataCollector
from src.pipeline.data_processing.data_processor import DataProcessor
from src.pipeline.models.ensemble_model import EnsembleModel
from src.pipeline.evaluation.model_evaluator import ModelEvaluator

In [None]:
# @title Step 1: Data Collection
DAYS_TO_COLLECT = 365 # @param {type:"integer"}

print("Starting Data Collection...")
collector = DataCollector()
# We'll focus on Price data for the demo to ensure stability without complex API keys
price_data = collector.collect_price_data(days=DAYS_TO_COLLECT)
print(f"Collected {len(price_data['historical_ohlcv'])} price records")

# Convert dictionary to dataframe if needed for visualization
df_price = pd.DataFrame(price_data['historical_ohlcv'])
df_price.plot(y='close', title='BTC Price History', figsize=(10, 5))
plt.show()

In [None]:
# @title Step 2: Data Processing
print("Starting Data Processing...")
processor = DataProcessor()

# Normally we would merge sentiment here, but we'll proceed with price-only for baseline
processed_df = processor.process_full_pipeline(
    price_data=df_price,
    sentiment_data=None, # Optional
    add_targets=True
)

print(f"Processed Data Shape: {processed_df.shape}")
processed_df.tail()

In [None]:
# @title Step 3: Model Training (Ensemble)
import torch
from src.pipeline.models.ensemble_model import EnsembleModel, EnsembleTrainer
from src.pipeline.models.lstm_gru_model import create_sequences, create_data_loaders

# Prepare Data
SEQ_LENGTH = 30
features = [c for c in processed_df.columns if c not in ['target', 'date']]
X = processed_df[features].values
y = processed_df['target'].values.reshape(-1, 1)

X_seq, y_seq = create_sequences(X, y, seq_length=SEQ_LENGTH)
train_loader, val_loader = create_data_loaders(X_seq, y_seq)

print(f"Training with {len(features)} features")

# Initialize Model
model = EnsembleModel(
    input_size=len(features),
    ensemble_method='weighted'
)

# Train
trainer = EnsembleTrainer(model)
history = trainer.train(train_loader, val_loader, epochs=10)

# Plot Loss
plt.figure(figsize=(10, 5))
plt.plot(history['train_loss'], label='Train Loss')
plt.plot(history['val_loss'], label='Validation Loss')
plt.title('Training History')
plt.legend()
plt.show()

In [None]:
# @title Step 4: Evaluation
model.eval()
device = next(model.parameters()).device

with torch.no_grad():
    X_val_tensor = torch.FloatTensor(X_seq).to(device)
    predictions = model.predict(X_seq)

# Plot Predictions vs Actual
plt.figure(figsize=(15, 7))
plt.plot(y_seq, label='Actual', alpha=0.7)
plt.plot(predictions, label='Predicted', alpha=0.7)
plt.title('BTC Price Prediction - Ensemble Model')
plt.legend()
plt.show()