# Bitcoin Price Prediction — Tutorial Notebook

This tutorial walks through an end-to-end example that:

1. Installs required packages (if missing)
2. Downloads **real** Bitcoin (BTC-USD) historical data from Yahoo Finance
3. Creates time-series features (lags and moving averages)
4. Trains a simple Linear Regression model to predict next-day close
5. Evaluates and plots actual vs predicted prices using Matplotlib

> **Note:** This notebook downloads live data from Yahoo Finance. Make sure your computer is online.

## Step 0 — Install required packages

This cell will install packages if they aren't already present in your Anaconda environment. It's safe to run; pip will skip packages that are already installed.

In [None]:
# Install dependencies (safe to run even if packages already installed)
import sys
import subprocess

def install(pkg):
    subprocess.check_call([sys.executable, "-m", "pip", "install", pkg])

packages = ["pandas", "numpy", "matplotlib", "scikit-learn", "yfinance"]
for p in packages:
    try:
        __import__(p)
    except Exception:
        print(f"Installing {p} ...")
        install(p)

print("All required packages are installed (or already present).")

## Step 1 — Load real Bitcoin data from Yahoo Finance

We'll download historical daily BTC-USD closing prices. By default the notebook pulls 5 years of data.

In [None]:
import yfinance as yf
import pandas as pd

def load_bitcoin_data(period='5y', ticker='BTC-USD'):
    print(f"Downloading {ticker} data for period={period} ...")
    df = yf.download(ticker, period=period, progress=False)
    if df.empty:
        raise RuntimeError("No data downloaded. Check your internet connection or ticker symbol.")
    df = df[["Close", "Volume"]].copy()
    df.index = pd.to_datetime(df.index)
    print(f"Downloaded {len(df)} rows. Date range: {df.index.min().date()} to {df.index.max().date()}")
    return df

df = load_bitcoin_data(period='5y')
df.head()

## Step 2 — Feature engineering

Create lag features and moving averages. We'll predict the next day's closing price (target = close shifted -1).

In [None]:
def create_features(df):
    df = df.copy()
    df = df.sort_index()
    df['close'] = df['Close']
    df['lag_1'] = df['close'].shift(1)
    df['lag_2'] = df['close'].shift(2)
    df['return_1'] = df['close'].pct_change(1)
    df['ma_7'] = df['close'].rolling(7).mean()
    df['ma_14'] = df['close'].rolling(14).mean()
    df['vol_7'] = df['Volume'].rolling(7).mean()
    df['target'] = df['close'].shift(-1)  # next-day close
    df = df.dropna()
    return df

feat = create_features(df)
feat[['close','lag_1','lag_2','ma_7','ma_14','vol_7','target']].head()

## Step 3 — Train a simple model

We'll use a Linear Regression model. The data is split chronologically (first 80% train, last 20% test).

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np

FEATURES = ['close','lag_1','lag_2','ma_7','ma_14','vol_7']

def train_model(df):
    X = df[FEATURES].values
    y = df['target'].values
    split = int(len(df) * 0.8)
    X_train, X_test = X[:split], X[split:]
    y_train, y_test = y[:split], y[split:]
    model = LinearRegression()
    model.fit(X_train, y_train)
    preds = model.predict(X_test)
    mae = mean_absolute_error(y_test, preds)
    mse = mean_squared_error(y_test, preds)
    r2 = r2_score(y_test, preds)
    print(f"Test MAE: {mae:.4f}, MSE: {mse:.4f}, R2: {r2:.4f}")
    # return predictions aligned with test index
    test_index = df.index[split:]
    return model, pd.Series(preds, index=test_index), pd.Series(y_test, index=test_index)

model, y_pred, y_true = train_model(feat)

## Step 4 — Plot actual vs predicted prices

This plot shows predicted next-day close vs actual.

In [None]:
import matplotlib.pyplot as plt

def plot_actual_vs_predicted(actual, predicted, figsize=(12,5), out=None):
    plt.figure(figsize=figsize)
    plt.plot(actual.index, actual.values, label='Actual', linewidth=2)
    plt.plot(predicted.index, predicted.values, label='Predicted', linewidth=2)
    plt.xlabel('Date')
    plt.ylabel('Price (USD)')
    plt.title('Bitcoin: Actual vs Predicted Close (next-day prediction)')
    plt.legend()
    plt.tight_layout()
    if out:
        plt.savefig(out)
        print(f"Saved plot to {out}")
    else:
        plt.show()

plot_actual_vs_predicted(y_true, y_pred)

In [None]:
# Uncomment to save model
# import joblib
# joblib.dump(model, 'models/linear_model.joblib')
# print('Model saved to models/linear_model.joblib')