# KNN Algorithm for Bitcoin Price Direction Prediction

This notebook implements a K-Nearest Neighbors (KNN) classifier to predict Bitcoin price direction (Up/Down) using treasury and sentiment data.

**Methodology:**
- Load Bitcoin sentiment dataset with treasury indicators
- Create binary classification target: price_direction (Up/Down) by comparing current close with previous day's close
- Remove OHLC features (open, high, low, close) to prevent data leakage
- Normalize numeric features for KNN distance calculations
- Use sklearn's KNeighborsClassifier (standard KNN implementation)
- Perform stratified train/test split (33% test size)
- Evaluate with accuracy, confusion matrix, and per-class metrics


In [None]:
# Import required libraries
import pandas as pd
import numpy as np
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
import random

# Import mysklearn evaluation functions
from mysklearn.myevaluation import (
    stratified_train_test_split,
    confusion_matrix,
    accuracy_score,
    binary_precision_score,
    binary_recall_score,
    binary_f1_score
)

# Set random seed for reproducibility
random.seed(0)
np.random.seed(0)

print("Libraries imported successfully")
print("=" * 70)


## Step 1: Load and Prepare Dataset


In [None]:
# Load the bitcoin sentiment dataset
df = pd.read_csv('input_data/bitcoin_sentiment.csv')

print("Dataset Shape:")
print(f"  Rows: {df.shape[0]}")
print(f"  Columns: {df.shape[1]}")
print()
print("Column Headers:")
print(df.columns.tolist())
print()
print("First 5 Rows:")
print(df.head())
print()


## Step 2: Create Price Direction Target


In [None]:
# Create the price_direction label by comparing close with previous day's close
df['price_direction'] = 'Down'  # Default value

# Compare current close with previous close
for i in range(1, len(df)):
    if df.loc[i, 'close'] > df.loc[i-1, 'close']:
        df.loc[i, 'price_direction'] = 'Up'
    else:
        df.loc[i, 'price_direction'] = 'Down'

# Remove the first row (no previous day to compare)
df = df.iloc[1:].copy()
df = df.reset_index(drop=True)

print("Price Direction Distribution:")
print(df['price_direction'].value_counts())
print()
print("Label Proportions:")
print(df['price_direction'].value_counts(normalize=True))
print()
print(f"Total instances: {len(df)}")
print()


## Step 3: Remove Data Leakage Features

Remove OHLC (open, high, low, close) features to prevent temporal leakage. The model should predict using only volume, sentiment, and macroeconomic indicators.


In [None]:
# Features to drop: OHLC prices and metadata columns
columns_to_drop = [
    'timestamp', 'open', 'high', 'low', 'close',  # Price data (leakage)
    'datetime_utc', 'merge_date',  # Date metadata
    'sentiment_missing',  # Missing indicator (not a feature)
    'price_direction'  # Target variable (will be extracted separately)
]

# Create feature dataframe
df_features = df.drop(columns=columns_to_drop, errors='ignore')

print("Features after removing leakage columns:")
print(f"  Remaining columns: {df_features.shape[1]}")
print(f"  Feature names: {df_features.columns.tolist()}")
print()
print("Data types:")
print(df_features.dtypes)
print()


## Step 4: Prepare Data for KNN

KNN requires numeric features. Convert all features to numeric and normalize.


In [None]:
# Convert all features to numeric (KNN requires numeric input)
# Handle any non-numeric columns
for col in df_features.columns:
    if df_features[col].dtype == 'object':
        # Try to convert to numeric
        df_features[col] = pd.to_numeric(df_features[col], errors='coerce')
        # Fill any NaN values created during conversion with 0
        df_features[col] = df_features[col].fillna(0)

# Ensure all columns are numeric
df_features = df_features.select_dtypes(include=[np.number])

print("Numeric features prepared:")
print(f"  Number of features: {df_features.shape[1]}")
print(f"  Feature names: {df_features.columns.tolist()}")
print()
print("Feature statistics (before normalization):")
print(df_features.describe().round(4))
print()
