# Data Preprocessing

In [1]:
import pandas as pd
import requests
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Function to fetch cryptocurrency data from Binance API
def fetch_cryptocurrency_data(symbol, interval, limit):
    base_url = "https://api.binance.com/api/v3/klines"
    params = {
        "symbol": symbol + "USDT",
        "interval": interval,
        "limit": limit
    }
    response = requests.get(base_url, params=params)
    data = response.json()
    return data

# Function to append latest data to a CSV file
def append_latest_data_to_csv(file_path, symbol, interval):
    data = fetch_cryptocurrency_data(symbol, interval, 1)
    latest_data = data[0]
    df = pd.read_csv(file_path)
    new_row = [latest_data[0], latest_data[1], latest_data[2], latest_data[3], latest_data[4], latest_data[5]]
    df.loc[len(df)] = new_row
    df.to_csv(file_path, index=False)

# Function to update master dataset
def update_master_dataset(file_path, symbol, interval):
    append_latest_data_to_csv(file_path, symbol, interval)


# Model Training and Evaluation

## Random Forest Regressor
Random Forest is an ensemble learning method that combines multiple decision trees to make predictions. It's effective for cryptocurrency price prediction due to its ability to handle non-linear relationships and capture complex interactions between features. Random Forest can work well with a variety of input features, including historical price data, trading volume, technical indicators, and sentiment analysis.


In [2]:
# Function to train a Random Forest model and predict next day's price
# Function to train a Random Forest model and predict next day's price
def predict_next_day_price(file_path):
    df = pd.read_csv(file_path)
    df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')

    # Calculate daily price change
    df['price_change'] = df['close'].diff(-1)
    df.dropna(subset=['price_change'], inplace=True)

    # Prepare features and target
    X = df[['open', 'high', 'low', 'volume']]
    y = df['price_change']

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train a Random Forest model
    rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
    rf_model.fit(X_train, y_train)
    y_rf_pred = rf_model.predict(X_test)
    mse_rf = mean_squared_error(y_test, y_rf_pred)
    print(f"Random Forest Mean Squared Error: {mse_rf:.2f}")

    # Make prediction for the next day's price change
    latest_features = X.tail(1)
    predicted_price_change = rf_model.predict(latest_features)[0]

    # Calculate the next day's predicted price
    latest_close = df['close'].iloc[-1]
    predicted_next_day_price = latest_close + predicted_price_change

    return predicted_next_day_price

# Example usage
update_master_dataset("btc_data.csv", "BTC", "1d")  # Update the dataset
predicted_price = predict_next_day_price("btc_data.csv")
print(f"The predicted price for the next day for BTC is ${predicted_price:.2f}")

Random Forest Mean Squared Error: 1820400.89
The predicted price for the next day for BTC is $26019.84


## Gradient Boosting
Gradient Boosting is another ensemble learning technique that builds an additive model in a forward stage-wise manner. Models like XGBoost (Extreme Gradient Boosting) and LightGBM (Light Gradient Boosting Machine) are popular implementations of this approach. Gradient Boosting can capture both linear and non-linear relationships in the data, making it suitable for predicting cryptocurrency prices using various features.

In [3]:
from sklearn.ensemble import GradientBoostingRegressor
def predict_next_day_price(file_path):
    df = pd.read_csv(file_path)
    df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')

    # Calculate daily price change
    df['price_change'] = df['close'].diff(-1)
    df.dropna(subset=['price_change'], inplace=True)

    # Prepare features and target
    X = df[['open', 'high', 'low', 'volume']]
    y = df['price_change']

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train a Gradient Boosting model
    gb_model = GradientBoostingRegressor(n_estimators=100, random_state=42)
    gb_model.fit(X_train, y_train)
    y_gb_pred = gb_model.predict(X_test)
    mse_gb = mean_squared_error(y_test, y_gb_pred)
    print(f"Gradient Boosting Mean Squared Error: {mse_gb:.2f}")

    # Make prediction for the next day's price change
    latest_features = X.tail(1)
    predicted_price_change = gb_model.predict(latest_features)[0]

    # Calculate the next day's predicted price
    latest_close = df['close'].iloc[-1]
    predicted_next_day_price = latest_close + predicted_price_change

    return predicted_next_day_price

# Example usage
update_master_dataset("btc_data.csv", "BTC", "1d")  # Update the dataset
predicted_price = predict_next_day_price("btc_data.csv")
print(f"The predicted price for the next day for BTC is ${predicted_price:.2f}")

Gradient Boosting Mean Squared Error: 1589303.91
The predicted price for the next day for BTC is $25934.99


## Support Vector Regression (SVR)
SVR is a machine learning model that aims to find the best-fitting hyperplane that predicts target values within a certain margin of error. It's useful for cryptocurrency price prediction when dealing with datasets where the relationships between features and prices are not straightforward. SVR can handle both linear and non-linear relationships by using kernel functions to map data into higher-dimensional spaces.

In [4]:
from sklearn.svm import SVR
def predict_next_day_price(file_path):
    df = pd.read_csv(file_path)
    df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')

    # Calculate daily price change
    df['price_change'] = df['close'].diff(-1)
    df.dropna(subset=['price_change'], inplace=True)

    # Prepare features and target
    X = df[['open', 'high', 'low', 'volume']]
    y = df['price_change']

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train an SVR model
    svr_model = SVR(kernel='linear')
    svr_model.fit(X_train, y_train)
    y_svr_pred = svr_model.predict(X_test)
    mse_svr = mean_squared_error(y_test, y_svr_pred)
    print(f"SVR Mean Squared Error: {mse_svr:.2f}")

    # Make prediction for the next day's price change
    latest_features = X.tail(1)
    predicted_price_change = svr_model.predict(latest_features)[0]

    # Calculate the next day's predicted price
    latest_close = df['close'].iloc[-1]
    predicted_next_day_price = latest_close + predicted_price_change

    return predicted_next_day_price

# Example usage
update_master_dataset("btc_data.csv", "BTC", "1d")  # Update the dataset
predicted_price = predict_next_day_price("btc_data.csv")
print(f"The predicted price for the next day for BTC is ${predicted_price:.2f}")

SVR Mean Squared Error: 17871876.23
The predicted price for the next day for BTC is $24737.50


# Results and Discussion:

The models were evaluated based on their MSE values, which provide insight into how closely the predicted values align with the actual values. The results are as follows:

#### Random Forest Regressor:
Mean Squared Error: 1820400.89
Predicted Price for Next Day: $26019.84

#### Gradient Boosting Regressor:
Mean Squared Error: 1589303.91
Predicted Price for Next Day: $25934.99

#### Support Vector Regressor (SVR):
Mean Squared Error: 17871876.23
Predicted Price for Next Day: $24737.50



# Conclusion and Recommendations:
Based on the evaluation results, the Gradient Boosting Regressor exhibited the lowest Mean Squared Error among the three models, closely followed by the Random Forest Regressor. These two models demonstrate their ability to capture the underlying patterns in cryptocurrency price data and make accurate predictions.

While the SVR model had the highest MSE, its performance could potentially improve with hyperparameter tuning or by incorporating additional features. However, the SVR model's current performance suggests that it might not be the best choice for predicting cryptocurrency prices in this context.

## Final Recommendation
For predicting cryptocurrency prices for the next day, it is recommended to use the Gradient Boosting Regressor due to its relatively lower Mean Squared Error and consistent predictive performance. However, it's essential to continuously monitor the model's performance as the cryptocurrency market's dynamics can change rapidly.