**Question**: Can I predict future Bitcoin prices using historical data, and how accurate will my predictions be?

1. Visualize historical Bitcoin prices through line graphs and candlestick charts.
2. Prepare data for machine learning by creating features (lagged prices).
3. Apply a Random Forest model to predict future Bitcoin prices.
4. Evaluate the model's accuracy by comparing predicted prices to actual prices.

1. Reference Information & Descriptions

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

!pip install mplfinance


2. Data Ingestion

In [None]:
df = pd.read_csv('/content/BTC-USD(1).csv')

3. Set Display Option

In [None]:
pd.set_option('display.max_columns', 40)
pd.set_option('display.max_rows', 40)
df.head()
df.tail
df.shape
df.info()

4. Data Exploration

In [None]:
df.describe(include='all')

5. Visualization Line graph

In [None]:
# Convert the 'Date' column to datetime format
df['Date'] = pd.to_datetime(df['Date'])

# Filter the DataFrame to include only dates within the specified range
start_date = '2014-09-16'
end_date = '2022-03-24'
mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
df_filtered = df.loc[mask]

# Plot the line graph
plt.figure(figsize=(12, 6))
plt.plot(df_filtered['Date'], df_filtered['Close'], label='Bitcoin Price')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.title('Bitcoin Price from 2014-09-16 to 2022-03-24')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.grid(True)
plt.legend()
plt.show()

6. Visualization Candlestick graph

In [None]:
import mplfinance as mpf

# Convert the 'Date' column to datetime format
df['Date'] = pd.to_datetime(df['Date'])

# Filter the DataFrame to include only dates within the specified range
start_date = '2014-09-16'
end_date = '2022-03-24'
mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
df_filtered = df.loc[mask]

# Set the 'Date' column as the index
df_filtered.set_index('Date', inplace=True)

# Plot the candlestick chart
mpf.plot(df_filtered, type='candle', volume=True,
         title='Bitcoin Candlestick Chart (2014-09-16 to 2022-03-24)',
         ylabel='Price (USD)',
         ylabel_lower='Volume',
         style='yahoo')

7. Feature Engineering for Random Forest

In [None]:
# Lag Features: Created new features by shifting the closing prices by 1 to 5 days (`lag_1` to `lag_5`). These features represent the price at previous time points, which is crucial for time series forecasting.
# Creating lag features
df_filtered['lag_1'] = df_filtered['Close'].shift(1)
df_filtered['lag_2'] = df_filtered['Close'].shift(2)
df_filtered['lag_3'] = df_filtered['Close'].shift(3)
df_filtered['lag_4'] = df_filtered['Close'].shift(4)
df_filtered['lag_5'] = df_filtered['Close'].shift(5)

# Drop any rows with NaN values created by the shifting
df_filtered.dropna(inplace=True)

 8. Define features (X) and target (y)

In [None]:
# Selected the newly created features (lag and rolling features) as input variables (X) and the current closing price as the target variable (y).
X = df_filtered[['lag_1', 'lag_2', 'lag_3', 'lag_4', 'lag_5']]
y = df_filtered['Close']

9. Train-Test Split

In [None]:
# Used TimeSeriesSplit instead of a standard train-test split to maintain the temporal order of the data, which is crucial for time series data.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)

10. Fit the Random Forest model

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

11. Make predictions

In [None]:
y_pred = rf.predict(X_test)


12. Evaluate the model

In [None]:
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")
print(f"R-squared: {r2}")

13. Plot actual vs. predicted prices

In [None]:
plt.figure(figsize=(12, 6))
plt.plot(df_filtered.index[-len(y_test):], y_test, label='Actual Prices')
plt.plot(df_filtered.index[-len(y_test):], y_pred, label='Predicted Prices', linestyle='--')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.title('Actual vs Predicted Bitcoin Prices')
plt.xticks(rotation=45)
plt.legend()
plt.show()