# **Neuro-Fuzzy Computing Final Project**
## **Bitcoin Price Prediction Using Neural Networks**
* In this project our aim is to develop a neural network-based forecasting model to predict the daily closing price of the Bitcoin cryptocurrency. The dataset included consists of minute-level Bitcoin price data from the following dates: January 1, 2021, to March 1, 2022.

* Our goal is to train a model using historical price trends and evaluate its performance by predicting Bitcoin prices for the last ten days of February 2022.

---

* We need to predict the daily closing price of Bitcoin for the last 10 days of February 2022 using a neural network. Bitcoin price data is sequential and time-dependent, so we will use **Time Series Forecasting** techniques.



## **Step 0. Import Dataset**

In [None]:
import pandas as pd

data = pd.read_csv("bitcoin_data.csv")

data.head()

In [None]:
data.info()

In [None]:
data.describe()

## **Step 1. Data Preprocessing**
The first step after we observe our given data is to preprocess them in order to move forward to the next step.

---

* Firstly, we have to ensure that the datetime column is formatted properly. The description of the data indicated that the *unix* column represents timestamps but in the Unix format. We need to convert it into a readable datetime format:

In [None]:
data['timestamp'] = pd.to_datetime(data['unix'], unit='s')
data.set_index('timestamp', inplace=True)

* Following this, our goal is to predict the closing prices of the cryptocurrency, so we have to resample to *daily closing prices*. This is an important step, as the dataset has 610,782 rows (high frequency) and if we try to train our model to minute-level data, while our focus are daily predictions, the model may focus more on short-term fluctuations instead of learning broader trends. Also, this will help us in computation cost and the trends will also be smoother and overall it will lead us to a better performance.

In [None]:
daily_data = data.resample('D').agg({'close': 'last', 'Volume BTC': 'sum'})

* Then (while it might not be necessary) it is our duty to check and handle (possible) missing data, so we will check for missing days and interpolate (if needed).

In [None]:
daily_data.isna().sum()
daily_data.interpolate(inplace=True)

* Also, it would be beneficial if we added some extra features which are time-based in order to understand the correlation better and overall lead to better analysis and forecasting.

* The extra features that we decided to add are the following:
  * *day_of_week*: It stores the day of the week as an integer, beginning from 0 for Monday and finishing with 6 for Sunday (e.g., Monday: 0, Tuesday: 1,..., Sunday: 6).
  * *month*: Stores the month as an integer, with 1 being the month January and adding up to 12 which is December.
  * *year*: e.g., 2021,2022,2023...

* Adding these particular features in time series forecasting is quite useful as they help to capture temporal dependencies and seasonal patterns better. We add each one for the following reasons:
  * *day_of_week*: Cryptocurrency traded are happening 24 hours a day 7 days a week, but the trading volume and the volatility may vary depending on which day of the week the trading happens. For example, in weekends there may be lower trading activity comparing to the weekdays, because traders from companies may not be active during Saturday and Sunday. Our neural network, with this addition, could learn which days tend to have higher or lower price movements.
  * *month*: The cryptocurrency market experiences seasonal trends (e.g., summer slump). This could happen as institutional investors could be adjust their holdings at particular seasons of the year, which overall affects the prices. Also, some historical events like the Bitcoin halving, tend to be tied in specific months. An example that could affect our forecasting is that if in December Bitcoin spikes due to more investing activity because of the holidays extra income, our model could recognise and leverage it.
  * *year*: Longer trends can vary depending on the year due to new regulations, because the cryptocurrency is a new market and the rules/laws are still adjusting and in each year there are more and more companies are adapting to Bitcoin. So, by adding this feature, the model could spot macroeconomic changes. A signifant example of this case: Let's say that in 2021 Bitcoin performed better than in 2022 because of regulatory concerns, the model can differentiate the behavior of the price based on the year.

* Also, important to note is that, while the dataset already has a date column, raw timestamps do not provide such meaningful patterns for a neural network and time series models do not inherently understand the time cycles of the calendar, so it is impactful to provide these features. And of course, the RNNs (Recurrent Neural Networks) like LSTMs and Transformers only process numerical features, not categorical ones, that is why we chose these new columns to include integers and not strings.

In [None]:
daily_data['day_of_week'] = daily_data.index.dayofweek
daily_data['month'] = daily_data.index.month
daily_data['year'] = daily_data.index.year

* We will use for time related features, day of the week and month, as they are cyclic, a sine and cosine transformations, which will allow the model to understand the circular relationships.
  * For example, Monday (0) seems far from Sunday (6), but they are 1 day apart of each other. Same goes for January and December.

In [None]:
import numpy as np

daily_data['day_sin'] = np.sin(2 * np.pi * daily_data['day_of_week'] / 7)
daily_data['day_cos'] = np.cos(2 * np.pi * daily_data['day_of_week'] / 7)

daily_data['month_sin'] = np.sin(2 * np.pi * daily_data['month'] / 12)
daily_data['month_cos'] = np.cos(2 * np.pi * daily_data['month'] / 12)

---

* Now that we finished everything needed for the preprocessing of the data, we need to ensure that the dataset is fully processed for the time series forecasting.


In [None]:
print(daily_data.isnull().sum())

* As we can see no null values, so we don't have any missing data.

* Next, we have to check for stationarity using the ADF test, because most of the forecasting models assume stationarity (constant means and variance over time).

In [None]:
from statsmodels.tsa.stattools import adfuller

adf_test = adfuller(daily_data['close'])
print(f"ADF Statistic: {adf_test[0]}")
print(f"P-value: {adf_test[1]}")

* The results indicate that the data is non-stationary, because the p-value is larger than 0.05 and since the time-series forecasting models perform better with stationary data, we have to handle it.
---

**-How to handle not stationary data-**
* **Differencing**(First): It works by subtracting each value from the previous one, but first we have to check the plot of the data in order to see how the trends look like.

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(12,5))
plt.plot(daily_data.index, daily_data['close'], label='Original Close Prices', color='blue')
plt.title('Bitcoin Closing Price Over Time')
plt.xlabel('Date')
plt.ylabel('Price (USD)')
plt.legend()
plt.show()

* Now that we have the plot, we can see that there are long upward and downward trends. Differencing removes long-term trends by converting absolute values into relative changes. So, we apply the method and rerun the ADF test.

In [None]:
daily_data['close_diff'] = daily_data['close'].diff()
daily_data.dropna(inplace=True)

In [None]:
adf_test_diff = adfuller(daily_data['close_diff'])
print(f"Differenced Data - ADF Statistic: {adf_test_diff[0]}")
print(f"Differenced Data - P-value: {adf_test_diff[1]}")

* Now we are sure that the data is stationary as the p-value is lower than the threshold (0.05) and we can continue the preprocess.

* One thing to check, before moving to the model implementation is to check for the normalization of the data. We need all values to be betweeen 0 and 1.

* We scale the data, so we can ensure that all the values fall within consistent range (0,1). This helps us prevent large numerical differences from dominating in the process of the model's training and learning. The scaling method improves training stability, speeds up the convergence and it benefits us overall in the Neural Network that we're trying to build. We use the *MinMaxScaler* because it preserves the shape of the data, while the values are in the range from 0 to 1.

In [None]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(0, 1))
daily_data[['close_diff_scaled']] = scaler.fit_transform(daily_data[['close_diff']])

* Verify that the scaling was successful.

In [None]:
print(f"Min: {daily_data['close_diff_scaled'].min()}, Max: {daily_data['close_diff_scaled'].max()}")

What we need to implement is the split of the data into training and test split sets. It is quite an important step, in which the data is split into two parts; The training set which helps the model understand patterns and the testing set, where the set assesses how well the model performs (predicts).
* In time series forecasting the split must be chronological to prevent data leakage, so it can be ensured that the model learns from past data and predicts future values. In the case we analyze, the split is defined from the problem's description; Training data: Before February 19th 2021 - Test data: February 19th to February 28th 2021.

In [None]:
split_date = '2022-02-19'

train_data = daily_data[daily_data.index < split_date].copy()
test_data = daily_data[daily_data.index >= split_date].copy()

In [None]:
plt.figure(figsize=(12,5))
plt.plot(daily_data.index, daily_data['close_diff'], label='Original Differenced Close Price', color='blue')
plt.plot(daily_data.index, daily_data['close_diff_scaled'], label='Scaled Differenced Close Price', color='red', linestyle='dashed')
plt.legend()
plt.title('Comparison of Original and Scaled Differenced Closing Prices')
plt.show()

* Checking all values with the min and max method and the plot, we can be sure that the data is normalised.



* Now that we checked all the preprocess steps are successful, we have to split the data into sequences as neural networks need sequential input.

---

### **Creating sequences for Neural Network training**

Neural networks especially LSTM and GRUs, require the input of the data in a sequential form since the process is temporal. Instead of treating the change of the prices independently, the model learns the patterns of the data from a fixed number of the previous days (*lookback window*) to predict the price's movement of the current/following day.

* Since we have the closing price column in differenced and scaled form, we have to create structures of the data in input-output pairs, where:
  * **X** will be the input and it will consist of the past *lookback* days of the closed_diff_scaled values.
  * **Y** will be the output which will store the next day's prediction of the closed_diff_scaled value.

For the lookback part, we have to select a lookback value which will be appropriate for our dataset. We saw in the preproccesing part, in the plot, that we have long-term trends, which indicates that the selection of a larger lookback would be a better-fit. We have to keep in mind several factors before selecting this because the quality and accuracy of our predictions depend heavily on this.

* Our dataset spans to 14 months (January 1st 2021 to March 1st 2022 - 426 days), which means we have a relatively long history to analyze. Given this, the best would be to refine the optimal lookback choice by considering other factors too, like market cycles, data volatility and trends. For this reason, we will create a table to analyze our method to choosing the lookback:



In [None]:
df = pd.read_csv("lookback_analysis.csv")
df.head()

So, based on the table above, since the dataset as we mentioned covers 426 days, the optimal choice is 30 days. But 60 days would be a good choice too, as it includes more history and alligns with the trends.

Our goal is the prediction Bitcoin prices for Feb 19 - Feb 28. Thus, we must generate sequences that allow predictions for these exact dates while respecting the lookback requirement. Because we chose as the lookback value the 30 day threshold, the first day of the predictions (February 19) has to be predicted based on the past 30 days(20/01/2022). So we have to include the past 30 days on the test set:

* We also print the shape (number of rows and columns) in order to be sure that we didn't make any mistakes.



In [None]:
import numpy as np

def create_sequences(data, lookback):
    X, y = [], []
    for i in range(len(data) - lookback):
        X.append(data[i:i+lookback])
        y.append(data[i+lookback])
    return np.array(X), np.array(y)

lookback = 30

X_train, y_train = create_sequences(train_data['close_diff_scaled'].values, lookback)

test_start_date = '2022-01-20'
test_data_extended = daily_data[daily_data.index >= test_start_date].copy()

X_test_full, y_test_full = create_sequences(test_data_extended['close_diff_scaled'].values, lookback)

X_test = X_test_full[-10:]
y_test = y_test_full[-10:]

print(f"Train Shape: X={X_train.shape}, y={y_train.shape}")
print(f"Test Shape: X={X_test.shape}, y={y_test.shape}")

---

## **Step 2. LSTM model implementation**

After all the preprocessing and the train set is ready (*X_train*, *y_train*), we need to define the **LSTM (Long Short-Term Memory) neural network**.

For forecasting Bitcoin closing prices, LSTM neural networks are a good fit, because of their ability to capture long-term dependencies in data which are in sequential form. Unlike traditional models like ARIMA, which assume linear relationships or simple RNNs, which struggle with the vanishing gradients issue, LSTMs can retain past information effectively, as they have a gated architecture in their memory. This fact, makes them ideal for capturing price trends and patterns in volatile markets, like the cryptocurrency market. While LSTMs perform well in short-term forecasting, their accuracy most of the times is limited by other external factors (such as regulations, investors sentiment, etc), which are not reflected in our dataset.

Therefore, the LSTM model makes the ideal candidate for our study, because other than their strong foundation for time-series forecasting, with the incorporation of additional features (like trading volume, sentiment analysis or indicators for macroeconomics) they could perform even better (more prediction power).

Also it is important to note that our choice of the lookback window being 30 days is reasonable for an LSTM model predicting Bitcoin prices, as it captures both short-term fluctuations and medium-term trends. Bitcoin cryptocurrency often is following monthly cycles influenced by market sentiment, economic events, and investor behaviors, making a 30-day window effective in identifying patterns. A shorter lookback, as we said, such as 7 to 14 days, might not provide enough historical context, while a much longer lookback, could introduce outdated patterns and increase the risk of overfitting.
* Additionally, a 30-day lookback keeps model complexity manageable and ensures efficient training, as increasing the window significantly would slow computation and require more data to maintain accuracy.

---

**Comparison to other neural network models.**

**1. MLP (Multi-Layer Perceptron)**
* MLPs treat all inputs as independent, which leads to the inability of capturing temporal dependencies. Also, they require manual feature engineering, whereas the LSTM model can learn temporal relationships on its own. Finally, it cannot handle sequential data effectively, making them a poor choice for time-series data.

**2. Simple RNNs**
* RNNs are designed for sequential data but they have a major flaw, which is called *Vanishing Gradients*; In the process of long sequences of data, gradients shrink during the backpropagation, so they cannot learn long-term dependencies effectively. Also, standard RNNs have a tendency of loosing important information from earlier time-steps. This doesn't happen is LSTM because its architecture with the **Forget Gate** (what information is discarded from the memory), **Input Gate** (which new information is stored) and **Output Gate** (final output control based on history), allows LSTMs to retain crucial information from past timestamps.

**3. GRU (Gated Recurrent Unit)**
* GRU is a simplified version of LSTM and although they require less parameters which leads to faster training and less memory, can handle short-term dependencies well like LSTM, they cannot handle well long sequences and may lose information for longer periods. For Bitcoin forecasting, LSTMs are preferable due to their superiority of tracking long-term dependencies. If we focused more on speed, GRUs could be a great alternative.

**4. Transformers (like GPT, Attention models)**
* Transformer models recently are being used for time-series forecasting and models like those excel at capturing long-term dependencies and are capable of handling time-series data quite well. But, on the other hand, they require massive datasets and computational power. Our dataset, although it is large, it is not that large for those methods and it may lead to overfitting. So, Transformers work best for large-scale financial datasets and LSTMs work best for small-to-medium datasets like ours.

**5. ARIMA & Traditional Statistical Models**
* Classical time-series models like **ARIMA (AutoRegressive Integrated Moving Average)** and **Exponential Smoothing** work well with stationary data and require less computational power, but they do not handle well non-linear data (like ours) and they are incapable of working with such high volatility datasets. Our dataset also has large feature sets and the dependencies among them are complex and those classical methods struggle to handle them. As we mentioned earlier, Bitcoin prices are high volatile and non-linear, so the LSTM model outperform traditional statistical models.

---


### **Model Definition**
We will use the tensorflow library and the architecture we selected consists of:
* 2 LSTM Layers, which help capture long-term dependencies in sequential data (Bitcoin price data).
* Dropout Layers, to prevent overfitting by reducing the reliance on specific neurons.
* 1 Dense Output Layer, so we can convert the learned features into a final price prdeiction.

---

More specifically:

**1. LSTM Layers**
* The 1st LSTM layer is used to learn short-term patterns and after the learning process, it passes to the next layer the meaningful information.
* The 2nd LSTM layer further refines the the learned patterns and then captures longer-term dependencies.
* A single LSTM layer would probably not capture the complex dependencies that exist in such a volatile market. More than 2 LSTM layers would help the model learn more patterns, but would increase the computational cost, so the optimal is to choose 2 layers.
* We use the *ReLU* activation because it allows the LSTM model to capture in a better way the fluctuations of the bitcoin prices and it helps the learning of non-linear relationships more effectively.

**2. Dropout Layers**
* Dropout is a technique used for optimization that works by dropping the percentage of the neurons while the model is training, so the model can prevent from relying on specific neurons more than needed.
  * Due to the high volatility of the Bitcoin dataset, the model might overfit small patterns, that do not generalize well.
  * The dropout (20%) forces the model to learn more robust patterns that work across different market conditions.
* So, overall, the dropout layers help the model be more generalizable and makes the real-world performance more improved.

**3. Dense Output Layer**
* The final dense layer as we mentioned already is used to convert the time-series that we extracted into one single prediction.
* Since we are predicting one price movement for each day, one single dense layer is sufficient and we do not need extra.
* Also important to note is that we do not use any activation function in the output layer, because we aim for raw numerical values.


After the state the model design, we compile it using the *adam* optimizer. This optimizer is ideal for our model as it adapts learning rates dynamically, ensuring faster convergence and stability in the volatile price data. By combining momentum and adaptive updates, it efficiently handles non-stationary trends and prevents the model from getting stuck in local minima, making it well-suited for time-series forecasting.

* Finally, for the loss function we will use the **MSE (Mean Squared Error** which is calculated by the following formula:
$$
MSE = \frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2
$$

  where:

  * $n$: Total number of samples

  * $y_i$: Actual value

  * $\hat{y}_i$: Predicted value

* This function calculates the average squared difference between actual and predicted values, penalizing larger errors more significantly.

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

model = Sequential([
    LSTM(50, activation='relu', return_sequences=True, input_shape=(lookback, 1)),
    Dropout(0.2),
    LSTM(50, activation='relu'),
    Dropout(0.2),
    Dense(1)
])

model.compile(optimizer='adam', loss='mse')

model.summary()

* The total number of trainable parameters is 30,651, meaning the model has enough complexity to learn meaningful patterns while remaining computationally efficient.

---

### **Model Training**
For the training process, we begin by feeding our LSTM model with training sequences **(X_train, y_train)** over **50 epochs**. We choose 50 epochs because they provide a balance between learning efficiency and computational cost. Too few epochs, such as 10-20, would probably lead to underfitting. Conversely, training for too many epochs, such as 100+, increases the risk of overfitting, where the model memorizes training data instead of generalizing to new data. So 50 epochs is the optimal choice.

* The **batch_size=16** determines how many samples are processed before updating the model's weights. This way it is balancing computational efficiency and gradient updates.
* The **validation_data=(X_test, y_test)** allows the model to be evaluated on unseen data after each epoch, ensuring it generalizes well and preventing overfitting.
* The **verbose=1** setting displays the training progress, showing loss reduction over time periods.

Additionally, we record the total training time to analyze how long the model takes to train on our dataset. The training duration is measured by capturing the time before and after model training using Python's time module. This is crucial for evaluating the computational cost of training and assessing how the model scales as we increase the dataset size. The recorded training time is later compared with different input data sizes to observe how training performance changes as more historical data is used.

In [None]:
import time

start_time = time.time()
history = model.fit(
    X_train, y_train,
    epochs=50,
    batch_size=16,
    validation_data=(X_test, y_test),
    verbose=1
)

end_time = time.time()
training_time = end_time - start_time

print(f"Total Training Time: {training_time:.2f} seconds")

### **Model Performance**

To evaluate our model we use some error metrics from the sklearn library. Since we are forecasting Bitcoin closing prices, the metrics that we should choose, they have to be able to accurately measure prediction deviations, while handling crypto market volatility. The ideal choices are in our opinion are:

**Mean Absolute Error (MAE)**
* Formula: $MAE = \frac{1}{n} \sum_{i=1}^{n} \left| y_i - \hat{y}_i \right|$

 $y_i$: Actual Closing Price

 $\hat{y}_i$: Predicted Closing Prices

 $n$: Number of predictions.

* This metric measures the average absoluate difference between the predicted and the actual values. It provides the most straightforward way to assess the accuracy of the model. It is also ideal because it treats all deviations equally, so it is robust to extreme number fluctuations which are common in the crypto market. Applicable across different time frames.

**Root Mean Squared Error (RMSE)**
* Formula: $ RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2} $
* RMSE measures the standard deviation(sd) of the prediction errors, while penalizing larger deviations more than MAE does. Basically, RMSE emphasizes on larger mistakes, useful on detecting forcasting errors in volatile Bitcoin prices. It helps in minimizing major mispredictions that could be costly in real-world trading decisions. However, the sensitivity of this metric to outliers means it is best to be used alongside MAE for a more balanced evaluation.

**Mean Absolute Error Percantage (MAPE)**
* Formula: $MAPE = \frac{100}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right|$
* Finally, the MAPE error metric expresses the prediction errors as a percentage of actual prices, so it is scale-independent and helpful in the comparison of forecasting accuracy accross different price levels. This way it helps traders assess the performance regardless of the coin's absolute price. However, it can be unreliable when prices approach zero, as small errors can lead to inflated percentage values, that's why we have to use it in addition to the other metrics.

To evaluate futher on, our model's efficiency, we also measure the inference time, which represents how long the model takes to generate predictions. This is crucial in real-world applications, where rapid forecasting is essential for decision-making in volatile markets like cryptocurrency trading. The total inference time is calculated by recording the time before and after calling **model.predict(X_test)**, giving us the overall duration required to predict the closing prices for the test period. Additionally, we compute the average time per prediction, which helps assess how scalable the model is when applied to larger datasets. A lower inference time is desirable, ensuring that the model can quickly adapt to new data and provide timely price forecasts.






In [None]:
from sklearn.metrics import mean_absolute_error, root_mean_squared_error, mean_absolute_percentage_error

start_time = time.time()
predictions_scaled = model.predict(X_test)

predicted_diff = scaler.inverse_transform(predictions_scaled)

actual_prices = test_data['close'].iloc[-10:].values
predicted_prices = actual_prices + predicted_diff.flatten()

end_time = time.time()
inference_time = end_time - start_time

mae = mean_absolute_error(actual_prices, predicted_prices)
rmse = root_mean_squared_error(actual_prices, predicted_prices)
mape = mean_absolute_percentage_error(actual_prices, predicted_prices)


print(f"MAE: {mae}")
print(f"RMSE: {rmse}")
print(f"MAPE: {mape}")

print(f"Total Inference Time: {inference_time:.4f} seconds")
print(f"Average Time per Prediction: {inference_time / len(X_test):.6f} seconds")

### **Model Performance Evaluation**

**MAE = 160.56**
* The model's predictions deviate by $160.56 from the actual bitcoin prices. This shows a moderate prediction error, but because of the volatily of Bitcoin, we deem it acceptable.

**RMSE = 160.87**
* Since RMSE and MAE are close in their values, we understand that large prediction errors are not dominating the performance and that makes the model consistent (fairly).

**MAPE = 0,4%**
* Model's error is small, only 0.4%, relative to the Bitcoin's actual prices, suggesting that while the absolute errors seem high in dollar terms, they are relatively small compared to Bitcoin's actual price range.

**Time components**
* **Total Inference Time = 0.1971 seconds**
* **Average Time per Prediction = 0.01971 seconds**
The total inference time for forecasting the test period is 0.1971 seconds, while the average time per prediction is just 0.0197 seconds, making the model fast and efficient.



### **Training Scalability Analysis: Impact of Data Size on Training Time**

To analyze how training time is scaling with different input data size, we train the model on increasing subsets of the dataset (20%, 40%, 60%, 80%, and 100%) and record the corresponding training durations.
* This allows us to observe the relationship between dataset size and computational cost, helping determine how efficiently the model handles larger inputs.
* By plotting training time against data size, we can evaluate whether training scales linearly, exponentially, or with diminishing returns, providing insights into the model's feasibility for larger datasets.

In [None]:
train_sizes = [int(len(X_train) * frac) for frac in [0.2, 0.4, 0.6, 0.8, 1.0]]
training_times = []

for size in train_sizes:
    start_time = time.time()

    model.fit(X_train[:size], y_train[:size], epochs=10, batch_size=16, verbose=0)

    end_time = time.time()
    training_times.append(end_time - start_time)

import matplotlib.pyplot as plt

plt.figure(figsize=(8,5))
plt.plot(train_sizes, training_times, marker='o', linestyle='-')
plt.xlabel("Training Data Size")
plt.ylabel("Training Time (seconds)")
plt.title("Training Time vs. Input Data Size")
plt.grid(True)
plt.show()

**Plot Result explained**

From the plot above we can see that initially, the training time increases gradually, but as the dataset size surpasses approximately 200 samples, we observe a significant jump, which is indicating that the computational cost rises beyond this point. This suggests that the model requires considerably more resources when handling larger datasets, which is expected due to the increased number of weight updates and backpropagation computations.
However, after 300 samples, training time slightly decreases, which could be due to fluctuations in hardware performance or optimization adjustments in TensorFlow. Overall, this trend highlights how important it is to balance the dataset size and training efficiency in order to maintain a practical computational cost.

### **Prediction's Visualizations**

We have obtained, from the previous steps, the model's predictions and we have also evaluated their accuracy. Now, we will plot the actual vs predicted values of the Bitcoin Closing Prices for the test period (19/2/2022 -> 28/2/2022). The visual will help us understand how well the model follows the trends of Bitcoin closing prices and if there are larger deviations.

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(12,5))
plt.plot(test_data.index[-10:], test_data['close'].iloc[-10:].values, label="Actual Prices", marker='o', linestyle='-')
plt.plot(test_data.index[-10:], predicted_prices, label="Predicted Prices", marker='x', linestyle='dashed')

plt.legend()
plt.title("Bitcoin Price Prediction (Feb 19, 2022 - Feb 28, 2022)")
plt.xlabel("Date")
plt.ylabel("Price (USD)")
plt.grid(True)

plt.show()

From the plot we can see a strong alignment between the actual prices (orange line) and the predicted prices (blue line), indicating the model tracks overall trends well.
* The sharp price increase at the end of the test period is tracked, but there is a slight larger gap between actual and predicted prices, than we had in the period before the 28th of February. This suggests that while the model captures trends, it may struggle with sharp, sudden price jumps.
* Of course, there are some slight other deviations in some parts of the test period but this is expected, as LSTM models cannot capture unpredictable external factors (market shift, news events, regulations etc..).
* In general, we can see that model follows price fluctuations quite well, especially in the period between Feb 19 - Feb 27, showing that it is capable of learning market patterns effectively.

---

## **Step 3. Comparing LSTM with ARIMA**

To compare these 2 methods, we have to train an ARIMA model on our dataset and evaluate the predictions using the same error metrics as we did for the LSTM (MAE, RMSE, MAPE).

* ARIMA(AutoRegressive Integrated Moving Average) as we mentioned earlier fall in the category of traditional statistical models for time-series forecasting, is commonly used for short-term predictions and relies on linear relationship in the data.
---

### **ARIMA Training**

ARIMA method, as we mentioned, assumes Bitcoin prices follow a linear trend over time. Unlike LSTM, which can model non-linear dependencies which are complex, ARIMA relies on histoical data (price values) and their differences to make predictions.

Before training, we select our parameters **(p,d,q)** which are:
* **p**: Number of observations (AR component)
* **d**: Number of differences applied (Integrated component-stationarity ensurement)
* **q**: Number of MA (Moving Average) terms.

Also, to compare efficiency, we measure the total training time, tracking how long it takes for ARIMA to fit the dataset.

In [None]:
import time
from statsmodels.tsa.arima.model import ARIMA

p, d, q = 5, 1, 0

start_time = time.time()

arima_model = ARIMA(train_data['close'], order=(p, d, q))
arima_fit = arima_model.fit()

end_time = time.time()
arima_training_time = end_time - start_time

print(f"Total ARIMA Training Time: {arima_training_time:.2f} seconds")

print(f"Total ARIMA Training Time: {arima_training_time:.2f} seconds")

### **ARIMA Performance**
Now we have to use the ARIMA model that we trained to predict the Bitcoin Closing Prices for our test period. ARIMA works by generating predictions sequentially based on past values. To measure efficiency, we record inference time, tracking how long it takes to generate predictions for the full test period. This helps compare how quickly ARIMA vs. LSTM can make forecasts in real-world scenarios.

In [None]:
start_time = time.time()

arima_predictions = arima_fit.forecast(steps=len(test_data))

end_time = time.time()
arima_inference_time = end_time - start_time

print(f"Total ARIMA Inference Time: {arima_inference_time:.4f} seconds")
print(f"Average ARIMA Time per Prediction: {arima_inference_time / len(test_data):.6f} seconds")

### **ARIMA Model Performance Evaluation (Error Metrics & Time Analysis)**

As we did before, in this section we calculate error metrics (MAE, RMSE, MAPE) for ARIMA.

* The predictions have to be in an array format to use them into the metrics, so we use the NumPy library to convert it (**np.array** command).

In [None]:
from sklearn.metrics import mean_absolute_error, root_mean_squared_error, mean_absolute_percentage_error

import numpy as np
arima_predictions = np.array(arima_predictions)

mae_arima = mean_absolute_error(test_data['close'], arima_predictions)
rmse_arima = root_mean_squared_error(test_data['close'], arima_predictions)
mape_arima = mean_absolute_percentage_error(test_data['close'], arima_predictions)

print(f"ARIMA MAE: {mae_arima}")
print(f"ARIMA RMSE: {rmse_arima}")
print(f"ARIMA MAPE: {mape_arima}")

### **LSTM vs. ARIMA: Performance Comparison**

Now that we have evaluated both models, we compare their prediction accuracy and computational efficiency. LSTM uses deep learning to capture non-linear dependencies, while ARIMA assumes a linear relationship in time-series data. By analyzing their performance metrics and time efficiency, we determine which model is better suited for Bitcoin price forecasting.

* We will use a comparison table to visualize the differences between our 2 methods, along with some plots.

In [None]:
comparison_df = pd.read_csv("LSTM_vs_ARIMA_Comparison.csv")
print(comparison_df.to_string(index=False))

In [None]:
plt.figure(figsize=(12,5))

plt.plot(test_data.index[-len(predicted_prices):], test_data['close'].iloc[-len(predicted_prices):],
         label="Actual Prices", color="blue", marker='o')

plt.plot(test_data.index[-len(predicted_prices):], predicted_prices,
         label="LSTM Predictions", linestyle='dashed', color="orange", marker='x')

plt.plot(test_data.index[-len(arima_predictions):], arima_predictions,
         label="ARIMA Predictions", linestyle='dashed', color="green", marker='s')

plt.legend()
plt.title("LSTM vs. ARIMA Bitcoin Price Predictions (Feb 19 - Feb 28, 2022)")
plt.xlabel("Date")
plt.ylabel("Price (USD)")
plt.grid(True)
plt.show()

In [None]:
actual_prices_trimmed = test_data['close'].iloc[-len(predicted_prices):].values

lstm_errors = abs(actual_prices_trimmed - predicted_prices)
arima_errors = abs(actual_prices_trimmed - arima_predictions[:len(predicted_prices)])

plt.figure(figsize=(12,5))

plt.plot(test_data.index[-len(lstm_errors):], lstm_errors, label="LSTM Absolute Error", color="orange", marker='x')

plt.plot(test_data.index[-len(arima_errors):], arima_errors, label="ARIMA Absolute Error", color="green", marker='s')

plt.legend()
plt.title("Absolute Errors: LSTM vs. ARIMA")
plt.xlabel("Date")
plt.ylabel("Absolute Error (USD)")
plt.grid(True)
plt.show()

### **Final Comparison and Conclusion**

The results from our evaluation highlight significant differences in both prediction accuracy and computational efficiency between our 2 methods: LSTM and ARIMA.

* The first plot comparing actual vs. predicted prices shows that LSTM closely follows the actual Bitcoin price trends, with only minor deviations, whereas ARIMA predictions remain relatively static, failing to capture Bitcoin's volatile price movements. This is further confirmed by the absolute error plot, where ARIMA exhibits much higher errors, reaching over 3,500 USD on certain days, while LSTM maintains consistently low errors, staying well below 200 USD throughout the test period. Looking at the error metrics, LSTM significantly outperforms ARIMA, with a MAE of 160.56 vs. 1790.28, an RMSE of 160.87 vs. 2109.42, and a MAPE of just 0.4% compared to ARIMA's 4.53%, proving that deep learning is better suited for modeling the complex, non-linear patterns of cryptocurrency prices. However, ARIMA is far more computationally efficient, with a training time of just 1.55 seconds vs. LSTM's 49.74 seconds and an inference time of 0.01 seconds vs. 0.1971 seconds for LSTM, making it suitable for applications where speed is prioritized over accuracy.
* Ultimately, **LSTM is the superior model** for forecasting Bitcoin closing prices, capturing trend fluctuations far better than ARIMA, despite its higher computational cost.