Task: Predict the closing stock price of NASDAQ: NVDA for trading days from 25 October - 7
November (inclusive). There will be no restrictions on the data sources used (e.g. you may use
relevant macro-economic indicators). Predictions will be compared with actual (after 7 November,
but you have to submit before 24 October) using RMSE. The lower the RMSE, the more accurate
the prediction.

In [None]:
pip install pandas numpy scikit-learn tensorflow

In [None]:
import pandas as pd
import numpy as np

In [None]:
# Read CSV file containing historical price of NVDA from https://finance.yahoo.com/quote/NVDA/history?p=NVDA
nvda_data = pd.read_csv('NVDA.csv')
nvda_data.head()

In [None]:
# Preprocess the data
nvda_data['Date'] = pd.to_datetime(nvda_data['Date'])
nvda_data.set_index('Date', inplace=True)
nvda_data = nvda_data[['Close']]

#Data cleaning
nvda_data = nvda_data.dropna()

nvda_data.head()

In [None]:
#Splitting into train and test datasets
# Define training and test periods
train_end_date = "2023-10-10"
test_start_date = "2023-10-11"

# Split the data
train_data = nvda_data[nvda_data.index <= train_end_date]
test_data = nvda_data[(nvda_data.index >= test_start_date) & (nvda_data.index <= "2023-10-23")]

In [None]:
#Feature scaling step
from sklearn.preprocessing import MinMaxScaler

# Scale the data
scaler = MinMaxScaler()
train_data_scaled = scaler.fit_transform(train_data)
test_data_scaled = scaler.transform(test_data)

In [None]:
def prepare_sequences(data, sequence_length):
    X, y = [], []
    for i in range(sequence_length, len(data)):
        X.append(data[i - sequence_length:i, :])
        y.append(data[i, 0])
    return np.array(X), np.array(y)

# Define the sequence length
sequence_length = 7

# Prepare sequences
X_train, y_train = prepare_sequences(train_data_scaled, sequence_length)
X_test, y_test = prepare_sequences(test_data_scaled, sequence_length)

I have chosen to use the LSTM (Long Short-Term Memory) model to predict the closing stock price, as stock prices are a type of sequential data, and LSTM is known for its ability to handle time series data effectively and capture long-term dependencies. In stock markets, historical events and trends can have lasting effects on stock prices, and LSTMs can model these dependencies effectively. SVMs are not very good for time series forecasting because they may not effectively capture temporal dependencies in the data, which are often crucial in stock price prediction.

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Build the LSTM model
model = tf.keras.Sequential([
    LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])),
    LSTM(units=50, return_sequences=False),
    Dense(units=1)
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32)

In [None]:
# Make predictions
y_pred = model.predict(X_test)

# Inverse transform the scaled predictions
y_pred = scaler.inverse_transform(y_pred)

# Extract the corresponding test data for y_true (avoid using slice)
y_true = test_data.values[sequence_length:, 0]

# Calculate RMSE
from sklearn.metrics import mean_squared_error

rmse = np.sqrt(mean_squared_error(y_true, y_pred))
print(f"Root Mean Squared Error (RMSE): {rmse:.2f}")