<a href="https://colab.research.google.com/github/Ncn914491/stock_prediction/blob/main/stock_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Create a simple model to predict the stock prices of next 10 days and import an open source dataset relating to it from online.

## Data acquisition

### Subtask:
Find and load an open-source stock price dataset from an online source.


**Reasoning**:
Identify a reliable online source for open-source stock price data and load the historical stock price data into a pandas DataFrame.



In [1]:
import pandas as pd
import yfinance as yf

# Define the ticker symbol for the stock you want to analyze (e.g., Apple)
ticker_symbol = "AAPL"

# Define the date range for the data
start_date = "2020-01-01"
end_date = "2024-01-01"

# Download the historical stock data
stock_data = yf.download(ticker_symbol, start=start_date, end=end_date)

# Display the first few rows of the DataFrame
display(stock_data.head())

  stock_data = yf.download(ticker_symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed


Price,Close,High,Low,Open,Volume
Ticker,AAPL,AAPL,AAPL,AAPL,AAPL
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2020-01-02,72.62085,72.681296,71.373226,71.6271,135480400
2020-01-03,71.91481,72.676439,71.68995,71.84711,146322800
2020-01-06,72.487846,72.526533,70.783248,71.034709,118387200
2020-01-07,72.14695,72.753831,71.926922,72.497537,108872000
2020-01-08,73.307495,73.609729,71.849518,71.849518,132079200


## Data preprocessing

### Subtask:
Prepare the data for modeling, which may include handling missing values, scaling, and creating features.


**Reasoning**:
Select the 'Close' price column, check for missing values, handle them using forward fill, scale the data using MinMaxScaler, and store the scaled data.



In [2]:
from sklearn.preprocessing import MinMaxScaler

# Select the 'Close' price column
close_prices = stock_data['Close']

# Check for missing values
missing_values = close_prices.isnull().sum()
print(f"Number of missing values before handling: {missing_values}")

# Handle missing values using forward fill
close_prices_filled = close_prices.fillna(method='ffill')
print(f"Number of missing values after handling: {close_prices_filled.isnull().sum()}")

# Scale the 'Close' price data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_close_prices = scaler.fit_transform(close_prices_filled.values.reshape(-1, 1))

# Create a new DataFrame or array containing the scaled 'Close' prices
scaled_close_prices_df = pd.DataFrame(scaled_close_prices, index=close_prices_filled.index, columns=['Scaled_Close'])

display(scaled_close_prices_df.head())

Number of missing values before handling: Ticker
AAPL    0
dtype: int64
Number of missing values after handling: Ticker
AAPL    0
dtype: int64


  close_prices_filled = close_prices.fillna(method='ffill')


Unnamed: 0_level_0,Scaled_Close
Date,Unnamed: 1_level_1
2020-01-02,0.128204
2020-01-03,0.123242
2020-01-06,0.127269
2020-01-07,0.124873
2020-01-08,0.133029


## Model selection

### Subtask:
Choose a suitable model for time series prediction.


## Model training

### Subtask:
Train the selected model on the historical stock data.


**Reasoning**:
Prepare the data into sequences suitable for LSTM input, split the data into training and testing sets, build an LSTM model architecture, compile the model, and train the model.



In [3]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from sklearn.model_selection import train_test_split

# Prepare data for LSTM
def create_dataset(dataset, look_back=1):
    X, Y = [], []
    for i in range(len(dataset) - look_back - 1):
        a = dataset[i:(i + look_back), 0]
        X.append(a)
        Y.append(dataset[i + look_back, 0])
    return np.array(X), np.array(Y)

look_back = 60  # Using past 60 days to predict the next day
X, y = create_dataset(scaled_close_prices_df.values, look_back)

# Reshape input to be [samples, time steps, features]
X = np.reshape(X, (X.shape[0], X.shape[1], 1))

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(look_back, 1)))
model.add(LSTM(50))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=100, batch_size=64, verbose=1)

print("Model training complete.")

  super().__init__(**kwargs)


Epoch 1/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 18ms/step - loss: 0.3132
Epoch 2/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step - loss: 0.0143
Epoch 3/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 16ms/step - loss: 0.0072
Epoch 4/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - loss: 0.0041
Epoch 5/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - loss: 0.0031
Epoch 6/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - loss: 0.0025
Epoch 7/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - loss: 0.0022
Epoch 8/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - loss: 0.0021
Epoch 9/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - loss: 0.0019
Epoch 10/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - loss

## Prediction

### Subtask:
Use the trained model to predict stock prices for the next 10 days.

**Reasoning**:
Use the trained model to make predictions on the test set, invert the scaling to get actual price predictions, and prepare the last `look_back` days of historical data to predict the next 10 days. Then, iteratively predict the next 10 days' prices, appending each prediction to the input sequence and removing the oldest data point. Finally, inverse transform the predictions to get actual price values.

In [4]:
# Make predictions
train_predict = model.predict(X_train)
test_predict = model.predict(X_test)

# Invert scaling
train_predict = scaler.inverse_transform(train_predict)
test_predict = scaler.inverse_transform(test_predict)

# Prepare data for predicting the next 10 days
last_60_days = scaled_close_prices[-look_back:].reshape(1, look_back, 1)
predicted_prices = []

for _ in range(10):
    next_day_prediction = model.predict(last_60_days)
    predicted_prices.append(next_day_prediction[0, 0])
    last_60_days = np.append(last_60_days[:, 1:, :], next_day_prediction.reshape(1, 1, 1), axis=1)

# Invert scaling for the next 10 days predictions
predicted_prices = scaler.inverse_transform(np.array(predicted_prices).reshape(-1, 1))

print("Predicted prices for the next 10 days:")
print(predicted_prices.flatten())

[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step 
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 28ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 29ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 30ms/step
Predicted prices for the next 10 days:
[188.72653 188.05486 187.2403  186.35858 185.45433 184.55511 183.67732
 182.83015 182.01816 181

## Finish task

### Subtask:
Summarize the approach and present the predictions.

**Reasoning**:
Summarize the steps taken, including data acquisition, preprocessing, model training, and prediction. Present the predicted stock prices for the next 10 days in a clear and understandable format.

The process involved:
1.  **Data Acquisition**: Historical stock price data for AAPL was downloaded using the `yfinance` library.
2.  **Data Preprocessing**: The 'Close' prices were selected, missing values were handled with forward fill, and the data was scaled using `MinMaxScaler`. The data was then prepared into sequences for LSTM input.
3.  **Model Training**: An LSTM model was built with two LSTM layers and a dense output layer. The model was compiled with the 'adam' optimizer and 'mean_squared_error' loss, and trained on the prepared data.
4.  **Prediction**: The trained model was used to predict stock prices for the next 10 days by iteratively predicting the next day's price and updating the input sequence. The predicted prices were then inverse transformed to their original scale.

The predicted stock prices for the next 10 days are:

### Summary of the Approach and Predictions

The task of predicting the next 10 days of stock prices for AAPL was completed through the following steps:

1.  **Data Acquisition**: Historical stock price data for AAPL was successfully downloaded using the `yfinance` library, covering the period from 2020-01-01 to 2024-01-01.

2.  **Data Preprocessing**: The 'Close' prices were extracted, and it was confirmed that there were no missing values. The data was then scaled using `MinMaxScaler` to normalize the values between 0 and 1, which is beneficial for neural network training. The scaled data was then transformed into sequences using a `look_back` period of 60 days, creating input-output pairs for the LSTM model.

3.  **Model Training**: An LSTM model architecture was defined using `tensorflow.keras`. The model consists of two LSTM layers followed by a dense output layer, designed to capture temporal dependencies in the stock price data. The model was compiled with the 'adam' optimizer and 'mean_squared_error' loss function, and trained on the prepared training data for 100 epochs.

4.  **Prediction**: The trained LSTM model was used to predict the stock prices for the next 10 days. This was done iteratively: the model predicted the next day's price based on the last 60 days of data, and this prediction was then added to the sequence to predict the subsequent day. Finally, the predicted scaled prices were inverse transformed back to their original price scale.

Here are the predicted stock prices for the next 10 days:

In [6]:
print(predicted_prices.flatten())

[188.72653 188.05486 187.2403  186.35858 185.45433 184.55511 183.67732
 182.83015 182.01816 181.24301]
