<a href="https://colab.research.google.com/github/Ncn914491/stock_prediction/blob/main/stocks_predict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Create a simple model to predict the stock prices of next 10 days and import an opensouce dataset relating to it from online.

## Data acquisition

### Subtask:
Find and load an open-source stock price dataset from an online source.


**Reasoning**:
Identify a reliable online source for open-source stock price data and load the historical stock price data into a pandas DataFrame.



In [None]:
import pandas as pd
import yfinance as yf

# Define the ticker symbol for the stock you want to analyze (e.g., Apple)
ticker_symbol = "AAPL"

# Define the date range for the data
start_date = "2020-01-01"
end_date = "2024-01-01"

# Download the historical stock data
stock_data = yf.download(ticker_symbol, start=start_date, end=end_date)

# Display the first few rows of the DataFrame
display(stock_data.head())

  stock_data = yf.download(ticker_symbol, start=start_date, end=end_date)
[*********************100%***********************]  1 of 1 completed


Price,Close,High,Low,Open,Volume
Ticker,AAPL,AAPL,AAPL,AAPL,AAPL
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2020-01-02,72.62085,72.681296,71.373226,71.6271,135480400
2020-01-03,71.91481,72.676439,71.68995,71.84711,146322800
2020-01-06,72.487854,72.526541,70.783256,71.034717,118387200
2020-01-07,72.14695,72.753831,71.926922,72.497537,108872000
2020-01-08,73.307518,73.609752,71.84954,71.84954,132079200


Error: Runtime no longer has a reference to this dataframe, please re-run this cell and try again.


## Data preprocessing

### Subtask:
Prepare the data for modeling, which may include handling missing values, scaling, and creating features.


**Reasoning**:
Select the 'Close' price column, check for missing values, handle them using forward fill, scale the data using MinMaxScaler, and store the scaled data.



In [None]:
from sklearn.preprocessing import MinMaxScaler

# Select the 'Close' price column
close_prices = stock_data['Close']

# Check for missing values
missing_values = close_prices.isnull().sum()
print(f"Number of missing values before handling: {missing_values}")

# Handle missing values using forward fill
close_prices_filled = close_prices.fillna(method='ffill')
print(f"Number of missing values after handling: {close_prices_filled.isnull().sum()}")

# Scale the 'Close' price data
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_close_prices = scaler.fit_transform(close_prices_filled.values.reshape(-1, 1))

# Create a new DataFrame or array containing the scaled 'Close' prices
scaled_close_prices_df = pd.DataFrame(scaled_close_prices, index=close_prices_filled.index, columns=['Scaled_Close'])

display(scaled_close_prices_df.head())

Number of missing values before handling: Ticker
AAPL    0
dtype: int64
Number of missing values after handling: Ticker
AAPL    0
dtype: int64


  close_prices_filled = close_prices.fillna(method='ffill')


Unnamed: 0_level_0,Scaled_Close
Date,Unnamed: 1_level_1
2020-01-02,0.128204
2020-01-03,0.123242
2020-01-06,0.127269
2020-01-07,0.124873
2020-01-08,0.13303


## Model selection

### Subtask:
Choose a suitable model for time series prediction.


## Model training

### Subtask:
Train the selected model on the historical stock data.


**Reasoning**:
Prepare the data into sequences suitable for LSTM input, split the data into training and testing sets, build an LSTM model architecture, compile the model, and train the model.



In [None]:
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from sklearn.model_selection import train_test_split

# Prepare data for LSTM
def create_dataset(dataset, look_back=1):
    X, Y = [], []
    for i in range(len(dataset) - look_back - 1):
        a = dataset[i:(i + look_back), 0]
        X.append(a)
        Y.append(dataset[i + look_back, 0])
    return np.array(X), np.array(Y)

look_back = 60  # Using past 60 days to predict the next day
X, y = create_dataset(scaled_close_prices_df.values, look_back)

# Reshape input to be [samples, time steps, features]
X = np.reshape(X, (X.shape[0], X.shape[1], 1))

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build the LSTM model
model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(look_back, 1)))
model.add(LSTM(50))
model.add(Dense(1))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=100, batch_size=64, verbose=1)

print("Model training complete.")

  super().__init__(**kwargs)


Epoch 1/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 14ms/step - loss: 0.2435
Epoch 2/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - loss: 0.0150
Epoch 3/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - loss: 0.0065
Epoch 4/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - loss: 0.0040
Epoch 5/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 13ms/step - loss: 0.0031
Epoch 6/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - loss: 0.0025
Epoch 7/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 11ms/step - loss: 0.0020
Epoch 8/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - loss: 0.0021
Epoch 9/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - loss: 0.0018
Epoch 10/100
[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - loss:

## Prediction

### Subtask:
Use the trained model to predict stock prices for the next 10 days.

**Reasoning**:
Use the trained model to make predictions on the test set, invert the scaling to get actual price predictions, and prepare the last `look_back` days of historical data to predict the next 10 days. Then, iteratively predict the next 10 days' prices, appending each prediction to the input sequence and removing the oldest data point. Finally, inverse transform the predictions to get actual price values.

In [None]:
# Make predictions
train_predict = model.predict(X_train)
test_predict = model.predict(X_test)

# Invert scaling
train_predict = scaler.inverse_transform(train_predict)
test_predict = scaler.inverse_transform(test_predict)

# Prepare data for predicting the next 10 days
last_60_days = scaled_close_prices[-look_back:].reshape(1, look_back, 1)
predicted_prices = []

for _ in range(10):
    next_day_prediction = model.predict(last_60_days)
    predicted_prices.append(next_day_prediction[0, 0])
    last_60_days = np.append(last_60_days[:, 1:, :], next_day_prediction.reshape(1, 1, 1), axis=1)

# Invert scaling for the next 10 days predictions
predicted_prices = scaler.inverse_transform(np.array(predicted_prices).reshape(-1, 1))

print("Predicted prices for the next 10 days:")
print(predicted_prices.flatten())

[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step
[1m6/6[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step 
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 34ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 33ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 35ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 36ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 32ms/step
Predicted prices for the next 10 days:
[192.30426 192.44632 192.75932 193.15729 193.5909  194.03146 194.46326
 194.87859 195.2748  195

## Finish task

### Subtask:
Summarize the approach and present the predictions.

**Reasoning**:
Summarize the steps taken, including data acquisition, preprocessing, model training, and prediction. Present the predicted stock prices for the next 10 days in a clear and understandable format.

The process involved:
1.  **Data Acquisition**: Historical stock price data for AAPL was downloaded using the `yfinance` library.
2.  **Data Preprocessing**: The 'Close' prices were selected, missing values were handled with forward fill, and the data was scaled using `MinMaxScaler`. The data was then prepared into sequences for LSTM input.
3.  **Model Training**: An LSTM model was built with two LSTM layers and a dense output layer. The model was compiled with the 'adam' optimizer and 'mean_squared_error' loss, and trained on the prepared data.
4.  **Prediction**: The trained model was used to predict stock prices for the next 10 days by iteratively predicting the next day's price and updating the input sequence. The predicted prices were then inverse transformed to their original scale.

The predicted stock prices for the next 10 days are:

In [None]:
print(predicted_prices.flatten())

[192.30426 192.44632 192.75932 193.15729 193.5909  194.03146 194.46326
 194.87859 195.2748  195.65205]


In [None]:
# Re-run the data preprocessing cell to define scaled_close_prices_df
get_ipython().run_cell("8c04db6a")

# Re-run the model training cell
get_ipython().run_cell("8f7475a9")

SyntaxError: invalid decimal literal (ipython-input-12-1111720046.py, line 1)

SyntaxError: invalid decimal literal (ipython-input-12-3083733155.py, line 1)

<ExecutionResult object at 7c478ba36850, execution_count=None error_before_exec=invalid decimal literal (ipython-input-12-3083733155.py, line 1) error_in_exec=None info=<ExecutionInfo object at 7c475409f450, raw_cell="8f7475a9" store_history=False silent=False shell_futures=True cell_id=None> result=None>

In [None]:
# Run the prediction cell
get_ipython().run_cell("ce52b636")

# Run the presentation cell
get_ipython().run_cell("c1b94026")

NameError: name 'ce52b636' is not defined

NameError: name 'c1b94026' is not defined

<ExecutionResult object at 7c47541021d0, execution_count=None error_before_exec=None error_in_exec=name 'c1b94026' is not defined info=<ExecutionInfo object at 7c47553c0490, raw_cell="c1b94026" store_history=False silent=False shell_futures=True cell_id=None> result=None>

In [None]:
# Run the prediction and presentation cells
get_ipython().run_cell("7477b248")

SyntaxError: invalid decimal literal (ipython-input-15-2511143776.py, line 1)

<ExecutionResult object at 7c474cbdbd10, execution_count=None error_before_exec=invalid decimal literal (ipython-input-15-2511143776.py, line 1) error_in_exec=None info=<ExecutionInfo object at 7c476924b010, raw_cell="7477b248" store_history=False silent=False shell_futures=True cell_id=None> result=None>