# Stock Price Prediction Model Training

This notebook contains the complete pipeline for training our LSTM (Long Short-Term Memory) models. For each stock ticker, we will perform the following steps:

1.  **Fetch Historical Data**: Download the latest stock data using the `yfinance` library.
2.  **Preprocess Data**: Scale the data and create sequences suitable for a time-series model.
3.  **Build the LSTM Model**: Define the architecture of our neural network using TensorFlow/Keras.
4.  **Train the Model**: Train a unique model on the historical data of each stock.
5.  **Save the Model**: Save the trained model to a file, ready to be used by our Flask backend.

### Step 1: Install and Import Libraries

In [None]:
!pip install yfinance tensorflow scikit-learn -q

import yfinance as yf
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from sklearn.preprocessing import MinMaxScaler
import os

### Step 2: Define Tickers and Fetch Data

We'll use the same list of tickers from our `stock_data.csv` file.

In [None]:
tickers = [
    'AAPL', 'MSFT', 'GOOGL', 'AMZN', 'NVDA', 'TSLA', 'JPM', 'JNJ', 'V', 'PG',
    'XOM', 'UNH', 'HD', 'MA', 'PFE', 'KO', 'BAC', 'MCD', 'COST', 'WMT',
    'CVX', 'LLY', 'PEP', 'DIS'
]

# Fetch data for the last 5 years
end_date = pd.Timestamp.now()
start_date = end_date - pd.DateOffset(years=5)

print(f"Fetching data from {start_date.date()} to {end_date.date()}...")
all_data = yf.download(tickers, start=start_date, end=end_date)
print("Data fetching complete.")

# We'll focus on the 'Close' price for our predictions
close_prices = all_data['Close']

### Step 3: Data Preprocessing

We'll create a function to prepare the data for our LSTM. This involves:
1.  Scaling the prices between 0 and 1 to help the model train better.
2.  Creating sequences of data (e.g., use the last 60 days to predict the 61st day).

In [None]:
def create_dataset(data, time_step=60):
    """Creates sequences of data for LSTM training."""
    X, Y = [], []
    for i in range(len(data) - time_step - 1):
        a = data[i:(i + time_step), 0]
        X.append(a)
        Y.append(data[i + time_step, 0])
    return np.array(X), np.array(Y)

### Step 4: Build the LSTM Model Architecture

We'll create a function that defines our LSTM model structure. This makes it easy to create a fresh, untraiend model for each stock.

In [None]:
def build_model(input_shape):
    """Builds and compiles a Keras LSTM model."""
    model = Sequential([
        LSTM(units=50, return_sequences=True, input_shape=input_shape),
        Dropout(0.2),
        LSTM(units=50, return_sequences=True),
        Dropout(0.2),
        LSTM(units=50),
        Dropout(0.2),
        Dense(units=1)
    ])
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

### Step 5: Train and Save a Model for Each Stock

This is the main loop. We will iterate through each ticker, prepare its specific data, train a model, and save it.

In [None]:
TIME_STEP = 60
MODELS_DIR = 'trained_models'

if not os.path.exists(MODELS_DIR):
    os.makedirs(MODELS_DIR)

for ticker in tickers:
    print(f'\n--- Processing and Training for {ticker} ---')
    
    # 1. Prepare Data
    ticker_data = close_prices[ticker].dropna().values.reshape(-1, 1)
    if len(ticker_data) < TIME_STEP + 100: # Ensure we have enough data
        print(f"Skipping {ticker} due to insufficient data.")
        continue
        
    scaler = MinMaxScaler(feature_range=(0, 1))
    scaled_data = scaler.fit_transform(ticker_data)
    
    X, y = create_dataset(scaled_data, TIME_STEP)
    
    # Reshape input to be [samples, time steps, features]
    X = X.reshape(X.shape[0], X.shape[1], 1)
    
    # 2. Build Model
    model = build_model(input_shape=(TIME_STEP, 1))
    
    # 3. Train Model
    print(f'Starting training for {ticker}...')
    # Using fewer epochs for a quick demonstration. Increase epochs for better accuracy.
    model.fit(X, y, epochs=10, batch_size=32, verbose=0)
    print(f'Training finished for {ticker}.')
    
    # 4. Save Model and Scaler
    model_path = os.path.join(MODELS_DIR, f'{ticker}_model.h5')
    model.save(model_path)
    
    # We MUST save the scaler for each stock to correctly inverse the prediction later
    import pickle
    scaler_path = os.path.join(MODELS_DIR, f'{ticker}_scaler.pkl')
    with open(scaler_path, 'wb') as f:
        pickle.dump(scaler, f)
        
    print(f'Successfully saved model to {model_path} and scaler to {scaler_path}')

print('\n--- All models have been trained and saved! ---')

### Step 6: Next Steps

After running this notebook, you will have a folder named `trained_models` containing:
1.  An `.h5` file for each stock (the trained neural network).
2.  A `.pkl` file for each stock (the scaler used for its data).

**To use these in your project:**

1.  Click the 'Files' icon on the left sidebar in Colab.
2.  You will see the `trained_models` directory. Right-click it and select 'Download'.
3.  This will download a `.zip` file. Unzip it.
4.  Place the entire `trained_models` folder into your Flask project directory, alongside `app.py`.

Our next step will be to update the `get_price_prediction` function in `app.py` to load and use these real models instead of the random placeholder.