# RNN 

- It is a neural network that handles the sequential data (e.g., time-series , stock prices) 

- Normal neural netowrk takes one input at a time but RNN remembers the previous step too it works as a memory , this process data with time , like trend of stok prices

### RNN Equation-> 

#### - At every time step 't' , we calculate hidden state of RNN h_t

- $h_t = \tanh(W_{xh} \cdot x_t + W_{hh} \cdot h_{t-1} + b_h)$

1. x_t : Current Input (e.g., stok price at time t)

2. h_{t-1}: previous hidden state(memory)

3. W_{xh} : Hidden to Hidden weight matrix 

4.  W_{hh} : hidden to hidden weight matric

5.  b_h : bias

6. tanh: Activation function (-1 to 1 range )

7. output o_t 

- $o_t = W_{ho} \cdot h_t + b_o$

- problem with RNN is vaishing / exploding gradients , whenever the sequence is long (e.g, 50 days) , by tanh or matrix multiplication the gradients becomes small or large , that stops the learning rate , thats why we use then GRU and  LSTM

GRU (Gated Recurrent Unit):
GRU simpler LSTM hai, jo 2 gates use karta hai:

Update Gate (z_t):
$$z_t = \sigma(W_z \cdot [h_{t-1}, x_t])$$
Yeh decide karta hai ki kitna naya update karna hai.
Reset Gate (r_t):
$$r_t = \sigma(W_r \cdot [h_{t-1}, x_t])$$
Yeh peechli memory ko kitna bhoolna hai batata hai.
New memory:
$$h_t = (1 - z_t) \cdot h_{t-1} + z_t \cdot \tilde{h_t}$$
Jahan $\tilde{h_t} = \tanh(W_h \cdot [r_t \cdot h_{t-1}, x_t])$.


BiRNN:
Yeh dono taraf (past aur future) se data dekhta hai. Hidden state:
$$h_t = [h_t^\rightarrow, h_t^\leftarrow]$$
Yeh stock trends ke liye useful hai.
Temporal Attention:
Har time step ko weight deta hai:
$$\alpha_t = \frac{\exp(e_t)}{\sum \exp(e_t)}, \quad e_t = v^T \cdot \tanh(W_h \cdot h_t)$$
Yeh important days (e.g., market crash) pe focus karta hai.

In [None]:
%pip install yfinance

In [5]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import matplotlib.pyplot as plt
import logging
from torch.cuda.amp import autocast, GradScaler
from torch.optim.lr_scheduler  import CosineAnnealingLR , ReduceLROnPlateau
import yfinance as yf
import pandas as pd 
import os 
import warnings
from datetime import datetime ,timedelta


from sklearn.metrics import mean_absolute_error , mean_squared_error , r2_score
from sklearn.preprocessing import MinMaxScaler
from torch.utils.data import DataLoader, Dataset

## 📘 Advanced Imports & Concepts

### 🧮 FP16 vs FP32
- **FP32 (32-bit floating point)**  
  - Default precision in ML.  
  - Very accurate but slower and uses more memory.  

- **FP16 (16-bit floating point)**  
  - Half precision, faster, and uses less memory.  
  - Risk of underflow (tiny numbers become `0`) or overflow (big numbers become `∞`).  

---

### ⚡ `autocast`
- Automatically switches between FP16 and FP32 during training.  
- Uses FP16 where it’s safe (fast) and FP32 where stability is needed (accurate).  
- Improves training speed without losing much accuracy.  

---

### ⚖️ `GradScaler`
- Prevents gradient underflow when training in FP16.  
- Scales the loss before backpropagation, then rescales updates.  
- Makes FP16 training as stable as FP32, but much faster.  

---

### 📉 Learning Rate Schedulers
- **CosineAnnealingLR** → Learning rate follows a cosine wave (fast → slow → fast).  
- **ReduceLROnPlateau** → Lowers learning rate when validation loss stops improving.  

---

### 📊 Metrics
- **MAE (Mean Absolute Error)** → Average absolute difference between predictions and actual values.  
- **MSE (Mean Squared Error)** → Squares errors → penalizes big mistakes more.  
- **R² Score** → How well predictions explain variance in data (1 = perfect).  

---

### 📈 Finance Tools
- **`yfinance`** → Downloads stock market data (prices, volume, etc.) from Yahoo Finance.  
- **`MinMaxScaler`** → Normalizes data to range [0,1] for stable training.  

---

### 🛠️ Data Handling
- **`Dataset`** → Defines how to load & preprocess data.  
- **`DataLoader`** → Creates batches, shuffles, and feeds data to the model.  

---

### 📑 Logging & Warnings
- **`logging`** → Professional alternative to `print()`, saves info to log files.  
- **`warnings.filterwarnings("ignore")`** → Hides unnecessary warnings (like deprecation messages).  


In [6]:
warnings.filterwarnings("ignore")

logging.basicConfig(level = logging.INFO , format = 
                    '%(asctime)s - %(name)s - %(levelname)s - %(message)s ',
                    handlers = [logging.StreamHandler(), logging.FileHandler("stock_predictions.log")],
)
logger = logging.getLogger(__name__)


In [None]:

class Config:
    def __init__(self):
        # Data parameters
        self.data_dir = "./stock_data"
        self.sequence_length = 60  # 60 days of historical data for predicton
        self.predict_steps = 1  # single-step prediction (easier to validate )

        # training parameters - desgined to prevent overfitting
        self.epochs = 100
        self.batch_size = 32  # larger batch size for stable gradient
        self.patience = 15  # early stopping patience

        # Model architecture parameters
        self.hidden_size = 64  # 128 might cause overfitting so 128
        self.num_layers = 2
        self.dropout = 0.3

        # Optimizations parameters
        self.lr = 0.001
        self.weight_decay = 1e-4
        self.clip_grad_norm = 1.0

        # file paths
        self.model_path = "best_Stock_rnn_model.pt"
        self.scaler_path = "stock_scaler.pkl"

        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

        os.makedirs(self.data_dir, exist_ok=True)
        for split in ["train", "val", "test"]:
            os.makedirs(os.path.join(self.data_dir, split), exist_ok=True)

### ⚙️ Model & Optimization Hyperparameters

#### 🏗️ Model Parameters
- **`self.num_layers = 2`**  
  Number of stacked layers in the model (e.g., 2 LSTM layers).  
  → More layers = higher capacity, but risk of overfitting/slow training.  

- **`self.dropout = 0.3`**  
  Probability of "dropping" (ignoring) neurons during training.  
  → Prevents overfitting by making the network less dependent on specific neurons.  

---

#### ⚡ Optimization Parameters
- **`self.lr = 0.001`**  
  Learning rate → how fast weights update per step.  
  - Too high → unstable training.  
  - Too low → very slow learning.  

- **`self.weight_decay = 1e-4`**  
  L2 regularization → adds penalty for large weights.  
  → Helps reduce overfitting.  

- **`self.clip_grad_norm = 1.0`**  
  Gradient clipping → prevents gradients from becoming too large (exploding gradients).  
  → Keeps training stable, especially in RNNs/LSTMs.  


In [None]:
# Downloading the dataset , stock dataset for train test val
class StockDataDownloader:
    """
    Downloads and preprocesses stock data with proper train , test , val splits
    includes data normalization and feature engineering
    """

    def __init__(self, config):
        self.config = config
        self.scaler = MinMaxScaler(feature_range=(-1, 1))

    def download_stock_data(self, symbol="AAPL", years=5):
        end_date = datetime.now()
        start_date = end_date - timedelta(days=years * 365)

        logger.info(f"Downloading {symbol}  data from {start_date.date()} to {end_date.date()}")

        try:
            stock_data = yf.download(
                symbol,
                start=start_date.strftime("%Y-%m-%d"),
                end=end_date.strftime("%Y-%m-%d"),
                progress=False,
            )
            if stock_data.empty:
                raise ValueError(f"No data found for symbol {symbol}")

            stock_data.reset_index(inplace=True)  # reset index to make date a column

            stock_data = self.add_technical_indicators(stock_data)

            raw_data_path = os.path.join(self.config.data_dir, f"{symbol}_raw.csv")
            stock_data.to_csv(raw_data_path, index=False)

            logger.info(f"downloaded {len(stock_data)} days of data for {symbol}")
            return stock_data

        except Exception as e:
            logger.error(f"Error downloading data for {symbol}: {str(e)}")
            raise

    def add_technical_indicators(self, df):
        """
        Adding technical indicators to improve prediction accuracy.
        These features help the model understand market trends.
        """
        # Moving averages
        df["MA_5"] = df["Close"].rolling(window=5).mean()
        df["MA_10"] = df["Close"].rolling(window=10).mean()
        df["MA_20"] = df["Close"].rolling(window=20).mean()

        df["Returns"] = df["Close"].pct_change()
        # .pct_change() calculates the percentage change between the current value and the previous one

        # .rolling(window=10) means we take a sliding window of 10 days. , .std() means we calculate the standard deviation of returns in that window.
        # Standard deviation in finance = measure of volatility (risk, uncertainty, how much returns fluctuate).
        # Higher volatility = more risky stock.
        df["Volatility"] = df["Returns"].rolling(window=10).std()

        df["Price_Range"] = (df["Close"] - df["Low"]) / (df["High"] - df["Low"])

        df["Volume_MA"] = df["Volume"].rolling(window=10).mean()

        df.fillna(method="bfill", inplace=True)
        df.fillna(method="ffill", inplace=True)

        return df

    def prepare_data_splits(self, data, symbol):
        """
        Split data into train/validation/test sets with proper temporal ordering.
        This is crucial for time series data - we can't randomly shuffle.
        """
        data = data.sort_values("Date").reset_index(drop=True)

        total_len = len(data)

        train_len = int(total_len * 0.7)

        val_len = int(total_len * 0.15)

        train_data = data[:train_len].copy()
        val_data = data[train_len : train_len + val_len].copy()
        test_data = data[train_len + val_len :].copy()

        feature_columns = [
            "Close",
            "Volume",
            "MA_5",
            "MA_10",
            "MA_20",
            "Volatility",
            "Price_Range",
            "Volume_MA",
        ]

        self.scaler.fit(train_data[feature_columns])

        train_data[feature_columns] = self.scaler.transform(train_data[feature_columns])

        val_data[feature_columns] = self.scaler.transform(val_data[feature_columns])

        test_data[feature_columns] = self.scaler.transform(test_data[feature_columns])

        train_data.to_csv(os.path.join(self.config.data_dir, "train", f"{symbol}.csv"), index=False)

        val_data.to_csv(os.path.join(self.config.data_dir, "val", f"{symbol}.csv"), index=False)

        test_data.to_csv(os.path.join(self.config.data_dir, "test", f"{symbol}.csv"), index=False)

        import joblib

        joblib.dump(self.scaler, self.config.scaler_path)

        logger.info(
            f"Data split completed: Train={len(train_data)}, val={len(val_data)}, Test={len(test_data)}"
        )

        return train_data, val_data, test_data


### 📊 Example of Volatility Calculation

Suppose last 10 days returns =  
`[0.01, -0.02, 0.03, 0.01, -0.01, 0.04, -0.02, 0.00, 0.02, -0.01]`

---

#### 1. Mean Return (Average)
Formula:  
Mean return = (Sum of all returns) ÷ (Number of returns)

$$
\bar{R} = \frac{0.01 + (-0.02) + 0.03 + 0.01 + (-0.01) + 0.04 + (-0.02) + 0.00 + 0.02 + (-0.01)}{10}
$$

$$
\bar{R} = 0.005 = 0.5\%
$$

---

#### 2. Variance (How spread out the returns are)
Formula:  

$$
Var = \frac{(R_1 - \bar{R})^2 + (R_2 - \bar{R})^2 + \dots + (R_{10} - \bar{R})^2}{10}
$$  

This measures how much each return deviates from the mean.

---

#### 3. Standard Deviation (Volatility)
Formula:  

$$
\sigma = \sqrt{Var}
$$  

This tells us the *riskiness / fluctuation* of returns.

---

### ✅ Interpretation
- If $\sigma = 0.02$ → Volatility = **2% → Low Risk** (stable stock)  
- If $\sigma = 0.08$ → Volatility = **8% → High Risk** (more unpredictable)  


# 📈 Price, Moving Averages, and Price Range — A Deep Dive

---

## 1. Basics of Price Action
In stock/crypto/forex markets, every **candle/bar** (1 day, 1 hour, 5 min, etc.) has 4 main values:

- **Open (O)** → Price at which the period started  
- **High (H)** → Highest price reached in that period  
- **Low (L)** → Lowest price reached in that period  
- **Close (C)** → Final price at end of that period  

This OHLC structure is the foundation of technical analysis. Every indicator is derived from this.

---

## 2. Moving Average (MA)
A **Moving Average (MA)** is a smoothing technique. It removes price noise and gives a "trend line".

### Formula:
For a 20-day Simple Moving Average (SMA):

\[
MA_{20}(t) = \frac{C_{t} + C_{t-1} + \dots + C_{t-19}}{20}
\]

Where \(C_t\) = closing price at day \(t\).  
Each new day adds the latest close and drops the oldest one → hence “moving”.

### Intuition:
- **Price > MA_20** → Trend is *above average*, bullish momentum.  
- **Price < MA_20** → Trend is *below average*, bearish momentum.  

So MA acts as a dynamic benchmark for strength/weakness.

---

## 3. Bullish vs Bearish Signal
### Condition:
- **If Price (Close) > MA_20 → Bullish**
  - Buyers dominate, price trends upward.  
  - Example: If past 20-day average price = ₹100 and today close = ₹110 → momentum is positive.

- **If Price (Close) < MA_20 → Bearish**
  - Sellers dominate, price trends downward.  
  - Example: MA_20 = ₹100, close = ₹90 → selling pressure dominates.

---

## 4. Price Range
Price Range measures **volatility** — the total movement within a candle.

### Formula:
\[
\text{Range} = High - Low
\]

- **High** = max price buyers paid.  
- **Low** = min price sellers accepted.  
- Large Range → high volatility (strong battle).  
- Small Range → low volatility (calm market).  

---

## 5. Normalized Price Range
To add context, we normalize the close within that day’s range.

### Formula:
\[
Price\_Range = \frac{Close - Low}{High - Low}
\]

- Value is always **0 to 1**.  
- If Close ≈ High → Price closed near the top → Buyers strong.  
- If Close ≈ Low → Price closed near the bottom → Sellers strong.  
- If Close ≈ Middle → Balance between buyers/sellers.  

This works like a sentiment thermometer inside each candle.

---

## 6. Real-World Analogy
- **MA as Class Average**:  
   - If class average (past 20 tests) = 70.  
   - If you score 80 (Price > MA) → above average (bullish).  
   - If you score 60 (Price < MA) → below average (bearish).  

- **Price Range as Energy Level**:  
   - High−Low = amount of “drama” in the market.  
   - If everyone scored similarly (low range) → calm.  
   - If some scored 100, some scored 20 (high range) → chaos (high volatility).  

---

## 7. Python Code Example

```python
import pandas as pd

# Sample OHLC data
data = {
    "Open":  [100, 102, 105, 103, 106],
    "High":  [105, 107, 108, 106, 110],
    "Low":   [98, 101, 102, 100, 105],
    "Close": [104, 106, 103, 105, 108]
}
df = pd.DataFrame(data)

# 20-day Moving Average (here demo, using 3-day window)
df["MA_3"] = df["Close"].rolling(window=3).mean()

# Bullish/Bearish
df["Signal"] = df.apply(lambda row: "Bullish" if row["Close"] > row["MA_3"] else "Bearish", axis=1)

# Price Range
df["Range"] = df["High"] - df["Low"]

# Normalized Price Range
df["Price_Range"] = (df["Close"] - df["Low"]) / (df["High"] - df["Low"])

print(df)
```

### Output Example:
| Open | High | Low | Close | MA_3   | Signal   | Range | Price_Range |
|------|------|-----|-------|--------|----------|-------|-------------|
| 100  | 105  | 98  | 104   | NaN    | Bearish? | 7     | 0.857       |
| 102  | 107  | 101 | 106   | NaN    | Bullish? | 6     | 0.833       |
| 105  | 108  | 102 | 103   | 104.33 | Bearish  | 6     | 0.333       |
| 103  | 106  | 100 | 105   | 104.67 | Bullish  | 6     | 0.833       |
| 106  | 110  | 105 | 108   | 105.33 | Bullish  | 5     | 0.75        |

---

## 8. Key Takeaways
1. **Price vs MA** → trend direction (bullish/bearish).  
2. **Range (H−L)** → volatility intensity.  
3. **Normalized Price_Range** → candle sentiment (buyers vs sellers).  
4. These basics lead to advanced tools like **RSI, Bollinger Bands, ATR, Stochastic**.  

---

## 9. Advanced Insight
- **Price > MA and Price_Range ≈ 1** → Very strong bullish trend.  
- **Price < MA and Price_Range ≈ 0** → Very strong bearish trend.  
- **Very small Range** → Market indecision (consolidation, possible breakout).  


In [None]:
class StockDataset(Dataset):
    """
    PyTorch Dataset class for handling stock data sequences.
    Creates sequences of historical data to predict future prices.
    """

    def __init__(self, data_dir, split_type="train", sequence_length=60):
        self.split_type = split_type
        self.sequence_length = sequence_length
        self.sequences = []
        self.targets = []

        split_dir = os.path.join(data_dir, split_type)

        for filename in os.listdir(split_dir):
            if filename.endswith(".csv"):
                filepath = os.path.join(split_dir, filename)
                df = pd.read_csv(filepath)

                sequences, targets = self._create_sequences(df)
                self.sequences.extend(sequences)
                self.targets.extend(targets)

        self.sequences = np.array(self.sequences, dtype=np.float32)
        self.targets = np.array(self.targets, dtype=np.float32)

        logger.info(f"Created {len(self.sequences)} sequences for {split_type} set")

    def _create_sequences(self, df):
        """
        Create input sequences and corresponding targets from dataframe.
        Each sequence contains 'sequence_length' days of historical data.
        """
        sequences = []
        targets = []

        feature_columns = [
            "Close",
            "Volume",
            "MA_5",
            "MA_10",
            "MA_20",
            "Volatility",
            "Price_Range",
            "Volume_MA",
        ]

        for i in range(len(df) - self.sequence_length):
            sequence = df[feature_columns].iloc[i : i + self.sequence_length].values
            target = df["Close"].iloc[i + self.sequence_length]
            sequences.append(sequence)
            targets.append(target)

        return sequences, targets

    def __len__(self):
        return len(self.sequences)

    def __getitem__(self, idx):
        return torch.tensor(self.sequences[idx]), torch.tensor(self.targets[idx])

In [None]:
class AdvancedStockRNN(nn.Module):
    """
    Advanced RNN model combining RNN, GRU, and Bidirectional RNN with attention.
    Includes multiple regularization techniques to prevent overfitting.
    """

    def __init__(self, input_size=8, hidden_size=64, num_layers=2, dropout=0.3):
        super(AdvancedStockRNN, self).__init__()

        self.hidden_size = hidden_size
        self.num_layers = num_layers

        # layer 1: Standar RNN for basic sequential processing
        self.rnn = nn.RNN(
            input_size,
            hidden_size,
            num_layers,
            batch_first=True,
            dropout=dropout if num_layers > 1 else 0,
        )

        # Layer 2: GRU for better gradient flow ane memory
        self.gru = nn.GRU(
            hidden_size,
            hidden_size,
            num_layers,
            batch_first=True,
            dropout=dropout if num_layers > 1 else 0,
        )

        # Bidriectional RNN for capturing both past and future context
        self.bi_rnn = nn.GRU(
            hidden_size,
            hidden_size // 2,
            num_layers,
            batch_first=True,
            bidirectional=True,
            dropout=dropout if num_layers > 1 else 0,
        )

        # Attention mechanism for focusing on important time steps
        self.attention = nn.Sequential(
            nn.Linear(hidden_size, hidden_size // 2),
            nn.Tanh(),
            nn.Linear(hidden_size // 2, 1),
        )
        self.dropout = nn.Dropout(dropout)
        self.batch_norm = nn.BatchNorm1d(hidden_size)  # to normalize every mini - batch's variance activations ( mean = 0 , variance = 1)

        self.fc_layers = nn.Sequential(
            nn.Linear(hidden_size, hidden_size // 2),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_size // 2, hidden_size // 4),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(hidden_size // 4, 1),
        )

    def forward(self, x):
        """
        Forward pass through the network.

        Args:
            x: Input tensor of shape (batch_size, sequence_length, input_size)

        Returns:
            Output tensor of shape (batch_size, 1)
        """
        batch_size = x.size(0)

        # Initialize hidden state
        h0_rnn = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device)
        h0_gru = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device)
        h0_bi = torch.zeros(self.num_layers * 2, batch_size, self.hidden_size // 2).to(x.device)

        # Pass through the rnn layers
        rnn_out, _ = self.rnn(x, h0_rnn)  # (batch, seq_len, hidden_size)
        gru_out, _ = self.gru(rnn_out, h0_gru)
        bi_out, _ = self.bi_rnn(gru_out, h0_bi)

        attention_scores = self.attention(bi_out)  # (batch, seq_len, 1)
        attention_weights = torch.softmax(attention_scores, dim=1)  # (batch, seq_len, 1)

        # Weughted some of hidden states
        context_vector = torch.sum(attention_weights * bi_out, dim=1)  # (batch, hidden_size)

        context_vector = self.dropout(context_vector)

        if context_vector.size(0) > 1:
            context_vector = self.batch_norm(context_vector)

        output = self.fc_layers(context_vector)

        return output

class ModelTrainer:
    def __init__(self, model, config):
        self.model = model
        self.config = config
        self.best_val_loss = float("inf")
        self.patience_counter = 0
        self.train_losses = []
        self.val_losses = []

        self.criterion = nn.MSELoss()

        self.optimizer = optim.Adam(model.parameters(), lr=config.lr, weight_decay=config.weight_decay)

        self.scheduler_cosine = CosineAnnealingLR(self.optimizer, T_max=config.epochs)
        self.scheduler_plateau = ReduceLROnPlateau(
            self.optimizer, mode="min", factor=0.5, patience=7, verbose=True
        )

        self.scaler = GradScaler()

    def train_epoch(self, train_loader):
        """Train the model for one epoch."""
        self.model.train()
        total_loss = 0.0
        num_batches = 0

        for sequences, targets in train_loader:
            sequences, targets = sequences.to(self.config.device), targets.to(self.config.device)

            self.optimizer.zero_grad()

            with autocast():
                predictions = self.model(sequences)
                loss = self.criterion(predictions.squeeze(), targets)

            # Backward pass with gradient scaling
            self.scaler.scale(loss).backward()

            # Gradient clipping to prevent exploding gradients
            self.scaler.unscale_(self.optimizer)
            torch.nn.utils.clip_grad_norm_(self.model.parameters(), self.config.clip_grad_norm)

            self.scaler.step(self.optimizer)
            self.scaler.update()

            total_loss += loss.item()
            num_batches += 1

        return total_loss / num_batches

    def validate_epoch(self, val_loader):
        """Validate the model for one epoch."""
        self.model.eval()
        total_loss = 0.0
        num_batches = 0

        with torch.no_grad():
            for sequences, targets in val_loader:
                sequences, targets = sequences.to(self.config.device), targets.to(self.config.device)

                with autocast():
                    predictions = self.model(sequences)
                    loss = self.criterion(predictions.squeeze(), targets)

                total_loss += loss.item()
                num_batches += 1

        return total_loss / num_batches

    def train(self, train_loader, val_loader):
        logger.info("Starting model training")

        for epoch in range(self.config.epochs):
            train_loss = self.train_epoch(train_loader)

            val_loss = self.validate_epoch(val_loader)

            self.train_losses.append(train_loss)
            self.val_losses.append(val_loss)

            self.scheduler_cosine.step()
            self.scheduler_plateau.step(val_loss)

            # early stopping check
            if val_loss < self.best_val_loss:
                self.best_val_loss = val_loss
                self.patience_counter = 0

                # save best model
                torch.save(
                    {
                        "model_state_dict": self.model.state_dict(),
                        "optimizer_state_dict": self.optimizer.state_dict(),
                        "epoch": epoch,
                        "val_loss": val_loss,
                        "train_loss": train_loss,
                    },
                    self.config.model_path,
                )

                logger.info(f"New best model saved at epoch {epoch + 1}")

            else:
                self.patience_counter += 1

            # Log progress
            if (epoch + 1) % 10 == 0 or self.patience_counter == 0:
                logger.info(
                    f"Epoch {epoch + 1}/{self.config.epochs}, "
                    f"Train Loss: {train_loss:.6f}, "
                    f"Val Loss: {val_loss:.6f}, "
                    f"LR: {self.optimizer.param_groups[0]['lr']:.7f}"
                )

            # Early stopping
            if self.patience_counter >= self.config.patience:
                logger.info(f"Early stopping triggered at epoch {epoch + 1}")
                break

        logger.info("Training completed!")

    def plot_training_history(self):
        """Plot training and validation loss curves."""
        plt.figure(figsize=(10, 6))
        plt.plot(self.train_losses, label="Training Loss", color="blue")
        plt.plot(self.val_losses, label="Validation Loss", color="red")
        plt.xlabel("Epoch")
        plt.ylabel("Loss")
        plt.title("Model Training History")
        plt.legend()
        plt.grid(True)
        plt.savefig("training_history.png", dpi=300, bbox_inches="tight")
        plt.show()

def evaluate_model(model, test_loader, config, scaler):
    """
    Comprehensive model evaluation with multiple metrics.
    """
    model.eval()
    predictions = []
    actuals = []

    with torch.no_grad():
        for sequences, targets in test_loader:
            sequences, targets = sequences.to(config.device), targets.to(config.device)

            with autocast():
                pred = model(sequences)
                predictions.extend(pred.cpu().numpy())
                actuals.extend(targets.cpu().numpy())

    predictions = np.array(predictions).flatten()  # flatten used to conver multi dimensional array into single dimesnional array
    actuals = np.array(actuals).flatten()

    # Denormalize predictions and actuals for meaningful metrics
    predictions_denorm = scaler.inverse_transform(
        np.column_stack([predictions] + [np.zeros((len(predictions), 7))])
    )[:, 0]
    actuals_denorm = scaler.inverse_transform(
        np.column_stack([actuals] + [np.zeros((len(actuals), 7))])
    )[:, 0]

    # Calculate metrics
    mse = mean_squared_error(actuals_denorm, predictions_denorm)
    mae = mean_absolute_error(actuals_denorm, predictions_denorm)
    rmse = np.sqrt(mse)
    r2 = r2_score(actuals_denorm, predictions_denorm)

    # Calculate MAPE (Mean Absolute Percentage Error)
    mape = np.mean(np.abs((actuals_denorm - predictions_denorm) / actuals_denorm)) * 100

    logger.info("=== Model Evaluation Results ===")
    logger.info(f"Mean Squared Error (MSE): {mse:.4f}")
    logger.info(f"Root Mean Squared Error (RMSE): {rmse:.4f}")
    logger.info(f"Mean Absolute Error (MAE): {mae:.4f}")
    logger.info(f"Mean Absolute Percentage Error (MAPE): {mape:.2f}%")
    logger.info(f"R-squared Score: {r2:.4f}")

    return {
        "mse": mse,
        "rmse": rmse,
        "mae": mae,
        "mape": mape,
        "r2": r2,
        "predictions": predictions_denorm,
        "actuals": actuals_denorm,
    }

def main():
    """
    Main execution function that orchestrates the entire pipeline.
    """
    # Initialize configuration
    config = Config()

    # Download and prepare data
    logger.info("Step 1: Downloading stock data...")
    downloader = StockDataDownloader(config)

    # Download multiple stocks for diversified training
    stocks = ["AAPL", "GOOGL", "MSFT", "TSLA", "AMZN"]

    for stock in stocks:
        try:
            stock_data = downloader.download_stock_data(stock, years=5)
            downloader.prepare_data_splits(stock_data, stock)
            logger.info(f"Successfully processed {stock}")
        except Exception as e:
            logger.error(f"Failed to process {stock}: {str(e)}")

    # Create datasets and data loaders
    logger.info("Step 2: Creating datasets...")
    train_dataset = StockDataset(config.data_dir, "train", config.sequence_length)
    val_dataset = StockDataset(config.data_dir, "val", config.sequence_length)
    test_dataset = StockDataset(config.data_dir, "test", config.sequence_length)

    train_loader = DataLoader(
        train_dataset,
        batch_size=config.batch_size,
        shuffle=True,
        num_workers=2,
        pin_memory=True,
    )
    val_loader = DataLoader(
        val_dataset,
        batch_size=config.batch_size,
        shuffle=False,
        num_workers=2,
        pin_memory=True,
    )
    test_loader = DataLoader(
        test_dataset,
        batch_size=config.batch_size,
        shuffle=False,
        num_workers=2,
        pin_memory=True,
    )

    # Initialize model
    logger.info("Step 3: Initializing model...")
    model = AdvancedStockRNN(
        input_size=8,  # Number of features
        hidden_size=config.hidden_size,
        num_layers=config.num_layers,
        dropout=config.dropout,
    ).to(config.device)

    # Log model parameters
    total_params = sum(p.numel() for p in model.parameters())
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    logger.info(f"Model parameters: {total_params:,} total, {trainable_params:,} trainable")

    # Train model
    logger.info("Step 4: Training model...")
    trainer = ModelTrainer(model, config)
    trainer.train(train_loader, val_loader)

    # Plot training history
    trainer.plot_training_history()

    # Load best model for evaluation
    logger.info("Step 5: Evaluating model...")
    checkpoint = torch.load(config.model_path, map_location=config.device)
    model.load_state_dict(checkpoint["model_state_dict"])

    # Evaluate on test set
    import joblib

    scaler = joblib.load(config.scaler_path)
    results = evaluate_model(model, test_loader, config, scaler)

    # Plot predictions vs actuals
    plt.figure(figsize=(15, 8))
    plt.plot(results["actuals"][:100], label="Actual Prices", color="blue", alpha=0.7)
    plt.plot(results["predictions"][:100], label="Predicted Prices", color="red", alpha=0.7)
    plt.xlabel("Time Steps")
    plt.ylabel("Stock Price ($)")
    plt.title("Stock Price Predictions vs Actual (First 100 Test Samples)")
    plt.legend()
    plt.grid(True)
    plt.savefig("predictions_comparison.png", dpi=300, bbox_inches="tight")
    plt.show()

    logger.info("Pipeline completed successfully!")

if __name__ == "__main__":
    main()