Chuy·ªÉn t·ª´ **Ridge Regression** sang **Time-Series LSTM** y√™u c·∫ßu m·ªôt s·ªë thay ƒë·ªïi trong vi·ªác x·ª≠ l√Ω d·ªØ li·ªáu v√† x√¢y d·ª±ng m√¥ h√¨nh. D∆∞·ªõi ƒë√¢y l√† h∆∞·ªõng d·∫´n v√† m·ªôt phi√™n b·∫£n c·∫≠p nh·∫≠t cho class ƒë·ªÉ s·ª≠ d·ª•ng **LSTM**:

---

### **C√°c b∆∞·ªõc chuy·ªÉn ƒë·ªïi:**

1. **Ti·ªÅn x·ª≠ l√Ω d·ªØ li·ªáu cho LSTM:**
   - LSTM y√™u c·∫ßu d·ªØ li·ªáu ƒë·∫ßu v√†o c√≥ ƒë·ªãnh d·∫°ng **3D**: `(samples, timesteps, features)`.
   - Chuy·ªÉn d·ªØ li·ªáu `X_train` v√† `X_test` th√†nh c√°c **sequences** (c·ª≠a s·ªï th·ªùi gian).

2. **X√¢y d·ª±ng m√¥ h√¨nh LSTM:**
   - S·ª≠ d·ª•ng Keras ho·∫∑c TensorFlow ƒë·ªÉ x√¢y d·ª±ng m·ªôt m·∫°ng LSTM.
   - K√≠ch th∆∞·ªõc ƒë·∫ßu v√†o (input shape) ph·∫£i ph√π h·ª£p v·ªõi s·ªë l∆∞·ª£ng timesteps v√† features.

3. **ƒê√†o t·∫°o m√¥ h√¨nh LSTM:**
   - Chia d·ªØ li·ªáu th√†nh `train` v√† `test` v·ªõi `shuffle=False`.
   - S·ª≠ d·ª•ng callback (v√≠ d·ª•: EarlyStopping) ƒë·ªÉ tr√°nh overfitting.

4. **ƒê√°nh gi√° v√† tr·ª±c quan h√≥a k·∫øt qu·∫£:**
   - T√≠nh c√°c ch·ªâ s·ªë nh∆∞ `R2`, `MSE`, v√† `MAPE`.
   - V·∫Ω c√°c bi·ªÉu ƒë·ªì t∆∞∆°ng t·ª± nh∆∞ tr∆∞·ªõc.

---

### **C·∫≠p nh·∫≠t class ƒë·ªÉ s·ª≠ d·ª•ng LSTM:**

```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error, mean_absolute_percentage_error
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.preprocessing import MinMaxScaler
from matplotlib.ticker import FuncFormatter

class TimeSeriesLSTMModel:
    def __init__(self, file_path, look_back=10):
        """
        Initialize the class with the file path of the dataset and look-back window.
        """
        self.file_path = file_path
        self.look_back = look_back
        self.data = None
        self.model = None
        self.scaler = MinMaxScaler(feature_range=(0, 1))

    def load_and_preprocess_data(self):
        """
        Load the dataset and preprocess the data for LSTM.
        """
        # Load data
        self.data = pd.read_csv(self.file_path)
        self.data["close_tomor"] = self.data["close"].shift(-1)
        self.data = self.data.iloc[:-1]
        
        # Scaling data
        self.data_scaled = self.scaler.fit_transform(self.data[['close_tomor']])
        
        # Create sequences
        X, y = [], []
        for i in range(self.look_back, len(self.data_scaled)):
            X.append(self.data_scaled[i - self.look_back:i, 0])  # Sequence of look_back days
            y.append(self.data_scaled[i, 0])  # Target value

        X, y = np.array(X), np.array(y)
        X = X.reshape((X.shape[0], X.shape[1], 1))  # Reshape to (samples, timesteps, features)
        
        # Split into training and testing sets
        train_size = int(len(X) * 0.75)
        X_train, X_test = X[:train_size], X[train_size:]
        y_train, y_test = y[:train_size], y[train_size:]

        return X_train, X_test, y_train, y_test

    def build_model(self):
        """
        Build the LSTM model.
        """
        self.model = Sequential([
            LSTM(50, activation='relu', return_sequences=True, input_shape=(self.look_back, 1)),
            LSTM(50, activation='relu'),
            Dense(1)
        ])
        self.model.compile(optimizer='adam', loss='mse')

    def train_model(self, X_train, y_train, epochs=50, batch_size=32):
        """
        Train the LSTM model.
        """
        early_stop = EarlyStopping(monitor='loss', patience=5, restore_best_weights=True)
        self.model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, callbacks=[early_stop])

    def evaluate_model(self, y_true, y_pred):
        """
        Evaluate the model performance.
        """
        y_true = self.scaler.inverse_transform(y_true.reshape(-1, 1))
        y_pred = self.scaler.inverse_transform(y_pred.reshape(-1, 1))

        print("R2:", r2_score(y_true, y_pred))
        print("MSE:", mean_squared_error(y_true, y_pred))
        print("RMSE:", np.sqrt(mean_squared_error(y_true, y_pred)))
        print("MAE:", mean_absolute_error(y_true, y_pred))
        print("MAPE%:", f"{mean_absolute_percentage_error(y_true, y_pred) * 100:.2f}%")

    def plot_predictions(self, y_test, y_pred):
        """
        Plot predictions vs actual values.
        """
        y_test = self.scaler.inverse_transform(y_test.reshape(-1, 1))
        y_pred = self.scaler.inverse_transform(y_pred.reshape(-1, 1))

        plt.figure(figsize=(10, 5))
        plt.plot(y_test, label="Actual")
        plt.plot(y_pred, label="Predicted", linestyle="--")
        plt.title("Test Predictions")
        plt.xlabel("Time")
        plt.ylabel("Value")
        plt.legend()
        plt.show()

    def process(self):
        """
        Execute the full pipeline: load, preprocess, train, evaluate, and plot.
        """
        X_train, X_test, y_train, y_test = self.load_and_preprocess_data()
        self.build_model()
        self.train_model(X_train, y_train)
        y_pred_train = self.model.predict(X_train)
        y_pred_test = self.model.predict(X_test)
        self.evaluate_model(y_test, y_pred_test)
        self.plot_predictions(y_test, y_pred_test)

# Example usage for one file
file_path = "D:/OneDrive - Hanoi University of Science and Technology/GIT/MiniProj_StockPrediction_ML_SpManhGraduationProj_2024/data/raw20192024/FPT_stock_data.csv"
model = TimeSeriesLSTMModel(file_path)
model.process()
```

---

### **M√¥ t·∫£ ch√≠nh trong Class:**
1. **`load_and_preprocess_data`:**
   - T·∫°o c√°c c·ª≠a s·ªï th·ªùi gian (`look_back`) cho LSTM.
   - Chu·∫©n h√≥a d·ªØ li·ªáu b·∫±ng `MinMaxScaler`.

2. **`build_model`:**
   - X√¢y d·ª±ng m·∫°ng LSTM v·ªõi 2 l·ªõp LSTM v√† 1 l·ªõp Dense.

3. **`train_model`:**
   - ƒê√†o t·∫°o m√¥ h√¨nh v·ªõi `EarlyStopping` ƒë·ªÉ tr√°nh overfitting.

4. **`evaluate_model`:**
   - T√≠nh c√°c ch·ªâ s·ªë ƒë√°nh gi√° v√† in k·∫øt qu·∫£.

5. **`plot_predictions`:**
   - V·∫Ω bi·ªÉu ƒë·ªì so s√°nh gi·ªØa gi√° tr·ªã th·ª±c t·∫ø v√† gi√° tr·ªã d·ª± ƒëo√°n.

---

### **L∆∞u √Ω:**
- ƒê·∫£m b·∫£o r·∫±ng file CSV c√≥ c·ªôt `"close"` ƒë·ªÉ t·∫°o c√°c sequences th·ªùi gian.
- Ki·ªÉm tra `look_back` ph√π h·ª£p (v√≠ d·ª•: 10 ng√†y g·∫ßn nh·∫•t).

B·∫°n c√≥ th·ªÉ ch·∫°y v√† ki·ªÉm tra k·∫øt qu·∫£ tr·ª±c ti·∫øp t·ª´ LSTM model n√†y! üòä

### **C·∫≠p nh·∫≠t ƒë·ªÉ th√™m `R2` v√† `DA` (Directional Accuracy)**
ƒê·ªÉ th√™m c√°c ch·ªâ s·ªë **R2** v√† **DA (Directional Accuracy)**, ch√∫ng ta ch·ªâ c·∫ßn m·ªü r·ªông h√†m ƒë√°nh gi√° v√† tr·ª±c quan h√≥a m√¥ h√¨nh.

### **C·∫≠p nh·∫≠t Class ƒë·ªÉ c·∫£i thi·ªán:**

#### **1. T√≠nh `R2` v√† `DA`:**
- **R2:** ƒê√£ c√≥ trong th∆∞ vi·ªán `sklearn.metrics`.
- **DA:** T·ª± t√≠nh b·∫±ng c√°ch so s√°nh h∆∞·ªõng bi·∫øn ƒë·ªông gi·ªØa gi√° tr·ªã th·ª±c (`y_true`) v√† gi√° tr·ªã d·ª± ƒëo√°n (`y_pred`).

#### **2. H·ª£p nh·∫•t bi·ªÉu ƒë·ªì:**
- Thay v√¨ t√°ch ra hai bi·ªÉu ƒë·ªì (scatter v√† line), ta c√≥ th·ªÉ hi·ªÉn th·ªã ch√∫ng tr√™n m·ªôt c·ª≠a s·ªï v·ªõi hai subplot.

#### **3. C·∫≠p nh·∫≠t Code:**
```python
class TimeSeriesLSTMModel:
    def __init__(self, file_path, look_back=10):
        """
        Initialize the class with the file path of the dataset and look-back window.
        """
        self.file_path = file_path
        self.look_back = look_back
        self.data = None
        self.model = None
        self.scaler = MinMaxScaler(feature_range=(0, 1))

    def load_and_preprocess_data(self):
        """
        Load the dataset and preprocess the data for LSTM.
        """
        # Load data
        self.data = pd.read_csv(self.file_path)
        self.data["close_tomor"] = self.data["close"].shift(-1)
        self.data = self.data.iloc[:-1]
        
        # Scaling data
        self.data_scaled = self.scaler.fit_transform(self.data[['close_tomor']])
        
        # Create sequences
        X, y = [], []
        for i in range(self.look_back, len(self.data_scaled)):
            X.append(self.data_scaled[i - self.look_back:i, 0])  # Sequence of look_back days
            y.append(self.data_scaled[i, 0])  # Target value

        X, y = np.array(X), np.array(y)
        X = X.reshape((X.shape[0], X.shape[1], 1))  # Reshape to (samples, timesteps, features)
        
        # Split into training and testing sets
        train_size = int(len(X) * 0.75)
        X_train, X_test = X[:train_size], X[train_size:]
        y_train, y_test = y[:train_size], y[train_size:]

        return X_train, X_test, y_train, y_test

    def build_model(self):
        """
        Build the LSTM model.
        """
        self.model = Sequential([
            LSTM(50, activation='relu', return_sequences=True, input_shape=(self.look_back, 1)),
            LSTM(50, activation='relu'),
            Dense(1)
        ])
        self.model.compile(optimizer='adam', loss='mse')

    def train_model(self, X_train, y_train, epochs=50, batch_size=32):
        """
        Train the LSTM model.
        """
        early_stop = EarlyStopping(monitor='loss', patience=5, restore_best_weights=True)
        self.model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, callbacks=[early_stop])

    def evaluate_model(self, y_true, y_pred):
        """
        Evaluate the model performance.
        """
        y_true_rescaled = self.scaler.inverse_transform(y_true.reshape(-1, 1))
        y_pred_rescaled = self.scaler.inverse_transform(y_pred.reshape(-1, 1))

        # Calculate R2
        r2 = r2_score(y_true_rescaled, y_pred_rescaled)

        # Calculate Directional Accuracy
        da = np.mean(
            np.sign(y_true_rescaled[1:] - y_true_rescaled[:-1]) ==
            np.sign(y_pred_rescaled[1:] - y_pred_rescaled[:-1])
        )

        print("R2:", r2)
        print("Directional Accuracy (DA):", f"{da * 100:.2f}%")
        print("MSE:", mean_squared_error(y_true_rescaled, y_pred_rescaled))
        print("RMSE:", np.sqrt(mean_squared_error(y_true_rescaled, y_pred_rescaled)))
        print("MAE:", mean_absolute_error(y_true_rescaled, y_pred_rescaled))
        print("MAPE%:", f"{mean_absolute_percentage_error(y_true_rescaled, y_pred_rescaled) * 100:.2f}%")

        return r2, da

    def plot_predictions(self, y_test, y_pred, num_samples=50):
        """
        Plot predictions vs actual values with both scatter and line plots.
        """
        y_test_rescaled = self.scaler.inverse_transform(y_test.reshape(-1, 1))
        y_pred_rescaled = self.scaler.inverse_transform(y_pred.reshape(-1, 1))

        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 7))

        # Scatter plot
        ax1.scatter(y_test_rescaled, y_pred_rescaled, alpha=0.5)
        ax1.set_title("Predictions vs Actual")
        ax1.set_xlabel("Actual Values")
        ax1.set_ylabel("Predicted Values")

        # Add y=x line
        min_val = min(y_test_rescaled.min(), y_pred_rescaled.min())
        max_val = max(y_test_rescaled.max(), y_pred_rescaled.max())
        ax1.plot([min_val, max_val], [min_val, max_val], 'r--')

        # Line plot
        x_points = range(num_samples)
        ax2.plot(x_points, y_test_rescaled[:num_samples], 'b-', label='Actual')
        ax2.plot(x_points, y_pred_rescaled[:num_samples], 'orange', linestyle='--', label='Predicted')
        
        ax2.set_title("Test Predictions")
        ax2.set_xlabel("Point")
        ax2.set_ylabel("Value")
        ax2.legend()

        plt.tight_layout()
        plt.show()

    def process(self):
        """
        Execute the full pipeline: load, preprocess, train, evaluate, and plot.
        """
        X_train, X_test, y_train, y_test = self.load_and_preprocess_data()
        self.build_model()
        self.train_model(X_train, y_train)
        y_pred_train = self.model.predict(X_train)
        y_pred_test = self.model.predict(X_test)
        self.evaluate_model(y_test, y_pred_test)
        self.plot_predictions(y_test, y_pred_test)

# Example usage for one file
file_path = "D:/OneDrive - Hanoi University of Science and Technology/GIT/MiniProj_StockPrediction_ML_SpManhGraduationProj_2024/data/raw20192024/FPT_stock_data.csv"
model = TimeSeriesLSTMModel(file_path)
model.process()
```

---

### **C√°c thay ƒë·ªïi ch√≠nh:**
1. **Th√™m `R2` v√† `Directional Accuracy`:**
   - **R2**: ƒê√°nh gi√° ƒë·ªô ph√π h·ª£p c·ªßa m√¥ h√¨nh.
   - **DA**: ƒêo l∆∞·ªùng kh·∫£ nƒÉng d·ª± ƒëo√°n ƒë√∫ng h∆∞·ªõng bi·∫øn ƒë·ªông gi√° tr·ªã.

2. **Bi·ªÉu ƒë·ªì h·ª£p nh·∫•t:**
   - Scatter plot (tr·ª•c `y_true` v√† `y_pred`) v√† line plot (bi·ªÉu di·ªÖn m·∫´u) trong c√πng m·ªôt c·ª≠a s·ªï.

3. **G·ªçn g√†ng h∆°n:**
   - D√πng `min()` v√† `max()` cho ƒë∆∞·ªùng y=x trong scatter plot.
   - Hi·ªÉn th·ªã s·ªë l∆∞·ª£ng m·∫´u t√πy ch·ªçn tr√™n line plot (`num_samples`).

---

### **K·∫øt qu·∫£ mong ƒë·ª£i:**
- **Bi·ªÉu ƒë·ªì scatter:** So s√°nh gi√° tr·ªã d·ª± ƒëo√°n v·ªõi th·ª±c t·∫ø, k√®m ƒë∆∞·ªùng y=x ƒë·ªÉ ki·ªÉm tra t√≠nh ch√≠nh x√°c.
- **Bi·ªÉu ƒë·ªì line plot:** Hi·ªÉn th·ªã xu h∆∞·ªõng c·ªßa m·∫´u th·ª±c v√† d·ª± ƒëo√°n, d·ªÖ ki·ªÉm tra s·ª± sai l·ªách.
- **Ch·ªâ s·ªë R2 v√† DA:** ƒê∆∞·ª£c in r√µ r√†ng trong qu√° tr√¨nh ƒë√°nh gi√°.

H√£y th·ª≠ √°p d·ª•ng v√† ki·ªÉm tra k·∫øt qu·∫£ nh√©! üòä