D∆∞·ªõi ƒë√¢y l√† phi√™n b·∫£n **class** t√≠ch h·ª£p c·∫£ 3 h√†m m√† b·∫°n y√™u c·∫ßu:

---

### **Class T·ªïng H·ª£p**
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import (
    r2_score,
    mean_squared_error,
    mean_absolute_error,
    mean_absolute_percentage_error,
)

class ModelAnalysis:
    def __init__(self, model, X_test, y_test):
        """
        Initialize the class with the trained model, test features, and target values.
        
        Parameters:
        -----------
        model : trained model
            A trained model with methods like `predict` and optional `coef_` attribute.
        X_test : array-like
            Test feature data.
        y_test : array-like
            Test target values.
        """
        self.model = model
        self.X_test = X_test
        self.y_test = y_test
        self.y_pred_test = self.model.predict(self.X_test)

    def evaluate_metrics(self, y_train, y_pred_train):
        """
        Evaluate the model's performance using various metrics for both training and test sets.
        
        Parameters:
        -----------
        y_train : array-like
            True values for training data.
        y_pred_train : array-like
            Predicted values for training data.
        """
        # Train metrics
        r2_train = r2_score(y_train, y_pred_train)
        mse_train = mean_squared_error(y_train, y_pred_train)
        rmse_train = np.sqrt(mse_train)
        mae_train = mean_absolute_error(y_train, y_pred_train)
        mape_train = mean_absolute_percentage_error(y_train, y_pred_train) * 100

        # Test metrics
        r2_test = r2_score(self.y_test, self.y_pred_test)
        mse_test = mean_squared_error(self.y_test, self.y_pred_test)
        rmse_test = np.sqrt(mse_test)
        mae_test = mean_absolute_error(self.y_test, self.y_pred_test)
        mape_test = mean_absolute_percentage_error(self.y_test, self.y_pred_test) * 100

        print("EVALUATE METRICS ON THE TRAIN SET")
        print(f"R2: {r2_train:.4f}")
        print(f"MSE: {mse_train:.4f}")
        print(f"RMSE: {rmse_train:.4f}")
        print(f"MAE: {mae_train:.4f}")
        print(f"MAPE: {mape_train:.2f}%\n")
        
        print("EVALUATE METRICS ON THE TEST SET")
        print(f"R2: {r2_test:.4f}")
        print(f"MSE: {mse_test:.4f}")
        print(f"RMSE: {rmse_test:.4f}")
        print(f"MAE: {mae_test:.4f}")
        print(f"MAPE: {mape_test:.2f}%\n")

    def visualize_predictions(self, num_samples=50):
        """
        Visualize predictions vs actual values using scatter and line plots.
        
        Parameters:
        -----------
        num_samples : int
            Number of samples to display (default is 50).
        """
        num_samples = min(num_samples, len(self.y_test))

        # Scatter plot
        plt.figure(figsize=(10, 5))
        plt.scatter(self.y_test, self.y_pred_test, alpha=0.5, label="Predictions")
        plt.plot(
            [min(self.y_test), max(self.y_test)],
            [min(self.y_test), max(self.y_test)],
            'r--', label="y = x"
        )
        plt.title("Predictions vs Actual Values")
        plt.xlabel("Actual Values")
        plt.ylabel("Predicted Values")
        plt.legend()
        plt.show()

        # Line plot
        plt.figure(figsize=(10, 5))
        plt.plot(range(num_samples), self.y_test[:num_samples], 'b-', label="Actual")
        plt.plot(range(num_samples), self.y_pred_test[:num_samples], 'orange', linestyle='--', label="Predicted")
        plt.title(f"Predictions for First {num_samples} Samples")
        plt.xlabel("Samples")
        plt.ylabel("Values")
        plt.legend()
        plt.show()

    def visualize_coefficients(self, feature_names, top_n=10):
        """
        Visualize the coefficients of a regression model.
        
        Parameters:
        -----------
        feature_names : list
            List of feature names.
        top_n : int
            Number of top features to display (default is 10).
        """
        if not hasattr(self.model, "coef_"):
            print("The model does not have coefficients to visualize.")
            return
        
        # Create a DataFrame of coefficients
        coefficients = pd.DataFrame({
            'Feature': feature_names,
            'Coefficient': self.model.coef_
        })

        # Sort by absolute value of coefficients
        coefficients = coefficients.sort_values(by='Coefficient', key=abs, ascending=False)

        # Plot top_n coefficients
        plt.figure(figsize=(10, 5))
        plt.bar(coefficients['Feature'][:top_n], coefficients['Coefficient'][:top_n])
        plt.xticks(rotation=45, ha='right')
        plt.title(f"Top {top_n} Features by Coefficients")
        plt.xlabel("Features")
        plt.ylabel("Coefficient")
        plt.tight_layout()
        plt.show()
```

---

### **C√°ch S·ª≠ D·ª•ng Class**

1. **Kh·ªüi t·∫°o ƒë·ªëi t∆∞·ª£ng:**
   - Sau khi ƒë√£ hu·∫•n luy·ªán m√¥ h√¨nh (vd: Lasso, Ridge), kh·ªüi t·∫°o class v·ªõi m√¥ h√¨nh, `X_test`, v√† `y_test`.
   ```python
   analysis = ModelAnalysis(model=lasso_model, X_test=X_test, y_test=y_test)
   ```

2. **ƒê√°nh gi√° m√¥ h√¨nh:**
   - Truy·ªÅn th√™m d·ªØ li·ªáu `y_train` v√† `y_pred_train` ƒë·ªÉ ƒë√°nh gi√° tr√™n c·∫£ t·∫≠p train v√† test.
   ```python
   analysis.evaluate_metrics(y_train, lasso_train_pred)
   ```

3. **Tr·ª±c quan h√≥a d·ª± ƒëo√°n:**
   - Hi·ªÉn th·ªã d·ª± ƒëo√°n so v·ªõi gi√° tr·ªã th·ª±c t·∫ø.
   ```python
   analysis.visualize_predictions(num_samples=50)
   ```

4. **Tr·ª±c quan h√≥a h·ªá s·ªë:**
   - Hi·ªÉn th·ªã c√°c h·ªá s·ªë quan tr·ªçng nh·∫•t c·ªßa m√¥ h√¨nh.
   ```python
   analysis.visualize_coefficients(feature_names=X_train.columns, top_n=10)
   ```

---

### **∆Øu ƒëi·ªÉm**
- **D·ªÖ s·ª≠ d·ª•ng**: T√≠ch h·ª£p 3 ch·ª©c nƒÉng ch√≠nh trong m·ªôt class.
- **T∆∞∆°ng th√≠ch cao**: C√≥ th·ªÉ s·ª≠ d·ª•ng v·ªõi c√°c m√¥ h√¨nh nh∆∞ Lasso, Ridge, Linear Regression ho·∫∑c b·∫•t k·ª≥ m√¥ h√¨nh n√†o c√≥ `predict` v√† `coef_`.
- **T√πy ch·ªânh linh ho·∫°t**: B·∫°n c√≥ th·ªÉ thay ƒë·ªïi s·ªë l∆∞·ª£ng ƒëi·ªÉm hi·ªÉn th·ªã (`num_samples`) ho·∫∑c s·ªë l∆∞·ª£ng h·ªá s·ªë h√†ng ƒë·∫ßu (`top_n`).

H√£y th·ª≠ v√† cho m√¨nh bi·∫øt n·∫øu c·∫ßn h·ªó tr·ª£ th√™m! üòä

D∆∞·ªõi ƒë√¢y l√† m·ªôt class t·ªïng h·ª£p t·∫•t c·∫£ c√°c h√†m c·ªßa b·∫°n v√† c√≥ th·ªÉ ho·∫°t ƒë·ªông v·ªõi file d·ªØ li·ªáu, m√¥ h√¨nh v√† c√°c ph∆∞∆°ng ph√°p tr·ª±c quan h√≥a ho·∫∑c ƒë√°nh gi√°:

### **Class T·ªïng H·ª£p**
```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import (
    r2_score,
    mean_squared_error,
    mean_absolute_error,
    mean_absolute_percentage_error,
)

class ModelAnalysis:
    def __init__(self, filepath_dataset, model):
        """
        Initialize the class with the dataset filepath and the trained model.
        
        Parameters:
        -----------
        filepath_dataset : str
            Path to the dataset file (CSV format).
        model : trained model
            A trained model with methods like `predict` and attributes like `coef_` (optional).
        """
        self.filepath = filepath_dataset
        self.model = model
        self.data = None
        self.X = None
        self.y = None
        self._load_data()
    
    def _load_data(self):
        """
        Internal method to load the dataset.
        """
        self.data = pd.read_csv(self.filepath)
        print("Dataset loaded successfully.")
    
    def preprocess_data(self, feature_columns, target_column):
        """
        Preprocess the data to separate features and target.
        
        Parameters:
        -----------
        feature_columns : list
            List of feature column names.
        target_column : str
            Target column name.
        """
        self.X = self.data[feature_columns]
        self.y = self.data[target_column]
        print("Data preprocessed successfully.")
    
    def evaluate_model(self, y_train, y_pred_train, y_test, y_pred_test):
        """
        Evaluate the model performance using various metrics.
        """
        # Metrics for train set
        r2_train = r2_score(y_train, y_pred_train)
        mse_train = mean_squared_error(y_train, y_pred_train)
        rmse_train = np.sqrt(mse_train)
        mae_train = mean_absolute_error(y_train, y_pred_train)
        mape_train = mean_absolute_percentage_error(y_train, y_pred_train) * 100

        # Metrics for test set
        r2_test = r2_score(y_test, y_pred_test)
        mse_test = mean_squared_error(y_test, y_pred_test)
        rmse_test = np.sqrt(mse_test)
        mae_test = mean_absolute_error(y_test, y_pred_test)
        mape_test = mean_absolute_percentage_error(y_test, y_pred_test) * 100

        # Print metrics
        print("EVALUATE METRICS ON THE TRAIN SET")
        print("R2:", r2_train)
        print("MSE:", mse_train)
        print("RMSE:", rmse_train)
        print("MAE:", mae_train)
        print("MAPE%:", f"{mape_train:.2f}%\n")
        
        print("EVALUATE METRICS ON THE TEST SET")
        print("R2:", r2_test)
        print("MSE:", mse_test)
        print("RMSE:", rmse_test)
        print("MAE:", mae_test)
        print("MAPE%:", f"{mape_test:.2f}%\n")

    def plot_predictions(self, y_test, y_pred_test, num_samples=50):
        """
        Plot predictions vs actual values and a line plot for the first few samples.
        """
        # Gi·ªõi h·∫°n s·ªë l∆∞·ª£ng m·∫´u
        num_samples = min(num_samples, len(y_test), len(y_pred_test))

        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 7))

        # Scatter plot
        ax1.scatter(y_test, y_pred_test, alpha=0.5)
        ax1.set_title("Predictions vs Actual")
        ax1.set_xlabel("Actual Values")
        ax1.set_ylabel("Predicted Values")
        
        # Th√™m ƒë∆∞·ªùng y=x
        min_val = min(min(y_test), min(y_pred_test))
        max_val = max(max(y_test), max(y_pred_test))
        ax1.plot([min_val, max_val], [min_val, max_val], 'r--')
        
        # Line plot
        x_points = range(num_samples)
        ax2.plot(x_points, y_test[:num_samples], 'b-', label='Actual')
        ax2.plot(x_points, y_pred_test[:num_samples], 'orange', linestyle='--', label='Predicted')
        
        ax2.set_title("Test Predictions")
        ax2.set_xlabel("Point")
        ax2.set_ylabel("Value")
        ax2.legend()
        
        plt.tight_layout()
        plt.show()
    
    def visualize_coefficients(self, top_n=10):
        """
        Visualize the model coefficients.
        
        Parameters:
        -----------
        top_n : int
            Number of top features to display (default is 10).
        """
        if not hasattr(self.model, "coef_"):
            print("The model does not have coefficients to visualize.")
            return
        
        # T·∫°o m·ªôt DataFrame ƒë·ªÉ l∆∞u t√™n c√°c bi·∫øn v√† h·ªá s·ªë t∆∞∆°ng ·ª©ng c·ªßa ch√∫ng
        coefficients = pd.DataFrame({
            'Feature': self.X.columns,  # T√™n c√°c bi·∫øn (features)
            'Coefficient': self.model.coef_  # L·∫•y h·ªá s·ªë t·ª´ m√¥ h√¨nh ƒë√£ ƒë∆∞·ª£c hu·∫•n luy·ªán
        })

        # S·∫Øp x·∫øp c√°c h·ªá s·ªë theo gi√° tr·ªã tuy·ªát ƒë·ªëi ƒë·ªÉ t√¨m c√°c bi·∫øn c√≥ ·∫£nh h∆∞·ªüng l·ªõn nh·∫•t
        coefficients = coefficients.sort_values(by='Coefficient', key=abs, ascending=False)

        # Hi·ªÉn th·ªã c√°c bi·∫øn quan tr·ªçng nh·∫•t v√† h·ªá s·ªë c·ªßa ch√∫ng
        print(f'\n{top_n} bi·∫øn c√≥ ·∫£nh h∆∞·ªüng l·ªõn nh·∫•t:')
        print(coefficients.head(top_n))

        # 8. T·∫°o bi·ªÉu ƒë·ªì c·ªôt cho c√°c h·ªá s·ªë c·ªßa c√°c bi·∫øn
        plt.figure(figsize=(12, 6))
        plt.bar(
            coefficients['Feature'][:top_n],  # Tr·ª•c X: t√™n c√°c bi·∫øn
            coefficients['Coefficient'][:top_n]  # Tr·ª•c Y: gi√° tr·ªã h·ªá s·ªë
        )
        plt.xticks(rotation=45, ha='right')  # Xoay nh√£n bi·∫øn ƒë·ªÉ d·ªÖ ƒë·ªçc h∆°n
        plt.xlabel('Feature')  # Nh√£n tr·ª•c X
        plt.ylabel('Coefficient')  # Nh√£n tr·ª•c Y
        plt.title(f'Top {top_n} Features by Coefficients')  # Ti√™u ƒë·ªÅ bi·ªÉu ƒë·ªì
        plt.tight_layout()
        plt.show()
```

---

### **C√°ch S·ª≠ D·ª•ng Class**

1. **Kh·ªüi t·∫°o ƒë·ªëi t∆∞·ª£ng v·ªõi dataset v√† m√¥ h√¨nh**:
   ```python
   analysis = ModelAnalysis(filepath_dataset="data.csv", model=trained_model)
   ```

2. **Ti·ªÅn x·ª≠ l√Ω d·ªØ li·ªáu**:
   ```python
   analysis.preprocess_data(feature_columns=["col1", "col2", "col3"], target_column="target")
   ```

3. **ƒê√°nh gi√° m√¥ h√¨nh**:
   ```python
   analysis.evaluate_model(y_train, y_train_pred, y_test, y_test_pred)
   ```

4. **Tr·ª±c quan h√≥a d·ª± ƒëo√°n**:
   ```python
   analysis.plot_predictions(y_test, y_test_pred, num_samples=50)
   ```

5. **Tr·ª±c quan h√≥a h·ªá s·ªë c·ªßa m√¥ h√¨nh**:
   ```python
   analysis.visualize_coefficients(top_n=10)
   ```

H√£y th·ª≠ class n√†y v√† cho m√¨nh bi·∫øt n·∫øu b·∫°n c·∫ßn h·ªó tr·ª£ th√™m nh√©! üòä

### **Class ModelAnalysis - Summary**

**M·ª•c ƒë√≠ch**: H·ªó tr·ª£ ph√¢n t√≠ch v√† ƒë√°nh gi√° c√°c m√¥ h√¨nh h·ªìi quy (regression) b·∫±ng c√°ch:
1. **ƒê√°nh gi√° hi·ªáu su·∫•t** c·ªßa m√¥ h√¨nh tr√™n t·∫≠p train v√† test.
2. **Tr·ª±c quan h√≥a d·ª± ƒëo√°n**: So s√°nh gi√° tr·ªã th·ª±c t·∫ø v√† gi√° tr·ªã d·ª± ƒëo√°n.
3. **Tr·ª±c quan h√≥a h·ªá s·ªë (coefficients)**: Hi·ªÉn th·ªã m·ª©c ƒë·ªô ·∫£nh h∆∞·ªüng c·ªßa c√°c bi·∫øn ƒë·∫ßu v√†o.

---

### **C√°c Ch·ª©c NƒÉng Ch√≠nh**

1. **`evaluate_metrics(y_train, y_pred_train)`**:
   - ƒê√°nh gi√° hi·ªáu su·∫•t m√¥ h√¨nh tr√™n t·∫≠p train v√† test.
   - S·ª≠ d·ª•ng c√°c ch·ªâ s·ªë ph·ªï bi·∫øn:
     - **R¬≤**: M·ª©c ƒë·ªô gi·∫£i th√≠ch c·ªßa m√¥ h√¨nh v·ªõi d·ªØ li·ªáu.
     - **MSE, RMSE**: ƒêo ƒë·ªô ch√™nh l·ªách b√¨nh ph∆∞∆°ng (v√† cƒÉn b·∫≠c 2).
     - **MAE**: Sai s·ªë trung b√¨nh tuy·ªát ƒë·ªëi.
     - **MAPE**: Sai s·ªë ph·∫ßn trƒÉm trung b√¨nh.
   - **ƒê·∫ßu ra**: B·∫£ng ch·ªâ s·ªë hi·ªáu su·∫•t cho c·∫£ t·∫≠p train v√† test.

2. **`visualize_predictions(num_samples=50)`**:
   - So s√°nh gi√° tr·ªã d·ª± ƒëo√°n v√† th·ª±c t·∫ø.
   - **Bi·ªÉu ƒë·ªì scatter**: Hi·ªÉn th·ªã m·ªëi quan h·ªá gi·ªØa gi√° tr·ªã th·ª±c v√† gi√° tr·ªã d·ª± ƒëo√°n.
   - **Bi·ªÉu ƒë·ªì line**: So s√°nh tr·ª±c ti·∫øp t·ª´ng m·∫´u (s·ªë m·∫´u ƒë∆∞·ª£c ch·ªçn qua `num_samples`).

3. **`visualize_coefficients(feature_names, top_n=10)`**:
   - Tr·ª±c quan h√≥a c√°c h·ªá s·ªë c·ªßa m√¥ h√¨nh (√°p d·ª•ng v·ªõi c√°c m√¥ h√¨nh c√≥ thu·ªôc t√≠nh `coef_` nh∆∞ Lasso, Ridge, Linear Regression).
   - **Bi·ªÉu ƒë·ªì c·ªôt**: Hi·ªÉn th·ªã m·ª©c ƒë·ªô ·∫£nh h∆∞·ªüng c·ªßa c√°c bi·∫øn ƒë·∫ßu v√†o (s·∫Øp x·∫øp theo gi√° tr·ªã tuy·ªát ƒë·ªëi c·ªßa h·ªá s·ªë).
   - **ƒê·∫ßu v√†o**: 
     - `feature_names`: T√™n c√°c bi·∫øn ƒë·∫ßu v√†o.
     - `top_n`: S·ªë l∆∞·ª£ng bi·∫øn ·∫£nh h∆∞·ªüng l·ªõn nh·∫•t.

---

### **·ª®ng D·ª•ng**
- **ƒê·ªëi t∆∞·ª£ng**: C√°c m√¥ h√¨nh h·ªìi quy tuy·∫øn t√≠nh c√≥ th·ªÉ d·ª± ƒëo√°n v√† c√≥ h·ªá s·ªë (`coef_`), v√≠ d·ª•:
  - Lasso Regression.
  - Ridge Regression.
  - Linear Regression.
- **L·ª£i √≠ch**:
  - ƒê∆°n gi·∫£n h√≥a vi·ªác ph√¢n t√≠ch m√¥ h√¨nh.
  - H·ªó tr·ª£ ph√°t hi·ªán c√°c y·∫øu t·ªë quan tr·ªçng v√† ƒë√°nh gi√° ƒë·ªô ch√≠nh x√°c d·ª± ƒëo√°n.
  - TƒÉng c∆∞·ªùng hi·ªÉu bi·∫øt v·ªÅ c√°ch m√¥ h√¨nh ho·∫°t ƒë·ªông qua tr·ª±c quan h√≥a.

---

### **V√≠ D·ª• K·∫øt Qu·∫£**
- **ƒê√°nh gi√° hi·ªáu su·∫•t**:
  ```
  EVALUATE METRICS ON THE TRAIN SET
  R2: 0.9501
  MSE: 12.3456
  RMSE: 3.5124
  MAE: 2.7894
  MAPE: 5.67%

  EVALUATE METRICS ON THE TEST SET
  R2: 0.9234
  MSE: 14.6789
  RMSE: 3.8322
  MAE: 2.9871
  MAPE: 6.12%
  ```

- **Tr·ª±c quan h√≥a d·ª± ƒëo√°n**:
  - Scatter plot: ƒêi·ªÉm g·∫ßn ƒë∆∞·ªùng y = x ch·ª©ng t·ªè m√¥ h√¨nh d·ª± ƒëo√°n ch√≠nh x√°c.
  - Line plot: D·ª± ƒëo√°n s√°t v·ªõi gi√° tr·ªã th·ª±c qua t·ª´ng m·∫´u.

- **Tr·ª±c quan h√≥a h·ªá s·ªë**:
  - Hi·ªÉn th·ªã top 10 bi·∫øn c√≥ ·∫£nh h∆∞·ªüng l·ªõn nh·∫•t, gi√∫p x√°c ƒë·ªãnh bi·∫øn quan tr·ªçng trong d·ª± ƒëo√°n.

---

**T√≥m l·∫°i**: Class `ModelAnalysis` l√† m·ªôt c√¥ng c·ª• ti·ªán l·ª£i v√† d·ªÖ hi·ªÉu ƒë·ªÉ ƒë√°nh gi√° hi·ªáu su·∫•t, tr·ª±c quan h√≥a k·∫øt qu·∫£, v√† ph√¢n t√≠ch m·ª©c ƒë·ªô ·∫£nh h∆∞·ªüng c·ªßa c√°c bi·∫øn trong m√¥ h√¨nh h·ªìi quy.