# Credit Risk Modeling Pipeline Overview

This document provides a high-level overview of an automated credit risk modeling pipeline. The pipeline is   
designed to be intelligent and adaptable, using a GPT-powered agent to orchestrate each step. The core idea   
is to build a model for predicting credit risk, whether that is a credit risk scoring model or a market risk model.   
This involves data quality evaluation, univariate analysis, multivariate analysis including model selection,   
model building, and calibration. The key components and flow are described below.

## Key Components:

*   **GPTAgent:**
    *   This is the brain of the pipeline. It's an AI agent that uses a large language model (LLM) to make decisions at each stage.
    *   It generates analysis results, assesses their quality, creates dynamic action plans to guide the pipeline and selects the most suitable options.
    *   It maintains a context of the pipeline's progress, which it uses to make informed decisions.
*   **DataQualityChecker:**
    *   This component checks and cleans raw data. It identifies missing values, outliers, and inconsistencies.
    *   It stores the results of these checks, and it cleans the data.
*   **UnivariateAnalyzer:**
    *   Performs individual (univariate) analysis on each feature in the dataset.
    *   It calculates the distribution statistics of features and presents multiple Weight of Evidence (WOE) binning options.
*   **MultivariateAnalyzer:**
    *   Analyzes relationships between different features in the dataset (multivariate analysis).
    *   This step performs feature selection and, most importantly, fitting models such as logistic regression, OLS and XGBoost.
    *   It returns multiple modeling options allowing the GPT agent to select the best.
*   **Model:**
    *   An abstract class for representing a model object.
    *   Concrete classes are created to handle different types of models.
*   **RiskModelBuilder:**
    *   Builds and fits the chosen model and stores performance metrics.
    *   It also generates plots of the ROC curve to analyze performance.
*   **ModelCalibrator:**
    *   Calibrates the predicted probabilities to ensure the model produces well-calibrated risk scores.
*   **Documentor:**
    *   Stores all the results and assessment from each step, and is responsible for generating a final report using LaTeX.
*   **Assessor:**
    *   Evaluates the results of each step, and flags unfavorable outcomes.
    *   Works hand-in-hand with the `GPTAgent` to decide what the next step should be.

## Pipeline Flow:

The pipeline is designed to be non-linear. Here's a typical flow:

1.  **Data Quality Check:** The `DataQualityChecker` cleans the raw data, storing all results and metrics.
2.  **Univariate Analysis:** The `UnivariateAnalyzer` performs the univariate analysis of the features, returning multiple options to the GPT agent.
3.  **Feature Selection and Multivariate Modeling:** The `MultivariateAnalyzer` selects features, and generates multiple model options. The GPT agent chooses the best option.
4.  **Model Building and Evaluation:** The `RiskModelBuilder` fits the selected model, and generates evaluation metrics, including a ROC curve.
5.  **Calibration:** The `ModelCalibrator` calibrates the model's predicted probabilities.

Each step starts with the GPT agent creating a dynamic "action plan" on how the step should be conducted, and which functions   
should be called, based on the current status and historical results. The agent assess the quality of results and, if needed,   
triggers backtracking, specifying the next step to take.

The process is iterative, and if the results of any step are not up to standard (according to the `Assessor`), the GPT agent   
can decide to go back and re-run any of the steps. The feedback mechanisms and decision-making logic are mostly contained within   
the `GPTAgent`, making the process intelligent and adaptable.

## Key Features

*   **GPT-Driven Decision-Making:** The core logic and control is passed to the GPT agent, rather than pre-defined logic. This makes the process adaptable.
*   **Non-Linear Flow:** The pipeline is non-linear and iterative, allowing the GPT agent to adapt to different challenges in the dataset.
*   **Detailed Documentation:** Results and GPT comments are stored, and a report is generated with LaTeX.
*   **Modular Design:** The pipeline is created using separate classes, making it easy to extend.
*   **Context Management:** The GPT agent has access to the history of the pipeline and can make informed decisions on that basis.

This framework provides a robust base for building and evaluating credit risk models, using AI to enhance both the intelligence and adaptability of the process.

In [None]:
import pandas as pd
from sklearn.metrics import roc_auc_score, roc_curve, accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import train_test_split
from scipy.stats import zscore, ks_2samp
from sklearn.isotonic import IsotonicRegression
import numpy as np
from abc import ABC, abstractmethod
from typing import List, Dict, Any, Tuple, Callable
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import xgboost as xgb
import json


class GPTAgent:
    """
    A GPT-powered agent responsible for orchestrating the credit risk modeling pipeline.

    This agent acts as the decision-making core of the pipeline, leveraging
    a large language model (LLM) to:
        - Generate human-readable commentary on analysis results.
        - Assess the quality and suitability of results against predefined metrics.
        - Create dynamic action plans to guide the pipeline execution.
        - Select the most appropriate options from multiple choices.
        - Provide reasoning for backtracking and course correction.
        - Maintain a context of the pipeline's progress.

    The agent is designed to be adaptable, configurable and maintain its context,
    making it suitable for integration with various model training and data analysis pipelines.
    """
    def __init__(self, model="gpt-4"):
        self.model = model
        self.prompt_templates = self._load_prompt_templates()
        self.knowledge_base = self._load_knowledge_base()  # Placeholder for the knowledge base.

    def _load_prompt_templates(self) -> Dict:
        """Loads prompt templates from a JSON file"""
        try:
            with open('prompt_templates.json', 'r') as f:
                return json.load(f)
        except (FileNotFoundError, json.JSONDecodeError) as e:
            print(f"Error loading prompt templates: {e}")
            return {}

    def _load_knowledge_base(self) -> Dict:
        """
        Loads the knowledge base with rules for analysis and model building.
        """
        # Mock knowledge base
        return {
            "feature_selection_rules": {
                "high_cardinality": "Consider feature selection with VIF to avoid multicollinearity.",
                "high_correlation": "Perform feature selection with correlation to reduce redundancy."
            },
            "model_choice_rules": {
                "binary_classification": ["logistic_regression", "xgboost"],
                "regression": ["ols"],
                "small_dataset": "Consider simpler models or regularization.",
                "large_dataset": "Consider more complex models."
            }
        }

    def _call_llm(self, prompt: str) -> str:
        """
        Placeholder for the actual LLM API call
        """
        # Mock LLM response
        print("LLM called with prompt:", prompt)
        return f"LLM response for prompt: {prompt}"

    def generate_commentary(self, step_name: str, results: Dict, metrics: Dict, template: str) -> str:
        """
        Generates commentary based on the results of a given step.
        """
        prompt = self.prompt_templates.get("commentary", "").format(
            step_name=step_name, results=results, metrics=metrics, template=template
        )
        response = self._call_llm(prompt)
        # Mock response parsing
        return f"GPT Commentary: {response}"

    def assess_results(self, step_name: str, results: Dict, metrics: Dict, template: str) -> Tuple[bool, str]:
        """
        Assesses results of a given step and provides a recommendation.
        """
        prompt = self.prompt_templates.get("assessment", "").format(
            step_name=step_name, results=results, metrics=metrics, template=template
        )
        response = self._call_llm(prompt)
        # Mock assessment logic
        if 'auc' in results and results['auc'] < 0.6:
            return False, "AUC is below 0.6, consider re-running feature selection or data preparation."
        if 'ks' in results and results['ks'] < 0.15:
            return False, "KS is below 0.15, consider re-running feature selection or data preparation."
        return True, "Results are acceptable."

    def create_action_plan(self, step_name: str, context: Dict) -> Dict:
        """
         Generates an action plan for a specific step. The plan includes function names and parameters.
        """
        if step_name == "data_quality":
           return {"action": "run_checks", "params": {}}
        elif step_name == "univariate_analysis":
            features = context.get("features", [])
            return {"action": "calculate_distribution", "params": {"features": features}}
        elif step_name == "feature_selection":
          features = context.get("features", [])
          
          # Mock Feature Selection Plan Logic
          vif_required = False
          if len(features) > 5:
            vif_required = True
          
          rule_based_options = []
          if vif_required:
             rule_based_options.append("vif")
          
          if context.get('previous_step') == 'model_evaluation' or context.get('previous_step') == 'calibration':
            rule_based_options.append("correlation")
          
          if not rule_based_options:
            rule_based_options = ["correlation", "vif"] # Adding default

          return {"action": "feature_selection_options", "params": {"features": features, "options": rule_based_options,"threshold": 0.7}}

        elif step_name == "model_evaluation":
          X = context.get("X", [])
          y = context.get("y", [])
          features = context.get("features", [])

          # Model Choice Logic
          model_options = self.knowledge_base["model_choice_rules"].get("binary_classification", [])

          if len(X) > 100:
              if "ols" in model_options:
                model_options.remove("ols")

          return {"action": "model_options", "params": {"X": X, "y": y, "features": features, "options": model_options}}

        elif step_name == "calibration":
             preds = context.get("preds", [])
             y_true = context.get("y_true", [])
             return {"action": "fit_calibrator", "params": {"preds": preds, "y_true": y_true}}
        return {}

    def choose_option(self, options: Dict, context: str) -> str:
        """
        Chooses one option from multiple ones based on context.
        """
        prompt = self.prompt_templates.get("choice", "").format(options=options, context=context)
        response = self._call_llm(prompt)
        # Mock choice parsing
        print("LLM choise:", response)
        return list(options.keys())[0]

    def generate_backtracking_comment(self, step_name: str, reason: str) -> str:
         """
         Generates a comment for backtracking.
        """
         prompt = self.prompt_templates.get("backtracking", "").format(step_name=step_name, reason=reason)
         response = self._call_llm(prompt)
         return f"GPT Backtracking: {response}"


class DataQualityChecker:
    """
    Checks and cleans raw data, stores metrics.
    """
    def __init__(self, df: pd.DataFrame, documentor: 'Documentor', gpt_agent: GPTAgent = None):
        self.df = df
        self.documentor = documentor
        self.gpt_agent = gpt_agent
        self.results = {}
        self.assessment = {}

    def run_checks(self) -> Dict:
        """
         Performs a variety of data quality checks and returns the results as a dictionary.
         """
        self.results = {}
        self.results['missing_values'] = self.df.isnull().sum().to_dict()
        self.results['dtypes'] = self.df.dtypes.apply(lambda x: x.name).to_dict()
        outlier_counts = (zscore(self.df.select_dtypes(include='number')).abs() > 3).sum()
        self.results['outliers'] = outlier_counts.to_dict()
        
        if not self.df.empty:
          for col in self.df.columns:
              if self.df[col].dtype in ['int64', 'float64']:
                 self.results[f'{col}_stats'] = self.df[col].describe().to_dict()
              elif self.df[col].dtype == 'object':
                 self.results[f'{col}_unique'] = len(self.df[col].unique())
        return self.results

    def clean_data(self) -> pd.DataFrame:
        """
        Performs cleaning of the dataframe: dropping rows with many NaNs and imputing the rest
        """
        self.df.dropna(thresh=len(self.df.columns) - 2, inplace=True)
        for col in self.df.columns:
          if self.df[col].dtype in ['int64', 'float64']:
             self.df[col].fillna(self.df[col].median(), inplace=True)
          elif self.df[col].dtype == 'object':
             self.df[col].fillna(self.df[col].mode()[0], inplace=True)
        return self.df


class UnivariateAnalyzer:
    """
    Performs univariate analysis, returns multiple options and lets GPT comment.
    """
    def __init__(self, df: pd.DataFrame, target_col: str, documentor: 'Documentor', gpt_agent: GPTAgent = None):
      self.df = df
      self.target_col = target_col
      self.documentor = documentor
      self.gpt_agent = gpt_agent
      self.results = {}
      self.assessment = {}

    def calculate_distribution(self, features: List) -> Dict:
        """
        Calculates the distribution statistics of the input features.
        """
        distribution_results = {}
        for feature in features:
            if self.df[feature].dtype in ['int64', 'float64']:
                distribution_results[feature] = self.df[feature].describe().to_dict()
            elif self.df[feature].dtype == 'object':
                value_counts = self.df[feature].value_counts()
                distribution_results[feature] = {
                    'counts': value_counts.to_dict(),
                    'probabilities': (value_counts / len(self.df)).to_dict()
                }
        self.results['distribution'] = distribution_results
        return distribution_results

    def woe_options(self, feature: str, bins: int = 5) -> Tuple[Dict, str]:
        """
         Calculates multiple Weight of Evidence (WOE) binning options and selects one.
         """
        options = {
            "option_1": self._calculate_woe(feature, bins),
            "option_2": self._calculate_woe(feature, bins + 1)
        }
        chosen = "option_1"  # Arbitrary choice to show the functionality
        self.results[f"{feature}_woe_choices"] = options
        self.results[f"{feature}_woe_chosen"] = chosen
        return options, chosen
        
    def _calculate_woe(self, feature: str, bins: int) -> Dict:
        """
        Calculates Weight of Evidence (WOE) values for a given feature and a specific number of bins.
        """
        if self.df[feature].dtype in ['int64', 'float64']:
          # Bin numerical features
          df_bins = pd.cut(self.df[feature], bins=bins, include_lowest=True)
        elif self.df[feature].dtype == 'object':
          df_bins = self.df[feature]
        else:
          raise ValueError(f"Feature '{feature}' has an unsupported dtype: {self.df[feature].dtype}")

        df_temp = pd.DataFrame({'bins': df_bins, 'target': self.df[self.target_col]})
        grouped = df_temp.groupby('bins')['target'].agg(['count', 'sum']).reset_index()
        grouped.rename(columns={'count': 'total', 'sum': 'positives'}, inplace=True)
        
        grouped['negatives'] = grouped['total'] - grouped['positives']

        total_positives = grouped['positives'].sum()
        total_negatives = grouped['negatives'].sum()

        grouped['pos_dist'] = grouped['positives'] / total_positives
        grouped['neg_dist'] = grouped['negatives'] / total_negatives

        # Avoid division by zero
        grouped['woe'] = np.log(grouped['pos_dist'] / grouped['neg_dist'])
        grouped['woe'].replace([np.inf, -np.inf], np.nan, inplace=True)
        grouped['woe'].fillna(0, inplace=True)

        # Format output for clarity
        woe_dict = {}
        for _, row in grouped.iterrows():
          if isinstance(row['bins'], pd.Interval): # Check if row['bins'] is an interval
            label = f"[{row['bins'].left:.2f}, {row['bins'].right:.2f}]"
          else:
            label = row['bins']
          woe_dict[label] = row['woe']
            
        return woe_dict


class MultivariateAnalyzer:
    """
    Analyzes relationships among features, returns options, lets GPT choose.
    """
    def __init__(self, df: pd.DataFrame, target_col: str, documentor: 'Documentor', gpt_agent: GPTAgent = None):
        self.df = df
        self.target_col = target_col
        self.documentor = documentor
        self.gpt_agent = gpt_agent
        self.results = {}
        self.assessment = {}

    def correlation_analysis(self, features: List) -> pd.DataFrame:
       """
       Calculates and returns the correlation matrix for the specified features
       """
       corr = self.df[features].corr()
       self.results['correlation_matrix'] = corr.to_dict()
       return corr
    
    def vif_analysis(self, features: List) -> Dict:
        """
        Calculates Variance Inflation Factor (VIF) for the given features and returns as a dictionary.
        """
        from statsmodels.stats.outliers_influence import variance_inflation_factor
        
        vif_data = {}
        
        if not features:
            return {}  # Return empty if no features provided

        try:
            X = self.df[features]
            for i, feature in enumerate(features):
                vif = variance_inflation_factor(X.values, i)
                vif_data[feature] = vif
            self.results['vif_analysis'] = vif_data
            return vif_data
        except Exception as e:
            print(f"Error in VIF calculation: {e}")
            return {}

    def feature_selection_options(self, features: List, options: List, threshold: float = 0.7) -> Dict:
       """
       Generates multiple feature selection options for GPT to choose from
       """
       available_options = {}
       if 'correlation' in options:
           available_options['correlation'] = self._select_features_by_correlation(features, threshold)
       if 'vif' in options:
          available_options['vif'] = self._select_features_by_vif(features, threshold)
       self.results['feature_selection_options'] = available_options
       return available_options


    def _select_features_by_correlation(self, features: List, threshold: float) -> List:
        """
        Selects features based on correlation matrix.
        """
        if not features:
          return []
        corr = self.df[features].corr()
        selected_features = []
        considered_features = set()
        
        for col1 in corr.columns:
            if col1 not in considered_features:
              selected_features.append(col1)
              for col2 in corr.columns:
                if col1 != col2 and abs(corr.loc[col1, col2]) > threshold:
                  considered_features.add(col2)
        return selected_features

    def _select_features_by_vif(self, features: List, threshold: float) -> List:
        """
        Selects features based on VIF values.
        """
        vif_values = self.vif_analysis(features)
        if not vif_values:
          return []
        return [feature for feature, vif in vif_values.items() if vif <= threshold]
    
    def model_options(self, X: pd.DataFrame, y: pd.Series, features: List, options: List) -> Dict:
        """
         Generates multiple model options for GPT to choose from.
        """
        available_options = {}
        if 'logistic_regression' in options:
          available_options["logistic_regression"] = self._fit_logistic_regression(X, y, features)
        if 'ols' in options:
          available_options["ols"] = self._fit_ols(X, y, features)
        if 'xgboost' in options:
          available_options["xgboost"] = self._fit_xgboost(X, y, features)
        self.results['model_options'] = available_options
        return available_options

    def _fit_logistic_regression(self, X: pd.DataFrame, y: pd.Series, features: List) -> Dict:
       """
       Fits a logistic regression model using statsmodels.
       """
       X = sm.add_constant(X) # Adding a constant
       model = sm.Logit(y, X).fit(disp=0)
       return {
        "model": model,
        "features": features,
        "summary": model.summary().as_text(),
        "aic": model.aic,
        "bic": model.bic
        }

    def _fit_ols(self, X: pd.DataFrame, y: pd.Series, features: List) -> Dict:
       """
       Fits an OLS model using statsmodels.
       """
       X = sm.add_constant(X) # Adding a constant
       model = sm.OLS(y, X).fit(disp=0)
       return {
        "model": model,
        "features": features,
        "summary": model.summary().as_text(),
         "aic": model.aic,
         "bic": model.bic
        }
    
    def _fit_xgboost(self, X: pd.DataFrame, y: pd.Series, features: List) -> Dict:
        """Fits an XGBoost model."""
        model = xgb.XGBClassifier(use_label_encoder=False, eval_metric="logloss")
        model.fit(X, y)
        return {
        "model": model,
        "features": features,
        }


class Model(ABC):
    """Abstract base class for models."""
    @abstractmethod
    def fit(self, X: pd.DataFrame, y: pd.Series) -> None:
        """Fit the model."""
        pass

    @abstractmethod
    def predict_proba(self, X: pd.DataFrame) -> np.ndarray:
        """Predict class probabilities."""
        pass

    @abstractmethod
    def predict(self, X: pd.DataFrame) -> np.ndarray:
      """Make predictions"""
      pass

class LogisticRegressionModel(Model):
  """
  Concrete class that implements Logistic Regression (sklearn)
  """
  def __init__(self, model):
    self.model = model

  def fit(self, X: pd.DataFrame, y: pd.Series) -> None:
    self.model.fit(X, y)

  def predict_proba(self, X: pd.DataFrame) -> np.ndarray:
    return self.model.predict_proba(X)[:, 1]
  
  def predict(self, X: pd.DataFrame) -> np.ndarray:
      return self.model.predict(X)


class StatsModelWrapper(Model):
    """
    Wraps a statsmodels model for consistent interface.
    """
    def __init__(self, model):
        self.model = model

    def fit(self, X: pd.DataFrame, y: pd.Series) -> None:
        # Statsmodels fit is done in the multivariate step. Nothing to do here.
        pass

    def predict_proba(self, X: pd.DataFrame) -> np.ndarray:
      """Get probabilities using statsmodels object."""
      X = sm.add_constant(X)
      return self.model.predict(X).values

    def predict(self, X: pd.DataFrame) -> np.ndarray:
      """Get labels using statsmodels object."""
      X = sm.add_constant(X)
      return (self.model.predict(X).values > 0.5).astype(int) # Using default threshold of 0.5
    
class XGBoostModelWrapper(Model):
    """
     Wraps an XGBoost model for consistent interface.
    """
    def __init__(self, model):
        self.model = model

    def fit(self, X: pd.DataFrame, y: pd.Series) -> None:
        # XGBoost fit is done in the multivariate step. Nothing to do here.
        pass

    def predict_proba(self, X: pd.DataFrame) -> np.ndarray:
      """Get probabilities using xgboost object."""
      return self.model.predict_proba(X)[:, 1]
    
    def predict(self, X: pd.DataFrame) -> np.ndarray:
      """Get labels using xgboost object."""
      return self.model.predict(X)


class RiskModelBuilder:
    """
    Builds and fits the model, stores performance metrics.
    """
    def __init__(self, model: Model, documentor: 'Documentor', gpt_agent: GPTAgent = None):
        self.model = model
        self.documentor = documentor
        self.gpt_agent = gpt_agent
        self.results = {}
        self.assessment = {}

    def fit(self, X: pd.DataFrame, y: pd.Series) -> None:
        self.model.fit(X, y)

    def predict_proba(self, X: pd.DataFrame) -> np.ndarray:
        return self.model.predict_proba(X)
    
    def predict(self, X: pd.DataFrame) -> np.ndarray:
        return self.model.predict(X)

    def evaluate(self, y_true: pd.Series, y_pred: np.ndarray) -> Dict:
        """
        Evaluates the model performance.
        """
        auc = roc_auc_score(y_true, y_pred)
        fpr, tpr, thresholds = roc_curve(y_true, y_pred)
        ks = max(tpr - fpr)
        accuracy = accuracy_score(y_true, (y_pred > 0.5).astype(int))
        precision = precision_score(y_true, (y_pred > 0.5).astype(int), zero_division=0)
        recall = recall_score(y_true, (y_pred > 0.5).astype(int), zero_division=0)
        f1 = f1_score(y_true, (y_pred > 0.5).astype(int), zero_division=0)
        self.results['auc'] = auc
        self.results['ks'] = ks
        self.results['accuracy'] = accuracy
        self.results['precision'] = precision
        self.results['recall'] = recall
        self.results['f1'] = f1
        self.results['fpr'] = fpr.tolist()
        self.results['tpr'] = tpr.tolist()
        self.results['thresholds'] = thresholds.tolist()
        return self.results
    
    def plot_roc_curve(self, fpr: List, tpr: List) -> str:
        """Generates and saves ROC curve plot"""
        plt.figure(figsize=(8, 6))
        plt.plot(fpr, tpr, color='darkorange', label='ROC Curve')
        plt.plot([0, 1], [0, 1], color='navy', linestyle='--')
        plt.xlabel('False Positive Rate')
        plt.ylabel('True Positive Rate')
        plt.title('ROC Curve')
        plt.legend()
        plt.grid()
        plt.savefig('roc_curve.png') # Save figure to a file
        plt.close()  # Close the plot
        return "roc_curve.png"


class ModelCalibrator:
    """
    Calibrates predicted probabilities.
    """
    def __init__(self, method: str = 'isotonic', documentor: 'Documentor', gpt_agent: GPTAgent = None):
        self.method = method
        self.calibrator = IsotonicRegression(out_of_bounds='clip') if method == 'isotonic' else None
        self.documentor = documentor
        self.gpt_agent = gpt_agent
        self.results = {}
        self.assessment = {}

    def fit_calibrator(self, preds: np.ndarray, y_true: pd.Series) -> None:
        if self.calibrator:
            self.calibrator.fit(preds, y_true)

    def calibrate(self, preds: np.ndarray) -> np.ndarray:
        if self.calibrator:
            return self.calibrator.transform(preds)
        return preds


class Documentor:
    """
    Handles storage of results and LaTeX report generation.
    """
    def __init__(self):
        self.sections = []
        self.pipeline_results = {}

    def add_section(self, title: str, content: str) -> None:
        self.sections.append((title, content))
    
    def store_results(self, results: Dict, step_name: str) -> None:
        self.pipeline_results[step_name] = results
    
    def get_results(self) -> Dict:
        return self.pipeline_results
    
    def get_step_results(self, step_name: str) -> Dict:
        return self.pipeline_results.get(step_name, {})
    
    def build_latex_document(self) -> str:
        doc = r"\documentclass{article}\usepackage{graphicx}\begin{document}"
        for title, content in self.sections:
            doc += f"\n\\section*{{{title}}}\n{content}\n"
        doc += r"\end{document}"
        return doc
    
    def extract_step_data(self, step_name: str, data_keys: List[str] = None) -> Dict:
        """
         Extracts only selected data from the results of a particular step
        """
        step_results = self.pipeline_results.get(step_name, {})
        if not data_keys:
           return step_results
        return {key: step_results.get(key) for key in step_results if key in step_results}


class Assessor:
  """
  Evaluates the results of each step, uses GPT commentary, and checks for feedback.
  """
  def __init__(self, documentor: 'Documentor', gpt_agent: GPTAgent, feedback_metrics: Dict = None):
    self.documentor = documentor
    self.gpt_agent = gpt_agent
    self.feedback_metrics = feedback_metrics if feedback_metrics else {}

  def assess_step(self, step_name: str, metrics: Dict = None, template: str = None) -> Tuple[bool, str]:
      """
      Assesses the results of a pipeline step, returns True if assessment is favorable, otherwise False and suggests the next step.
      """
      results = self.documentor.get_step_results(step_name)
      if not results:
         print(f"No results found for {step_name}")
         return True, ""

      assessment_result, assessment_commentary = self.gpt_agent.assess_results(
                  step_name=step_name,
                  results=results,
                  metrics=metrics,
                  template=template
      )
      commentary = self.gpt_agent.generate_commentary(
                  step_name=step_name,
                  results=results,
                  metrics=metrics,
                  template=template
      )
      self.documentor.add_section(f"GPT Commentary on {step_name}", commentary)
      self.documentor.add_section(f"Assessment Result for {step_name}", f"{assessment_result}. Next Steps: {assessment_commentary}")
      if assessment_result is False:
        print(f"Assessment failed for {step_name}: {assessment_commentary}")
      return assessment_result, assessment_commentary


class CreditRiskPipeline:
  """
  Orchestrates the entire credit risk modeling pipeline, data quality to calibration.
  """
  def __init__(self, df: pd.DataFrame, model: Model, target_col: str, gpt_agent: GPTAgent = None,
               test_size: float = 0.2, random_state: int = 42, feedback_metrics: Dict = None):
      self.df = df
      self.target_col = target_col
      self.gpt_agent = gpt_agent
      self.test_size = test_size
      self.random_state = random_state
      self.documentor = Documentor()
      self.assessor = Assessor(self.documentor, gpt_agent, feedback_metrics)

      self.data_quality_checker = DataQualityChecker(df, self.documentor, gpt_agent)
      self.univariate_analyzer = UnivariateAnalyzer(df, target_col, self.documentor, gpt_agent)
      self.multivariate_analyzer = MultivariateAnalyzer(df, target_col, self.documentor, gpt_agent)
      self.model_builder = RiskModelBuilder(model, self.documentor, gpt_agent)
      self.calibrator = ModelCalibrator(documentor=self.documentor, gpt_agent=gpt_agent)

      self.pipeline_results = {}


  def run_pipeline(self, features: List, metrics: Dict = None, template: str = None, max_iterations: int = 2) -> Dict:
    """Runs the entire credit risk pipeline"""
    current_step = "data_quality"  # Start with the data quality check
    
    history = []
    context = {}
    

    for iteration in range(max_iterations):
      print(f"\nRunning Pipeline - Iteration {iteration + 1}. Current Step: {current_step}")
      history.append(current_step)
      context["history"] = history # Adding history
      train_df, test_df = train_test_split(self.df, test_size = self.test_size, random_state=self.random_state)
      
      current_state = {
          "features": features,
           "X": None,
            "y": None,
           "preds": None,
            "y_true": None,
           "clean_train_df": None
      }
      context.update(current_state)
      
      if current_step == "data_quality":
          # Get Action Plan
          action_plan = self.gpt_agent.create_action_plan(current_step, context)
          # Step 1: Data Quality
          if action_plan.get("action") == "run_checks":
            dq_metrics = self.data_quality_checker.run_checks()
            self.documentor.store_results(dq_metrics, "data_quality")
            clean_train_df = self.data_quality_checker.clean_data()
            context["clean_train_df"] = clean_train_df
            context["data_quality_metrics"] = dq_metrics
          assessment_result, assessment_commentary = self.assessor.assess_step("data_quality", metrics, template)
          if not assessment_result:
              current_step = "data_quality"
              comment = self.gpt_agent.generate_backtracking_comment(current_step, assessment_commentary)
              self.documentor.add_section(f"Backtracking to {current_step}", comment)
              context["previous_step"] = current_step
              continue # Try another iteration
          else:
              current_step = "univariate_analysis"

      elif current_step == "univariate_analysis":
          # Get Action Plan
          action_plan = self.gpt_agent.create_action_plan(current_step, context)
          # Step 2: Univariate Analysis
          if action_plan.get("action") == "calculate_distribution":
            distribution = self.univariate_analyzer.calculate_distribution(action_plan["params"].get("features"))
            self.documentor.store_results(distribution, "univariate_analysis")
            
            if features:
              woe_opts, chosen_opt = self.univariate_analyzer.woe_options(features[0])
            else:
                woe_opts, chosen_opt = {}, None
            self.documentor.store_results({"woe_options": woe_opts, "chosen_woe": chosen_opt}, "univariate_analysis_woe")
          
          assessment_result, assessment_commentary = self.assessor.assess_step("univariate_analysis", metrics, template)
          if not assessment_result:
              current_step = "univariate_analysis"
              comment = self.gpt_agent.generate_backtracking_comment(current_step, assessment_commentary)
              self.documentor.add_section(f"Backtracking to {current_step}", comment)
              context["previous_step"] = current_step
              continue # Try another iteration
          else:
             current_step = "feature_selection"

      elif current_step == "feature_selection":
        # Get Action Plan
        action_plan = self.gpt_agent.create_action_plan(current_step, context)
        
        # Step 3: Multivariate Analysis
        if action_plan.get("action") == "feature_selection_options":
           corr = self.multivariate_analyzer.correlation_analysis(action_plan["params"].get("features"))
           self.documentor.store_results(corr.to_dict(), "correlation_matrix")

           feature_options = self.multivariate_analyzer.feature_selection_options(action_plan["params"].get("features"), action_plan["params"].get("options"), action_plan["params"].get("threshold"))
           chosen_feature_method = self.gpt_agent.choose_option(feature_options, "Choose best feature selection method")
        
        assessment_result, assessment_commentary = self.assessor.assess_step("feature_selection", metrics, template)
        if not assessment_result:
            if 're-run feature selection' in assessment_commentary:
                current_step = "feature_selection"
            elif 'data preparation' in assessment_commentary:
                current_step = "data_quality"
            else:
                current_step = "univariate_analysis"
            comment = self.gpt_agent.generate_backtracking_comment(current_step, assessment_commentary)
            self.documentor.add_section(f"Backtracking to {current_step}", comment)
            continue
        else:
            current_step = "model_evaluation"

      elif current_step == "model_evaluation":
          # Get Action Plan
          action_plan = self.gpt_agent.create_action_plan(current_step, current_state)
          # Step 4: Model Building
          X_train = current_state["X"]
          y_train = current_state["y"]
          X_test, y_test = test_df[current_state["features"]], test_df[self.target_col]
          
          model_type = self.documentor.get_step_results("multivariate_modelling").get("chosen_model_option")
          chosen_model = model