# Diabetes Readmission Prediction Pipeline Execution

This notebook provides a convenient interface to execute the `src/run.py` pipeline script directly from within the `src/` directory.

The `src/run.py` script is designed to be run as a Python module using `python -m src.run <args>` from the *project's root directory* (the directory containing the `src/` folder). This ensures that Python correctly resolves module paths and relative imports (like `from . import config`).

Executing `src/run.py` directly (e.g., `python run.py`) from within `src/` or running `python -m src.run` from within `src/` can lead to `ImportError` issues because the script is not run in the correct package context.

To address this, this notebook uses Python's `subprocess` module to execute the `python -m src.run` command, explicitly setting the current working directory (`cwd`) to the project root. This accurately simulates the intended command-line execution environment.

The notebook is structured into several cells:
1.  **Environment Setup:** Imports necessary libraries, determines the project root path, and defines a helper function `run_src_script` to handle execution via `subprocess`.
2.  **Default Execution:** Runs the pipeline with default configuration (equivalent to `$ python -m src.run`).
3.  **Force Train Autoencoder:** Runs the pipeline forcing autoencoder retraining (equivalent to `$ python -m src.run --train-ae`).
4.  **Force Train Predictor:** Runs the pipeline forcing predictor retraining (equivalent to `$ python -m src.run --train-predictor`).
5.  **Force Train Both:** Runs the pipeline forcing both autoencoder and predictor retraining (equivalent to `$ python -m src.run --train-ae --train-predictor`).

Execute the cells sequentially or individually as needed.

In [1]:
# --- Cell 1: Environment Setup and Execution Helper ---
# Imports, path configuration, and a helper function for running the script.
# This cell sets up the necessary environment and defines the `run_src_script`
# helper function which executes the main pipeline script.

import os
import sys
import subprocess
# We no longer need logging or datetime in this helper function
# import logging
# import datetime

print("--- Setting up execution environment ---")

# Get the directory where this notebook file is located.
# In most Jupyter/IPython environments, this is the current working directory.
notebook_dir = os.getcwd()
print(f"Current notebook directory: {notebook_dir}")

# The project root is the parent directory of the 'src' directory.
# We assume this notebook is located directly within the 'src' directory.
# Use abspath to get the absolute path and handle '..' correctly.
project_root_dir = os.path.abspath(os.path.join(notebook_dir, '..'))

print(f"Target execution directory (project root): {project_root_dir}")

# Optional but recommended: Verify that the 'src' directory exists at the calculated root.
src_path_check = os.path.join(project_root_dir, 'src')
if not os.path.isdir(src_path_check):
    print(f"ERROR: The 'src' directory was not found at the calculated project root '{project_root_dir}'.")
    print("Please ensure this notebook file is located directly within your project's 'src' directory.")
    # In a production script, you might sys.exit(1) here. In a notebook, printing an error is usually sufficient.
else:
    print("Project structure confirmed ('src' found in root). Execution helper ready.")


def run_src_script(args: list[str]):
    """
    Executes the src/run.py script as a Python module from the project root.

    This function runs the command `python -m src.run` with the given arguments
    using subprocess, setting the working directory to the project root.
    The script's standard output and standard error will be streamed directly
    to this notebook cell's output in real-time.

    Note: This function does *not* capture the output programmatically after
    execution, nor does it handle saving output to a log file. Real-time
    output is prioritized. If file logging is needed, configure it within
    the src/run.py script itself using Python's 'logging' module
    (as shown in the previous example of modifying src/run.py).

    Examples of `args` lists (matching src/run.py's CLI):
    []
    ['--train-ae']
    ['--ae-epochs', '100']
    ['--hidden-dim', '256', '--use-gru']
    ['--train-predictor', '--predictor-learning-rate', '0.0001']

    Args:
        args: A list of strings representing command-line arguments
              to pass to src/run.py. Each argument and its value (if any)
              should be a separate string in the list.
              E.g., ['--arg1', 'value1', '--flag2', 'value2']
    """
    # Construct the full command list for subprocess
    command = [sys.executable, '-m', 'src.run'] + args
    # Create a string representation of the command for logging/printing
    command_str = ' '.join(command)

    print(f"\n--- Executing Command ---")
    print(f"Working Directory: {project_root_dir}")
    print(f"Command: {command_str}")
    print("---------------------------")
    print("--- Script Output (Real-time) ---") # Indicate where script output starts


    execution_status = "UNKNOWN" # Initial status

    try:
        # Run the subprocess WITHOUT capturing output pipes explicitly.
        # stdout=None and stderr=None (the default) means the subprocess
        # output goes to the parent process's stdout/stderr, which is
        # the notebook cell output in this case.
        # Removed: capture_output, text, encoding, errors arguments from subprocess.run
        result = subprocess.run(
            command,
            cwd=project_root_dir, # Set the current working directory for the command
            check=True # Still raise CalledProcessError if the command returns a non-zero exit status
            # stdout=None, # Explicitly specify default for clarity, though often not needed
            # stderr=None, # Explicitly specify default for clarity, though often not needed
        )

        # If subprocess.run completes without CalledProcessError, it was a success
        execution_status = "SUCCESS"
        print("\n--- COMMAND EXECUTED SUCCESSFULLY ---") # Print success message after script output finishes

    except FileNotFoundError:
        # Handle the case where the Python executable is not found in the environment
        execution_status = "ERROR - Python Not Found"
        error_msg = f"Error: Python executable '{sys.executable}' not found.\nPlease ensure Python is installed and accessible in your environment (check your PATH)."
        print(f"\n--- EXECUTION ERROR ---")
        print(error_msg)
        print("---------------------------")

    except subprocess.CalledProcessError as e:
        # Handle errors where the subprocess itself exited with a non-zero status
        execution_status = f"ERROR - Command Failed (Return Code: {e.returncode})"
        print(f"\n--- COMMAND FAILED ---")
        print(f"Command: {command_str}")
        print(f"Working Directory: {project_root_dir}")
        print(f"Return Code: {e.returncode}")
        # Note: Because we are not capturing output, e.stdout and e.stderr will
        # likely be None or empty. The script's error messages should have
        # already been printed to the cell output in real-time.
        print("\nCheck the real-time output above for script error details.")
        print("---------------------------")


    except Exception as e:
        # Catch any other unexpected exceptions during the execution setup or subprocess handling
        execution_status = f"ERROR - Unexpected Exception: {type(e).__name__}"
        error_msg = f"An unexpected error occurred during execution: {e}"
        print(f"\n--- UNEXPECTED ERROR DURING EXECUTION ---")
        print(error_msg)
        print("---------------------------")


    finally:
       # This block executes regardless of whether an exception occurred or not
       print(f"\n--- Execution finished with status: {execution_status} ---")
       # No log file is handled by this function, so no log path to report here.
       print("\n") # Add a newline for separation between executions


# Final confirmation message after the setup cell has run
print("Setup complete. Helper function 'run_src_script' is defined and ready.")
print("The script's standard output and error will now stream directly to the cell output in real-time.")
print("Log file generation must be configured within the src/run.py script itself if needed.")
print("Use run_src_script(args=[...]) in subsequent cells.")

--- Setting up execution environment ---
Current notebook directory: c:\Users\aflon\OneDrive\Documentos\GitHub\IDSS-for-Diabetes-Readmission-Prediction\scripts
Target execution directory (project root): c:\Users\aflon\OneDrive\Documentos\GitHub\IDSS-for-Diabetes-Readmission-Prediction
Project structure confirmed ('src' found in root). Execution helper ready.
Setup complete. Helper function 'run_src_script' is defined and ready.
The script's standard output and error will now stream directly to the cell output in real-time.
Log file generation must be configured within the src/run.py script itself if needed.
Use run_src_script(args=[...]) in subsequent cells.


In [None]:
## Execute with Default Settings

# This cell runs the pipeline using the default configuration settings
# specified within the src/config.py file.
# This simulates the command:
# $ python -m src.run

print("--- Running pipeline with default configuration ---")
run_src_script(args=[]) # Pass an empty list as no command-line arguments are needed

--- Running pipeline with default configuration ---

--- Executing Command ---
Working Directory: c:\Users\aflon\OneDrive\Documentos\GitHub\IDSS-for-Diabetes-Readmission-Prediction
Command: c:\Users\aflon\OneDrive\Documentos\GitHub\IDSS-for-Diabetes-Readmission-Prediction\.venv\Scripts\python.exe -m src.run
---------------------------
--- Script Output (Real-time) ---


In [None]:
## Execute Forcing Autoencoder Retraining

# This cell runs the pipeline and explicitly requests that the
# Autoencoder model be retrained, overriding the default configuration
# or any existing saved model.
# This simulates the command:
# $ python -m src.run --train-ae

print("--- Running pipeline forcing Autoencoder training ---")
run_src_script(args=['--train-ae'])

--- Running pipeline forcing Autoencoder training ---

--- Executing Command ---
Working Directory: c:\Users\aflon\OneDrive\Documentos\GitHub\IDSS-for-Diabetes-Readmission-Prediction
Command: c:\Users\aflon\OneDrive\Documentos\GitHub\IDSS-for-Diabetes-Readmission-Prediction\.venv\Scripts\python.exe -m src.run --train-ae
Saving output to log file: c:\Users\aflon\OneDrive\Documentos\GitHub\IDSS-for-Diabetes-Readmission-Prediction\logs\pipeline_run_20250519_131957.log
---------------------------


In [None]:
## Execute Forcing Predictor Retraining

# This cell runs the pipeline and explicitly requests that the
# Predictor model be retrained, overriding the default configuration
# or any existing saved model.
# This simulates the command:
# $ python -m src.run --train-predictor

print("--- Running pipeline forcing Predictor training ---")
run_src_script(args=['--train-predictor'])

In [None]:
## Execute Forcing Both AE and Predictor Retraining

# This cell runs the pipeline and explicitly requests that *both*
# the Autoencoder and the Predictor models be retrained.
# This simulates the command:
# $ python -m src.run --train-ae --train-predictor

print("--- Running pipeline forcing both AE and Predictor training ---")
run_src_script(args=['--train-ae', '--train-predictor'])

In [None]:
## Execute Forcing AE Retraining with Custom Hyperparameters

# This cell runs the pipeline, forcing the Autoencoder to be retrained,
# and overrides its number of epochs and learning rate.
# This simulates a command like:
# $ python -m src.run --train-ae --ae-epochs 50 --ae-learning-rate 0.0005

print("--- Running pipeline forcing AE training with custom epochs and learning rate ---")
custom_ae_args = [
    '--train-ae',             # Force training the AE
    '--ae-epochs', '50',       # Override AE_EPOCHS to 50
    '--ae-learning-rate', '0.0005' # Override AE_LEARNING_RATE to 0.0005 (pass float as string)
]
run_src_script(args=custom_ae_args)

In [None]:
## Execute Forcing Predictor Retraining with Custom Hyperparameters

# This cell runs the pipeline, forcing the Predictor to be retrained,
# and overrides its batch size and optimizer.
# This simulates a command like:
# $ python -m src.run --train-predictor --predictor-batch-size 128 --predictor-optimizer Adam

print("--- Running pipeline forcing Predictor training with custom batch size and optimizer ---")
custom_predictor_args = [
    '--train-predictor',      # Force training the Predictor
    '--predictor-batch-size', '128', # Override PREDICTOR_BATCH_SIZE to 128
    '--predictor-optimizer', 'Adam' # Override PREDICTOR_OPTIMIZER to 'Adam'
]
run_src_script(args=custom_predictor_args)

In [None]:
## Execute Forcing Both Retraining and Tuning Model Architecture

# This cell runs the pipeline, forces retraining of *both* AE and Predictor,
# and overrides key model architecture parameters: hidden dimension,
# number of RNN layers, dropout, and switches to using GRU instead of LSTM (if default is LSTM).
# This simulates a command like:
# $ python -m src.run --train-ae --train-predictor --hidden-dim 256 --num-rnn-layers 2 --dropout 0.3 --use-gru

print("--- Running pipeline forcing both training and tuning model architecture ---")
architecture_tuning_args = [
    '--train-ae',             # Force AE training (architecture change affects AE)
    '--train-predictor',      # Force Predictor training (architecture change affects Predictor)
    '--hidden-dim', '256',    # Override HIDDEN_DIM to 256
    '--num-rnn-layers', '2',  # Override NUM_RNN_LAYERS to 2
    '--dropout', '0.3',       # Override DROPOUT to 0.3 (pass float as string)
    '--use-gru',              # Override USE_GRU to True
    '--no-use-attention'      # Override USE_ATTENTION to False (if you want to disable attention)
    # Or just omit --no-use-attention to keep default USE_ATTENTION from config/CLI if not overridden
]
run_src_script(args=architecture_tuning_args)

In [None]:
## Execute Forcing Both Retraining and Tuning Batch Sizes and Data Workers

# This cell runs the pipeline, forces retraining of *both* AE and Predictor,
# and overrides the batch size for both training phases and the number
# of data loader workers.
# This simulates a command like:
# $ python -m src.run --train-ae --train-predictor --ae-batch-size 128 --predictor-batch-size 128 --dataloader-num-workers 4

print("--- Running pipeline forcing both training and tuning batch sizes/data workers ---")
batch_size_tuning_args = [
    '--train-ae',             # Force AE training
    '--train-predictor',      # Force Predictor training
    '--ae-batch-size', '128', # Override AE_BATCH_SIZE to 128
    '--predictor-batch-size', '128', # Override PREDICTOR_BATCH_SIZE to 128
    '--dataloader-num-workers', '4' # Override DATALOADER_NUM_WORKERS to 4
]
run_src_script(args=batch_size_tuning_args)