# Diabetes Readmission Prediction Pipeline Execution

This notebook provides a convenient interface to execute the `src/run.py` pipeline script directly from within the `src/` directory.

The `src/run.py` script is designed to be run as a Python module using `python -m src.run <args>` from the *project's root directory* (the directory containing the `src/` folder). This ensures that Python correctly resolves module paths and relative imports (like `from . import config`).

Executing `src/run.py` directly (e.g., `python run.py`) from within `src/` or running `python -m src.run` from within `src/` can lead to `ImportError` issues because the script is not run in the correct package context.

To address this, this notebook uses Python's `subprocess` module to execute the `python -m src.run` command, explicitly setting the current working directory (`cwd`) to the project root. This accurately simulates the intended command-line execution environment.

The notebook is structured into several cells:
1.  **Environment Setup:** Imports necessary libraries, determines the project root path, and defines a helper function `run_src_script` to handle execution via `subprocess`.
2.  **Default Execution:** Runs the pipeline with default configuration (equivalent to `$ python -m src.run`).
3.  **Force Train Autoencoder:** Runs the pipeline forcing autoencoder retraining (equivalent to `$ python -m src.run --train-ae`).
4.  **Force Train Predictor:** Runs the pipeline forcing predictor retraining (equivalent to `$ python -m src.run --train-predictor`).
5.  **Force Train Both:** Runs the pipeline forcing both autoencoder and predictor retraining (equivalent to `$ python -m src.run --train-ae --train-predictor`).

Execute the cells sequentially or individually as needed.

In [1]:
# --- Cell 1: Environment Setup and Execution Helper ---
# Imports, path configuration, and a helper function for running the script.

import os
import sys
import subprocess
import logging
import datetime # Import datetime to generate timestamps for log files

print("--- Setting up execution environment ---")

# Get the directory where this notebook file is located.
# In most Jupyter/IPython environments, this is the current working directory.
notebook_dir = os.getcwd()
print(f"Current notebook directory: {notebook_dir}")

# The project root is the parent directory of the 'src' directory.
# We assume this notebook is located directly within the 'src' directory.
# Use abspath to get the absolute path and handle '..' correctly.
project_root_dir = os.path.abspath(os.path.join(notebook_dir, '..'))

print(f"Target execution directory (project root): {project_root_dir}")

# Define the directory where logs will be saved
log_dir = os.path.join(project_root_dir, 'logs')

# Optional but recommended: Verify that the 'src' directory exists at the calculated root.
src_path_check = os.path.join(project_root_dir, 'src')
if not os.path.isdir(src_path_check):
    print(f"ERROR: The 'src' directory was not found at the calculated project root '{project_root_dir}'.")
    print("Please ensure this notebook file is located directly within your project's 'src' directory.")
    # Consider adding 'sys.exit(1)' in a non-notebook script, but print is better for a notebook.
    # In a notebook, you might manually stop execution if this check fails.
else:
    print("Project structure confirmed ('src' found in root).")
    # Create the logs directory if it doesn't exist
    os.makedirs(log_dir, exist_ok=True)
    print(f"Log files will be saved in: {log_dir}")
    print("Execution helper ready.")


def run_src_script(args: list[str]):
    """
    Executes the src/run.py script as a Python module from the project root.

    This function constructs and runs the command `python -m src.run`
    with the given arguments using subprocess, setting the working directory
    to the project root. Captured stdout/stderr are also saved to a
    timestamped log file in the 'logs' directory.

    Args:
        args: A list of strings representing command-line arguments
              to pass to src/run.py (e.g., ['--train-ae']).
    """
    command = [sys.executable, '-m', 'src.run'] + args
    command_str = ' '.join(command)

    # Generate a timestamp for the log file
    timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    # Create a log file name based on timestamp and arguments (simplified name)
    log_filename = f"pipeline_run_{timestamp}.log"
    log_filepath = os.path.join(log_dir, log_filename)

    print(f"\n--- Executing Command ---")
    print(f"Working Directory: {project_root_dir}")
    print(f"Command: {command_str}")
    print(f"Saving output to log file: {log_filepath}")
    print("---------------------------")

    # Placeholder for output to also print to console
    stdout_output = ""
    stderr_output = ""
    execution_status = "UNKNOWN" # Will be updated based on try/except

    # Open the log file right away to write initial info
    try:
        with open(log_filepath, 'w', encoding='utf-8') as log_file:
            log_file.write(f"--- Pipeline Execution Log - {timestamp} ---\n")
            log_file.write(f"Working Directory: {project_root_dir}\n")
            log_file.write(f"Command: {command_str}\n")
            log_file.write("-" * 30 + "\n\n") # Separator

            # Run the command using subprocess
            result = subprocess.run(
                command,
                cwd=project_root_dir,
                capture_output=True,
                text=True,
                encoding='utf-8',
                errors='replace',
                check=True # Will raise CalledProcessError on non-zero exit
            )

            stdout_output = result.stdout
            stderr_output = result.stderr
            execution_status = "SUCCESS"

            # Write captured stdout and stderr to the log file
            log_file.write("--- STDOUT ---\n")
            log_file.write(stdout_output)
            log_file.write("\n") # Ensure newline after stdout
            if stderr_output:
                 log_file.write("--- STDERR ---\n")
                 log_file.write(stderr_output)
                 log_file.write("\n")

            log_file.write("-" * 30 + "\n") # Separator
            log_file.write("--- COMMAND EXECUTED SUCCESSFULLY ---\n")

    except FileNotFoundError:
        execution_status = "ERROR - Python Not Found"
        error_msg = f"Error: Python executable '{sys.executable}' not found.\nPlease ensure Python is installed and accessible in your environment (check your PATH)."
        print(f"\n--- EXECUTION ERROR ---")
        print(error_msg)
        print("---------------------------")

        # Write error details to log
        with open(log_filepath, 'a', encoding='utf-8') as log_file: # Use 'a' to append if file was already opened/written
             log_file.write(f"\n--- ERROR --- {execution_status}\n")
             log_file.write(error_msg + "\n")


    except subprocess.CalledProcessError as e:
        execution_status = f"ERROR - Command Failed (Return Code: {e.returncode})"
        print(f"\n--- COMMAND FAILED ---")
        print(f"Command: {command_str}")
        print(f"Working Directory: {project_root_dir}")
        print(f"Return Code: {e.returncode}")
        # Print captured output to console even on failure
        print(f"\nSTDOUT:\n{e.stdout}")
        print(f"\nSTDERR:\n{e.stderr}")
        print("---------------------------")

        # Write captured output and error details to log
        with open(log_filepath, 'a', encoding='utf-8') as log_file: # Use 'a' to append
             log_file.write(f"\n--- ERROR --- {execution_status}\n")
             log_file.write(f"Command: {command_str}\n")
             log_file.write(f"Working Directory: {project_root_dir}\n")
             log_file.write(f"Return Code: {e.returncode}\n")
             log_file.write("\nSTDOUT:\n")
             log_file.write(e.stdout)
             log_file.write("\nSTDERR:\n")
             log_file.write(e.stderr)
             log_file.write("\n")


    except Exception as e:
        execution_status = f"ERROR - Unexpected Exception: {type(e).__name__}"
        error_msg = f"An unexpected error occurred: {e}"
        print(f"\n--- UNEXPECTED ERROR DURING EXECUTION ---")
        print(error_msg)
        print("---------------------------")

        # Write error details to log
        with open(log_filepath, 'a', encoding='utf-8') as log_file: # Use 'a' to append
             log_file.write(f"\n--- ERROR --- {execution_status}\n")
             log_file.write(error_msg + "\n")


    finally:
         # This block ensures the log file path is always reported after an attempt
         print(f"\n--- Execution finished with status: {execution_status} ---")
         print(f"Full output saved to: {log_filepath}")
         print("\n") # Add a newline for separation between executions


# Indicate that the setup is complete
print("Setup complete. Helper function 'run_src_script' is defined and ready.")

--- Setting up execution environment ---
Current notebook directory: c:\Users\aflon\OneDrive\Documentos\GitHub\IDSS-for-Diabetes-Readmission-Prediction\src
Target execution directory (project root): c:\Users\aflon\OneDrive\Documentos\GitHub\IDSS-for-Diabetes-Readmission-Prediction
Project structure confirmed ('src' found in root).
Log files will be saved in: c:\Users\aflon\OneDrive\Documentos\GitHub\IDSS-for-Diabetes-Readmission-Prediction\logs
Execution helper ready.
Setup complete. Helper function 'run_src_script' is defined and ready.


In [2]:
## Execute with Default Settings

# This cell runs the pipeline using the default configuration settings
# specified within the src/config.py file.
# This simulates the command:
# $ python -m src.run

print("--- Running pipeline with default configuration ---")
run_src_script(args=[]) # Pass an empty list as no command-line arguments are needed

--- Running pipeline with default configuration ---

--- Executing Command ---
Working Directory: c:\Users\aflon\OneDrive\Documentos\GitHub\IDSS-for-Diabetes-Readmission-Prediction
Command: c:\Users\aflon\OneDrive\Documentos\GitHub\IDSS-for-Diabetes-Readmission-Prediction\.venv\Scripts\python.exe -m src.run
Saving output to log file: c:\Users\aflon\OneDrive\Documentos\GitHub\IDSS-for-Diabetes-Readmission-Prediction\logs\pipeline_run_20250519_101707.log
---------------------------

--- Execution finished with status: SUCCESS ---
Full output saved to: c:\Users\aflon\OneDrive\Documentos\GitHub\IDSS-for-Diabetes-Readmission-Prediction\logs\pipeline_run_20250519_101707.log




In [None]:
## Execute Forcing Autoencoder Retraining

# This cell runs the pipeline and explicitly requests that the
# Autoencoder model be retrained, overriding the default configuration
# or any existing saved model.
# This simulates the command:
# $ python -m src.run --train-ae

print("--- Running pipeline forcing Autoencoder training ---")
run_src_script(args=['--train-ae'])

In [None]:
## Execute Forcing Predictor Retraining

# This cell runs the pipeline and explicitly requests that the
# Predictor model be retrained, overriding the default configuration
# or any existing saved model.
# This simulates the command:
# $ python -m src.run --train-predictor

print("--- Running pipeline forcing Predictor training ---")
run_src_script(args=['--train-predictor'])

In [None]:
## Execute Forcing Both AE and Predictor Retraining

# This cell runs the pipeline and explicitly requests that *both*
# the Autoencoder and the Predictor models be retrained.
# This simulates the command:
# $ python -m src.run --train-ae --train-predictor

print("--- Running pipeline forcing both AE and Predictor training ---")
run_src_script(args=['--train-ae', '--train-predictor'])