# Submit Script to Azure Compute Cluster

This notebook submits the `rerun_model.py` script to an Azure ML compute cluster for execution.

## 1. Import Required Libraries

In [1]:
from azure.ai.ml import MLClient
from azure.ai.ml import command
from azure.ai.ml.entities import Environment
from azure.identity import DefaultAzureCredential
from datetime import datetime

In [2]:
SCRIPT_NAME = 'rerun_model.py'
CLUSTER_NAME = "MLDevCluster2" 

## 2. Configuration

In [3]:
# Azure ML workspace configuration
subscription_id = "3ff01651-6d0f-4b11-b79c-7fa6ecbe432f"
resource_group = "AnalysisSvcs_RG"
workspace_name = "concorddevtestml_workspace"

# Compute cluster configuration
compute_name = CLUSTER_NAME  # Replace with your compute cluster name

# Environment configuration - specify pre-existing environment
environment_name = "autogluon-env"  # Change this to your pre-existing environment
environment_version = "2"  # Or specify version, use None for latest

# Experiment configuration
experiment_name = "Loonie-Bankuity-Model-Rerun"
job_name = f"{experiment_name}-{datetime.now().strftime('%Y%m%d-%H%M%S')}"

print(f"Job name: {job_name}")
print(f"Using environment: {environment_name}:{environment_version}")

Job name: Loonie-Bankuity-Model-Rerun-20251124-133611
Using environment: autogluon-env:2


## 3. Initialize Azure ML Client

In [4]:
# Initialize the MLClient
ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id=subscription_id,
    resource_group_name=resource_group,
    workspace_name=workspace_name,
)

print(f"Connected to workspace: {workspace_name}")

Class DeploymentTemplateOperations: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


Connected to workspace: concorddevtestml_workspace


## 4. Use Pre-existing Environment

Reference an existing environment from your Azure ML workspace.

In [5]:
# Use pre-existing environment from the workspace
if environment_version:
    environment = f"azureml:{environment_name}:{environment_version}"
else:
    environment = f"azureml:{environment_name}@latest"

print(f"Using environment: {environment}")

# Alternative: Define a new environment (commented out)
# environment = Environment(
#     name="post-onboarding-env",
#     description="Environment for post-onboarding model comparison",
#     conda_file="../requirements.txt",
#     image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest"
# )

Using environment: azureml:autogluon-env:2


## 5. Create Command Job

Create the command job with proper source directory configuration. The script will load environment variables from the .env file automatically.

In [6]:
from dotenv import dotenv_values
env_local = dotenv_values(".env") 
# Create the command job
job_env = {
    "DB_SERVER": env_local["DB_SERVER"],
    "DB_USER": env_local["DB_USER"],
    "DB_PASSWORD": env_local["DB_PASSWORD"],
    "ODBC_DRIVER_VERSION": env_local.get("ODBC_DRIVER_VERSION","ODBC Driver 18 for SQL Server"),
    "DATAPATH": "ibv_status_data/loonie_ibv_shadowV3_dedup.csv",
    "EXPERIMENT_NAME": "loonie_rerun_testV2",
    "CLIENT_NAME": "Loonie",
    "IBV_NAME": "LoonieIBV",
    "TEST": "true",
    "PARALLEL": "false",
    "CONCURRENCY_LIMIT": "30",
    "PYTHONPATH": "."
}
job = command(
    code=".",  # Point to src directory
    command=f"bash -lc \"apt-get update && ACCEPT_EULA=Y apt-get install -y msodbcsql18 unixodbc unixodbc-dev || true; pip install --no-input pyodbc SQLAlchemy; python {SCRIPT_NAME}\"",
    environment=environment,
    compute=compute_name,
    experiment_name=experiment_name,
    display_name=job_name,
    description="Run Bankuity model on Loonie IBV IDs",
    environment_variables=job_env,
    #environment_variables={
        
        #"PYTHONPATH": "."  # Set PYTHONPATH to current directory (src) so run_test module can be found
    #},
    # Resource configuration
    instance_count=1,
    # Add timeout if needed (in seconds)
    # timeout=7200,  # 2 hours
)

print("Command job created")

Command job created


## 6. Submit Job to Compute Cluster

In [7]:
# Submit the job
submitted_job = ml_client.jobs.create_or_update(job)

print("Job submitted successfully!")
print(f"Job ID: {submitted_job.name}")
print(f"Job Status: {submitted_job.status}")
print(f"Studio URL: {submitted_job.studio_url}")

Class AutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class AutoDeleteConditionSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseAutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
[32mUploading loonie_bankuity_rerun (

Job submitted successfully!
Job ID: modest_night_8fqtqndtkw
Job Status: Starting
Studio URL: https://ml.azure.com/runs/modest_night_8fqtqndtkw?wsid=/subscriptions/3ff01651-6d0f-4b11-b79c-7fa6ecbe432f/resourcegroups/AnalysisSvcs_RG/workspaces/concorddevtestml_workspace&tid=a0441342-388e-435c-a487-ed619a0af8d8


## 7. Monitor Job Status

In [11]:
# Get job status
job_status = ml_client.jobs.get(submitted_job.name)
print(f"Current job status: {job_status.status}")
print(f"Job details: {job_status.display_name}")

# You can also stream the logs
# ml_client.jobs.stream(submitted_job.name)

Current job status: Running
Job details: Loonie-Bankuity-Model-Rerun-20251124-133611


## 8. Optional: Stream Job Logs

Uncomment the cell below to stream logs in real-time (this will block the cell until the job completes).

In [None]:
# Stream job logs (this will block until job completes)
# ml_client.jobs.stream(submitted_job.name)

## 9. Download Job Outputs (After Completion)

After the job completes, you can download any outputs or logs.

In [None]:
# Download job outputs after completion
# ml_client.jobs.download(submitted_job.name, download_path="./job_outputs")
# print("Job outputs downloaded to ./job_outputs")

## 10. Job Management Utilities

In [None]:
# List recent jobs in the experiment
jobs = ml_client.jobs.list(max_results=10)
print("Recent jobs:")
for job in jobs:
    print(f"  {job.name}: {job.status} - {job.display_name}")

In [None]:
# Cancel a job if needed (uncomment and provide job name)
# job_to_cancel = "job-name-here"
# ml_client.jobs.cancel(job_to_cancel)
# print(f"Job {job_to_cancel} cancelled")

## Notes

1. **Environment Setup**: Make sure your compute cluster has access to the required packages (pandas, sqlalchemy, pyodbc, etc.)

2. **Database Access**: Ensure the compute cluster can access your SQL Server database. You may need to configure network access or use Azure Key Vault for credentials.

3. **File Access**: Make sure the CSV file specified in `CSV_FILE_PATH` is accessible to the compute cluster.

4. **Environment Variables**: Update the configuration section with your actual values for experiment name, client name, etc.

5. **Resource Requirements**: Adjust the compute cluster size based on your data size and processing requirements.

6. **Monitoring**: Use the Azure ML Studio URL to monitor job progress and view detailed logs.

## 11. Create New Environment (Optional)

Use this cell to create a new environment based on requirements.txt if you don't have a pre-existing environment.

In [None]:
# Create a new environment based on conda.yml
# Uncomment and run this cell if you need to create a new environment


new_environment = Environment(
    name="post-onboarding-env-custom",
    description="Custom environment for post-onboarding model comparison created from conda.yml",
    conda_file="../conda.yml",  # Path to conda.yml file in src directory
    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest"
)

# Create the environment in the workspace
created_env = ml_client.environments.create_or_update(new_environment)
print(f"Environment created: {created_env.name}:{created_env.version}")

# To use this new environment, update the configuration in cell 2:
# environment_name = "post-onboarding-env-custom"
# environment_version = created_env.version

print("Conda.yml file has been created at ../conda.yml")
print("Uncomment the code above to create the environment from conda.yml")

Environment created: post-onboarding-env-custom:2
Conda.yml file has been created at ../conda.yml
Uncomment the code above to create the environment from conda.yml


## Download Job Outputs (After Completion)

In [None]:
ml_client.jobs.download(
    name="lucid_sail_3cf8ld23v3",
    output_name="default",        # or your named output (e.g., "model", "score")
    download_path="outputs"
)