# Scheduler Notification Demo

This notebook demonstrates how to:
1. Send Teams notifications with job identification
2. Clean up old scheduler jobs
3. List and inspect scheduler jobs

## Key Concept: Job Identification in Notifications

When scheduling this notebook, pass the job_id as a parameter so the notification can identify the source job.

**Scheduling with the Scheduler UI:**
- When you schedule this notebook via the JupyterLab Scheduler UI, set the parameters:
  - `_scheduler_job_id`: Will be automatically available as the job runs
  - `_scheduler_job_name`: The name you give the job

**Scheduling programmatically:**
```python
from jupyter_scheduler import SchedulerClient
client = SchedulerClient()
job = client.create_job(
    name="My Notification Test",
    path="workflows/POC/Scheduler_Notification_Demo.ipynb",
    parameters={
        "_scheduler_job_name": "'My Notification Test'"
    }
)
# The job_id will be in job['job_id']
```

## Parameters

These parameters can be overridden when scheduling the notebook.

In [1]:
# Parameters - these will be injected by the scheduler
# When running interactively, these defaults are used

_scheduler_job_id = None  # Will be set when scheduled
_scheduler_job_name = "Interactive Test"  # Override when scheduling

## Setup

In [2]:
import os
import sys
from datetime import datetime

# Import our helpers
from helpers.teams_notification import TeamsNotificationClient
from helpers.notebook_scheduler import (
    cleanup_old_jobs,
    get_cleanup_preview,
    list_jobs,
    get_job_info,
    build_job_notification_message,
    get_job_actions,
    get_current_job_context
)

print("Imports successful!")

Imports successful!


## Get Current Job Context

This shows how to detect if we're running as a scheduled job and get the job details.

In [3]:
# Get the current job context
# This checks for injected parameters from the scheduler

# Method 1: Use the helper function (recommended)
job_context = get_current_job_context()
print(f"Job Context: {job_context}")

# Method 2: Direct access to injected parameters
job_id = _scheduler_job_id
job_name = _scheduler_job_name

print(f"\nJob ID: {job_id}")
print(f"Job Name: {job_name}")
print(f"Is Scheduled Run: {job_id is not None}")

Job Context: {'job_id': 'INTERACTIVE', 'job_name': 'Interactive Test', 'is_scheduled': False}

Job ID: None
Job Name: Interactive Test
Is Scheduled Run: False


## Send Notification with Job Identification

This demonstrates how to send a Teams notification that includes the job ID so you can easily identify which scheduled job sent the notification.

In [4]:
# Create notification client
client = TeamsNotificationClient()

# Determine if we're running as a scheduled job
is_scheduled = _scheduler_job_id is not None

if is_scheduled:
    # Build message with job identification
    message = build_job_notification_message(
        job_id=_scheduler_job_id,
        job_name=_scheduler_job_name,
        additional_info="This notification includes job identification so you can trace it back to the specific scheduled job."
    )
    
    # Get action buttons that link to the job
    actions = get_job_actions(_scheduler_job_id)
    
    # Add custom actions
    actions.append({
        "title": "View Dashboard",
        "url": os.environ.get('TEAMS_DEFAULT_DASHBOARD_URL', 'http://localhost:8001')
    })
else:
    # Interactive run - simple message
    message = "**Running interactively** (not scheduled)\n\nTo see job identification in notifications, schedule this notebook using the JupyterLab Scheduler."
    actions = None

# Send the notification
print("Sending notification...")
print(f"Message:\n{message}\n")

result = client.send_success(
    title="Scheduler Notification Demo",
    message=message,
    actions=actions
)

print(f"Notification sent: {result}")

Sending notification...
Message:
**Running interactively** (not scheduled)

To see job identification in notifications, schedule this notebook using the JupyterLab Scheduler.

Notification sent: True


## Error Notification Pattern

This shows the recommended pattern for sending error notifications with job identification.

In [5]:
def process_with_notification():
    """
    Example function that demonstrates the error notification pattern.
    Wrap your main logic in a try/except and send appropriate notifications.
    """
    client = TeamsNotificationClient()
    
    # Get job context for identification
    job_id = _scheduler_job_id or 'INTERACTIVE'
    job_name = _scheduler_job_name or 'Interactive Session'
    
    try:
        # ============================================
        # Your main processing logic goes here
        # ============================================
        
        # Simulate some work
        result = "Processed 100 records successfully"
        
        # Optionally: Simulate an error for testing
        # raise ValueError("Simulated error for testing")
        
        # ============================================
        # Send success notification
        # ============================================
        message = build_job_notification_message(
            job_id=job_id,
            job_name=job_name,
            additional_info=result
        )
        
        actions = get_job_actions(job_id) if job_id != 'INTERACTIVE' else None
        
        client.send_success(
            title="Processing Complete",
            message=message,
            actions=actions
        )
        
        return result
        
    except Exception as e:
        # ============================================
        # Send error notification with job identification
        # ============================================
        error_message = build_job_notification_message(
            job_id=job_id,
            job_name=job_name,
            additional_info=f"**Error:** {str(e)}"
        )
        
        actions = get_job_actions(job_id) if job_id != 'INTERACTIVE' else None
        
        client.send_error(
            title="Processing Failed",
            message=error_message,
            actions=actions
        )
        
        # Re-raise the exception so the job is marked as failed
        raise

# Run the example
result = process_with_notification()
print(f"Result: {result}")

Result: Processed 100 records successfully


---

## Job Cleanup Operations

These cells demonstrate how to clean up old scheduler jobs.

### List Current Jobs

In [8]:
# List all jobs (most recent first)
jobs = list_jobs(limit=2000)

print(f"Found {len(jobs)} jobs:\n")
for job in jobs:
    print(f"  {job['job_id'][:8]}... | {job['status']:12} | {job['name'] or 'Unnamed'} | {job['create_time']}")

Found 275 jobs:

  b93f3603... | COMPLETED    | Display Demo | 2025-11-22T04:04:01.527000
  8ebe294d... | COMPLETED    | Display Demo | 2025-11-22T04:03:09.836000
  1c031ac6... | COMPLETED    | Display Demo | 2025-11-22T04:02:06.530000
  d0d780fc... | COMPLETED    | Display Demo | 2025-11-22T04:01:02.991000
  f1b8c9d7... | COMPLETED    | Display Demo | 2025-11-22T04:00:09.739000
  15156b94... | COMPLETED    | Display Demo | 2025-11-22T03:59:06.263000
  613367c1... | COMPLETED    | Display Demo | 2025-11-22T03:58:02.882000
  ca07c140... | COMPLETED    | Display Demo | 2025-11-22T03:57:09.485000
  df5b68a1... | COMPLETED    | Display Demo | 2025-11-22T03:56:06.208000
  0f642898... | COMPLETED    | Display Demo | 2025-11-22T03:55:13.537000
  46eb8a6e... | COMPLETED    | Display Demo | 2025-11-22T03:23:15.182000
  53838cd3... | COMPLETED    | Display Demo | 2025-11-22T03:03:00.867000
  b0684dc0... | COMPLETED    | Display Demo | 2025-11-22T03:02:07.087000
  042e7538... | COMPLETED    | Dis

In [9]:
# List only failed jobs
failed_jobs = list_jobs(status='FAILED', limit=10)

print(f"Found {len(failed_jobs)} failed jobs:\n")
for job in failed_jobs:
    print(f"  {job['job_id'][:8]}... | {job['name'] or 'Unnamed'} | {job['create_time']}")

Found 0 failed jobs:



### List Jobs by Status

View jobs filtered by different statuses.

In [10]:
# List jobs by different statuses
statuses = ['COMPLETED', 'FAILED', 'IN_PROGRESS', 'STOPPED']

print("Jobs by Status")
print("=" * 60)

for status in statuses:
    status_jobs = list_jobs(status=status, limit=100)
    print(f"\n{status}: {len(status_jobs)} jobs")
    
    # Show first 3 of each status
    for job in status_jobs[:3]:
        name = job['name'] or 'Unnamed'
        if len(name) > 30:
            name = name[:27] + '...'
        print(f"  {job['job_id'][:8]}... | {name:30} | {job['create_time']}")
    
    if len(status_jobs) > 3:
        print(f"  ... and {len(status_jobs) - 3} more")

Jobs by Status

COMPLETED: 100 jobs
  b93f3603... | Display Demo                   | 2025-11-22T04:04:01.527000
  8ebe294d... | Display Demo                   | 2025-11-22T04:03:09.836000
  1c031ac6... | Display Demo                   | 2025-11-22T04:02:06.530000
  ... and 97 more

FAILED: 0 jobs

IN_PROGRESS: 0 jobs

STOPPED: 0 jobs


In [11]:
# Display all jobs as a formatted table
import pandas as pd

all_jobs = list_jobs(limit=50)

if all_jobs:
    # Convert to DataFrame for better display
    df = pd.DataFrame(all_jobs)
    
    # Reorder and select columns
    columns = ['job_id', 'name', 'status', 'create_time', 'start_time', 'end_time', 'input_filename']
    df = df[columns]
    
    # Truncate job_id for display
    df['job_id'] = df['job_id'].str[:12] + '...'
    
    # Display
    print(f"All Jobs (showing {len(df)} most recent)")
    print("=" * 100)
    display(df)
else:
    print("No jobs found in the scheduler database.")

All Jobs (showing 50 most recent)


Unnamed: 0,job_id,name,status,create_time,start_time,end_time,input_filename
0,b93f3603-187...,Display Demo,COMPLETED,2025-11-22T04:04:01.527000,2025-11-22T04:04:03.492000,2025-11-22T04:04:06.394000,Display Demo.ipynb
1,8ebe294d-ef5...,Display Demo,COMPLETED,2025-11-22T04:03:09.836000,2025-11-22T04:03:11.126000,2025-11-22T04:03:13.913000,Display Demo.ipynb
2,1c031ac6-74c...,Display Demo,COMPLETED,2025-11-22T04:02:06.530000,2025-11-22T04:02:08.406000,2025-11-22T04:02:11.538000,Display Demo.ipynb
3,d0d780fc-b8f...,Display Demo,COMPLETED,2025-11-22T04:01:02.991000,2025-11-22T04:01:03.934000,2025-11-22T04:01:06.041000,Display Demo.ipynb
4,f1b8c9d7-d62...,Display Demo,COMPLETED,2025-11-22T04:00:09.739000,2025-11-22T04:00:11.662000,2025-11-22T04:00:14.833000,Display Demo.ipynb
5,15156b94-379...,Display Demo,COMPLETED,2025-11-22T03:59:06.263000,2025-11-22T03:59:08.182000,2025-11-22T03:59:11.538000,Display Demo.ipynb
6,613367c1-97e...,Display Demo,COMPLETED,2025-11-22T03:58:02.882000,2025-11-22T03:58:05.376000,2025-11-22T03:58:08.030000,Display Demo.ipynb
7,ca07c140-30f...,Display Demo,COMPLETED,2025-11-22T03:57:09.485000,2025-11-22T03:57:10.377000,2025-11-22T03:57:12.674000,Display Demo.ipynb
8,df5b68a1-3a5...,Display Demo,COMPLETED,2025-11-22T03:56:06.208000,2025-11-22T03:56:06.905000,2025-11-22T03:56:08.602000,Display Demo.ipynb
9,0f642898-dbc...,Display Demo,COMPLETED,2025-11-22T03:55:13.537000,2025-11-22T03:55:14.442000,2025-11-22T03:55:16.720000,Display Demo.ipynb


### Preview Cleanup (Dry Run)

Always preview what will be deleted before actually deleting.

In [17]:
# Preview what would be cleaned up (30 days threshold)
preview = get_cleanup_preview(days_threshold=30)

print("Cleanup Preview (30 days)")
print("=" * 40)
print(f"Jobs to delete: {preview['jobs_count']}")
print(f"Oldest job: {preview['oldest_job_date']}")
print(f"Newest job: {preview['newest_job_date']}")
print(f"\nMessage: {preview['message']}")

if preview['jobs_count'] > 0:
    print(f"\nJob IDs to be deleted:")
    for job_id in preview['job_ids'][:10]:  # Show first 10
        print(f"  - {job_id}")
    if len(preview['job_ids']) > 10:
        print(f"  ... and {len(preview['job_ids']) - 10} more")

Cleanup Preview (30 days)
Jobs to delete: 0
Oldest job: None
Newest job: None

Message: No jobs older than 30 days found


### Actually Delete Old Jobs

**WARNING:** This will permanently delete jobs and their staging files!

In [19]:
# UNCOMMENT THE LINES BELOW TO ACTUALLY DELETE OLD JOBS
# Make sure you've reviewed the preview first!

result = cleanup_old_jobs(days_threshold=30, dry_run=False)
print(f"Cleanup Result: {result['message']}")
print(f"Jobs deleted: {result['jobs_count']}")
print(f"Staging directories deleted: {result['staging_files_deleted']}")

print("Cleanup is commented out for safety. Uncomment to run.")

Cleanup Result: No jobs older than 30 days found
Jobs deleted: 0
Staging directories deleted: 0
Cleanup is commented out for safety. Uncomment to run.


---

## Downloaded Job Outputs

When you click "Download" on a job's output files in the JupyterLab Scheduler UI, the files are copied to:

```
{workspace}/jobs/{notebook_name}-{job_id}/
```

For example:
```
workspace/jobs/Scheduler_Notification_Demo-abc123def/
├── Scheduler_Notification_Demo-2024-01-15.ipynb
├── Scheduler_Notification_Demo-2024-01-15.html
└── Scheduler_Notification_Demo.ipynb (input copy)
```

**Important:** The `cleanup_old_jobs()` function deletes jobs from the **scheduler database** and **staging area**, but it does NOT delete these downloaded output directories. You need to clean those up separately.

### List Downloaded Output Directories

In [20]:
import os
import shutil
from pathlib import Path

# Get workspace root (parent of 'workflows' directory)
workspace_root = Path(os.getcwd()).parent.parent
jobs_output_dir = workspace_root / "jobs"

print(f"Downloaded outputs directory: {jobs_output_dir}")
print("=" * 60)

if jobs_output_dir.exists():
    # List all downloaded job output directories
    output_dirs = sorted(jobs_output_dir.iterdir(), key=lambda x: x.stat().st_mtime, reverse=True)
    
    print(f"\nFound {len(output_dirs)} downloaded job output directories:\n")
    
    total_size = 0
    for d in output_dirs:
        if d.is_dir():
            # Calculate directory size
            dir_size = sum(f.stat().st_size for f in d.rglob('*') if f.is_file())
            total_size += dir_size
            
            # Get modification time
            mtime = datetime.fromtimestamp(d.stat().st_mtime).strftime('%Y-%m-%d %H:%M')
            
            # Count files
            file_count = len(list(d.rglob('*')))
            
            print(f"  {d.name}")
            print(f"    Size: {dir_size / 1024:.1f} KB | Files: {file_count} | Modified: {mtime}")
    
    print(f"\nTotal size: {total_size / 1024 / 1024:.2f} MB")
else:
    print("\nNo downloaded outputs directory found.")
    print("This directory is created when you download job outputs from the Scheduler UI.")

Downloaded outputs directory: /home/jovyan/workspace/jobs

No downloaded outputs directory found.
This directory is created when you download job outputs from the Scheduler UI.


### Clean Up Orphaned Downloaded Outputs

Find and delete downloaded output directories for jobs that no longer exist in the scheduler database.

In [21]:
def find_orphaned_output_dirs(jobs_output_dir: Path) -> list:
    """
    Find downloaded output directories for jobs that no longer exist in the database.
    
    Returns:
        List of (directory_path, job_id) tuples for orphaned directories
    """
    if not jobs_output_dir.exists():
        return []
    
    # Get all job IDs from database
    all_jobs = list_jobs(limit=10000)  # Get all jobs
    existing_job_ids = {job['job_id'] for job in all_jobs}
    
    orphaned = []
    for d in jobs_output_dir.iterdir():
        if d.is_dir():
            # Extract job_id from directory name (format: notebook_name-job_id)
            # The job_id is a UUID like "abc12345-def6-7890-ghij-klmnopqrstuv"
            name_parts = d.name.rsplit('-', 5)  # UUID has 5 parts separated by -
            if len(name_parts) >= 5:
                # Reconstruct the job_id (last 5 parts)
                job_id = '-'.join(name_parts[-5:])
                if job_id not in existing_job_ids:
                    orphaned.append((d, job_id))
    
    return orphaned

# Find orphaned directories
orphaned_dirs = find_orphaned_output_dirs(jobs_output_dir)

print("Orphaned Downloaded Output Directories")
print("=" * 60)
print("(These are outputs for jobs that no longer exist in the scheduler database)")
print()

if orphaned_dirs:
    total_size = 0
    for d, job_id in orphaned_dirs:
        dir_size = sum(f.stat().st_size for f in d.rglob('*') if f.is_file())
        total_size += dir_size
        print(f"  {d.name}")
        print(f"    Job ID: {job_id}")
        print(f"    Size: {dir_size / 1024:.1f} KB")
    
    print(f"\nTotal orphaned: {len(orphaned_dirs)} directories, {total_size / 1024 / 1024:.2f} MB")
else:
    print("No orphaned directories found.")

Orphaned Downloaded Output Directories
(These are outputs for jobs that no longer exist in the scheduler database)

No orphaned directories found.


In [22]:
# DELETE ORPHANED DOWNLOADED OUTPUT DIRECTORIES
# WARNING: This permanently deletes the directories!

# UNCOMMENT TO DELETE:
# if orphaned_dirs:
#     print("Deleting orphaned directories...")
#     for d, job_id in orphaned_dirs:
#         try:
#             shutil.rmtree(d)
#             print(f"  Deleted: {d.name}")
#         except Exception as e:
#             print(f"  Error deleting {d.name}: {e}")
#     print(f"\nDeleted {len(orphaned_dirs)} orphaned directories")
# else:
#     print("No orphaned directories to delete")

print("Deletion is commented out for safety. Uncomment to delete orphaned directories.")

Deletion is commented out for safety. Uncomment to delete orphaned directories.


### Clean Up Old Downloaded Outputs by Age

Delete downloaded output directories older than a specified number of days.

In [23]:
def cleanup_old_downloaded_outputs(
    jobs_output_dir: Path,
    days_threshold: int = 30,
    dry_run: bool = True
) -> dict:
    """
    Clean up downloaded output directories older than specified days.
    
    Args:
        jobs_output_dir: Path to the jobs output directory
        days_threshold: Delete directories older than this many days
        dry_run: If True, only report what would be deleted
        
    Returns:
        Dictionary with results
    """
    if not jobs_output_dir.exists():
        return {'deleted': 0, 'size_freed': 0, 'message': 'Jobs output directory not found'}
    
    threshold_time = datetime.now().timestamp() - (days_threshold * 24 * 60 * 60)
    
    old_dirs = []
    total_size = 0
    
    for d in jobs_output_dir.iterdir():
        if d.is_dir() and d.stat().st_mtime < threshold_time:
            dir_size = sum(f.stat().st_size for f in d.rglob('*') if f.is_file())
            old_dirs.append((d, dir_size))
            total_size += dir_size
    
    if not old_dirs:
        return {
            'deleted': 0,
            'size_freed': 0,
            'dry_run': dry_run,
            'message': f'No downloaded outputs older than {days_threshold} days'
        }
    
    if dry_run:
        return {
            'deleted': len(old_dirs),
            'size_freed': total_size,
            'directories': [str(d[0].name) for d in old_dirs],
            'dry_run': True,
            'message': f'Would delete {len(old_dirs)} directories ({total_size / 1024 / 1024:.2f} MB)'
        }
    
    # Actually delete
    deleted = 0
    for d, _ in old_dirs:
        try:
            shutil.rmtree(d)
            deleted += 1
        except Exception as e:
            print(f"Error deleting {d.name}: {e}")
    
    return {
        'deleted': deleted,
        'size_freed': total_size,
        'dry_run': False,
        'message': f'Deleted {deleted} directories ({total_size / 1024 / 1024:.2f} MB freed)'
    }

# Preview cleanup of downloaded outputs older than 30 days
result = cleanup_old_downloaded_outputs(jobs_output_dir, days_threshold=30, dry_run=True)

print("Downloaded Outputs Cleanup Preview (30 days)")
print("=" * 60)
print(f"Message: {result['message']}")

if result.get('directories'):
    print(f"\nDirectories to delete:")
    for name in result['directories'][:10]:
        print(f"  - {name}")
    if len(result['directories']) > 10:
        print(f"  ... and {len(result['directories']) - 10} more")

Downloaded Outputs Cleanup Preview (30 days)
Message: Jobs output directory not found


In [24]:
# ACTUALLY DELETE OLD DOWNLOADED OUTPUTS
# WARNING: This permanently deletes the directories!

# UNCOMMENT TO DELETE:
# result = cleanup_old_downloaded_outputs(jobs_output_dir, days_threshold=30, dry_run=False)
# print(f"Result: {result['message']}")

print("Deletion is commented out for safety. Uncomment to delete old downloaded outputs.")

Deletion is commented out for safety. Uncomment to delete old downloaded outputs.


### Get Job Details

Look up detailed information about a specific job.

In [25]:
# Get the most recent job and show its details
jobs = list_jobs(limit=1)

if jobs:
    job_id = jobs[0]['job_id']
    job_info = get_job_info(job_id)
    
    if job_info:
        print(f"Job Details for: {job_id}")
        print("=" * 50)
        for key, value in job_info.items():
            print(f"{key}: {value}")
else:
    print("No jobs found in the scheduler database.")

Job Details for: be701053-fc04-43cd-a272-84b79302449f
job_id: be701053-fc04-43cd-a272-84b79302449f
name: Display Demo
status: COMPLETED
create_time: 2025-11-22T04:07:01.320000
start_time: 2025-11-22T04:07:02.417000
end_time: 2025-11-22T04:07:04.352000
input_filename: Display Demo.ipynb
parameters: None
status_message: None


---

## Scheduling This Notebook Programmatically

This example shows how to schedule this notebook with job identification parameters.

In [26]:
# Example: How to schedule this notebook with parameters
# Uncomment and modify to actually schedule

'''
from jupyter_scheduler import SchedulerClient

client = SchedulerClient()

# Create the job with a name parameter
job_name = "Scheduled Notification Test"

job = client.create_job(
    name=job_name,
    path="workflows/POC/Scheduler_Notification_Demo.ipynb",
    parameters={
        "_scheduler_job_name": repr(job_name)  # Use repr() for string values
    },
    output_formats=["notebook", "html"]
)

print(f"Created job: {job['job_id']}")
print(f"Job name: {job_name}")

# Note: The job_id will be available in the notification
# because we look it up during execution
'''

print("Scheduling example is commented out. Uncomment to create a scheduled job.")

Scheduling example is commented out. Uncomment to create a scheduled job.


---

## Summary

### Key Takeaways

1. **Job Identification in Notifications**
   - Pass `_scheduler_job_name` as a parameter when scheduling
   - Use `build_job_notification_message()` to format messages with job ID
   - Use `get_job_actions()` to create action buttons that link to the job

2. **Cleanup Old Jobs**
   - Use `get_cleanup_preview()` to see what will be deleted
   - Use `cleanup_old_jobs(dry_run=False)` to actually delete
   - Set appropriate `days_threshold` (default: 30 days)

3. **Error Handling Pattern**
   - Wrap main logic in try/except
   - Send success notification on completion
   - Send error notification on failure with job identification
   - Re-raise exception so scheduler marks job as failed

In [27]:
print("\nNotebook execution complete!")
print(f"Timestamp: {datetime.now().isoformat()}")


Notebook execution complete!
Timestamp: 2025-11-22T04:08:34.183545
