# Jupyter Scheduler Job Operations

This utility helps you with the following:
1. List Jupyter scheduler jobs.
2. Clean up old scheduler jobs and downloaded output files.

## Setup

In [1]:
import pandas as pd
from datetime import datetime
import os
import shutil
from pathlib import Path

from helpers.notebook_scheduler import (
    cleanup_old_jobs,
    list_jobs,
    get_job_info,
    get_downloaded_output_directory,
    find_orphaned_outputs
)

## Input

***What is old? (Number of days)***

In [2]:
days_threshold = 30
dry_run = False  # Set dry_run=False to DELETE, True to preview

## Count and List Jobs

### Total Number of Jobs

In [3]:
all_jobs = list_jobs(limit=100000)
print(f"\nTotal: {len(all_jobs)} jobs")


Total: 64 jobs


### List Jobs by Status

View jobs filtered by different statuses.

In [4]:
# List jobs by different statuses
statuses = ['FAILED'] # 'COMPLETED', 'FAILED', 'IN_PROGRESS', 'STOPPED'

print("Jobs by Status")
print("=" * 60)

for status in statuses:
    status_jobs = list_jobs(status=status, limit=1000)
    print(f"\n{status}: {len(status_jobs)} jobs")
    
    # Show first 3 of each status
    for job in status_jobs[:3]:
        job_id = job['job_id']
        job_info = get_job_info(job_id)
        
        if job_info:
            print(f"Job Details for: {job_id}")
            print("=" * 50)
            for key, value in job_info.items():
                print(f"{key}: {value}")    
    if len(status_jobs) > 3:
        print(f"  ... and {len(status_jobs) - 3} more")

Jobs by Status

FAILED: 1 jobs
Job Details for: f90d7a0d-16df-4a3b-a183-fade3856e326
job_id: f90d7a0d-16df-4a3b-a183-fade3856e326
name: Display Demo
status: FAILED
create_time: 2025-11-24T04:27:00.780000
start_time: 2025-11-24T04:27:01.700000
end_time: None
input_filename: Display Demo.ipynb
parameters: None
status_message: Kernel didn't respond in 60 seconds


---



## Clean-up

### Delete Old Jobs

**WARNING:** This will permanently delete jobs and their staging files!

In [5]:
# Make sure you've reviewed the preview first!
result = cleanup_old_jobs(days_threshold=days_threshold, dry_run=dry_run)

print(f"{result['message']}")
if not dry_run:
    print(f"Jobs deleted: {result['jobs_count']}")
    print(f"Staging directories deleted: {result['staging_files_deleted']}")

No jobs older than 30 days found
Jobs deleted: 0
Staging directories deleted: 0


---

## Downloaded Job Outputs

When you click "Download" on a job's output files in the JupyterLab Scheduler UI, the files are copied to:

```
{jupyter_home}/jobs/{notebook_name}-{job_id}/
```

For example:
```
jovyan/home/jobs/Scheduler_Notification_Demo-abc123def/
├── Scheduler_Notification_Demo-2024-01-15.ipynb
├── Scheduler_Notification_Demo-2024-01-15.html
└── Scheduler_Notification_Demo.ipynb (input copy)
```

**Important:** The `cleanup_old_jobs()` function deletes jobs from the **scheduler database** and **staging area**, but it does NOT delete these downloaded output directories. You need to clean those up separately.

### List Downloaded Output Directories

In [6]:
downloaded_output_directory = get_downloaded_output_directory()
downloaded_output_directory


Total size: 0.00 MB


{'jobs_output_directory': '/home/jovyan/jobs',
 'directories': [],
 'total_size': '0.00 MB'}

### Clean Up Downloaded Outputs

Find and delete downloaded output directories for jobs.

***WARNING:*** This permanently deletes the directories!

In [7]:
if len(downloaded_output_directory.get('directories', [])) > 0:
    print("Deleting directories...")
    for directory in downloaded_output_directory.get('directories'):
        try:
            shutil.rmtree(directory.get('directory'))
            print(f"  Deleted: {directory.get('directory')}")
        except Exception as e:
            print(f"  Error deleting {d.get('directory')}: {e}")
else:
    print("No output directories to delete")

No output directories to delete


---