# IRP Notebook Framework - User Guide

## Getting Started

### First Time Setup

The workflow uses a PostgreSQL database to track cycles, stages, steps, and execution history. Before running your first workflow, you need to initialize the database.

**[→ Run Database Administration Tool](./_Tools/Database%20Management/Database%20Administration.ipynb)**

This will:
- Test database connectivity
- Create required tables (`irp_cycle`, `irp_stage`, `irp_step`, `irp_step_run`, `irp_batch`, `irp_job`)
- Set up schema and indexes
- Verify initialization
 
 ⚠ **Note:** Initializing the database will clear all existing metadata. Only use this for first-time setup or when you want to reset everything.

## Cycle Management

### Creating a New Cycle

Use the Notebook **[→ Create a New Cycle](./_Tools/Cycle%20Management/New%20Cycle.ipynb)** to create a new cycle of analysis.

A **cycle** represents one complete execution of your workflow (e.g., a quarterly analysis). Only one cycle can be active at a time.

#### Cycle Naming Convention
Follow this pattern: `Analysis-YYYY-QN[-optional-suffix]`

**Valid Examples:**
- `Analysis-2025-Q4`
- `Analysis-2025-Q4-v1`
- `Analysis-2025-Q4-November`

#### What Happens When You Create a Cycle?

1. **Archive Previous Cycle** (if exists)
   - Current `Active_*` directory → moved to `_Archive/`
   - Active cycle in database → marked as `ARCHIVED`

2. **Create New Active Cycle**
   - New directory created: `workflows/Active_<cycle_name>/`
   - Template copied from `_Template/` directory
   - Structure includes: `config/`, `temp/`, `notebooks/`

3. **Register in Database**
   - Cycle registered with status `ACTIVE`
   - All stages and steps from template notebooks are registered
   - Step execution tracking enabled


### Deleting the Active Cycle

In case you want to get rid of the active cycle and start over, run the **[→ Delete Active Cycle](./_Tools/Cycle%20Management/Delete%20Cycle.ipynb)**

This permanently removes the active cycle from both the filesystem and database.

⚠ **Warning:** This action cannot be reversed!

#### What Gets Deleted?
- The `Active_<cycle_name>` directory and all contents
- Cycle record from database
- All associated stages, steps, and execution history
- Job batches and job records

#### When to Use This?
- Testing/development when you need to start fresh
- Accidentally created a cycle with wrong name
- Need to clean up incomplete cycles

**Alternative:** If you want to preserve history, use the "Create New Cycle" tool instead, which automatically archives the current cycle.

---

### Purging Archived Cycles

If you would like to clean up the archives, run **[→ Purge Archive](./_Tools/Cycle%20Management/Purge%20Archive.ipynb)**

Permanently removes all archived cycles to free up disk space and clean up the database.

#### What Gets Removed?
- All directories under `_Archive/`
- All cycles with `ARCHIVED` status from database
- Associated stages, steps, and execution history

#### When to Use This?
- Quarterly/annual cleanup
- Disk space management
- Before major version upgrades
- After backing up important archived cycles

💡 **Best Practice:** Export/backup any important archived cycle data before purging.

## Database Administration

### Database Administration Tool

**[→ Database Administration](./_Tools/Database%20Management/Database%20Administration.ipynb)**

A comprehensive tool for managing your PostgreSQL database.

#### Features

**1. Database Connectivity Check**
- Tests connection to PostgreSQL
- Displays server version
- Verifies network connectivity
- Shows connection parameters

**2. Initialize Database**
- Creates all required tables
- Sets up relationships and indexes
- Configures schema
- ⚠️ **Warning:** Clears all existing data!

**3. Database Statistics**
View row counts for all tables:
- `irp_cycle` - Workflow cycles
- `irp_stage` - Stages within cycles
- `irp_step` - Individual steps
- `irp_step_run` - Step execution history
- `irp_batch` - Job batches
- `irp_job` - Individual jobs

**4. Recent Activity**
- View last 10 step executions
- See execution duration
- Check success/failure status
- Identify performance bottlenecks

**5. Cleanup Utilities**
Pre-built queries for common maintenance tasks:
```sql
-- Clear failed step runs
DELETE FROM irp_step_run WHERE status = 'FAILED'

-- Clear all step runs (keep step definitions)
DELETE FROM irp_step_run

-- Remove archived cycles older than 30 days
DELETE FROM irp_cycle 
WHERE status = 'ARCHIVED' 
AND archived_ts < NOW() - INTERVAL '30 days'
```

**6. Custom Query Builder**
Execute custom SQL queries for:
- Data analysis
- Troubleshooting
- Custom reports
- Database exploration

#### Database Schema Overview

TODO

---

### Database Connectivity Check

You can run a **[→ Quick Connectivity Test](./_Tools/Database%20Management/Database%20Connectivity%20Check.ipynb)** any time with this lightweight notebook that just tests database connectivity.

#### When to Use This?
- Quick health check before starting work
- Troubleshooting connection issues
- Verifying database is running
- Checking after configuration changes

#### What It Shows
- Connection successful / failed
- PostgreSQL version
- Connection parameters (host, port, database)

💡 **Tip:** Bookmark this for quick daily health checks!

## System Health & Monitoring

### System Status Dashboard

**[→ System Health Check](./_Tools/System%20Health/System_Status.ipynb)**

A comprehensive dashboard showing the current state of your workflow system.

#### What's Included?

**1. System Health Checks**
Verifies all critical components:
- Database connectivity
- Template directory exists
- Active cycle status
- Configuration files

**2. Active Cycle Overview**
- Cycle name and creation date
- Created by user
- Metadata information
- Cycle ID

**3. Step Progress Summary**
Visual overview of all steps in the active cycle:
- Total steps registered
- Completed steps
- Running steps
- Failed steps
- Skipped steps
- Not yet started

**4. Recent Activity**
Last 10 step executions with:
- Stage and step names
- Run numbers
- Execution status
- Start timestamp
- Duration (formatted as readable time)

**5. Directory Structure**
Visual tree of your active cycle

#### When to Use This?
- Start of each work session
- After creating a new cycle
- Before starting step execution
- Troubleshooting issues
- Progress reporting
- Team status updates

💡 **Best Practice:** Check system health before starting any major workflow execution.

---

##  Workflow Concepts

### Understanding the Framework

#### Cycles
A **cycle** represents a complete end-to-end workflow execution.

**Characteristics:**
- Named using convention: `Analysis-YYYY-QN[-suffix]`
- Only **one cycle can be active** at a time
- Tracked in database with status: `ACTIVE` or `ARCHIVED`
- Has its own isolated directory: `Active_<cycle_name>/`

**Examples:**
- `Analysis-2025-Q4` - Standard quarterly cycle
- `Analysis-2025-Q4-v2` - Second version/retry
- `Analysis-2025-Q4-Special` - Special run with modifications

**Lifecycle:**
```
CREATE → ACTIVE → ARCHIVED → PURGED
   ↓        ↓         ↓          ↓
  New    Current   Stored    Deleted
```

---

#### Stages
Each cycle is organized into **stages** that group related steps.

**Characteristics:**
- Numbered sequentially: `Stage_01`, `Stage_02`, etc.
- Have descriptive names: `Setup`, `Extract`, `Process`, `Submit`, `Monitor`
- Stored as directories in `notebooks/`
- Tracked in `irp_stage` table

---

#### Steps
Each stage contains **steps** implemented as Jupyter notebooks.

**Characteristics:**
- Numbered within each stage: `Step_01`, `Step_02`, etc.
- Self-contained Jupyter notebooks (`.ipynb`)
- Use helper modules for framework integration
- Automatically tracked in `irp_step` table

**Naming Convention:**
```
Step_<number>_<descriptive_name>.ipynb
```

**Examples:**
- `Step_01_Initialize.ipynb`
- `Step_02_Data_Validation.ipynb`
- `Step_03_Run_Model.ipynb`

**Step Execution Tracking:**
Each time a step runs, it creates a record in `irp_step_run`:
- Run number (increments with each execution)
- Status: `ACTIVE`, `COMPLETED`, `FAILED`, `SKIPPED`
- Timestamps: started, completed
- Error messages (if failed)
- Output data (JSON)

---

#### Execution Flow

**Typical Workflow:**

1. **Create Cycle**
   ```
   Run: New Cycle.ipynb
   → Creates Active_<name>/ directory
   → Registers cycle, stages, and steps in database
   ```

2. **Execute Steps Sequentially**
   ```
   Navigate to: Active_<name>/notebooks/Stage_01_Setup/
   Run: Step_01_Initialize.ipynb
   → Uses Step context for automatic tracking
   → Records execution in irp_step_run
   → Status updates in real-time
   ```

3. **Monitor Progress**
   ```
   Run: System_Status.ipynb
   → View completion status
   → Check for failures
   → Review execution times
   ```

4. **Complete Cycle**
   ```
   When finished, cycle remains active until:
   → New cycle created (auto-archives old one)
   → Manually deleted
   → Archived and purged
   ```

---

### Key Components

#### Helper Modules
Located in `workspace/helpers/`:

- **`database.py`** - Database operations (cycles, stages, steps, runs)
- **`cycle.py`** - Cycle management (create, archive, validate)
- **`step.py`** - Step execution context and tracking
- **`ux.py`** - User interface utilities (tables, formatting, colors)
- **`context.py`** - Workflow context management
- **`constants.py`** - Configuration and status enums

#### Tool Notebooks
Located in `workflows/_Tools/`:

**Cycle Management:**
- New Cycle.ipynb
- Delete Cycle.ipynb
- Purge Archive.ipynb

**Database Management:**
- Database Administration.ipynb
- Database Connectivity Check.ipynb

**System Health:**
- System_Status.ipynb

#### Template Structure
The `_Template` directory is the master template copied to each new cycle:

```
_Template/
├── config/                        # Configuration files
├── temp/                          # Temporary working files
└── notebooks/
    ├── Stage_01_<stagename>/
    │   ├── Step_01_<stepname>.ipynb
    │   └── Step_02_<stepname>.ipynb
    └── Stage_02_<stagename>/
        └── Step_01_<stepname>.ipynb
```

**Customizing Templates:**
1. Modify `_Template/` directory structure
2. Add/remove stages and steps
3. Update step notebooks with your logic
4. Next cycle creation will use updated template

---

## Common Tasks

### Daily Workflow

1. **Start of Day**
   - Check system health: [System_Status.ipynb](./_Tools/System%20Health/System_Status.ipynb)
   - Verify database connectivity
   - Review active cycle progress

2. **During Execution**
   - Navigate to: `Active_<cycle>/notebooks/Stage_XX/`
   - Run step notebooks sequentially
   - Monitor for errors
   - Check execution logs

3. **End of Day**
   - Review progress in System Status
   - Document any issues
   - Prepare next steps

### Troubleshooting

**Problem: Database connection failed**
- Check PostgreSQL is running
- Verify connection parameters in `.env`
- Run [Database Connectivity Check](./_Tools/Database%20Management/Database%20Connectivity%20Check.ipynb)
- For local development, add `os.environ['DB_SERVER'] = 'localhost'` at top of notebook

**Problem: No active cycle**
- Create new cycle: [New Cycle.ipynb](./_Tools/Cycle%20Management/New%20Cycle.ipynb)

**Problem: Step execution failed**
- Check error message in step output
- Review step run history in database
- Re-run failed step after fixing issue

**Problem: Template not found**
- Verify `_Template/` directory exists
- Check directory permissions
- Restore from backup if needed

---

## Additional Resources

### Configuration Files

**Environment Variables (`.env`):**
```bash
DB_SERVER=postgres        # Database host (use 'localhost' for local dev)
DB_PORT=5432              # Database port
DB_NAME=irp_db            # Database name
DB_USER=irp_user          # Database user
DB_PASSWORD=irp_pass      # Database password
SYSTEM_USER=notebook_user # Default system user
```

### Status Enums

**Cycle Status:**
- `ACTIVE` - Currently active cycle
- `ARCHIVED` - Completed/archived cycle

**Step Status:**
- `ACTIVE` - Currently executing
- `COMPLETED` - Successfully finished
- `FAILED` - Execution failed
- `SKIPPED` - Manually skipped

**Job Status:**
- `INITIATED` - Job created, not yet submitted
- `SUBMITTED` - Submitted to scheduler
- `QUEUED` - In queue waiting to run
- `RUNNING` - Currently executing
- `FINISHED` - Successfully completed
- `FAILED` - Execution failed
- `CANCELLED` - Cancelled
TODO
---

## Getting Help

**Common Questions:**

1. **How do I add a new step?**
   - Add notebook to appropriate stage directory in `_Template/`
   - Follow naming convention: `Step_XX_Name.ipynb`
   - Next cycle creation will include new step

2. **Can I have multiple active cycles?**
   - No, only one cycle can be `ACTIVE` at a time
   - This ensures clarity and prevents conflicts

3. **How do I backup my work?**
   - Archive important cycles before purging
   - Export database: `pg_dump irp_db > backup.sql`
   - Copy `Active_*` directories to backup location

4. **How do I restore a cycle?**
   - Copy directory from `_Archive/` to rename as `Active_*`
   - Update cycle status in database to `ACTIVE`
   - Use Database Administration tool for manual updates

5. **Can I modify an active cycle's notebooks?**
   - Yes! Modify notebooks in `Active_<cycle>/notebooks/`
   - Changes only affect current cycle
   - Template remains unchanged for future cycles