# Syft Job Submission System - Examples

This notebook demonstrates the **simplified bash-script-only** syft-job system.

## Key Features:
- ✅ **Bash-script driven** - Users provide custom bash scripts
- ✅ **Environment variable injection** - CODE_DIR, OUTPUT_DIR, TRAIN, TEST, etc.
- ✅ **Local & syft:// URL support** - Works with both local paths and syft URLs
- ✅ **Simple job structure** - Each job creates config.yaml and run.sh
- ✅ **Multi-language support** - Python, Go, or any language via bash

## Job Structure Created:
```
job-{uuid}/
├── config.yaml    # Job configuration
├── run.sh         # Your bash script (processed)
├── inputs/        # Resolved input data
└── outputs/       # Job output files
```

In [None]:
# Setup and imports
import sys
import os
from pathlib import Path
import json
import shutil

# Add src to path so we can import syft_job
sys.path.insert(0, './src')
import syft_job as sj

print("🎉 Syft Job System - Bash-Script Driven")
print("=" * 50)

## Helper Functions
Let's create some sample data and code directories for our examples:

In [None]:
def create_sample_data():
    """Create sample data files for testing."""
    data_dir = Path("./sample_data")
    data_dir.mkdir(exist_ok=True)
    
    # Create train data
    train_data = data_dir / "train.csv" 
    train_data.write_text("id,feature1,feature2,label\n1,0.1,0.2,A\n2,0.3,0.4,B\n3,0.5,0.6,A\n")
    
    # Create test data
    test_data = data_dir / "test.csv"
    test_data.write_text("id,feature1,feature2\n4,0.7,0.8\n5,0.9,1.0\n")
    
    print(f"📁 Created sample data in: {data_dir}")
    return str(data_dir)

def create_python_code():
    """Create a simple Python analysis script."""
    code_dir = Path("./sample_code/python_analysis")
    code_dir.mkdir(parents=True, exist_ok=True)
    
    # Create the main Python script
    main_py = code_dir / "analysis.py"
    main_py.write_text("""
import os
import pandas as pd
import json

print("[ANALYSIS] Starting Python data analysis...")
print(f"[ANALYSIS] Working directory: {os.getcwd()}")

# Read environment variables for inputs
train_file = os.environ.get("TRAIN", "No TRAIN data")
test_file = os.environ.get("TEST", "No TEST data")
output_dir = os.environ.get("OUTPUT_DIR", "./outputs")

print(f"[ANALYSIS] Train data: {train_file}")
print(f"[ANALYSIS] Test data: {test_file}")
print(f"[ANALYSIS] Output directory: {output_dir}")

# Read and process data
results = {"job_type": "python_analysis", "status": "completed"}

if os.path.exists(train_file):
    train_data = pd.read_csv(train_file)
    results["train_rows"] = len(train_data)
    print(f"[ANALYSIS] Loaded training data: {len(train_data)} rows")
    print(train_data.head())
else:
    results["train_rows"] = 0

if os.path.exists(test_file):
    test_data = pd.read_csv(test_file)
    results["test_rows"] = len(test_data)
    print(f"[ANALYSIS] Loaded test data: {len(test_data)} rows")
else:
    results["test_rows"] = 0

# Write results
os.makedirs(output_dir, exist_ok=True)
with open(os.path.join(output_dir, "analysis_results.json"), "w") as f:
    json.dump(results, f, indent=2)

print("[ANALYSIS] Analysis complete! Results saved.")
""")
    
    # Create requirements.txt
    requirements = code_dir / "requirements.txt"
    requirements.write_text("pandas>=1.0.0\n")
    
    print(f"📁 Created Python code in: {code_dir}")
    return str(code_dir)

# Create sample data and code
data_dir = create_sample_data()
python_code_dir = create_python_code()

## Example 1: Simple Python Job

This example shows how to submit a Python analysis job with a custom bash script:

In [None]:
print("\n=== Example 1: Python Analysis Job ===")

# Define our bash script for Python execution
python_script = """#!/bin/bash
set -e

echo "[JOB] Starting Python analysis job..."
echo "[JOB] Code directory: $CODE_DIR"
echo "[JOB] Output directory: $OUTPUT_DIR"

# Install Python dependencies
echo "[JOB] Installing Python dependencies..."
if [ -f "$CODE_DIR/requirements.txt" ]; then
    pip install -r "$CODE_DIR/requirements.txt"
else
    echo "[JOB] No requirements.txt found, skipping dependency installation"
fi

# Run the Python analysis
echo "[JOB] Running Python analysis script..."
python "$CODE_DIR/analysis.py"

echo "[JOB] Python job completed successfully!"
"""

# Submit the job
result = sj.submit_job(
    name="Python Data Analysis",
    code=python_code_dir,  # Local path to code
    run_script=python_script,  # Our custom bash script
    inputs={
        "TRAIN": f"{data_dir}/train.csv",  # Local path
        "TEST": f"{data_dir}/test.csv"     # Local path  
    },
    job_dir="./jobs",
    sync=True
)

print(f"\n🎯 Job Results:")
print(f"   Job ID: {result.job_id}")
print(f"   Status: {result.status}")
print(f"   Duration: {result.duration:.2f}s")

if result.output:
    print(f"\n📜 Job Output:")
    print(result.output)

if result.artifacts:
    print(f"\n📁 Artifacts created: {len(result.artifacts)} files")
    for artifact in result.artifacts:
        print(f"   - {artifact}")

python_job_id = result.job_id

## Example 2: Inline Bash Script Job

This example demonstrates using inline bash scripts for simple data processing:

In [None]:
print("\n=== Example 2: Inline Bash Processing ===")

# Define an inline bash script for data processing
bash_processing_script = """#!/bin/bash
set -e

echo "[BASH] Starting bash data processing..."
echo "[BASH] Available data: $TRAIN"
echo "[BASH] Output directory: $OUTPUT_DIR"

# Simple data processing with bash tools
if [ -f "$TRAIN" ]; then
    echo "[BASH] Processing training data..."
    line_count=$(wc -l < "$TRAIN")
    echo "[BASH] Training data has $line_count lines"
    
    # Create data summary
    echo "Data Processing Summary" > "$OUTPUT_DIR/summary.txt"
    echo "======================" >> "$OUTPUT_DIR/summary.txt"
    echo "Input file: $TRAIN" >> "$OUTPUT_DIR/summary.txt"
    echo "Total lines: $line_count" >> "$OUTPUT_DIR/summary.txt"
    echo "Processed at: $(date)" >> "$OUTPUT_DIR/summary.txt"
    
    # Create a JSON result
    echo '{"type": "bash_processing", "status": "completed", "lines_processed": '$line_count'}' > "$OUTPUT_DIR/result.json"
else
    echo "[BASH] No training data found at: $TRAIN"
fi

echo "[BASH] Bash processing completed!"
"""

# Submit the job
result = sj.submit_job(
    name="Bash Data Processing",
    code=data_dir,  # Use data directory as code directory
    run_script=bash_processing_script,  # Inline bash script
    inputs={
        "TRAIN": f"{data_dir}/train.csv"
    },
    job_dir="./jobs",
    sync=True
)

print(f"\n🎯 Job Results:")
print(f"   Job ID: {result.job_id}")
print(f"   Status: {result.status}")
print(f"   Duration: {result.duration:.2f}s")

if result.output:
    print(f"\n📜 Job Output:")
    print(result.output)

bash_job_id = result.job_id

## Example 3: Multi-Language Support (Go)

The system supports any language through custom bash scripts. Here's a Go example:

In [None]:
print("\n=== Example 3: Go Programming Job ===")

# Create a simple Go program
def create_go_code():
    code_dir = Path("./sample_code/go_processor")
    code_dir.mkdir(parents=True, exist_ok=True)
    
    # Create main.go
    main_go = code_dir / "main.go"
    main_go.write_text("""
package main

import (
	"encoding/json"
	"fmt"
	"os"
	"path/filepath"
	"time"
)

type Result struct {
	JobType   string    `json:"job_type"`
	Status    string    `json:"status"`
	Message   string    `json:"message"`
	Timestamp time.Time `json:"timestamp"`
}

func main() {
	fmt.Println("[GO] Starting Go data processor...")
	
	// Read environment variables
	trainFile := os.Getenv("TRAIN")
	testFile := os.Getenv("TEST")
	outputDir := os.Getenv("OUTPUT_DIR")
	
	fmt.Printf("[GO] Train file: %s\\n", trainFile)
	fmt.Printf("[GO] Test file: %s\\n", testFile)
	fmt.Printf("[GO] Output dir: %s\\n", outputDir)
	
	// Create output directory
	os.MkdirAll(outputDir, 0755)
	
	// Create result
	result := Result{
		JobType:   "go_processor",
		Status:    "completed",
		Message:   "Go data processing completed successfully",
		Timestamp: time.Now(),
	}
	
	// Write result to JSON file
	resultFile := filepath.Join(outputDir, "go_results.json")
	file, err := os.Create(resultFile)
	if err != nil {
		fmt.Printf("Error creating result file: %v\\n", err)
		return
	}
	defer file.Close()
	
	encoder := json.NewEncoder(file)
	encoder.SetIndent("", "  ")
	if err := encoder.Encode(result); err != nil {
		fmt.Printf("Error writing result: %v\\n", err)
		return
	}
	
	fmt.Println("[GO] Processing complete! Results saved.")
}
""")
    
    # Create go.mod
    go_mod = code_dir / "go.mod"
    go_mod.write_text("module go_processor\n\ngo 1.21\n")
    
    print(f"📁 Created Go code in: {code_dir}")
    return str(code_dir)

# Create Go code
go_code_dir = create_go_code()

# Define Go execution script
go_script = """#!/bin/bash
set -e

echo "[JOB] Starting Go build and execution..."
echo "[JOB] Code directory: $CODE_DIR"
echo "[JOB] Output directory: $OUTPUT_DIR"

# Build the Go program
echo "[JOB] Building Go program..."
cd "$CODE_DIR"
go build -o "$OUTPUT_DIR/go_processor" main.go

# Run the Go program
echo "[JOB] Running Go program..."
"$OUTPUT_DIR/go_processor"

echo "[JOB] Go job completed successfully!"
"""

# Submit the Go job
result = sj.submit_job(
    name="Go Data Processor", 
    code=go_code_dir,
    run_script=go_script,
    inputs={
        "TRAIN": f"{data_dir}/train.csv",
        "TEST": f"{data_dir}/test.csv"
    },
    job_dir="./jobs",
    timeout=60,
    sync=True
)

print(f"\n🎯 Job Results:")
print(f"   Job ID: {result.job_id}")
print(f"   Status: {result.status}")
print(f"   Duration: {result.duration:.2f}s")

if result.output:
    print(f"\n📜 Job Output:")
    print(result.output)

go_job_id = result.job_id

## Example 4: Batch Job Submission

Submit multiple jobs at once using the batch functionality:

In [None]:
print("\n=== Example 4: Batch Job Submission ===")

# Define a batch processing script
batch_script = """#!/bin/bash
set -e

echo "[BATCH] Starting batch processing job..."
echo "[BATCH] Job ID: $BATCH_ID"
echo "[BATCH] Processing data: $TRAIN"

# Simple processing with bash tools
if [ -f "$TRAIN" ]; then
    line_count=$(wc -l < "$TRAIN")
    echo "[BATCH] Found $line_count lines in data file"
    
    # Create batch result
    echo '{"batch_job": "'$BATCH_ID'", "lines_processed": '$line_count', "status": "completed"}' > "$OUTPUT_DIR/batch_result.json"
else
    echo "[BATCH] No data file found"
    echo '{"batch_job": "'$BATCH_ID'", "status": "no_data"}' > "$OUTPUT_DIR/batch_result.json"
fi

echo "[BATCH] Batch job completed!"
"""

# Create multiple batch jobs
jobs = []
for i in range(3):
    jobs.append({
        "name": f"Batch Job {i+1}",
        "code": data_dir,  # Use data directory
        "run_script": batch_script.replace("$BATCH_ID", f"batch_{i+1}"),
        "inputs": {
            "TRAIN": f"{data_dir}/train.csv"
        },
        "metadata": {"batch_index": i}
    })

# Submit batch jobs
results = sj.submit_batch_jobs(jobs, job_dir="./jobs")

print(f"\n🎯 Batch Results:")
print(f"   Submitted {len(results)} batch jobs:")
for i, result in enumerate(results):
    print(f"   Job {i+1}: {result.job_id} - {result.status} ({result.duration:.2f}s)")

batch_job_ids = [r.job_id for r in results]

## Inspect Job Workspaces

Let's explore the job workspaces that were created:

In [None]:
print("\n=== Job Workspace Inspection ===")

jobs_dir = Path("./jobs")
if jobs_dir.exists():
    job_dirs = [d for d in jobs_dir.iterdir() if d.is_dir()]
    print(f"\n📁 Found {len(job_dirs)} job workspaces:")
    
    for job_dir in sorted(job_dirs, key=lambda p: p.stat().st_mtime):
        print(f"\n🗂️  {job_dir.name}:")
        
        # Show directory structure
        for item in job_dir.iterdir():
            if item.is_file():
                if item.name == "config.yaml":
                    print(f"    📄 {item.name} (job configuration)")
                elif item.name == "run.sh":
                    print(f"    📄 {item.name} (bash script)")
                else:
                    print(f"    📄 {item.name}")
            elif item.is_dir():
                if item.name == "inputs":
                    input_files = list(item.glob("*"))
                    print(f"    📁 {item.name}/ ({len(input_files)} files)")
                elif item.name == "outputs":
                    output_files = list(item.glob("*"))
                    print(f"    📁 {item.name}/ ({len(output_files)} files)")
                    for output_file in output_files:
                        print(f"       📄 {output_file.name}")
                else:
                    print(f"    📁 {item.name}/")
else:
    print("No jobs directory found")

## Show Generated Files

Let's examine the config.yaml and run.sh files that were generated:

In [None]:
print("\n=== Generated Job Files ===")

jobs_dir = Path("./jobs")
if jobs_dir.exists():
    job_dirs = [d for d in jobs_dir.iterdir() if d.is_dir()]
    if job_dirs:
        # Show the most recent job
        latest_job = max(job_dirs, key=lambda p: p.stat().st_mtime)
        print(f"\n📋 Inspecting latest job: {latest_job.name}")
        
        # Show config.yaml
        config_file = latest_job / "config.yaml"
        if config_file.exists():
            print(f"\n📄 config.yaml:")
            print("-" * 40)
            print(config_file.read_text())
        
        # Show run.sh (first 20 lines to keep it manageable)
        run_file = latest_job / "run.sh" 
        if run_file.exists():
            print(f"\n📄 run.sh (first 20 lines):")
            print("-" * 40)
            lines = run_file.read_text().split('\n')
            for i, line in enumerate(lines[:20]):
                print(f"{i+1:2d}: {line}")
            if len(lines) > 20:
                print(f"... ({len(lines) - 20} more lines)")
    else:
        print("No job directories found")
else:
    print("No jobs directory found")

## Summary

This notebook demonstrated the key features of the simplified syft-job system:

In [None]:
print("\n" + "=" * 60)
print("🎉 Syft Job System Examples Complete!")
print("\n📖 Key Features Demonstrated:")
print("   ✅ Bash-script driven execution")
print("   ✅ Environment variable injection (CODE_DIR, OUTPUT_DIR, TRAIN, TEST)")
print("   ✅ Multi-language support (Python, Go, Bash)")
print("   ✅ Local path and syft:// URL support")
print("   ✅ Batch job submission")
print("   ✅ Simple job workspace structure")

print("\n💡 Basic Usage Pattern:")
print("""```python
import syft_job as sj

# Define your bash script
script = '''#!/bin/bash
set -e
# Install dependencies
pip install -r "$CODE_DIR/requirements.txt"
# Run your code  
python "$CODE_DIR/my_script.py"
'''

# Submit the job
result = sj.submit_job(
    name="My Job",
    code="./my_code",
    run_script=script,
    inputs={"DATA": "./data.csv"},
    job_dir="./jobs"
)
```""")

print("\n📁 Each job creates a workspace with:")
print("   📄 config.yaml - Job configuration")
print("   📄 run.sh - Your processed bash script")
print("   📁 inputs/ - Resolved input data files")
print("   📁 outputs/ - Job output files")

print("\n🔧 Environment Variables Available in Scripts:")
print("   • CODE_DIR - Path to your code directory")
print("   • OUTPUT_DIR - Path to job outputs directory")
print("   • {INPUT_NAME} - Paths to your input files (TRAIN, TEST, DATA, etc.)")

In [None]:
# Cleanup (optional)
print("\n🧹 Cleanup (run this cell to remove example files):")
print("Uncomment the lines below to clean up the example files created:")
print()

# Uncomment these lines to clean up:
# shutil.rmtree("./sample_data", ignore_errors=True)
# shutil.rmtree("./sample_code", ignore_errors=True)
# shutil.rmtree("./jobs", ignore_errors=True)
# print("✅ Cleanup completed!")