# Background Tasks with Celery - Interactive Learning
# مهام الخلفية مع Celery - تعلم تفاعلي

This notebook covers:
- Understanding background tasks
- Celery architecture
- Implementing Celery tasks
- Task patterns and best practices

يغطي هذا المفكرة:
- فهم مهام الخلفية
- معمارية Celery
- تنفيذ مهام Celery
- أنماط المهام وأفضل الممارسات

## Part 1: Understanding Background Tasks
## الجزء 1: فهم مهام الخلفية

### What are background tasks?
### ما هي مهام الخلفية؟

Background tasks allow you to execute operations asynchronously, outside the request-response cycle.

In [None]:
# Synchronous processing (blocks the request)
def process_document_sync(document):
    print("1. Extracting text...")
    import time
    time.sleep(2)
    print("2. Generating embeddings...")
    time.sleep(3)
    print("3. Saving to vector store...")
    time.sleep(1)
    return "Done"

print("Synchronous processing:")
process_document_sync("doc.pdf")
print("Total time: ~6 seconds (blocks everything)")

### Exercise 1: Convert to Async
### تمرين 1: تحويل إلى غير متزامن

Convert the synchronous function above to use async processing (simulate with threading)

In [None]:
import threading
import queue

# TODO: Create an async version using threading
def process_document_async(document):
    """
    TODO: Implement async processing using threading
    Queue the task and return immediately
    """
    pass  # Your code here

print("Synchronous processing:")
print(process_document_sync("doc.pdf"))

print("\nAsynchronous processing:")
print(process_document_async("doc.pdf"))

### Solution / الحل

In [None]:
# Solution
task_queue = queue.Queue()

def worker():
    while True:
        document = task_queue.get()
        if document == "STOP":
            break
        result = process_document_sync(document)
        print(f"\nCompleted: {document}")
        task_queue.task_done()

# Start worker thread
worker_thread = threading.Thread(target=worker, daemon=True)
worker_thread.start()

def process_document_async(document):
    task_queue.put(document)
    return {"status": "queued", "document": document}

print("Asynchronous processing:")
print(process_document_async("doc1.pdf"))
print(process_document_async("doc2.pdf"))
print("\nReturns immediately! Tasks running in background...")

## Part 2: Celery Architecture
## الجزء 2: معمارية Celery

Celery consists of three main components:
- **Broker**: Stores task messages (Redis, RabbitMQ)
- **Worker**: Processes tasks
- **Backend**: Stores task results (optional)

يتكون Celery من ثلاثة مكونات رئيسية:

In [None]:
# Visualizing Celery Architecture
from graphviz import Digraph

dot = Digraph(comment='Celery Architecture')
dot.node('A', 'FastAPI\n(Request)')
dot.node('B', 'Celery Broker\n(Redis)')
dot.node('C', 'Celery Worker\n(Processing)')
dot.node('D', 'Backend\n(Results)')

dot.edge('A', 'B', label='delay()')
dot.edge('B', 'C', label='pull()')
dot.edge('C', 'D', label='store()')
dot.edge('D', 'A', label='get_result()', style='dashed')

dot

## Part 3: Defining Celery Tasks
## الجزء 3: تعريف مهام Celery

In [None]:
# Basic task definition
from celery import Celery

app = Celery('tasks', broker='redis://localhost:6379/0')

@app.task(name='index_document')
def index_document(document_id):
    """Index a document in the background."""
    print(f"Indexing document: {document_id}")
    # Simulate work
    import time
    time.sleep(2)
    return {"document_id": document_id, "status": "indexed"}

# Task with retry configuration
@app.task(
    name='process_with_retry',
    autoretry_for=(ConnectionError, TimeoutError),
    retry_backoff=True,
    retry_kwargs={'max_retries': 3}
)
def process_with_retry(document_id):
    """Task that retries on failure."""
    print(f"Processing: {document_id}")
    # Simulate potential failure
    import random
    if random.random() < 0.3:
        raise ConnectionError("Network issue")
    return {"status": "success"}

# Task with timeout
@app.task(name='timeout_task', time_limit=30)
def timeout_task(data):
    """Task with time limit."""
    import time
    time.sleep(10)
    return {"status": "done"}

### Exercise 2: Create a Task with Monitoring
### تمرين 2: إنشاء مهمة مع المراقبة

Create a task that:
- Accepts a list of items
- Processes each item
- Updates progress using `update_state()`
- Returns results

أنشئ مهمة:

In [None]:
@app.task(bind=True, name='process_with_progress')
def process_with_progress(self, items):
    """
    TODO: Implement a task with progress updates
    - Process each item in the list
    - Update progress with self.update_state()
    - Return results
    """
    results = []
    total = len(items)
    
    # TODO: Process items with progress updates
    # for i, item in enumerate(items):
    #     result = process(item)
    #     self.update_state(
    #         state='PROGRESS',
    #         meta={'current': i+1, 'total': total, 'percent': ((i+1)/total)*100}
    #     )
    
    return {"status": "completed", "results": results}

## Part 4: Task Patterns
## الجزء 4: أنماط المهام

In [None]:
# Pattern 1: Chaining tasks (sequential execution)
from celery import chain

workflow = chain(
    index_document.s('doc1'),
    index_document.s('doc2'),
    index_document.s('doc3')
)

# workflow.apply_async()

# Pattern 2: Grouping tasks (parallel execution)
from celery import group

parallel_tasks = group(
    index_document.s('doc1'),
    index_document.s('doc2'),
    index_document.s('doc3')
)

# parallel_tasks.apply_async()

# Pattern 3: Chord (group + callback)
from celery import chord

def summarize_results(results):
    return {"total": len(results), "status": "summarized"}

workflow = chord(
    (index_document.s(f'doc{i}') for i in range(10)),
    summarize_results.s()
)

# workflow.apply_async()

### Exercise 3: Design a Workflow
### تمرين 3: تصميم سير عمل

Design a workflow that:
1. Processes 5 documents in parallel
2. After all complete, generates a summary

صمم سير عمل:

In [None]:
# TODO: Create a workflow with chord
# def summarize_batch(results):
#     return {"processed": len(results), "status": "complete"}
# 
# batch_workflow = chord(
#     # TODO: Add tasks
#     summarize_batch.s()
# )
# 
# batch_workflow.apply_async()

## Part 5: Implementing RAG Engine Tasks
## الجزء 5: تنفيذ مهام محرك RAG

In [None]:
# Simulating the bulk upload task from tasks.py
class BulkUploadResult:
    def __init__(self):
        self.results = []
        self.success = 0
        self.failure = 0

def simulate_bulk_upload(files):
    """Simulate bulk upload processing."""
    result = BulkUploadResult()
    
    for file_info in files:
        try:
            # Simulate file upload
            print(f"  Uploading: {file_info['filename']}")
            
            # Simulate indexing task
            print(f"    Queued indexing for: {file_info['filename']}")
            
            result.results.append({
                "filename": file_info['filename'],
                "status": "queued"
            })
            result.success += 1
            
        except Exception as e:
            result.results.append({
                "filename": file_info.get('filename', 'unknown'),
                "status": "failed",
                "error": str(e)
            })
            result.failure += 1
    
    return {
        "total": len(files),
        "success": result.success,
        "failures": result.failure,
        "results": result.results
    }

# Test bulk upload
files = [
    {"filename": "doc1.pdf", "content_type": "application/pdf"},
    {"filename": "doc2.pdf", "content_type": "application/pdf"},
    {"filename": "doc3.pdf", "content_type": "application/pdf"},
]

print("Bulk Upload Simulation:")
result = simulate_bulk_upload(files)
print(f"\nResult: {result}")

In [None]:
# Simulating PDF merge task
import io

class MockPDFWriter:
    def __init__(self):
        self.pages = []
    
    def add_page(self, page):
        self.pages.append(page)
    
    def write(self, output):
        output.write(b'MERGED_PDF:' + b','.join(self.pages))

class MockPDFReader:
    def __init__(self, filename):
        self.filename = filename
        self.pages = [f"page_{i+1}".encode() for i in range(3)]

def simulate_pdf_merge(source_docs, merged_filename):
    """Simulate PDF merging."""
    print(f"\nMerging {len(source_docs)} PDFs...")
    
    # Step 1: Read source PDFs
    source_readers = [MockPDFReader(d) for d in source_docs]
    
    # Step 2: Merge pages
    merged_writer = MockPDFWriter()
    for reader in source_readers:
        for page in reader.pages:
            merged_writer.add_page(page)
            print(f"  Added page from {reader.filename}")
    
    # Step 3: Write merged PDF
    merged_bytes = io.BytesIO()
    merged_writer.write(merged_bytes)
    content = merged_bytes.getvalue()
    
    print(f"\nCreated merged PDF: {merged_filename}")
    print(f"Total pages: {len(merged_writer.pages)}")
    print(f"Size: {len(content)} bytes")
    
    return {
        "merged_document_id": "new-doc-123",
        "source_count": len(source_docs),
        "filename": merged_filename,
        "size_bytes": len(content)
    }

# Test PDF merge
result = simulate_pdf_merge(["doc1.pdf", "doc2.pdf"], "merged.pdf")
print(f"\nResult: {result}")

## Part 6: Best Practices Quiz
## الجزء 6: اختبار أفضل الممارسات

### Quiz Questions / أسئلة الاختبار

**Q1:** Why should you use keyword arguments in Celery tasks?
a) They're faster
b) Order-independent, less error-prone
c) Required by Celery
d) Better for type hints

**Q2:** What's the purpose of `bind=True` in a task definition?
a) Bind to specific worker
b) Access task instance for retry and progress
c) Bind to specific queue
d) Enable task chaining

**Q3:** When should you use exponential backoff for retries?
a) Always
b) Never
c) For transient failures (network, timeouts)
d) Only for database errors

### أسئلة الاختبار

**س1:** لماذا يجب استخدام الحجج المسمية في مهام Celery؟
أ) أسرع
ب) مستقلة عن الترتيب، أقل عرضة للأخطاء
ج) مطلوبة من Celery
د) أفضل لتلميحات النوع

In [None]:
# Answer check
quiz_answers = {
    "Q1": "b",
    "Q2": "b",
    "Q3": "c"
}

for q, answer in quiz_answers.items():
    print(f"{q}: {answer}")

## Part 7: Monitoring with Prometheus
## الجزء 7: المراقبة مع Prometheus

In [None]:
# Simulated metrics collection
from collections import defaultdict
import time

class TaskMetrics:
    def __init__(self):
        self.count = defaultdict(lambda: defaultdict(int))
        self.duration = defaultdict(list)
    
    def record(self, task_name, status, duration_ms):
        self.count[task_name][status] += 1
        self.duration[task_name].append(duration_ms)
    
    def get_summary(self):
        summary = {}
        for task in self.count:
            total = sum(self.count[task].values())
            success = self.count[task].get("success", 0)
            failure = self.count[task].get("failure", 0)
            durations = self.duration[task]
            avg_duration = sum(durations) / len(durations) if durations else 0
            
            summary[task] = {
                "total": total,
                "success": success,
                "failure": failure,
                "success_rate": success / total if total > 0 else 0,
                "avg_duration_ms": round(avg_duration, 2)
            }
        return summary

# Simulate task execution with metrics
metrics = TaskMetrics()

def execute_task_with_metrics(task_name, func):
    """Execute task and record metrics."""
    start = time.time()
    try:
        result = func()
        duration = (time.time() - start) * 1000
        metrics.record(task_name, "success", duration)
        return result
    except Exception as e:
        duration = (time.time() - start) * 1000
        metrics.record(task_name, "failure", duration)
        raise

# Simulate tasks
import random

for i in range(10):
    try:
        execute_task_with_metrics("index_document", lambda: (
            time.sleep(0.1),
            None if random.random() < 0.1 else Exception("Failed")
        )[0])
    except:
        pass

print("Task Metrics Summary:")
import json
print(json.dumps(metrics.get_summary(), indent=2))

## Summary / الملخص

**Key concepts covered / المفاهيم الرئيسية المشمولة:**

1. **Background tasks** allow non-blocking operations
2. **Celery architecture**: Broker → Worker → Backend
3. **Task patterns**: Chaining, grouping, chords
4. **RAG Engine tasks**: Bulk upload, PDF merge, chat enhancements
5. **Best practices**: Keyword arguments, retry strategies, monitoring

**النقاط الرئيسية المشمولة:**

1. **مهمة الخلفية** تسمح بالعمليات غير الحظر
2. **معمارية Celery**: الوسيط → العامل → الخلفية
3. **أنماط المهام**: الربط، التجميع، الأوتار
4. **مهام محرك RAG**: الرفع بالجملة، دمج PDF، تحسينات المحادثة
5. **أفضل الممارسات**: الحجج المسمية، استراتيجيات إعادة المحاولة، المراقبة