# Cloud File Storage + Celery Tasks (Deep Dive)

This notebook is a hands-on companion to the detailed guide in:

`docs/learning/infrastructure/03-cloud-filestore-celery.md`

Goals:
- Inspect configuration for file storage backends.
- Create a filestore via the factory.
- Perform a streaming upload (local backend for demo).
- Enqueue a bulk upload task (best effort; requires Redis + Celery worker).
- Read worker status (best effort; requires Celery worker).


## Step 0: Setup and Imports

We add the project root to `sys.path` so imports work when running from the notebook directory.


In [None]:
import os
import sys
from pathlib import Path

# Find rag-engine-mini root by walking upwards
current = Path.cwd().resolve()
repo_root = None
for parent in [current, *current.parents]:
    if (parent / "src").exists() and (parent / "notebooks").exists():
        repo_root = parent
        break

if repo_root is None:
    raise RuntimeError("Could not locate rag-engine-mini root for imports")

sys.path.insert(0, str(repo_root))

print("Repo root:", repo_root)


## Step 1: Force Local Backend for Safe Demo

To avoid cloud dependency errors in a demo environment, we force `FILESTORE_BACKEND=local` *before* importing settings.


In [None]:
os.environ.setdefault("FILESTORE_BACKEND", "local")
os.environ.setdefault("UPLOAD_DIR", str(repo_root / "uploads"))


## Step 2: Load Settings and Create FileStore

We use the factory to get the correct backend instance.


In [None]:
from src.core.config import settings
from src.adapters.filestore.factory import create_file_store

print("FILESTORE_BACKEND:", settings.filestore_backend)
file_store = create_file_store(settings)
print("File store:", type(file_store).__name__)


## Step 3: Streaming Upload Demo (Local Backend)

We stream bytes in chunks, compute a SHA256 hash, and get back a StoredFile.


In [None]:
import asyncio

sample_content = ("hello world
" * 1024).encode("utf-8")

async def byte_stream(data: bytes, chunk_size: int = 256):
    for i in range(0, len(data), chunk_size):
        yield data[i : i + chunk_size]

async def run_stream_upload():
    stored, sha256 = await file_store.save_upload_stream(
        tenant_id="demo-tenant",
        upload_filename="demo.txt",
        content_type="text/plain",
        data_stream=byte_stream(sample_content),
    )
    return stored, sha256

stored_file, sha256 = asyncio.run(run_stream_upload())
print("Stored path:", stored_file.path)
print("Size bytes:", stored_file.size_bytes)
print("SHA256:", sha256)


## Step 4: Cleanup (Optional)

Remove the stored file created during the demo.


In [None]:
import asyncio

async def cleanup():
    ok = await file_store.delete(stored_file.path)
    return ok

print("Deleted:", asyncio.run(cleanup()))


## Step 5: Enqueue a Bulk Upload Task (Best Effort)

This requires:
- Redis running
- Celery worker running

If not available, the call will fail and we handle it gracefully.


In [None]:
from src.core.bootstrap import get_container

container = get_container()
queue = container.get("task_queue")

files_payload = [
    {
        "filename": "bulk-demo.txt",
        "content_type": "text/plain",
        "content": b"bulk content",
    }
]

try:
    task_id = queue.enqueue_bulk_upload(
        tenant_id="demo-tenant",
        files=files_payload,
    )
    print("Enqueued bulk upload task:", task_id)
except Exception as exc:
    print("Bulk enqueue failed (expected if broker is offline):", exc)


## Step 6: Worker Status (Best Effort)

Reads Celery worker state using Celery inspect.


In [None]:
try:
    from src.workers.monitoring import get_worker_status
    status = get_worker_status()
    print(status)
except Exception as exc:
    print("Worker status unavailable:", exc)
