# Task 2: Build a File Upload and Processing API

## Scenario
Build a FastAPI service that handles file uploads with background processing:
1. Accept file uploads with validation
2. Process files in background tasks
3. Track processing status
4. Use async file operations for efficiency

## Your Tasks:
1. **File upload endpoint**: Accept and validate files
2. **Background processing**: Process files asynchronously
3. **Status tracking**: Check processing status
4. **Results retrieval**: Get processed results
5. **Error handling**: Handle file errors gracefully

## Setup (provided)

In [None]:
import json
import time
import uuid
from pathlib import Path
from typing import Dict, List, Optional
from contextlib import asynccontextmanager

from fastapi import FastAPI, File, UploadFile, HTTPException, BackgroundTasks, status
from fastapi.testclient import TestClient
from pydantic import BaseModel, Field
import aiofiles

print("Imports successful!")

### Create temporary directory for file storage

In [None]:
import tempfile
import shutil

# Create temp directory for this session
TEMP_DIR = Path(tempfile.mkdtemp())
UPLOAD_DIR = TEMP_DIR / "uploads"
RESULTS_DIR = TEMP_DIR / "results"

UPLOAD_DIR.mkdir(exist_ok=True)
RESULTS_DIR.mkdir(exist_ok=True)

print(f"Upload directory: {UPLOAD_DIR}")
print(f"Results directory: {RESULTS_DIR}")

---
## Task 1: Setup Job Tracking and App

Create:
- Global dictionary `jobs` to track processing jobs
  - Structure: `{job_id: {"status": str, "filename": str, "result": dict}}`
- `lifespan` context manager for cleanup
- FastAPI app with the lifespan

Job statuses: "pending", "processing", "completed", "failed"

In [None]:
# YOUR CODE HERE
# 1. Create jobs dict
# 2. Create lifespan with cleanup
# 3. Create FastAPI app



In [None]:
# TEST - Do not modify
assert 'jobs' in dir(), "jobs dict not found"
assert 'lifespan' in dir(), "lifespan function not found"
assert 'app' in dir(), "app not found"
assert isinstance(jobs, dict), "jobs should be a dict"

client = TestClient(app)

print("✓ Task 1 PASSED!")
print("  Job tracking and app initialized")

---
## Task 2: Define Pydantic Models

Create:

1. **`UploadResponse`**:
   - `job_id`: str
   - `filename`: str
   - `status`: str
   - `message`: str

2. **`JobStatus`**:
   - `job_id`: str
   - `status`: str
   - `filename`: str
   - `result`: Optional[Dict] = None

3. **`ProcessingResult`**:
   - `line_count`: int
   - `word_count`: int
   - `char_count`: int
   - `processing_time`: float

In [None]:
# YOUR CODE HERE
# Define all three Pydantic models



In [None]:
# TEST - Do not modify
assert 'UploadResponse' in dir(), "UploadResponse not found"
assert 'JobStatus' in dir(), "JobStatus not found"
assert 'ProcessingResult' in dir(), "ProcessingResult not found"

# Test models
upload_resp = UploadResponse(
    job_id="test-123",
    filename="test.txt",
    status="pending",
    message="File uploaded"
)
assert upload_resp.job_id == "test-123"

job_status = JobStatus(
    job_id="test-123",
    status="completed",
    filename="test.txt",
    result={"lines": 10}
)
assert job_status.result is not None

result = ProcessingResult(
    line_count=10,
    word_count=50,
    char_count=300,
    processing_time=0.5
)
assert result.line_count == 10

print("✓ Task 2 PASSED!")
print("  All Pydantic models defined")

---
## Task 3: Implement Background Processing Function

Create an async function `process_file(job_id: str, file_path: Path)` that:
1. Updates job status to "processing"
2. Reads file asynchronously with aiofiles
3. Counts lines, words, and characters
4. Saves result to jobs dict
5. Updates status to "completed" or "failed"

Use try-except to handle errors.

In [None]:
# YOUR CODE HERE
# Implement async process_file function



In [None]:
# TEST - Do not modify
assert 'process_file' in dir(), "process_file function not found"

# Create test file
test_file = UPLOAD_DIR / "test_processing.txt"
test_file.write_text("Hello world\nThis is a test\nThree lines total")

# Test processing
test_job_id = "test-job-123"
jobs[test_job_id] = {"status": "pending", "filename": "test_processing.txt"}

# Run async function (use await in Jupyter since event loop is already running)
await process_file(test_job_id, test_file)

# Check results
assert jobs[test_job_id]['status'] == 'completed', f"Expected completed, got {jobs[test_job_id]['status']}"
assert 'result' in jobs[test_job_id], "Result not found in job"
result = jobs[test_job_id]['result']
assert result['line_count'] == 3, f"Expected 3 lines, got {result['line_count']}"
assert result['word_count'] > 0, "Word count should be > 0"
assert result['char_count'] > 0, "Char count should be > 0"

print("✓ Task 3 PASSED!")
print(f"  Processing result: {result}")

---
## Task 4: Implement File Upload Endpoint

Create POST endpoint `/upload` that:
- Accepts `UploadFile`
- Validates file type (only .txt files, max 10MB)
- Saves file to UPLOAD_DIR
- Creates job entry
- Schedules background processing
- Returns `UploadResponse`

Use `BackgroundTasks` to schedule processing.

In [None]:
# YOUR CODE HERE
# Implement /upload endpoint



In [None]:
# TEST - Do not modify
from io import BytesIO

client = TestClient(app)

# Create test file content
test_content = b"Line 1\nLine 2\nLine 3"
test_file = ("test.txt", BytesIO(test_content), "text/plain")

# Upload file
response = client.post(
    "/upload",
    files={"file": test_file}
)

assert response.status_code == 200, f"Expected 200, got {response.status_code}"
data = response.json()
assert 'job_id' in data, "Response missing job_id"
assert 'filename' in data, "Response missing filename"
assert 'status' in data, "Response missing status"
assert data['status'] == 'pending', f"Expected pending, got {data['status']}"

job_id = data['job_id']
assert job_id in jobs, "Job not found in jobs dict"

# Test invalid file type
invalid_file = ("test.exe", BytesIO(b"fake exe"), "application/x-msdownload")
response = client.post(
    "/upload",
    files={"file": invalid_file}
)
assert response.status_code == 400, "Invalid file type should return 400"

# Wait a bit for background task
time.sleep(1)

print("✓ Task 4 PASSED!")
print(f"  File uploaded with job_id: {job_id}")

---
## Task 5: Implement Status and Results Endpoints

Create:

1. **GET `/jobs/{job_id}`**: Returns `JobStatus`
   - 404 if job not found

2. **GET `/jobs`**: Returns list of all jobs

3. **DELETE `/jobs/{job_id}`**: Delete job and associated files
   - 404 if job not found

In [None]:
# YOUR CODE HERE
# Implement status and management endpoints



In [None]:
# TEST - Do not modify
client = TestClient(app)

# First upload a file
test_content = b"Test line 1\nTest line 2"
response = client.post(
    "/upload",
    files={"file": ("status_test.txt", BytesIO(test_content), "text/plain")}
)
job_id = response.json()['job_id']

# Wait for processing
time.sleep(1)

# Test get specific job
response = client.get(f"/jobs/{job_id}")
assert response.status_code == 200, f"Expected 200, got {response.status_code}"
data = response.json()
assert data['job_id'] == job_id
assert data['status'] in ['pending', 'processing', 'completed'], f"Unexpected status: {data['status']}"

# Test get all jobs
response = client.get("/jobs")
assert response.status_code == 200
jobs_list = response.json()
assert isinstance(jobs_list, list), "Expected list of jobs"
assert len(jobs_list) > 0, "Should have at least one job"

# Test delete job
response = client.delete(f"/jobs/{job_id}")
assert response.status_code == 200

# Verify job deleted
response = client.get(f"/jobs/{job_id}")
assert response.status_code == 404, "Deleted job should return 404"

# Test non-existent job
response = client.get("/jobs/fake-job-id")
assert response.status_code == 404, "Non-existent job should return 404"

print("✓ Task 5 PASSED!")
print("  Status and management endpoints working")

---
## Bonus: Test Complete Upload Flow

In [None]:
# Load real test file
with open('../fixtures/input/test_file.txt', 'rb') as f:
    file_content = f.read()

print("=== Complete File Upload Flow ===")
print()

# 1. Upload file
response = client.post(
    "/upload",
    files={"file": ("test_file.txt", BytesIO(file_content), "text/plain")}
)
print("1. File Upload:")
upload_data = response.json()
print(f"   Job ID: {upload_data['job_id']}")
print(f"   Status: {upload_data['status']}")
print()

job_id = upload_data['job_id']

# 2. Check status immediately
response = client.get(f"/jobs/{job_id}")
print("2. Initial Status:")
status_data = response.json()
print(f"   Status: {status_data['status']}")
print()

# 3. Wait for processing
print("3. Waiting for processing...")
time.sleep(2)

# 4. Check final status
response = client.get(f"/jobs/{job_id}")
print("\n4. Final Status:")
final_data = response.json()
print(f"   Status: {final_data['status']}")
if final_data.get('result'):
    print("   Results:")
    for key, value in final_data['result'].items():
        print(f"     - {key}: {value}")
print()

# 5. List all jobs
response = client.get("/jobs")
all_jobs = response.json()
print(f"5. Total Jobs: {len(all_jobs)}")
print()

print("✓ Complete flow test passed!")

---
## Cleanup

In [None]:
# Cleanup temp directory
shutil.rmtree(TEMP_DIR)
print(f"Cleaned up: {TEMP_DIR}")

---
## Expected Results

After completing all tasks:
- **Task 1**: Job tracking system initialized
- **Task 2**: Pydantic models for upload/status
- **Task 3**: Background processing function
- **Task 4**: File upload with validation
- **Task 5**: Status checking and job management

## Key Concepts

1. **File uploads**: Use `UploadFile` for efficient streaming
2. **Background tasks**: Process files without blocking requests
3. **Async file I/O**: Use `aiofiles` for non-blocking file operations
4. **Job tracking**: Maintain state across requests
5. **Validation**: Check file types and sizes before processing
6. **Error handling**: Graceful failures with proper status codes

## Common Pitfalls
- Forgetting to handle file encoding errors
- Not cleaning up uploaded files
- Blocking operations in async context
- Not validating file size before reading
- Race conditions in job status updates