# Notebook 18: Backend API Changes - File Upload vs JSON CRUD

## üéØ What You'll Learn

In our **Todo app**, all API endpoints worked with simple JSON data. But when you need to handle **file uploads**, your FastAPI endpoints need to work completely differently. This notebook shows you exactly what changes in your backend code when you go from text-based CRUD to file-based CRUD.

## üìä Todo API vs PDF API: Key Differences

| Aspect | Todo App | PDF App | Why Different? |
|---------|----------|---------|----------------|
| **Request Format** | JSON (`application/json`) | Multipart (`multipart/form-data`) | Files can't be sent as JSON |
| **Parameter Type** | Pydantic models | `UploadFile` + Pydantic | Need special file handling |
| **Endpoints** | 4 CRUD endpoints | 5 endpoints (CRUD + upload) | Upload needs separate logic |
| **Error Handling** | Database errors only | Database + AWS + file errors | Multiple services can fail |
| **Data Processing** | Direct to database | File ‚Üí S3, metadata ‚Üí database | Two-step storage process |

---

**üí° Key Insight**: File uploads require fundamentally different endpoint patterns because browsers send files and data in a completely different format than JSON.

## Part 1: What We Had - Todo App API Endpoints

### Todo App CRUD Endpoints (Simple JSON)

```python
# Todo app routers/todos.py
from fastapi import APIRouter, Depends, HTTPException
from sqlalchemy.orm import Session
import schemas, crud

router = APIRouter(prefix="/todos")

# CREATE - Simple JSON input
@router.post("", response_model=schemas.TodoResponse)
def create_todo(todo: schemas.TodoRequest, db: Session = Depends(get_db)):
    return crud.create_todo(db, todo)

# READ - Simple query parameters
@router.get("", response_model=List[schemas.TodoResponse])
def get_todos(completed: bool = None, db: Session = Depends(get_db)):
    return crud.get_todos(db, completed=completed)

# UPDATE - Simple JSON input
@router.put("/{todo_id}", response_model=schemas.TodoResponse)
def update_todo(todo_id: int, todo: schemas.TodoRequest, db: Session = Depends(get_db)):
    return crud.update_todo(db, todo_id, todo)

# DELETE - Just the ID
@router.delete("/{todo_id}")
def delete_todo(todo_id: int, db: Session = Depends(get_db)):
    return crud.delete_todo(db, todo_id)
```

### Todo App Request/Response Models
```python
# Todo app schemas.py
class TodoRequest(BaseModel):
    name: str
    completed: bool

class TodoResponse(BaseModel):
    id: int
    name: str
    completed: bool
    
    model_config = ConfigDict(from_attributes=True)
```

**Why this worked**: Todo data is simple - just text and boolean values that serialize perfectly to/from JSON.

## Part 2: What Changes - PDF App File Upload Endpoints

### PDF App Endpoints (Mixed JSON + File)

```python
# PDF app routers/pdfs.py
from fastapi import APIRouter, Depends, HTTPException, UploadFile, File
from sqlalchemy.orm import Session
from uuid import uuid4
import schemas, crud

router = APIRouter(prefix="/pdfs")

# CREATE - Still JSON (for manual creation)
@router.post("", response_model=schemas.PDFResponse)
def create_pdf(pdf: schemas.PDFRequest, db: Session = Depends(get_db)):
    return crud.create_pdf(db, pdf)

# NEW: UPLOAD - Special file endpoint
@router.post("/upload", response_model=schemas.PDFResponse)
def upload_pdf(file: UploadFile = File(...), db: Session = Depends(get_db)):
    file_name = f"{uuid4()}-{file.filename}"
    return crud.upload_pdf(db, file, file_name)

# READ - Same as Todo (JSON response)
@router.get("", response_model=List[schemas.PDFResponse])
def get_pdfs(selected: bool = None, db: Session = Depends(get_db)):
    return crud.read_pdfs(db, selected)

# UPDATE - Same as Todo (JSON)
@router.put("/{id}", response_model=schemas.PDFResponse)
def update_pdf(id: int, pdf: schemas.PDFRequest, db: Session = Depends(get_db)):
    return crud.update_pdf(db, id, pdf)

# DELETE - Same as Todo
@router.delete("/{id}")
def delete_pdf(id: int, db: Session = Depends(get_db)):
    return crud.delete_pdf(db, id)
```

### Key Differences Highlighted:

1. **New Import**: `UploadFile, File` - not needed in Todo app
2. **New Endpoint**: `/upload` - Todo app didn't need this
3. **Different Parameter**: `UploadFile = File(...)` vs Pydantic model
4. **UUID Generation**: Unique filenames - Todo app used simple IDs

## Part 3: Request Format Changes - JSON vs Multipart

### Todo App Requests (JSON)
```bash
# Creating a todo - simple JSON
curl -X POST "http://localhost:8000/todos" \
     -H "Content-Type: application/json" \
     -d '{"name": "Learn FastAPI", "completed": false}'

# Response
{
  "id": 1,
  "name": "Learn FastAPI",
  "completed": false
}
```

### PDF App Requests (Multipart for Upload)
```bash
# Creating a PDF manually - still JSON
curl -X POST "http://localhost:8000/pdfs" \
     -H "Content-Type: application/json" \
     -d '{"name": "manual.pdf", "selected": false, "file": "http://example.com/file.pdf"}'

# UPLOADING a PDF - multipart/form-data
curl -X POST "http://localhost:8000/pdfs/upload" \
     -F "file=@/path/to/document.pdf"

# Response (same format)
{
  "id": 1,
  "name": "document.pdf",
  "selected": false,
  "file": "https://pdf-basic-app.s3.amazonaws.com/uuid-document.pdf"
}
```

### Why Multipart is Required
- **JSON limitation**: Can't encode binary file data
- **Browser standard**: File inputs send `multipart/form-data`
- **Efficiency**: Streams large files instead of encoding them
- **Metadata**: Can include filename, content type, etc.

## Part 4: Parameter Type Changes - Pydantic vs UploadFile

### Todo App Parameter Handling
```python
# Todo app - Simple Pydantic validation
@router.post("", response_model=schemas.TodoResponse)
def create_todo(
    todo: schemas.TodoRequest,  # ‚Üê Pydantic model validates JSON
    db: Session = Depends(get_db)
):
    return crud.create_todo(db, todo)

# What FastAPI does automatically:
# 1. Parse JSON from request body
# 2. Validate against TodoRequest schema
# 3. Create TodoRequest object
# 4. Pass to your function
```

### PDF App Parameter Handling
```python
# PDF app - Special file handling
@router.post("/upload", response_model=schemas.PDFResponse)
def upload_pdf(
    file: UploadFile = File(...),  # ‚Üê Special FastAPI type for files
    db: Session = Depends(get_db)
):
    file_name = f"{uuid4()}-{file.filename}"
    return crud.upload_pdf(db, file, file_name)

# What FastAPI does with UploadFile:
# 1. Parse multipart/form-data
# 2. Create UploadFile object with metadata
# 3. Provide file stream access
# 4. Handle memory efficiently (streams large files)
```

### UploadFile Properties
```python
# What you get with UploadFile
def upload_pdf(file: UploadFile = File(...), db: Session = Depends(get_db)):
    print(file.filename)     # "document.pdf"
    print(file.content_type) # "application/pdf"
    print(file.size)         # File size in bytes
    
    # Access file content
    content = await file.read()  # Read entire file
    # OR
    stream = file.file           # Stream for large files
```

## Part 5: CRUD Logic Changes - Simple vs Dual Storage

### Todo App CRUD (Database Only)
```python
# Todo app crud.py - Simple database operations
def create_todo(db: Session, todo: schemas.TodoRequest):
    db_todo = models.Todo(
        name=todo.name,
        completed=todo.completed
    )
    db.add(db_todo)      # ‚Üê Single step: add to database
    db.commit()          # ‚Üê Single step: save to database
    db.refresh(db_todo)
    return db_todo
```

### PDF App CRUD (Database + S3)
```python
# PDF app crud.py - Two-step process
def upload_pdf(db: Session, file: UploadFile, file_name: str):
    s3_client = Settings.get_s3_client()
    BUCKET_NAME = Settings().AWS_S3_BUCKET
    
    try:
        # STEP 1: Upload file to S3
        s3_client.upload_fileobj(
            file.file,           # ‚Üê File stream
            BUCKET_NAME,         # ‚Üê S3 bucket
            file_name            # ‚Üê Unique filename
        )
        
        # STEP 2: Generate S3 URL
        file_url = f'https://{BUCKET_NAME}.s3.amazonaws.com/{file_name}'
        
        # STEP 3: Save metadata to database
        db_pdf = models.PDF(
            name=file.filename,  # ‚Üê Original filename
            selected=False,      # ‚Üê Default status
            file=file_url        # ‚Üê S3 URL (not the file!)
        )
        db.add(db_pdf)
        db.commit()
        db.refresh(db_pdf)
        return db_pdf
        
    except NoCredentialsError:
        raise HTTPException(status_code=500, detail="AWS credentials error")
    except BotoCoreError as e:
        raise HTTPException(status_code=500, detail=str(e))
```

### Key Differences:
1. **Steps**: 1 step (database) ‚Üí 3 steps (S3, URL, database)
2. **Error handling**: Database errors ‚Üí Database + AWS errors
3. **Data stored**: Direct data ‚Üí URL reference
4. **External dependencies**: None ‚Üí AWS S3

## Part 6: Error Handling Complexity

### Todo App Error Handling (Simple)
```python
# Todo app - Only database errors to handle
@router.post("", response_model=schemas.TodoResponse)
def create_todo(todo: schemas.TodoRequest, db: Session = Depends(get_db)):
    try:
        return crud.create_todo(db, todo)
    except SQLAlchemyError:
        raise HTTPException(status_code=500, detail="Database error")
    # That's it - only one thing can go wrong!
```

### PDF App Error Handling (Complex)
```python
# PDF app - Multiple services can fail
@router.post("/upload", response_model=schemas.PDFResponse)
def upload_pdf(file: UploadFile = File(...), db: Session = Depends(get_db)):
    try:
        # Validate file type
        if not file.content_type.startswith('application/pdf'):
            raise HTTPException(status_code=400, detail="Only PDF files allowed")
        
        # Check file size
        if file.size > 10 * 1024 * 1024:  # 10MB limit
            raise HTTPException(status_code=400, detail="File too large")
        
        file_name = f"{uuid4()}-{file.filename}"
        return crud.upload_pdf(db, file, file_name)
        
    except NoCredentialsError:
        raise HTTPException(status_code=500, detail="AWS credentials missing")
    except BotoCoreError as e:
        raise HTTPException(status_code=500, detail=f"AWS error: {str(e)}")
    except SQLAlchemyError:
        # Cleanup: delete file from S3 if database fails
        # (Advanced topic - cleanup strategies)
        raise HTTPException(status_code=500, detail="Database error")
    except Exception as e:
        raise HTTPException(status_code=500, detail="Upload failed")
```

### Error Types You Need to Handle:
1. **File validation**: Wrong type, too large, corrupted
2. **AWS errors**: Credentials, permissions, network
3. **Database errors**: Connection, constraints, space
4. **Cleanup**: What if S3 succeeds but database fails?

## Part 7: Schema Changes - Simple vs File Metadata

### Todo App Schemas (Simple)
```python
# Todo app schemas.py
class TodoRequest(BaseModel):
    name: str         # ‚Üê User types this
    completed: bool   # ‚Üê User sets this

class TodoResponse(BaseModel):
    id: int          # ‚Üê Database generates this
    name: str        # ‚Üê Same as input
    completed: bool  # ‚Üê Same as input
```

### PDF App Schemas (File Metadata)
```python
# PDF app schemas.py
class PDFRequest(BaseModel):
    name: str      # ‚Üê Filename (user or extracted)
    selected: bool # ‚Üê User selection status
    file: str      # ‚Üê S3 URL (not the actual file!)

class PDFResponse(BaseModel):
    id: int        # ‚Üê Database generates this
    name: str      # ‚Üê Original filename
    selected: bool # ‚Üê Selection status
    file: str      # ‚Üê S3 URL for download
    
    model_config = ConfigDict(from_attributes=True)
```

### Key Schema Differences:
1. **File field**: Contains URL, not file content
2. **Name field**: Filename instead of user description
3. **Usage**: Metadata only - actual file is in S3
4. **Validation**: Need to validate URLs, not just text

## Part 8: Testing Differences

### Testing Todo App (Simple)
```bash
# Test creating a todo
curl -X POST "http://localhost:8000/todos" \
     -H "Content-Type: application/json" \
     -d '{"name": "Test todo", "completed": false}'

# Test getting todos
curl "http://localhost:8000/todos"

# Test updating
curl -X PUT "http://localhost:8000/todos/1" \
     -H "Content-Type: application/json" \
     -d '{"name": "Updated todo", "completed": true}'
```

### Testing PDF App (More Complex)
```bash
# Test file upload (need actual PDF file)
curl -X POST "http://localhost:8000/pdfs/upload" \
     -F "file=@test.pdf"

# Test getting PDFs
curl "http://localhost:8000/pdfs"

# Test updating (need full object)
curl -X PUT "http://localhost:8000/pdfs/1" \
     -H "Content-Type: application/json" \
     -d '{
       "name": "Updated.pdf",
       "selected": true,
       "file": "https://pdf-basic-app.s3.amazonaws.com/uuid-test.pdf"
     }'

# Test file download (verify S3 URL works)
curl "https://pdf-basic-app.s3.amazonaws.com/uuid-test.pdf" --output downloaded.pdf
```

### Testing Complexity Additions:
1. **Need actual files**: Can't test with just JSON
2. **AWS setup required**: S3 bucket must exist and be configured
3. **Two-step verification**: Database + file accessibility
4. **Error scenarios**: Network failures, permission issues
5. **Cleanup**: Test files accumulate in S3

## Part 9: Development Workflow Changes

### Todo App Development (Simple)
```bash
# Setup
1. Install packages
2. Setup PostgreSQL
3. Configure .env
4. Run migrations
5. Start server

# Development loop
1. Code API endpoint
2. Test with curl/Postman
3. Debug if needed
4. Repeat
```

### PDF App Development (More Steps)
```bash
# Setup
1. Install packages (including boto3)
2. Setup PostgreSQL
3. Create AWS account
4. Create S3 bucket
5. Setup IAM user and permissions
6. Configure .env (database + AWS)
7. Run migrations
8. Start server

# Development loop
1. Code API endpoint
2. Test with real PDF files
3. Check S3 bucket for uploads
4. Verify database metadata
5. Test file downloads work
6. Debug AWS/database issues
7. Clean up test files from S3
8. Repeat
```

### Key Workflow Differences:
- **More services**: Database + AWS vs just database
- **External dependencies**: AWS account, internet connection
- **Test data**: Need actual PDF files, not just JSON
- **Debugging**: Multiple failure points to check
- **Cleanup**: Test files accumulate and cost money

## üéØ Key Takeaways

### What Changes in Your API When Adding File Upload:

1. **New Endpoint Pattern**: Standard CRUD + special `/upload` endpoint
2. **Parameter Types**: Pydantic models + `UploadFile` for files
3. **Request Formats**: JSON + `multipart/form-data` for files
4. **Processing Logic**: Simple database ‚Üí Database + cloud storage
5. **Error Handling**: Single service ‚Üí Multiple service failures
6. **Testing**: Simple JSON ‚Üí Files + verification workflows
7. **Development**: Local only ‚Üí Local + cloud dependencies

### Patterns You Can Reuse:

‚úÖ **Dual endpoint pattern**: Keep JSON endpoints for metadata, add upload endpoints for files  
‚úÖ **UUID naming**: Prevent filename conflicts with unique IDs  
‚úÖ **Error cascade handling**: Handle file upload failures gracefully  
‚úÖ **Metadata storage**: Store URLs in database, not files  
‚úÖ **Validation layers**: File type, size, permissions  

### Why These Changes Are Necessary:

- **Browser limitations**: Files can't be sent as JSON
- **Efficiency**: Streaming files vs loading into memory
- **Scalability**: Cloud storage handles large files better
- **User experience**: Progress feedback, error handling
- **Security**: Validate files before processing

### Next Steps:

In **Notebook 19**, we'll see how these backend changes affect your React frontend - spoiler alert: file inputs and FormData require completely different patterns than the simple text inputs we used in the Todo app!

---

**Remember**: The core FastAPI patterns from the Todo app still apply - we're just adding file handling capabilities. Your existing knowledge of routes, dependencies, and schemas is still the foundation!