# Lab 4.6.8.6: Deployment & Documentation

**Capstone Option E:** Browser-Deployed Fine-Tuned LLM (Matcha Expert)  
**Phase:** 6 of 6 (Final)  
**Time:** 6-8 hours  
**Difficulty:** ‚≠ê‚≠ê‚≠ê

---

## Phase Objectives

By completing this phase, you will:
- [ ] Upload model to S3 with proper CORS configuration
- [ ] Deploy static site to Vercel/Netlify
- [ ] Create a complete model card
- [ ] Write technical report outline
- [ ] Prepare presentation slides
- [ ] Record demo video

---

## Phase Checklist

- [ ] S3 bucket created with CORS
- [ ] Model files uploaded
- [ ] Static site deployed
- [ ] Demo working publicly
- [ ] Model card complete
- [ ] Technical report drafted
- [ ] Presentation ready
- [ ] Demo video recorded

---

## Why This Matters

**Documentation is what separates a project from a product.**

A well-documented project:
- Can be understood by others (and your future self)
- Demonstrates professionalism
- Enables reproducibility
- Shows awareness of limitations and ethics
- Is portfolio-ready

---

## Part 1: S3 Deployment

Host your model files on AWS S3 (or any CDN with CORS support).

In [None]:
# S3 Configuration

s3_cors_config = '''
[
    {
        "AllowedHeaders": ["*"],
        "AllowedMethods": ["GET", "HEAD"],
        "AllowedOrigins": [
            "https://your-domain.vercel.app",
            "http://localhost:5173",
            "http://localhost:3000"
        ],
        "ExposeHeaders": [
            "Content-Length",
            "Content-Type",
            "ETag"
        ],
        "MaxAgeSeconds": 3600
    }
]
'''

print("üìÑ S3 CORS Configuration (cors.json)")
print("="*70)
print(s3_cors_config)

In [None]:
# S3 Upload Commands

s3_commands = '''
# Create S3 bucket
aws s3 mb s3://matcha-expert-model --region us-east-1

# Apply CORS configuration
aws s3api put-bucket-cors --bucket matcha-expert-model --cors-configuration file://cors.json

# Upload model files (from matcha-browser directory)
aws s3 sync ./matcha-browser s3://matcha-expert-model/ --acl public-read

# Verify upload
aws s3 ls s3://matcha-expert-model/

# Get public URL
# Your model will be at: https://matcha-expert-model.s3.amazonaws.com/
'''

print("üîß S3 UPLOAD COMMANDS")
print("="*70)
print(s3_commands)

In [None]:
# Python upload script

upload_script = '''
#!/usr/bin/env python3
"""
Upload model files to S3 with proper configuration.
"""
import boto3
from pathlib import Path
import json

def upload_model_to_s3(
    model_dir: str,
    bucket_name: str,
    region: str = "us-east-1",
):
    """Upload model files to S3 with CORS configuration."""
    
    s3 = boto3.client('s3', region_name=region)
    
    # Create bucket if doesn't exist
    try:
        s3.head_bucket(Bucket=bucket_name)
        print(f"Bucket {bucket_name} exists")
    except:
        s3.create_bucket(Bucket=bucket_name)
        print(f"Created bucket {bucket_name}")
    
    # Configure CORS
    cors_config = {
        'CORSRules': [{
            'AllowedHeaders': ['*'],
            'AllowedMethods': ['GET', 'HEAD'],
            'AllowedOrigins': ['*'],  # Restrict in production!
            'ExposeHeaders': ['Content-Length', 'Content-Type', 'ETag'],
            'MaxAgeSeconds': 3600,
        }]
    }
    s3.put_bucket_cors(Bucket=bucket_name, CORSConfiguration=cors_config)
    print("CORS configured")
    
    # Upload files
    model_path = Path(model_dir)
    for file_path in model_path.iterdir():
        if file_path.is_file():
            key = file_path.name
            s3.upload_file(
                str(file_path),
                bucket_name,
                key,
                ExtraArgs={'ACL': 'public-read'}
            )
            print(f"Uploaded {key}")
    
    url = f"https://{bucket_name}.s3.{region}.amazonaws.com/"
    print(f"\nModel available at: {url}")
    return url

if __name__ == "__main__":
    upload_model_to_s3(
        model_dir="./matcha-expert/models/matcha-browser",
        bucket_name="matcha-expert-model",
    )
'''

print("üìÑ scripts/upload_to_s3.py")
print("="*70)
print(upload_script)

---

## Part 2: Static Site Deployment

Deploy your React app to Vercel or Netlify (both have free tiers).

In [None]:
# Vercel Deployment

vercel_deploy = '''
# Install Vercel CLI
npm install -g vercel

# Deploy (from project directory)
cd matcha-chatbot
vercel

# Follow prompts:
# - Link to existing project or create new
# - Accept defaults for build settings
# - Deploy!

# For production deployment:
vercel --prod

# Your app will be at: https://matcha-chatbot-xxxxx.vercel.app
'''

print("üöÄ VERCEL DEPLOYMENT")
print("="*70)
print(vercel_deploy)

In [None]:
# Netlify Deployment

netlify_deploy = '''
# Install Netlify CLI
npm install -g netlify-cli

# Build the project first
npm run build

# Deploy
netlify deploy --dir=dist

# For production:
netlify deploy --dir=dist --prod

# Your app will be at: https://your-site.netlify.app
'''

# netlify.toml configuration
netlify_toml = '''
[[headers]]
  for = "/*"
  [headers.values]
    Cross-Origin-Opener-Policy = "same-origin"
    Cross-Origin-Embedder-Policy = "require-corp"

[build]
  command = "npm run build"
  publish = "dist"
'''

print("üöÄ NETLIFY DEPLOYMENT")
print("="*70)
print(netlify_deploy)
print("\nüìÑ netlify.toml")
print("-"*70)
print(netlify_toml)

---

## Part 3: Model Card

A model card documents your model's capabilities, limitations, and ethical considerations.

In [None]:
# Model Card Template

model_card = '''
# Model Card: Matcha Expert

## Model Details

- **Model Name**: Matcha Expert
- **Model Type**: Causal Language Model (Chat)
- **Base Model**: Gemma 3 270M Instruct
- **Fine-tuning Method**: QLoRA (r=16, alpha=16)
- **Training Framework**: Unsloth + HuggingFace Transformers
- **Quantization**: INT4 (ONNX Runtime)
- **Model Size**: ~150-200MB (browser-ready)
- **Version**: 1.0.0
- **Date**: [DATE]
- **Author**: [YOUR NAME]

## Intended Use

### Primary Use Cases
- Educational information about matcha tea
- Preparation guidance and techniques
- Quality assessment help
- Recipe suggestions
- Cultural context about Japanese tea traditions

### Out-of-Scope Uses
- Medical advice (not a substitute for healthcare professionals)
- General conversation beyond matcha topics
- Commercial recommendations or endorsements
- Other types of tea beyond matcha

## Training Data

- **Dataset Size**: [X] examples
- **Data Sources**: Curated domain knowledge
- **Categories**:
  - Matcha grades and quality: [X]%
  - Preparation methods: [X]%
  - Health benefits: [X]%
  - Cultural context: [X]%
  - Recipes: [X]%
  - Storage and buying guide: [X]%

### Data Processing
- All examples reviewed for accuracy
- Balanced across categories
- No PII or sensitive data

## Training Procedure

- **Hardware**: NVIDIA DGX Spark (128GB unified memory)
- **Training Time**: ~[X] minutes
- **Epochs**: 3
- **Batch Size**: 2 (effective 8 with gradient accumulation)
- **Learning Rate**: 2e-4 with cosine schedule
- **Final Training Loss**: [X]
- **Validation Loss**: [X]

## Evaluation

### Quantitative Metrics
| Metric | Value |
|--------|-------|
| Training Loss | [X] |
| Validation Loss | [X] |
| Perplexity (Base) | [X] |
| Perplexity (Fine-tuned) | [X] |

### Qualitative Assessment
- Accuracy on domain questions: [X/10]
- Response quality: [X/10]
- Factual correctness: [X/10]

### Browser Performance
| Device | Backend | Tokens/sec |
|--------|---------|------------|
| [YOUR GPU] | WebGPU | [X] |
| [LAPTOP] | WASM | [X] |

## Limitations

- **Knowledge Cutoff**: Training data reflects knowledge as of [DATE]
- **Domain Scope**: Limited to matcha tea; may not perform well on other topics
- **Hallucination Risk**: May occasionally generate plausible but incorrect information
- **Language**: Primarily trained on English content
- **Performance**: Slower on devices without WebGPU support

## Ethical Considerations

### Potential Benefits
- Privacy-preserving (runs locally)
- No ongoing costs for users
- Educational value
- Accessible without internet (after initial load)

### Potential Risks
- Health information should not replace professional advice
- May perpetuate biases in training data
- Could provide incorrect information if asked beyond training scope

### Mitigations
- Clear disclaimers in the UI
- Focused training on verified information
- Regular evaluation and updates

## How to Use

### Browser (Transformers.js)
```javascript
import { pipeline } from '@huggingface/transformers';

const generator = await pipeline(
  'text-generation',
  'https://your-s3-url/matcha-expert',
  { device: 'webgpu', dtype: 'q4' }
);

const response = await generator([
  { role: 'system', content: 'You are a matcha expert.' },
  { role: 'user', content: 'What is ceremonial grade?' }
]);
```

## Citation

```bibtex
@misc{matcha-expert-2024,
  author = {[YOUR NAME]},
  title = {Matcha Expert: A Browser-Deployed Fine-Tuned LLM},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/yourusername/matcha-expert}
}
```

## License

[Specify license - e.g., MIT, Apache 2.0, or match base model license]

## Contact

- **Author**: [YOUR NAME]
- **Email**: [YOUR EMAIL]
- **GitHub**: [YOUR GITHUB]
'''

print("üìÑ MODEL CARD")
print("="*70)
print(model_card[:3000] + "\n...")

---

## Part 4: Technical Report Outline

In [None]:
# Technical Report Outline

report_outline = '''
# Technical Report: Browser-Deployed Fine-Tuned LLM

## Abstract (1 paragraph)
- Brief description of the project
- Key results and contributions

## 1. Introduction (1-2 pages)
- Problem statement: Why browser LLMs?
- Motivation: Zero cost, privacy, edge deployment
- Project goals and scope
- Document structure overview

## 2. Background (1-2 pages)
- QLoRA fine-tuning
- ONNX and quantization
- Browser ML (WebGPU, WASM)
- Related work

## 3. System Design (2-3 pages)
- Overall architecture diagram
- Training pipeline (DGX Spark)
- Optimization pipeline
- Deployment architecture
- Technology choices and rationale

## 4. Implementation (3-4 pages)
### 4.1 Dataset Preparation
- Data collection and curation
- Format and structure
- Quality assurance

### 4.2 Fine-Tuning
- Model selection
- QLoRA configuration
- Training procedure
- MLflow tracking

### 4.3 Model Optimization
- LoRA merging
- ONNX export
- INT4 quantization

### 4.4 Browser Integration
- React application
- Transformers.js integration
- WebGPU optimization

## 5. Evaluation (2-3 pages)
### 5.1 Training Metrics
- Loss curves
- Perplexity comparison

### 5.2 Quality Assessment
- Domain-specific evaluation
- Comparison: Base vs Fine-tuned

### 5.3 Performance Benchmarks
- Inference speed by device
- Memory usage
- Loading time

### 5.4 User Experience
- Browser compatibility
- Loading experience
- Response quality feedback

## 6. Discussion (1-2 pages)
- What worked well
- Challenges and solutions
- Lessons learned
- Limitations

## 7. Conclusion (1 page)
- Summary of achievements
- Future work
- Final thoughts

## References

## Appendices
- A: Complete model card
- B: Sample conversations
- C: Full code listings
- D: Deployment checklist
'''

print("üìÑ TECHNICAL REPORT OUTLINE")
print("="*70)
print(report_outline)

---

## Part 5: Presentation Outline

In [None]:
# Presentation Outline

presentation_outline = '''
# Matcha Expert: Browser-Deployed Fine-Tuned LLM
## Presentation Outline (15-20 slides, 15-20 minutes)

### Slide 1: Title
- Project name and tagline
- Your name
- Date

### Slide 2: The Problem
- LLMs are expensive to host
- Privacy concerns with cloud APIs
- Not everyone has GPU access

### Slide 3: The Solution
- Train once (DGX Spark)
- Deploy everywhere (browser)
- Run locally (zero cost)

### Slide 4: Architecture Overview
- Visual diagram of the pipeline
- Train ‚Üí Optimize ‚Üí Deploy

### Slide 5: Why Matcha?
- Defined domain
- Rich vocabulary
- Safe topic
- Practical value

### Slide 6: Dataset Creation
- 150+ examples
- 8 categories
- Quality over quantity

### Slide 7: QLoRA Fine-Tuning
- Why QLoRA?
- Configuration
- Training on DGX Spark

### Slide 8: Model Optimization
- Merge in BF16 (critical!)
- ONNX export
- INT4 quantization

### Slide 9: Size Comparison
- Visual chart: 2GB ‚Üí 500MB
- 75% compression

### Slide 10: Browser Integration
- Transformers.js
- WebGPU acceleration
- WASM fallback

### Slide 11: Live Demo
- Show the chatbot
- Ask sample questions
- Highlight local execution

### Slide 12: Evaluation Results
- Training metrics
- Quality comparison
- Performance benchmarks

### Slide 13: Deployment
- S3 for model hosting
- Vercel for app
- Cost: ~$0/month

### Slide 14: Challenges & Solutions
- Technical hurdles faced
- How you solved them

### Slide 15: Lessons Learned
- Key takeaways
- What you'd do differently

### Slide 16: Future Work
- Larger models
- More domains
- Mobile optimization

### Slide 17: Conclusion
- Summary of achievements
- Impact and value

### Slide 18: Questions?
- Contact info
- Demo URL
- GitHub link
'''

print("üìÑ PRESENTATION OUTLINE")
print("="*70)
print(presentation_outline)

---

## Part 6: Demo Video Script

In [None]:
# Demo Video Script

video_script = '''
# Matcha Expert Demo Video Script
## Duration: 5-10 minutes

### Introduction (30 seconds)
"Hi, I'm [NAME] and this is Matcha Expert - an AI chatbot that runs 
entirely in your browser, with zero server costs and complete privacy.

Let me show you how it works."

### The Problem (45 seconds)
"Traditional LLM deployment requires expensive GPU servers. 
Users' data goes to the cloud. And there's a continuous hosting cost.

What if we could train once and let users run the model themselves?"

### Live Demo (2-3 minutes)
1. Open the website
2. Show loading process ("First time downloads ~500MB, then it's cached")
3. Ask: "What's the difference between ceremonial and culinary grade matcha?"
4. Show response quality
5. Ask: "How should I store matcha?"
6. Show Chrome DevTools - Network tab ("See? No API calls!")

### Technical Deep Dive (2 minutes)
1. Show training notebook
2. Highlight QLoRA configuration
3. Show size comparison chart
4. Explain INT4 quantization
5. Show Transformers.js code

### Key Metrics (30 seconds)
- Training: X minutes on DGX Spark
- Model size: 500MB (was 2GB)
- Inference: X tokens/second
- Hosting cost: $0/month

### Conclusion (30 seconds)
"Matcha Expert demonstrates that browser LLMs are practical today.

The same pipeline works for any domain - customer support, education, 
specialized assistants.

Try it yourself at [URL]. Thanks for watching!"

---

## Recording Tips

1. Use screen recording software (OBS, Loom, QuickTime)
2. Clean browser with minimal tabs
3. Pre-load the model to avoid waiting during demo
4. Prepare questions in advance
5. Keep it concise and engaging
6. Add captions for accessibility
'''

print("üìÑ DEMO VIDEO SCRIPT")
print("="*70)
print(video_script)

---

## Part 7: Final Checklist

In [None]:
# Final Capstone Checklist

final_checklist = '''
# Option E Capstone: Final Checklist

## Artifacts
- [ ] Training dataset (150+ examples)
- [ ] LoRA adapters (safetensors)
- [ ] Merged model (BF16)
- [ ] GGUF model (for Ollama)
- [ ] ONNX INT4 model (for browser)

## Code
- [ ] Dataset preparation notebook
- [ ] Training notebook with MLflow
- [ ] Merge and export script
- [ ] ONNX quantization script
- [ ] React web application
- [ ] S3 upload script

## Deployment
- [ ] S3 bucket with CORS
- [ ] Model files uploaded
- [ ] Static site deployed
- [ ] Working demo URL

## Documentation
- [ ] Model card (complete)
- [ ] Technical report (15-20 pages)
- [ ] README with setup instructions
- [ ] Presentation slides (15-20)
- [ ] Demo video (5-10 min)

## Quality Checks
- [ ] Model generates accurate responses
- [ ] Browser demo works in Chrome/Edge
- [ ] WASM fallback works in Firefox
- [ ] Loading experience is smooth
- [ ] Error handling is user-friendly

## Grading Criteria (Self-Assessment)

| Criteria | Points | Self-Score | Notes |
|----------|--------|------------|-------|
| Dataset Quality | 15 | | |
| Training Pipeline | 20 | | |
| Optimization Pipeline | 15 | | |
| Browser Integration | 20 | | |
| Deployment | 10 | | |
| Documentation | 10 | | |
| Evaluation | 5 | | |
| Innovation | 5 | | |
| **TOTAL** | **100** | | |
'''

print("üìã FINAL CHECKLIST")
print("="*70)
print(final_checklist)

---

## Capstone Complete!

Congratulations! You've built a complete browser-deployed LLM:

- ‚úÖ Created a domain-specific training dataset
- ‚úÖ Fine-tuned with QLoRA on DGX Spark
- ‚úÖ Merged and optimized for browser deployment
- ‚úÖ Built a React application with Transformers.js
- ‚úÖ Deployed with zero ongoing costs
- ‚úÖ Documented your work professionally

**You are now AI-ready!**

---

In [None]:
# Save documentation templates
from pathlib import Path

docs_dir = Path("./matcha-expert/docs")
docs_dir.mkdir(parents=True, exist_ok=True)

# Save model card
with open(docs_dir / "MODEL_CARD.md", 'w') as f:
    f.write(model_card)

# Save report outline
with open(docs_dir / "REPORT_OUTLINE.md", 'w') as f:
    f.write(report_outline)

# Save presentation outline  
with open(docs_dir / "PRESENTATION_OUTLINE.md", 'w') as f:
    f.write(presentation_outline)

# Save video script
with open(docs_dir / "VIDEO_SCRIPT.md", 'w') as f:
    f.write(video_script)

# Save checklist
with open(docs_dir / "FINAL_CHECKLIST.md", 'w') as f:
    f.write(final_checklist)

print(f"‚úÖ Documentation templates saved to {docs_dir}")
print("\nüìÅ Files created:")
for f in sorted(docs_dir.iterdir()):
    print(f"   {f.name}")

print("\nüéâ CAPSTONE COMPLETE!")
print("\nüçµ You've successfully built a browser-deployed fine-tuned LLM!")
print("   Share your demo and inspire others!")