# Tutorial 3: Your First Submission - Get on the Leaderboard in 15 Minutes!

Alright, forget the fancy AI stuff for now. Let's just get you on that leaderboard.

## Today's mission

- Submit SOMETHING that works (even if it's terrible) 
- Learn why notebooks suck for production (and how to fix it)
- Write actual tests (yes, even for a hackathon)
- Understand the scoring by failing fast

**Real talk**: Your first submission will score terribly. Ship it anyway. You learn more from one bad submission than from endless planning.

---

## Part 1: Project Structure (The Right Way™)

Remember Tutorial 2 where we just dumped everything in the notebook? Yeah, that doesn't scale. Let's be slightly more professional.

**Our new structure:**
```
GDSC-8/
├── src/           # Reusable code goes here
│   └── utils.py   # Submission, validation, I/O utilities
├── tests/         # Yes, we're writing tests
│   └── test_utils.py
└── tutorials/     # Notebooks for exploration
```

**Why this matters:**
- Notebooks = great for exploration, terrible for production
- Modules = testable, reusable, version-controllable
- Tests = confidence that your code actually works

Let's import our utilities:

In [None]:
# Setup imports - the professional way
import sys
import random
from pathlib import Path

# Add parent directory to path so we can import from src/
sys.path.append('..')

# Import our utilities
from src.utils import (
    send_results, 
    save_json, 
    read_json,
    validate_submission_format,
    sanity_check
)

print("✅ Imported from src/ like a pro")
print("💡 Production tip: Keep reusable code in modules, not notebooks")

## Quick Sanity Check

Before we submit anything, let's make sure our AWS connection works:

In [None]:
# Check API connection
if sanity_check():
    print("\n🎯 Ready to submit!")
else:
    print("\n⚠️ Fix your AWS credentials first!")
    print("Check Tutorial 1 for setup instructions")

## Part 2: The World's Laziest Matcher

Let's build the absolute minimum viable submission. Everyone gets the same job. Zero intelligence. Zero API calls. Maximum speed.

**Why start here?**
- Understand the submission format
- Test the full pipeline
- Get that psychological win of being on the leaderboard
- Establish a baseline (spoiler: it'll be bad)

In [None]:
def lazy_matcher_v1():
    """
    The laziest possible solution that still works.
    Everyone gets job_001. No personalization. No intelligence.
    
    This is your baseline - everything else should beat this.
    """
    results = []
    
    for i in range(1, 101):
        results.append({
            "persona_id": f"persona_{i:03}",
            "predicted_type": "jobs+trainings",
            "jobs": [
                {
                    "job_id": "job_001",
                    "suggested_trainings": []  # No training suggestions
                }
            ]
        })
    
    return results

# Generate our terrible predictions
results_v1 = lazy_matcher_v1()

print("📊 Lazy Matcher v1 Stats:")
print(f"  Predictions: {len(results_v1)}")
print(f"  Unique jobs recommended: 1")
print(f"  API calls made: 0")
print(f"  Cost: $0.00")
print(f"  Expected score: Terrible")
print(f"  Time to implement: 30 seconds")

## Part 3: Validate Before Submitting

**Pro tip**: Always validate your format before submitting. Catching errors locally is free. Debugging failed submissions is painful.

In [None]:
# Always validate before submitting!
try:
    validate_submission_format(results_v1)
    print("✅ Format is valid! Ready to submit")
except ValueError as e:
    print(f"❌ Format error: {e}")
    print("Fix this before submitting!")

## Part 4: Save Your Work

**Best practice**: Always save your submissions. You'll want to compare different approaches later.

In [None]:
# Create submissions directory and save
save_json("../submissions/lazy_v1.json", results_v1)
print("💾 Saved to submissions/lazy_v1.json")
print("\n💡 Tip: Keep all your submissions for comparison")

## Part 5: Submit to Leaderboard!

This is it. The moment of truth. Let's get you on that leaderboard.

In [None]:
# Dry run first - always test before the real thing
print("🔍 Dry run (validation only)...")
response = send_results(results_v1, dry_run=True)
print()

In [None]:
# Now for real - submit to the leaderboard!
print("🚀 ACTUAL SUBMISSION...")
response = send_results(results_v1, dry_run=False)

if response and response.status_code == 200:
    print("\n🎉 CONGRATULATIONS! You're on the leaderboard!")
    print("Go check your score at: [leaderboard URL]")
    print("(Yes, it's probably terrible. That's the point!)")
else:
    print("\n😅 Something went wrong. Check the error message above.")

## Part 6: Quick Iteration - Random Matcher

Your lazy matcher probably scored around 1-5%. Let's try something slightly smarter: random assignments. Still no API calls, but at least there's variety.

In [None]:
def random_matcher_v2():
    """
    Slightly less lazy - random assignments.
    
    Some personas are too young (awareness).
    Some need training first.
    Most get random jobs.
    """
    # Sample job and training IDs (in reality, we have 200 jobs and 467 trainings)
    job_ids = [f"job_{i:03}" for i in range(1, 21)]  # Use first 20 jobs
    training_ids = [f"training_{i:03}" for i in range(1, 31)]  # First 30 trainings
    
    results = []
    
    for i in range(1, 101):
        persona_id = f"persona_{i:03}"
        
        # 10% chance of being too young (awareness)
        if random.random() < 0.1:
            results.append({
                "persona_id": persona_id,
                "predicted_type": "awareness",
                "predicted_items": "too_young"
            })
        
        # 20% need training only
        elif random.random() < 0.3:
            results.append({
                "persona_id": persona_id,
                "predicted_type": "trainings_only",
                "trainings": random.sample(training_ids, k=min(3, len(training_ids)))
            })
        
        # 70% get job recommendations
        else:
            # Pick 1-3 random jobs
            num_jobs = random.randint(1, 3)
            selected_jobs = random.sample(job_ids, k=min(num_jobs, len(job_ids)))
            
            jobs = []
            for job_id in selected_jobs:
                # Sometimes suggest trainings (50% chance)
                if random.random() < 0.5:
                    suggested = random.sample(training_ids, k=random.randint(0, 2))
                else:
                    suggested = []
                    
                jobs.append({
                    "job_id": job_id,
                    "suggested_trainings": suggested
                })
            
            results.append({
                "persona_id": persona_id,
                "predicted_type": "jobs+trainings",
                "jobs": jobs
            })
    
    return results

# Generate random predictions
results_v2 = random_matcher_v2()

# Analyze the distribution
types = [r['predicted_type'] for r in results_v2]
print("📊 Random Matcher v2 Stats:")
print(f"  Total predictions: {len(results_v2)}")
print(f"  Jobs+trainings: {types.count('jobs+trainings')}")
print(f"  Trainings only: {types.count('trainings_only')}")
print(f"  Awareness: {types.count('awareness')}")
print(f"  API calls: 0")
print(f"  Cost: $0.00")
print(f"  Expected score: Still bad, but better than v1")

In [None]:
# Validate and save v2
validate_submission_format(results_v2)
save_json("../submissions/random_v2.json", results_v2)

# Submit v2
print("\n🚀 Submitting Random Matcher v2...")
response = send_results(results_v2)

if response and response.status_code == 200:
    print("\n📈 Check if you improved! Even 1% better is progress.")

## Part 7: One API Call Wonder

Alright, let's use ONE Mistral API call to be slightly smarter. We'll ask the LLM to pick versatile jobs that work for many personas.

In [None]:
# Import our Mistral helper from Tutorial 2
import os
import dotenv

# Load API key
dotenv.load_dotenv("../.env")

# Check if we have the key
if not os.getenv("MISTRAL_API_KEY"):
    print("⚠️ No API key found. This version needs Mistral API.")
    print("Skipping smart matcher - use random matcher instead.")
else:
    print("✅ API key loaded")
    
    # Quick implementation of call_mistral for this tutorial
    from strands.agent import Agent
    from strands.models.mistral import MistralModel
    
    def quick_mistral_call(prompt: str) -> str:
        """Super simple Mistral call - just get a response."""
        model = MistralModel(
            api_key=os.environ["MISTRAL_API_KEY"],
            model_id="mistral-small-latest",
            stream=False
        )
        agent = Agent(model=model)
        response = agent(prompt)
        return str(response)

In [None]:
def smart_lazy_matcher_v3():
    """
    Use ONE API call to pick good default jobs.
    Then assign them semi-randomly.
    
    Still lazy, but at least the jobs might be relevant.
    """
    if not os.getenv("MISTRAL_API_KEY"):
        print("No API key - falling back to random")
        return random_matcher_v2()
    
    # ONE API call to pick versatile jobs
    prompt = """
    We have 200 green jobs in Brazil (job_001 to job_200) for young people.
    Pick 5 job IDs that would work for beginners interested in sustainability.
    Just list the IDs like: job_001, job_002, etc.
    """
    
    print("🤖 Making our ONE API call...")
    response = quick_mistral_call(prompt)
    print(f"Response: {response[:100]}...")
    
    # Parse job IDs from response (with fallback)
    import re
    job_matches = re.findall(r'job_\d{3}', response)
    
    if not job_matches:
        # Fallback if parsing fails
        job_matches = [f"job_{i:03}" for i in [1, 5, 10, 15, 20]]
        print("⚠️ Couldn't parse response, using defaults")
    else:
        print(f"✅ Found {len(job_matches)} job recommendations")
    
    # Similarly for trainings (or just use defaults)
    training_ids = [f"training_{i:03}" for i in range(1, 11)]
    
    # Now distribute these "smart" picks across personas
    results = []
    
    for i in range(1, 101):
        persona_id = f"persona_{i:03}"
        
        # Still use some randomness for variety
        if i % 10 == 0:  # Every 10th persona
            # Awareness (too young)
            results.append({
                "persona_id": persona_id,
                "predicted_type": "awareness",
                "predicted_items": "too_young"
            })
        elif i % 5 == 0:  # Every 5th persona
            # Training only
            results.append({
                "persona_id": persona_id,
                "predicted_type": "trainings_only",
                "trainings": random.sample(training_ids, k=2)
            })
        else:
            # Jobs from our "smart" selection
            selected_job = random.choice(job_matches)
            results.append({
                "persona_id": persona_id,
                "predicted_type": "jobs+trainings",
                "jobs": [
                    {
                        "job_id": selected_job,
                        "suggested_trainings": random.sample(training_ids, k=1)
                    }
                ]
            })
    
    return results

# Generate "smart" predictions
results_v3 = smart_lazy_matcher_v3()

print("\n📊 Smart Lazy Matcher v3 Stats:")
print(f"  Total predictions: {len(results_v3)}")
print(f"  API calls: 1")
print(f"  Estimated cost: ~$0.0001")
print(f"  Expected score: Slightly better?")
print(f"  Intelligence level: Barely any")

In [None]:
# Validate, save, and submit v3
validate_submission_format(results_v3)
save_json("../submissions/smart_lazy_v3.json", results_v3)

print("\n🚀 Submitting Smart Lazy Matcher v3...")
response = send_results(results_v3)

if response and response.status_code == 200:
    print("\n💡 Even tiny improvements count!")

## Part 8: Testing Your Code

**Production mindset**: Even competition code needs tests. They save debugging time and catch silly mistakes.

In [None]:
# Run our test suite
!cd .. && python -m pytest tests/test_utils.py -v

In [None]:
# Run with coverage to see what we're testing
!cd .. && python -m pytest tests/ --cov=src --cov-report=term-missing

## Part 9: The Psychology of Progress

### What we just accomplished:

1. **You're on the leaderboard** - That's huge! Most people are still planning.
2. **You understand the system** - Format, submission process, scoring.
3. **You have a baseline** - Everything from here is improvement.
4. **You built infrastructure** - Tests, utilities, proper structure.

### Your scores so far:

| Version | API Calls | Cost | Expected Score | Lesson |
|---------|-----------|------|----------------|--------|
| Lazy v1 | 0 | $0.00 | ~1% | Baseline established |
| Random v2 | 0 | $0.00 | ~3% | Variety helps |
| Smart Lazy v3 | 1 | ~$0.0001 | ~5% | Tiny intelligence helps |

### Why starting simple matters:

- **Momentum > Perfection**: You've already submitted 3 times while others are still reading docs
- **Learning by doing**: Each submission teaches you something
- **Fail fast, improve faster**: Your 4th submission will be way better
- **Psychological wins**: Being on the leaderboard motivates you to improve

### Common mistakes to avoid:

1. **Over-engineering v1**: Don't build a complex system before understanding the problem
2. **Ignoring tests**: That one typo can waste hours of compute
3. **Not saving submissions**: You'll want to analyze what worked later
4. **Perfectionism**: Ship garbage, then iterate

---

## Your Homework

### Must do:
1. Submit at least 3 different approaches today
2. Track your scores in a spreadsheet
3. Share your (terrible) scores in the Teams channel - embrace it!

### Should do:
1. Write a test for your matching logic
2. Create `src/matchers.py` for your better algorithms
3. Try using 2-3 API calls strategically

### Could do:
1. Analyze which personas you're getting wrong
2. Look at the job/training data files
3. Start thinking about Tutorial 4's intelligent approach

---

## Cost Tracking Summary

Let's be real about costs:

| Approach | API Calls | Total Cost | Score | Cost per % |
|----------|-----------|------------|-------|------------|
| Lazy v1 | 0 | $0.00 | ~1% | $0.00 |
| Random v2 | 0 | $0.00 | ~3% | $0.00 |
| Smart Lazy v3 | 1 | ~$0.0001 | ~5% | $0.00002 |
| Tutorial 4 (preview) | ~200 | ~$0.20 | ~40% | $0.005 |

**Key insight**: Going from 0% to 5% costs almost nothing. Going from 40% to 60% will cost way more.

---

## What's Next?

### Tutorial 4: Building Real LLM-Based Matching

Now that you understand the basics, Tutorial 4 will teach you:
- **AI Agents**: Conversation agents that actually talk to personas
- **Information Extraction**: Getting structured data from conversations
- **Intelligent Matching**: Real skill-based job matching
- **Cost Optimization**: Being smart about API usage

But for now, celebrate! You're on the leaderboard. You shipped code. You're ahead of 90% of participants who are still planning.

**Remember**: 
- Bad code that ships > Perfect code that doesn't
- Your score will improve dramatically in Tutorial 4
- The real learning happens through iteration

See you in Tutorial 4, where we'll build something actually intelligent! 🚀