# Tutorial 3: Your First Submission - Get on the Leaderboard in 15 Minutes!

Let's create a first submission and get on the leaderboard.

## Today's mission

- Submit SOMETHING that works (even if it's terrible)
- Learn why notebooks suck for production (and how to fix it)
- Write actual tests (yes, even for a hackathon)
- Understand the scoring by failing fast

Your first submission will score terribly. Ship it anyway. You learn more from one bad submission than from endless planning.

---

## Part 1: Project Structure (The Right Way™)

Remember Tutorial 2 where we just dumped everything in the notebook? Yeah, that doesn't scale. Let's be slightly more professional.

**Our structure:**
```
GDSC-8/
├── src/           # Reusable code goes here
│   └── utils.py   # We've created this for you!
├── tests/         # Yes, we're writing tests
│   └── test_utils.py
└── tutorials/     # Notebooks for exploration
```

**📝 Note**: We've created `src/utils.py` for you with submission and validation functions. This keeps the tutorial focused on the matching logic, but definitely peek at the source code to understand what's happening under the hood!

**Why this structure matters:**
- Notebooks = great for exploration, terrible for production
- Modules = testable, reusable, version-controllable
- Tests = confidence that your code actually works

Let's import our utilities:

In [1]:
# Setup imports - the professional way
import sys

# Add parent directory to path so we can import from src/
sys.path.append('..')

# Import our utilities
from src.utils import (
    make_submission,
    save_json,
    validate_submission_format,
    sanity_check
)

print("✅ Imported from src/ like a pro")
print("💡 Production tip: Keep reusable code in modules, not notebooks")

✅ Imported from src/ like a pro
💡 Production tip: Keep reusable code in modules, not notebooks


## Quick Sanity Check

Before we submit anything, let's make sure our AWS connection works:

In [2]:
# Check API connection
# TODO for testers: If this fails, check AWS credentials and network access

if sanity_check():
    print("\n🎯 Ready to submit!")
else:
    print("\n⚠️ You are not connected to the API")
    print("\nIf you are using Sagemaker Notebook the AWS credentials should be set by default and the sanity check should work. If not, contact the organization team.")
    print("\nIf you are running this locally, ensure your AWS credentials are configured correctly. Run aws sts get-caller-identity to verify.")

❌ API check failed with status: 403

⚠️ Fix your AWS credentials first!
Check Tutorial 1 for setup instructions
💡 Tip: Use Copilot (Ctrl+I) to help debug connection issues


## Part 2: The World's Laziest Matcher

Let's build the absolute minimum viable submission. Everyone gets the same job. Zero intelligence. Zero API calls. Maximum speed.

**Why start here?**
- Understand the submission format
- Test the full pipeline
- Get that psychological win of being on the leaderboard
- Establish a baseline (spoiler: it'll be bad)

## Understanding the Scoring System (This is Critical!)

Before we build anything, let's understand how your submission is actually scored. This knowledge alone can boost your score by 10-20%.

### The Two-Part Scoring Formula

Your final score = **50% Type Accuracy + 50% Recommendation Accuracy**

**Part 1: Type Accuracy (50% of your score)**
- Did you correctly predict if the persona needs `jobs+trainings`, `trainings_only`, or `awareness`?
- This is binary - you either get it right (1.0) or wrong (0.0)
- **Hidden gotcha**: Minors (age < 16) should ALWAYS get `awareness` type with `predicted_items: "too_young"`

**Part 2: Recommendation Accuracy (50% of your score)**
How this is calculated depends on the type:

For `jobs+trainings`:
- 25% of total score: F1 score on job matches
- 25% of total score: F1 score on training suggestions per job
- Yes, training suggestions matter that much!

For `trainings_only`:
- 50% of total score: F1 score on training recommendations

For `awareness`:
- 50% of total score: Exact match on reason (e.g., "too_young")

### The Critical Insight

**If you get the type wrong, your recommendation score is ZERO!**

Example: You recommend perfect jobs for someone, but they actually needed `trainings_only`. Your score for that persona: 0%.

This is why understanding personas is crucial. Getting the type right is literally half your score.

### Quick Win Strategy

1. **Check ages first** - Anyone under 16 → `awareness` with `predicted_items: "too_young"`
2. **Default to `jobs+trainings`** - Most personas want jobs
3. **Always include training suggestions** - They're 25% of your score for job recommendations!

Now let's build our submissions with this knowledge...

In [3]:
def lazy_matcher_v1():
    """
    The laziest possible solution that still works.
    Everyone gets job_001. No personalization. No intelligence.

    This is your baseline - everything else should beat this.
    """
    results = []

    for i in range(1, 101):
        results.append({
            "persona_id": f"persona_{i:03}",
            "predicted_type": "jobs+trainings",
            "jobs": [
                {
                    "job_id": "job_001",
                    "suggested_trainings": []  # No training suggestions
                }
            ]
        })

    return results

# Generate our terrible predictions
results_v1 = lazy_matcher_v1()

print("📊 Lazy Matcher v1 Stats:")
print(f"  Predictions: {len(results_v1)}")
print(f"  Unique jobs recommended: 1")
print(f"  API calls made: 0")
print(f"  Cost: $0.00")
print(f"  Expected score: Terrible")
print(f"  Time to implement: 30 seconds")

📊 Lazy Matcher v1 Stats:
  Predictions: 100
  Unique jobs recommended: 1
  API calls made: 0
  Cost: $0.00
  Expected score: Terrible
  Time to implement: 30 seconds


## Part 3: Validate Before Submitting

**Pro tip**: Always validate your format before submitting. Catching errors locally is free. Debugging failed submissions is painful.

In [4]:
# Always validate before submitting!
try:
    validate_submission_format(results_v1)
    print("✅ Format is valid! Ready to submit")
except ValueError as e:
    print(f"❌ Format error: {e}")
    print("Fix this before submitting!")

✅ Validated 100 results - format is correct!
✅ Format is valid! Ready to submit


## Part 4: Save Your Work

**Best practice**: Always save your submissions. You'll want to compare different approaches later.

### Understanding the Submission Format

**What's JSONL?** It's just JSON objects, one per line. Perfect for streaming large datasets without loading everything into memory.

```jsonl
{"persona_id": "persona_001", "predicted_type": "jobs+trainings", "jobs": [...]}
{"persona_id": "persona_002", "predicted_type": "jobs+trainings", "jobs": [...]}
```

[Learn more about JSONL format](https://jsonlines.org/) or ask Copilot (Ctrl+I) to explain the difference between JSON and JSONL.

Our `save_json()` function handles the conversion automatically!

In [5]:
# Create submissions directory and save
save_json("../submissions/lazy_v1.json", results_v1)
print("💾 Saved to submissions/lazy_v1.json")
print("\n💡 Tip: Keep all your submissions for comparison")

💾 Saved to submissions/lazy_v1.json

💡 Tip: Keep all your submissions for comparison


## Part 5: Submit to Leaderboard!

This is it. The moment of truth. Let's get you on that leaderboard.

In [6]:
# Now for real - submit to the leaderboard!
# TODO for testers: Verify AWS endpoint is accessible from your network
# If you get timeout errors, check with the org team on Teams

print("🚀 ACTUAL SUBMISSION...")
response = make_submission(results_v1, dry_run=False)

if response and response.status_code == 200:
    print("\n🎉 CONGRATULATIONS! You're on the leaderboard!")
    print("Go check your score at: [leaderboard URL]")
    print("(Yes, it's probably terrible. That's the point!)")
else:
    print("\n😅 Something went wrong. Check the error message above.")
    print("Common issues:")
    print("  - Network timeout: Check your connection or VPN")
    print("  - 403 error: You've hit the daily submission limit")
    print("  - Format error: Run validation again")
    print("\nReach out on Teams if you need help!")

🚀 ACTUAL SUBMISSION...
✅ Validated 100 results - format is correct!
❌ Submission failed with status: 403
Response: {"message":"The security token included in the request is invalid."}

😅 Something went wrong. Check the error message above.
Common issues:
  - Network timeout: Check your connection or VPN
  - 403 error: You've hit the daily submission limit
  - Format error: Run validation again

Reach out on Teams if you need help!


### 🎉 Quick Win Celebration!

If your submission worked, **CELEBRATE!** You just:
- Went from idea to submission in ~15 minutes
- Beat everyone still reading documentation
- Got real feedback from a real system
- Established your baseline for improvement

**Your 1% score is a badge of honor** - it means you shipped! Share it in Teams with pride. Remember: every top competitor started with a terrible first submission.

Now let's make it better...

### Getting to know the GDSC API

You sent your results to an endpoint just now with make_submission() function from utils.py. To learn more about the GDSC API and its documentation and endpoints, check out the GDSC API Documentation.md

## Part 6: Real LLM Matcher - Load ALL Job Data

Now let's do something that actually works: load ALL 200 job descriptions and ask the LLM to make informed recommendations.

**Warning**: This will use ~50,000 tokens per API call (~$0.15-0.20). But at least it's based on real data!

In [7]:
# Import our Mistral helper from Tutorial 2
import os
import dotenv
import time

# Load API key
dotenv.load_dotenv("../.env")

# Check if we have the key
if not os.getenv("MISTRAL_API_KEY"):
    print("⚠️ No API key found. This version needs Mistral API.")
    print("Check Tutorial 2 for setup instructions")
else:
    print("✅ API key loaded")

    # Implementation with actual token tracking (like Tutorial 2)
    from strands import Agent
    from strands.models.mistral import MistralModel

    def call_mistral_with_metrics(prompt: str, model: str = "mistral-small-latest") -> dict:
        """Call Mistral API and track actual token usage and costs"""
        mistral_model = MistralModel(
            api_key=os.environ["MISTRAL_API_KEY"],
            model_id=model,
            stream=False
        )
        agent = Agent(model=mistral_model)
        start_time = time.time()

        try:
            response = agent(prompt)
            end_time = time.time()

            # Extract actual token counts and calculate real costs
            result = {
                "content": response.message['content'][0]['text'],
                "model": model,
                "duration": end_time - start_time,
                "input_tokens": response.metrics.accumulated_usage['inputTokens'],
                "output_tokens": response.metrics.accumulated_usage['outputTokens'],
                "total_tokens": response.metrics.accumulated_usage['totalTokens']
            }

            # Calculate actual costs based on Mistral pricing
            # mistral-small: $0.10 per 1M input tokens, $0.30 per 1M output tokens
            input_cost = (result['input_tokens'] / 1_000_000) * 0.10
            output_cost = (result['output_tokens'] / 1_000_000) * 0.30
            result['total_cost'] = input_cost + output_cost

            return result

        except Exception as e:
            print(f"❌ API call failed: {e}")
            return None

    print("✅ Mistral client ready with token tracking!")

✅ API key loaded
✅ Mistral client ready with token tracking!


In [None]:
def load_all_jobs():
    """Load all 200 job descriptions from the data folder."""
    import glob

    jobs_data = {}
    job_files = glob.glob("../data/jobs/*.md")

    print(f"📂 Loading {len(job_files)} job files...")

    for filepath in sorted(job_files):
        # Extract job ID from filename (e.g., job_acc_001.md -> job_acc_001)
        job_id = filepath.split('/')[-1].replace('.md', '')

        with open(filepath, 'r', encoding='utf-8') as f:
            content = f.read()
            jobs_data[job_id] = content

    print(f"✅ Loaded {len(jobs_data)} job descriptions")

    # Calculate approximate token count (rough estimate for planning)
    total_chars = sum(len(content) for content in jobs_data.values())
    approx_tokens = total_chars // 4  # Rough estimate: 1 token ≈ 4 chars

    print(f"📊 Total content size: {total_chars:,} characters")
    print(f"🎯 Estimated tokens: {approx_tokens:,}")
    print(f"⚠️  We'll see the ACTUAL token count after the API call")

    return jobs_data

In [9]:
def smart_matcher_with_real_data():
    """
    ACTUALLY smart matcher - loads real job data and asks LLM to recommend based on content.

    Yes, this is expensive. But at least it's not blind guessing!
    """
    # Load all job data
    jobs_data = load_all_jobs()

    # Build a massive prompt with all job descriptions
    prompt = """You are a career counselor for Brazilian youth interested in green jobs.

Below are 200 actual job descriptions. Based on these REAL jobs, pick the 5 jobs that would be most suitable for the widest range of young Brazilian job seekers (ages 18-25).

Choose jobs that:
- Have reasonable entry requirements
- Offer growth potential
- Are in the green/sustainability sector
- Are geographically distributed across Brazil

Just respond with the 5 job IDs, one per line, like:
job_xxx_001
job_xxx_002
job_xxx_003
job_xxx_004
job_xxx_005

Here are all 200 job descriptions:

"""

    # Add all job descriptions to prompt
    for job_id, content in jobs_data.items():
        prompt += f"\n=== {job_id} ===\n{content}\n"

    print(f"\n🚨 MASSIVE PROMPT ALERT!")
    print(f"   Prompt length: {len(prompt):,} characters")
    print(f"\n   This is the 'brute force' approach - let's see the actual cost...")

    # Make the API call with token tracking
    print("\n🤖 Making API call with ALL job data...")
    result = call_mistral_with_metrics(prompt)

    if not result:
        print("❌ API call failed!")
        return [], 0

    # Show actual token usage and costs
    print(f"\n📊 ACTUAL API USAGE:")
    print(f"   Input tokens: {result['input_tokens']:,}")
    print(f"   Output tokens: {result['output_tokens']:,}")
    print(f"   Total tokens: {result['total_tokens']:,}")
    print(f"   Duration: {result['duration']:.2f} seconds")
    print(f"\n💰 ACTUAL COST: ${result['total_cost']:.4f}")
    print(f"   (Input: ${(result['input_tokens'] / 1_000_000) * 0.10:.4f} + Output: ${(result['output_tokens'] / 1_000_000) * 0.30:.4f})")

    print(f"\n📝 Response preview: {result['content'][:200]}...")

    # Parse the response to get job IDs
    import re
    selected_jobs = re.findall(r'job_\w+_\d{3}', result['content'])

    print(f"\n✅ LLM recommended these {len(selected_jobs)} universal jobs:")
    for job in selected_jobs:
        print(f"   - {job}")

    # Build results for all 100 personas - everyone gets one of these 5 jobs
    results = []

    for i in range(1, 101):
        persona_id = f"persona_{i:03}"

        # Give everyone one of the 5 recommended jobs (rotate through them)
        if selected_jobs:
            selected_job = selected_jobs[i % len(selected_jobs)]
        else:
            selected_job = "job_001"  # Fallback if parsing failed

        results.append({
            "persona_id": persona_id,
            "predicted_type": "jobs+trainings",
            "jobs": [{
                "job_id": selected_job,
                "suggested_trainings": []  # Keep it simple - no training suggestions
            }]
        })

    return results, result['total_cost']

In [10]:
# Generate predictions with REAL data
print("=" * 60)
print("🎯 SMART MATCHER WITH REAL JOB DATA")
print("=" * 60)

results_v2, actual_cost = smart_matcher_with_real_data()

print("\n📊 Smart Matcher v2 Stats:")
print(f"  Total predictions: {len(results_v2)}")
print(f"  Unique jobs recommended: 5 (rotated across all personas)")
print(f"  API calls: 1 (but a HUGE one)")
print(f"  ACTUAL cost: ${actual_cost:.4f}")
print(f"  Expected score: Much better! (10-20%?)")
print(f"  Intelligence level: Knows the jobs, but not the personas")

print("\n🌱 Green Computing Note:")
print(f"  We just used 54k tokens for ONE submission!")
print(f"  For 100 personas individually, that would be 5.4M tokens")
print(f"  This is why optimization matters - both for cost AND environment")


🎯 SMART MATCHER WITH REAL JOB DATA
📂 Loading 200 job files...
✅ Loaded 200 job descriptions
📊 Total content size: 298,459 characters
🎯 Estimated tokens: 74,614
⚠️  We'll see the ACTUAL token count after the API call

🚨 MASSIVE PROMPT ALERT!
   Prompt length: 303,438 characters

   This is the 'brute force' approach - let's see the actual cost...

🤖 Making API call with ALL job data...
Based on the job descriptions provided, here are the 5 most suitable jobs for young Brazilian job seekers (ages 18-25) interested in green jobs:

job_acc_002
job_cul_003
job_des_003
job_fib_003
job_vis_002
📊 ACTUAL API USAGE:
   Input tokens: 54,703
   Output tokens: 73
   Total tokens: 54,776
   Duration: 2.82 seconds

💰 ACTUAL COST: $0.0055
   (Input: $0.0055 + Output: $0.0000)

📝 Response preview: Based on the job descriptions provided, here are the 5 most suitable jobs for young Brazilian job seekers (ages 18-25) interested in green jobs:

job_acc_002
job_cul_003
job_des_003
job_fib_003
job_vi...

✅ L

In [11]:
# Validate, save, and submit v2
validate_submission_format(results_v2)
save_json("../submissions/smart_v2.json", results_v2)

print("\n🚀 Submitting Smart Matcher v2...")
response = make_submission(results_v2)

if response and response.status_code == 200:
    print("\n📈 This should score MUCH better than the lazy matcher!")
    print("But look at that cost... time to optimize in Tutorial 4!")

✅ Validated 100 results - format is correct!

🚀 Submitting Smart Matcher v2...
✅ Validated 100 results - format is correct!
❌ Submission failed with status: 403
Response: {"error": "Submission limit of 2 reached for today for this team"}


## Your Homework - Real Exercises This Time!

### Exercise 1: Investigate the LLM's Job Choices
```python
def analyze_llm_job_picks():
    """
    In the smart matcher, the LLM picked specific jobs (maybe job_cul_003-007?).
    Your task: Actually READ those job files and figure out WHY.
    
    Questions to answer:
    - What do these jobs have in common?
    - Are they really entry-level friendly?
    - Do they mention sustainability/green aspects?
    - What locations do they cover?
    
    Bonus: Compare them to 5 random other jobs. What's different?
    """
    import glob
    
    # The jobs the LLM picked (update this based on your run!)
    llm_picks = ["job_cul_003", "job_cul_004", "job_cul_005", "job_cul_006", "job_cul_007"]
    
    # Load and analyze the job files
    for job_id in llm_picks:
        filepath = f"../data/jobs/{job_id}.md"
        # Your analysis code here
        pass
    
    # Compare with random jobs
    # What patterns do you see?
```

### Exercise 2: Cost vs Randomness
```python
def calculate_random_score():
    """
    If you randomly assigned 5 different jobs to each persona,
    what would your expected score be?
    
    Hint: Better than 1% (not everyone gets the same job)
          Worse than 20% (no intelligence applied)
    
    Assumptions:
    - 200 jobs total
    - Each persona gets 1 random job
    - What's the probability of a random match?
    """
    # Your code here
    # Think about: P(correct domain) * P(correct seniority) * P(location match) * ...
    pass
```

### Exercise 3: Smarter Lazy Matcher
```python
def lazy_matcher_v2():
    """
    Improve the lazy matcher without using any API calls!
    
    Ideas:
    - Assign different jobs based on persona_id
      e.g., persona_001-020 get job_001, persona_021-040 get job_002
    - Or use modulo: persona_id % 10 maps to different jobs
    - Test if this improves your score over everyone getting job_001
    """
    results = []
    
    # Your implementation here
    # Remember: No API calls allowed in this version!
    
    return results
```

### Exercise 4: Token Economics 
```python
def compare_api_strategies():
    """
    Calculate the cost of these approaches:
    
    a) One API call with ALL jobs (what we did): ~$0.006
    b) 100 API calls (one per persona) with 10 jobs each: ~$???
    c) 10 API calls with 10 personas each batched: ~$???
    d) Smart caching: Process jobs once, reuse for all personas: ~$???
    
    Which is most cost-effective? Show your work!
    """
    
    # Constants (from Mistral pricing)
    COST_PER_M_INPUT = 0.10  # Small model
    COST_PER_M_OUTPUT = 0.30
    AVG_JOB_TOKENS = 300
    AVG_PERSONA_TOKENS = 200
    AVG_OUTPUT_TOKENS = 100
    
    # Calculate costs for each approach
    # Your code here
    
    pass
```

### Exercise 5: Submission Analysis (Advanced)
```python
def analyze_my_submissions():
    """
    Load your saved submissions and compare them:
    - How many unique jobs does each recommend?
    - What's the overlap between submissions?
    - Can you predict which will score better?
    
    Bonus: Write a function to merge two submissions,
    taking the best predictions from each!
    """
    import json
    
    # Load submissions/lazy_v1.json and submissions/smart_v2.json
    # Compare and analyze
    
    pass
```

### Must Do Today:
1. **Submit at least 3 different approaches** (you've already done 2!)
2. **Complete Exercise 1** - Understand WHY the LLM chose those jobs
3. **Track your scores** in a spreadsheet or text file
4. **Share your scores in Teams** - embrace the terrible scores!

### Should Do:
1. **Complete Exercise 3** - Can you beat 1% without API calls?
2. **Complete Exercise 4** - Understand the cost tradeoffs
3. **Peek at src/utils.py** - Use Copilot to understand the functions

### Could Do:
1. **Read more job files manually** - What patterns do you notice?
2. **Think about Tutorial 4** - How would you get to 40% accuracy?
3. **Help someone else** - Share tips in the Teams channel

## Cost Tracking Summary

Let's be real about costs:

| Approach | API Calls | Actual Cost | Expected Score | Key Learning |
|----------|-----------|-------------|----------------|--------------|
| Lazy v1 | 0 | $0.00 | ~1% | Baseline - no intelligence |
| Smart with Real Data | 1 | ~$0.05-0.10 | ~10-20% | Understanding jobs helps a lot |
| Tutorial 4 | ? | ? | ? | We'll explore smarter approaches |

### Key Insights So Far:

1. **Zero to hero for cheap** - One API call with all job data gives huge improvement
2. **The brute force approach works** - But sending 50k tokens isn't elegant
3. **Rotation isn't personalization** - We're giving everyone the same 5 jobs
4. **Room for improvement** - What if we actually understood each persona?

### Ideas for Optimization (Tutorial 4):

- Talk to individual personas to understand their needs
- Cache and reuse information intelligently  
- Use smaller, focused prompts instead of massive ones
- Build actual matching logic based on skills and constraints

### The Real Challenge:

It's not about getting 100% accuracy. It's about finding the sweet spot:
- **Good enough accuracy** 
- **Reasonable cost**
- **Fast execution**

This is exactly what real consultants deal with: balancing quality, cost, and time.

---

## What's Next?

### Tutorial 4: Building Real LLM-Based Matching

Now that you understand the basics, Tutorial 4 will teach you:
- **AI Agents**: Conversation agents that actually talk to personas
- **Information Extraction**: Getting structured data from conversations
- **Intelligent Matching**: Real skill-based job matching
- **Cost Optimization**: Being smart about API usage

But for now, celebrate! You're on the leaderboard. You shipped code. You're ahead of 90% of participants who are still planning.

**Remember**: 
- Bad code that ships > Perfect code that doesn't
- Your score will improve dramatically in Tutorial 4
- The real learning happens through iteration

See you in Tutorial 4, where we'll build something actually intelligent! 🚀