<h1 align='center'>✨ Welcome to the Synthetic Data AI Agents Challenge ✨</h1>
<h2 align='center'>Hosted by AMD, Pytorch, and Unsloth</h2>


---
## Task
You will be building:
1.  **A question agent** that will ask $N$ puzzle-based questions based on provided [topics](./assets/topics.json).
    - Create your model in [question_model.py](./agents/question_model.py) (it will be called by [question_agent.py](./agents/question_agent.py) for evaluation)
    - *Your question agent must output questions in the format specified in [sample_question.json](./assets/sample_question.json)*.
2. **An answer agent** that answers questions asked from a question agent.
    -  Create your model in [answer_model.py](./agents/answer_model.py) (it will be called by [answer_agent.py](./agents/answer_agent.py) for evaluation)
    -  *Your answer agent must output answers in the format specified in [sample_answer.json](./assets/sample_answer.json)*.
---


## Instructions

1. Read through this README.ipynb for more details on the challenge.
    - **Note:** If members of your team are working from the notebook simultaneously, please coordinate to ensure you do not overwrite each other's work.
1. Check out our [Synthetic Data Generation and Unsloth Tutorial](./tutorial.ipynb) for training tips and tricks.

## 📚 Table of Contents:
- 📝 [Task](#task)
- ⚙️ [Instructions](#instructions)
- 🏏 [Tournament Overview](#tournament-overview)
- 📋 [Guidelines](#guidelines)
    - [Format](#format-overview)
- 🛠️ [Submission](#️what-you-will-submit)
- ⚠️ [Restrictions](#restrictions)
- 📂 [Directory & Files overview](#directory--files-overview)
- 🎮 [Getting started](#getting-started)
    - 🚀 [Env Setup](#env-setup)
    - 🤔 [Q-Agent](#q-agent)
        - ✅ [Basic format-checks for questions from Q-agent](#basic-format-checks-for-questions-from-q-agent)
    - 🤖 [A-agent](#a-agent)
        - ✅ [Basic format-checks for answers from A-agent](#basic-format-checks-for-answers-from-a-agent)
- 🏅 [Evaluation](#evaluation)
    - 📊 [Scoring Criteria](#scoring-criteria)
    - 🧮 [Scoring Example](#scoring-example)
- ⏱ [Time Limit](#time-limit)
<!-- - 🏆 [LeaderBoard UI/UX](#leaderboard-uiux) -->

## Tournament Overview
<!-- 🏏  -->
* All matches in this tournament will be **1v1** knockout format where two teams, Team-A vs Team-B, will compete with their Q-agent (question agent) and A-agent (answer agent). You can think of this as a cricket match or baseball game where teams will switch sides.
* Like in cricket, each match has two innings:
    -   1st inning:
        *   $N$ Question from the Q-agent (Team-A) and their corresponding $N$ answers from the A-agent (Team-B).
        *   Q-agent score (Team-A): Say, $40$
        *   A-agent score (Team-B): $60$

    -   2nd inning:
        *   $N$ Question from the Q-agent (Team-B) and their respective $N$ responses from the A-agent (Team-A).
        *   Q-agent score (Team-B): Say, $70$
        *   A-agent score (Team-A): $30$
    -   Final Score:
        *   Team-A score $=$ 1st inning Q-agent score $+$ 2nd inning A-agent score $= 40 + 30 = 70$
        *   Team-B score $=$ 1st inning A-agent score $+$ 2nd inning Q-agent score $= 60 + 70 = 130$

    -   Winner: **Team-B** with a score of $130$.

For more info on how scoring is done, refer to the [scoring criteria section](#scoring-criteria).


## Guidelines
<!-- 📋  -->

### Format
We will only consider responses from the Q-agent and the A-agent which follow the below format.

*Note*: While having an explanation/reasoning is a plus, not having them doesn't disqualify the question or answer being correct.

#### Q-Agent
Given a topic, the Q-agent should generate questions in the specified JSON format:

```json
{
    "topic": "<Topic of the Question>",
    "question": "<full question text>",
    "choices": [
        "A) <choice A text>",
        "B) <choice B text>",
        "C) <choice C text>",
        "D) <choice D text>"
    ],
    "answer": "<correct choice letter only>",
    "explanation": "brief explanation within 100 words for why the answer is correct"
}
```

The **"Topic"**, **"Question"**, **"Choices"**, and **"Answer"** will be verified for correctness.

#### A-Agent
Given a Question and Choices, A-agent should produce answer in the format of:

```json
{
    "answer": "<correct choice letter only>",
    "reasoning": "brief reasoning within 100 words for why the answer is correct"
}
```

The **"Answer"** key will be compared with **"Answer"** from the opponent's Q-agent.

## Submission
<!-- 🛠️  -->
You need to submit your code which should contain these main files:
1. All work must be within the `AIAC` folder. Do NOT change the folder name.
1. No need to upload anything anywhere, we'll collect your agent code from your Jupyter Server at the end of the challenge.
   1. The agents will be called by `python -m agents.question_agent` and `python -m agents.answer_agent`, respectively.
1. ENSURE model checkpoint(s) (e.g., `model.safetensors` or `.pt` or `.pth`) is(are) loading and expected files are getting generated from Q-agent and A-agent, when inference is done.
   1. Outputs must be saved to `outputs/questions.json` and `outputs/answers.json`, respectively.

You can test your submission by running the commands in the [Getting Started](#getting-started) section.


## Restrictions
<!-- ⚠️ -->

1.  **<span style="color: red">NO</span> LAST Minute Submission**: The submission deadline is strict. Any changes to your code after the deadline may disqualify your submission.
1.  RAG (Retrieval Augmented Generation) techniques are not allowed.
1.  Adversarial approaches will lead to disqualification, e.g. making A-agents hallucinate.
1.  Only English language is allowed for both Q-agent and A-agent.
1.  Strictly stay within the `max_tokens` limits specified in `agen.yaml` & `qgen.yaml`. Other parameters can be changed.
1.  Questions must pertain to the topics listed in `topics.json`.
1.  Each question should be generated under `10 secs`. Questions exceeding this limit will not be considered.
1.  Each answer should be generated under `6 secs`. Answers exceeding this limit will not be considered.

Feel free to reach out in the Discord channel for any clarifications or questions!

## Directory & Files overview
<!-- 📂  -->

```plaintext
.
├── agents
│   ├── question_model.py
│   ├── question_agent.py
│   ├── answer_model.py
│   └── answer_agent.py
├── assets
│   ├── topics_example.json # example questions w.r.t each topic
│   ├── topics.json # Topics on which we require to generate questions
│   ├── sample_question.json # File specifying expected format of questions generated
│   └── sample_answer.json # Expected format of answers generated
├── utils
│   └── build_prompt.py # prompt-tuning scripts
├── README.ipynb
├── tutorial.ipynb # Synthetic Data Generation and Unsloth Tutorial
├── tutorial_config.yaml # Config file for tutorial
├── qgen.yaml # Generation specific parameters for Q-agent
├── agen.yaml # Generation specific parameters for A-agent
└── default_requirements.txt # Required packages
```
   

## Getting started
<!-- 🎮  -->
Let's get started with running the Q-agent and A-agent framework.

### Environment Setup
<!-- 🚀 -->

In [None]:
# Install the necessary packages
!pip install -r default_requirements.txt

In [None]:
# Import basic packages
import json
from typing import Dict, Any, List

### Q-Agent
<!-- 🤔 -->
You will update the model in `question_model.py`, which will be invoked by `question_agent.py`. In the provided skeleton, we have used the base Qwen3-4B model for Q-Agent but you should experiment with other models and techniques. Check out our [Synthetic Data Generation and Unsloth Tutorial](./tutorial.ipynb) for training tips and tricks.

Generated questions must pertain to the topics mentioned in `topics.json` file. Additional topics will be added for the tournament finals.

__Topics:__
1.  `Puzzles`: Seating Arrangements (Linear, Circular)
2.  `Blood Relations and Family Tree`: Puzzles involving generations and family tree logic

Sample questions and answers are available in the [assets folder](./assets).

In [None]:
# Run the following code to generate questions.
# For demo purpose, we have used the base Qwen3-4B model for Q-Agent. Participants are expected to improve upon this
!python -m agents.question_agent \
    --output_file "outputs/questions.json" \
    --num_questions 20 \
    --verbose

#### Basic format-checks for questions from Q-agent

Generated questions must follow the [format instructions](#format-overview). All questions generated from the Q-agent will be filtered and validated before being sent to the opponent's A-agent. We generate two version of questions, one is the raw, unfiltered one `questions.json` and the other is `filtered_questions.json` after passing through the below example filter. The full filtering and validation process is part of the judging system and is not demonstrated here.


In [None]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B", padding_side='left')

def count_tokens_q(text: str) -> int:
    """Count the number of tokens using Qwen3-4B tokenizer"""
    return len(tokenizer.encode(text, add_special_tokens=False))

def filter_questions(questions: List[str|Dict[str, str|Any]]) -> List[Dict[str, str|Any]]:
    def basic_checks(q2: Dict[str, str])->bool:
        # check required keys
        required_keys = ['topic', 'question', 'choices', 'answer']
        if all((key in q2) for key in required_keys):
            # check choices format
            checks = all(isinstance(choice, str) and len(choice) > 2 and choice[0].upper() in 'ABCD' for choice in q2['choices'])
            if isinstance(q2['choices'], list) and len(q2['choices']) == 4 and checks:
                # check answer format
                # Check token length
                check_len = sum(count_tokens_q(q2[k]) for k in ['question', 'answer'])
                check_len += sum(count_tokens_q(choice) for choice in q2['choices']) - 15
                if check_len < 130:
                    if check_len + count_tokens_q(q2.get('explanation', 'None')) <= 1024:
                        # Extra Checks: (PLUS checks) len(q2['answer']) == 1 and q2['answer'].upper() in 'ABCD':
                        if isinstance(q2['answer'], str):
                            return True
        return False
    correct_format_question = []
    for i, q in enumerate(questions):
        if isinstance(q, dict):
            if basic_checks(q):
                correct_format_question.append(q)
        elif isinstance(q, str):
            try:
                q1 = json.loads(q)
                if basic_checks(q1):
                    correct_format_question.append(q1)
            except json.JSONDecodeError:
                # If JSON decoding fails, skip this answer
                print(f"Skipping invalid JSON at index {i}: {q}")
                continue
        else:
            continue
    if len(correct_format_question) >= 0.5 * len(questions):
        return correct_format_question
    return list()

In [None]:
with open("outputs/questions.json", "r") as f:
    questions = json.load(f)

filtered_questions = filter_questions(questions)

with open("outputs/filtered_questions.json", "w") as f:
    json.dump(filtered_questions, f, indent=4)

print(f"Number of questions: {len(questions)}")
print(f"Number of filtered questions: {len(filtered_questions)}")

### A-agent
<!-- 🤖  -->
You will update the model in `answer_model.py`, which will be invoked by `answer_agent.py`. In the provided skeleton, we have again used the base Qwen3-4B model for A-Agent but you should experiment with other models and techniques. Check out our [Synthetic Data Generation and Unsloth Tutorial](./tutorial.ipynb) for training tips and tricks.

In [None]:
# Same instructions apply for the answer agent.
# For demo purpose, we have used the base Qwen3-4B model for A-agent. Participants are expected to improve upon this.
!python -m agents.answer_agent \
    --input_file "outputs/filtered_questions.json" \
    --output_file "outputs/answers.json" \
    --verbose

#### Basic format-checks for answers from A-agent
Generated answers must follow the [format instructions](#format-overview). The following filter is added into the `answer_agent.py`. Similarly here too, two versions are saved, `answers.json` and `filtered_answers.json`. The latter is used for evaluation.

In [None]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B", padding_side='left')

def count_tokens_a(text: str) -> int:
    """Count the number of tokens in the text using the agent's tokenizer"""
    return len(tokenizer.encode(text, add_special_tokens=False))

def filter_answers(ans: List[str|Dict[str, str]]) -> List[Dict[str, str]]:
    r"""Filter answers to ensure they are in the correct format"""
    def basic_checks(a1: Dict[str, str])->bool:
        # check required keys
        required_keys = ['answer']
        if all((key in a1) and isinstance(a1[key], str) for key in required_keys):
            if len(a1['answer']) == 1 and (a1['answer'] not in 'ABCDabcd'):
                    return False
            check_len = count_tokens_a(a1['answer'])
            if check_len < 50:
                check_len += count_tokens_a(a1.get('reasoning', 'None'))
                if check_len < 512:
                    # check answer format - EXTRA checks
                    # if len(a1['answer']) == 1 and a1['answer'].upper() in 'ABCD':
                    return True
        return False

    filtered_answers = []
    for i, a in enumerate(ans):
        if isinstance(a, dict):
            if basic_checks(a):
                filtered_answers.append(a)
            else:
                filtered_answers.append(None)
        elif isinstance(a, str):
            # Basic checks: at least with correct JSON format
            try:
                a1 = json.loads(a)
                if basic_checks(a1):
                    filtered_answers.append(a1)
                else:
                    filtered_answers.append(None)
            except json.JSONDecodeError:
                # If JSON decoding fails, skip this answer
                print(f"Skipping invalid JSON at index {i}: {a}")
                filtered_answers.append(None)
                continue
        else:
            # If the answer is neither a dict nor a str, skip it
            print(f"Skipping unsupported type at index {i}: {type(a)}")
            filtered_answers.append(None)
    return filtered_answers

In [None]:
with open("outputs/answers.json", "r") as f:
    answers = json.load(f)
filtered_answers = filter_answers(answers)


print(f"Number of answers: {len(answers)}")
print(f"Number of filtered answers: {len(filtered_answers)}")

## Evaluation
<!-- 🏅  -->

### Scoring Criteria

<!-- 📊  -->

Scores are assigned based on: out of $N$ questions from Q-agent, how many an A-agent can answer and vice-versa. *No negative marking for wrong answers.*

$$\text{A-agent Score} = \dfrac{\#\ \text{of questions correctly answered with expected format}}{N}\times 100$$
$$\text{Q-agent Score} = \dfrac{\#\ \text{of questions incorrectly answered by A-agent}}{N}\times 100$$


$N$ denotes the number of filtered / format-correct questions. **Teams whose Q-agent fails to generate at least $50\%$ of `num_questions` (where `num_questions` ranges from $2$ to $1000+$) of the questions correctly (as per [format-checking](#format-overview)) will be automatically disqualified.**<br>

In case of **TIE**, closed benchmark questions will be used to evaluate the answer agents (A-agent) and rank the teams accordingly.


### Scoring Example

In [None]:
# calculate scores...
N = len(filtered_questions)
assert N == len(filtered_answers), "Number of questions and answers must match."
num_correct_answers = len([1 for q,a in zip(filtered_questions, filtered_answers) if a is not None and q['answer'] == a['answer']])

# Here the answer may be correct, but since q['answer'] is not an option letter is not there, we face problems
# Below shown is one way of simple string parsing
num_correct_answers = len([1 for q,a in zip(filtered_questions, filtered_answers) if a is not None and q['answer'][0] == a['answer']])

a_score = num_correct_answers*100/(N+1e-9)
q_score = (N-num_correct_answers)*100/(N+1e-9)
# Announce the scores
print(f"Number of questions: {N}")
print(f"Number of correct answers: {num_correct_answers}")
print("Scores:")
print(f"Team B: A-agent score: {a_score:.2f}")
print(f"Team A: Q-agent score: {q_score:.2f}")
print(f"Innings 1 winner: {'Team A' if q_score > a_score else 'Team B' if q_score < a_score else 'Draw'}")