Skip to content

feat: Add Qwen3-0.6B LLM endpoint for question generation#452

Merged
Aditya062003 merged 5 commits intoAOSSIE-Org:mainfrom
jayydevs:main
Mar 17, 2026
Merged

feat: Add Qwen3-0.6B LLM endpoint for question generation#452
Aditya062003 merged 5 commits intoAOSSIE-Org:mainfrom
jayydevs:main

Conversation

@jayydevs
Copy link
Copy Markdown
Contributor

@jayydevs jayydevs commented Feb 21, 2026

Description

This PR implements LLM-based question generation using Qwen3-0.6B model, providing users with an AI-powered alternative to traditional question generation methods.

Addressed Issues:

Fixes #450

Changes Made

Backend Implementation

  • New Module: backend/Generator/llm_generator.py

    • LLMShortAnswerGenerator class with lazy model loading
    • Robust JSON parsing with fallback line-by-line parsing
    • Automatic context window management (500-word truncation)
    • Comprehensive error handling
  • New Endpoint: POST /get_shortq_llm in server.py

    • Full feature parity with existing short-question endpoints
    • MediaWiki integration support
    • Configurable question count
  • Dependencies: Added llama-cpp-python to requirements.txt

    • Model automatically downloads on first request (~397MB)
    • Uses Q4_K_M quantization for optimal performance

Testing

  • Added comprehensive test: test_get_shortq_llm() in test_server.py
  • Tests verify output structure, question/answer presence, and API response validation
    Logs:
@jayydevs ➜ /workspaces/EduAid (main) $ curl -X POST http://127.0.0.1:5000/get_shortq_llm \
  -H "Content-Type: application/json" \
  -d '{"input_text":"...","max_questions":3,"use_mediawiki":0}'
{"output":[{"answer":"Human activities such as burning fossil fuels, deforestation, and industrial emissions.","context":"","question":"What is the main cause of global warming?"},{"answer":"Carbon dioxide and methane.","context":"","question":"What greenhouse gases do humans emit?"},{"answer":"Glacier melting, sea level rise, heatwaves, and ecosystem disruption.","context":"","question":"What are some effects of global warming?"}]}

@jayydevs ➜ /workspaces/EduAid (main) $ curl -X POST http://127.0.0.1:5000/get_mcq_llm \
  -H "Content-Type: application/json" \
  -d '{"input_text":"....","max_questions":3,"use_mediawiki":0}'
{"output":[{"correct_answer":"A","options":["A) Human activities such as burning fossil fuels, deforestation, and industrial emissions","B) Natural processes like volcanic activity and weather cycles","C) The increase in carbon dioxide levels","D) The decrease in methane levels"],"question":"What is the main factor contributing to the long-term rise in Earth\u2019s average surface temperature?"},{"correct_answer":"B","options":["A) Carbon dioxide","B) Methane","C) Nitrogen dioxide","D) Sulfur dioxide"],"question":"Which of the following is a greenhouse gas that contributes to global warming?"},{"correct_answer":"B","options":["A) Decreased ocean acidity","B) Increased sea levels","C) Enhanced biodiversity","D) Reduced carbon sinks"],"question":"What effect does the melting of glaciers have on the environment?"}]}

@jayydevs ➜ /workspaces/EduAid (main) $ curl -X POST http://127.0.0.1:5000/get_problems_llm \
  -H "Content-Type: application/json" \
  -d '{"input_text":"....","max_questions_mcq":2,"max_questions_boolq":2,"max_questions_shortq":2,"use_mediawiki":0}'
{"output":[{"correct_answer":"A","options":["A) Fossil fuels","B) Deforestation","C) Industrial emissions","D) Glaciers"],"question":"What is the main cause of global warming?","type":"mcq"},{"correct_answer":"C","options":["A) Glaciers melting","B) Rising sea levels","C) Heatwaves becoming more frequent","D) Ecosystems becoming disrupted"],"question":"Which of the following changes are caused by climate change?","type":"mcq"},{"answer":"carbon dioxide","question":"What is the main greenhouse gas responsible for global warming?","type":"short_answer"},{"answer":"burning fossil fuels, deforestation, and industrial emissions","question":"How does human activity contribute to climate change?","type":"short_answer"}]}

curl -X POST http://127.0.0.1:5000/get_boolq_llm   -H "Content-Type: application/json"   -d '{"input_text":".....","max_questions":10,"use_mediawiki":0}'
{"output":[{"answer":true,"question":"Anime is a distinctive style of animation that originated in Japan and has grown into a global cultural phenomenon."},{"answer":false,"question":"Unlike many Western cartoons, anime covers a wide range of genres and themes intended for audiences of all ages."},{"answer":true,"question":"Anime stories can explore action, romance, science fiction, fantasy, horror, slice-of-life experiences, and complex philosophical ideas."},{"answer":false,"question":"Because of this diversity, anime appeals to both younger viewers and adults who are interested in deeper storytelling and character development."},{"answer":true,"question":"The history of anime dates back to the early 20th century when Japanese filmmakers began experimenting with animated techniques inspired by Western animation."},{"answer":true,"question":"Over time, the industry developed its own visual language characterized by expressive characters, detailed backgrounds, dramatic camera angles, and emotionally driven narratives."},{"answer":false,"question":"During the late 20th century, anime studios began producing television series and films that gained significant popularity both within Japan and internationally."},{"answer":true,"question":"Anime is often adapted from manga, which are Japanese comic books or graphic novels."},{"answer":false,"question":"Successful manga series frequently receive anime adaptations because they already have established audiences and compelling narratives."},{"answer":true,"question":"The collaboration between manga artists, animation studios, voice actors, and music composers creates a unique multimedia experience that blends visual art, storytelling, and sound design."}]}

Documentation

  • Updated main README.md with LLM setup instructions
  • Updated desktop app README.md with LLM feature mentions
  • Added new endpoint to Features list
  • Documented parameters and capabilities

How to Test

  1. Install Dependencies:

    cd backend
    pip install -r requirements.txt
  2. Start Backend Server:

    python server.py
  3. Test Endpoint with cURL:

    curl -X POST http://localhost:5000/get_shortq_llm \
      -H "Content-Type: application/json" \
      -d '{
        "input_text": "Artificial intelligence is the simulation of human intelligence processes by machines. It includes learning, reasoning, and self-correction.",
        "max_questions": 3,
        "use_mediawiki": 0
      }'
  4. Run Tests:

    python test_server.py

Expected Response

{
  "output": [
    {
      "question": "What is the primary goal of artificial intelligence?",
      "answer": "to simulate human intelligence processes by machines.",
      "context": ""
    },
    {
      "question": "What includes learning, reasoning, and self-correction?",
      "answer": "artificial intelligence.",
      "context": ""
    }
  ]
}

Performance Metrics

Metric Value
Model Size ~397MB
First Request ~ 5-10 seconds (includes download & load)
Subsequent Requests ~2-3 seconds
Memory Usage ~600-800MB during inference
CPU Usage Moderate (4 threads, no GPU required)

API Reference

Endpoint: POST /get_shortq_llm

Request Parameters:

  • input_text (string, required): Text passage to generate questions from
  • max_questions (integer, optional, default: 4): Number of questions to generate
  • use_mediawiki (integer, optional, default: 0): Set to 1 to fetch content from MediaWiki

Response:

  • output (array): Array of {question, answer, context} objects

Screenshots/Recordings:

N/A - Backend feature, no UI changes

Additional Notes:

  1. First Request Warning: The first request will be slower (~ 5-10 seconds) as the model downloads from Hugging Face (~397MB). Subsequent requests will be fast (~2-3 seconds).

  2. Lazy Loading: The model loads only when the endpoint is first called, not at server startup. This keeps initial server startup time fast.

  3. Local Processing: All question generation happens locally on the user's machine. No external API calls are made.

  4. Alternative Method: This endpoint provides an alternative to /get_shortq. Users can choose based on their preferences.

  5. Future Extensibility: This implementation is designed to be extended to other question types in future PRs.

Checklist

  • My PR addresses a single issue, fixes a single bug or makes a single improvement.
  • My code follows the project's code style and conventions
  • If applicable, I have made corresponding changes or additions to the documentation
  • If applicable, I have made corresponding changes or additions to tests
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contribution Guidelines
  • Once I submit my PR, CodeRabbit AI will automatically review it and I will address CodeRabbit's comments.

AI Usage Disclosure

Check one of the checkboxes below:

  • This PR does not contain AI-generated code at all.
  • This PR contains AI-generated code. I have tested the code locally and I am responsible for it.
  • For expanding this to other question types beyond short answers (as per Aditya's suggestion), I've used AI since this was something which could be automated as boiler plate was already there which I wrote.

I have used the following AI models and tools: Gemini and ChatGPT to research on different methods to improve efficiency.

Summary by CodeRabbit

  • New Features

    • LLM-powered generation for short-answer, MCQ, and true/false questions via new POST APIs; combined endpoint for mixed problem sets; configurable counts and fast CPU inference.
  • Documentation

    • README updated with usage, example requests, context/length management, and question-count options.
  • Bug Fixes

    • Improved error handling and standardized error responses for content and LLM endpoints.
  • Tests

    • Added tests covering short-answer, MCQ, boolean, and combined endpoints.
  • Chores

    • Added LLM runtime dependency.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a Qwen3-0.6B LLM integration via llama.cpp: new backend LLMQuestionGenerator with lazy, thread-safe loading; four Flask endpoints for short/MCQ/boolean/all-question generation; tests for the endpoints; README docs; and dependency addition (llama-cpp-python).

Changes

Cohort / File(s) Summary
Documentation
README.md
Added docs for LLM-based short-answer question generation (Qwen3-0.6B), endpoint POST /get_shortq_llm, examples, behavior (lazy load, CPU inference ~2–3s), configurable counts, and context handling.
LLM Generator Implementation
backend/Generator/llm_generator.py
Added LLMQuestionGenerator implementing lazy thread-safe Llama.from_pretrained loading, input preparation, prompt construction, generation methods for short/MCQ/boolean/all questions, JSON parsing with robust fallbacks, boolean normalization, and safe error handling.
API Integration
backend/server.py
Added global llm_generator and routes: /get_shortq_llm, /get_mcq_llm, /get_boolq_llm, /get_problems_llm; routes process input, call generator methods, and return JSON with try/except logging; improved error logging in /get_content.
Tests
backend/test_server.py
Added tests for the four LLM endpoints (test_get_shortq_llm, test_get_mcq_llm, test_get_boolq_llm, test_get_problems_llm), assertions on structure and non-empty results, and invoked them in the __main__ test runner with print output.
Dependencies
requirements.txt
Added llama-cpp-python dependency to support running Qwen3-0.6B via llama.cpp.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Server
    participant Generator as LLMQuestionGenerator
    participant Model as Qwen3_Model

    Client->>Server: POST /get_shortq_llm (input_text, max_questions)
    Server->>Generator: generate_short_questions(text, max_questions)
    alt Model not loaded
        Generator->>Model: _load_model() (Llama.from_pretrained)
        Model-->>Generator: model instance
    end
    Generator->>Generator: _prepare_text() (truncate / sanitize)
    Generator->>Generator: build prompt (system/user)
    Generator->>Model: invoke model via llama-cpp-python
    Model-->>Generator: raw response text
    Generator->>Generator: _parse_response() JSON parse or fallback parsing
    alt Parsed successfully
        Generator-->>Server: formatted questions JSON
    else Fallback parsed
        Generator-->>Server: best-effort questions JSON
    end
    Server-->>Client: HTTP 200 JSON (or 500 on error)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I nibble on prompts beneath moon's glow,
Qwen3 whispers answers fast and low,
Questions hop out, tidy in line,
Options and truths in neat design,
CPU-warmed and ready — off they go.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive The PR includes changes beyond phase 1 scope by implementing MCQ and boolean question generation endpoints, which are listed as phase 2 objectives in the linked issue. While these represent forward progress, they were not part of the phase 1 requirements. Clarify whether expanding to MCQ and boolean generation in this PR aligns with the intended scope, or if these should be deferred to a separate phase 2 PR as originally planned in the issue.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main feature: adding Qwen3-0.6B LLM endpoints for question generation. It is concise, specific, and directly reflects the primary changes across all modified files.
Linked Issues check ✅ Passed The PR implements all core coding objectives from #450: integrates Qwen3-0.6B with lazy loading, ~397MB Q4_K_M quantization, CPU inference (~2-3s), configurable question counts, robust JSON parsing, local processing, and extends beyond phase 1 by adding MCQ and boolean question generation alongside short-answer questions.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can suggest fixes for GitHub Check annotations.

Configure the reviews.tools.github-checks setting to adjust the time to wait for GitHub Checks to complete.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
README.md (1)

54-73: "LLM-Based Question Generation" section breaks the numbered setup flow.

The section is inserted between ### Troubleshooting and ### 3. Configure Google APIs, interrupting the sequential numbered steps (1 → 2 → [unnumbered LLM section] → 3). Consider nesting it as a subsection of ## 2. Backend Setup (e.g., #### LLM Model Setup) so the numbered flow remains intact for readers following the setup guide.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 54 - 73, The "LLM-Based Question Generation" section
breaks the numbered setup flow by sitting between "### Troubleshooting" and "###
3. Configure Google APIs"; move that block under the backend setup so numbering
remains sequential—specifically cut the "LLM-Based Question Generation" heading
and its bullet list and paste it as a nested subsection (e.g., "#### LLM Model
Setup" or "#### LLM-Based Question Generation") inside the "## 2. Backend Setup"
section, ensuring its content stays unchanged and that "### Troubleshooting" and
"### 3. Configure Google APIs" remain adjacent in the main flow.
backend/Generator/llm_generator.py (1)

63-64: Greedy \[.*\] regex will over-capture when prose contains stray brackets.

re.search(r"\[.*\]", cleaned, re.DOTALL) matches from the first [ to the last ] in the string. When the model's preamble includes brackets — e.g., "Here are [4] questions: [{...}]" — the match spans the entire substring including the preamble, causing json.loads to fail and unnecessarily falling through to the fragile line-based fallback. Anchoring the pattern on [{}] avoids this:

♻️ Proposed fix
-        match = re.search(r"\[.*\]", cleaned, re.DOTALL)
+        match = re.search(r"\[\s*\{.*\}\s*\]", cleaned, re.DOTALL)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/llm_generator.py` around lines 63 - 64, The regex used to
extract JSON arrays is greedy and can over-capture (match = re.search(r"\[.*\]",
cleaned, re.DOTALL)); replace it with a pattern that looks specifically for an
array of objects and uses a non-greedy quantifier and DOTALL, e.g. search for
something like r"\[\s*\{.*?\}\s*\]" with re.DOTALL, so update the match
assignment in llm_generator.py (the line that defines match from cleaned) to use
the anchored/non-greedy pattern to avoid spanning from the first '[' to the last
']' in the input.
backend/test_server.py (2)

57-70: Test requires a live ~397 MB model download — no mocking for CI.

test_get_shortq_llm calls the real /get_shortq_llm endpoint, which downloads Qwen3-0.6B on first run. In CI environments without a pre-cached model, this adds minutes of download time and requires internet access. Consider either:

  1. Mocking LLMShortAnswerGenerator.generate_short_questions at the unit-test level (preferred for CI), or
  2. Tagging this test as an integration/slow test so it can be skipped in CI with pytest -m "not llm".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/test_server.py` around lines 57 - 70, Test currently hits the real
/get_shortq_llm endpoint and triggers a large model download; update the test to
avoid network/model downloads by either (preferred) mocking
LLMShortAnswerGenerator.generate_short_questions to return a small deterministic
list and calling test_get_shortq_llm against that mock, or mark the test with a
pytest marker (e.g., `@pytest.mark.llm` or `@pytest.mark.integration`) so CI can
skip it with -m "not llm"; target the test function test_get_shortq_llm and the
class/method LLMShortAnswerGenerator.generate_short_questions (or the HTTP
client that posts to '/get_shortq_llm') when implementing the mock or marker.

128-128: Move test_get_shortq_llm() to the end of the test runner.

Running the LLM test first forces the ~397 MB model download before the other fast tests (MCQ, BoolQ, ShortQ). Placing it last keeps the fast feedback loop for the majority of tests.

♻️ Proposed fix
 if __name__ == '__main__':
-    test_get_shortq_llm()
     test_get_mcq()
     test_get_boolq()
     test_get_shortq()
     test_get_problems()
     test_root()
     test_get_answer()
     test_get_boolean_answer()
+    test_get_shortq_llm()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/test_server.py` at line 128, The test runner currently calls
test_get_shortq_llm() early which triggers a large model download; move the call
to test_get_shortq_llm() to the end of the test sequence so it runs after the
fast tests (e.g., test_get_mcq(), test_get_boolq(), test_get_shortq()). Locate
the test invocation for test_get_shortq_llm() in backend/test_server.py and
reorder the calls so that test_get_shortq_llm() is the last test executed by the
runner.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/Generator/llm_generator.py`:
- Around line 43-56: The call to self.llm.create_chat_completion and the
subsequent access response["choices"][0]["message"]["content"] can raise
exceptions or IndexError when the LLM returns an empty choices list or missing
keys; wrap the create_chat_completion call in a try/except to catch
inference-level errors and log/raise a controlled exception, then validate the
response structure before indexing (check "choices" is present, is a non-empty
list, and that ["message"]["content"] exists) and handle the empty/malformed
case by logging and returning an empty result or raising a clear ValueError;
modify the method containing the create_chat_completion call and the use of
_parse_response to perform these guards and error paths.
- Around line 100-102: The current a_match regex in llm_generator.py (where
a_match is assigned with re.match) incorrectly treats "A." list items as
answers; update the pattern so the dot variant requires the full word "Answer"
while allowing the single-letter "A" only when followed by a colon—i.e., change
the re.match call that assigns a_match to two alternatives: one that matches
"Answer" followed by either ':' or '.' and one that matches the single letter
"A" only when followed by ':'; keep re.IGNORECASE and the group that captures
the answer text intact.
- Around line 14-24: The _load_model method has a TOCTOU race when multiple
threads see self.llm is None and concurrently create Llama instances; fix by
implementing double-checked locking: add a threading.Lock (e.g., self._llm_lock)
on the Generator object (initialized in __init__), then in _load_model re-check
self.llm, and if still None acquire self._llm_lock, re-check self.llm again, and
only then call Llama.from_pretrained to assign self.llm; release the lock
afterward so only one thread downloads/initializes the model.

In `@backend/server.py`:
- Around line 97-105: Wrap the body of get_shortq_llm in a try/except similar to
the /get_content handler: call process_input_text and
llm_shortq.generate_short_questions inside a try block, catch Exception as err,
log the error (using the same logger used elsewhere) and return a JSON error
response with an appropriate HTTP status (e.g., jsonify({"error": "failed to
generate questions", "details": str(err)}), 500). Ensure you still return
jsonify({"output": questions}) on success and reference the functions
get_shortq_llm, process_input_text, and llm_shortq.generate_short_questions so
the change is applied to the correct code.

---

Nitpick comments:
In `@backend/Generator/llm_generator.py`:
- Around line 63-64: The regex used to extract JSON arrays is greedy and can
over-capture (match = re.search(r"\[.*\]", cleaned, re.DOTALL)); replace it with
a pattern that looks specifically for an array of objects and uses a non-greedy
quantifier and DOTALL, e.g. search for something like r"\[\s*\{.*?\}\s*\]" with
re.DOTALL, so update the match assignment in llm_generator.py (the line that
defines match from cleaned) to use the anchored/non-greedy pattern to avoid
spanning from the first '[' to the last ']' in the input.

In `@backend/test_server.py`:
- Around line 57-70: Test currently hits the real /get_shortq_llm endpoint and
triggers a large model download; update the test to avoid network/model
downloads by either (preferred) mocking
LLMShortAnswerGenerator.generate_short_questions to return a small deterministic
list and calling test_get_shortq_llm against that mock, or mark the test with a
pytest marker (e.g., `@pytest.mark.llm` or `@pytest.mark.integration`) so CI can
skip it with -m "not llm"; target the test function test_get_shortq_llm and the
class/method LLMShortAnswerGenerator.generate_short_questions (or the HTTP
client that posts to '/get_shortq_llm') when implementing the mock or marker.
- Line 128: The test runner currently calls test_get_shortq_llm() early which
triggers a large model download; move the call to test_get_shortq_llm() to the
end of the test sequence so it runs after the fast tests (e.g., test_get_mcq(),
test_get_boolq(), test_get_shortq()). Locate the test invocation for
test_get_shortq_llm() in backend/test_server.py and reorder the calls so that
test_get_shortq_llm() is the last test executed by the runner.

In `@README.md`:
- Around line 54-73: The "LLM-Based Question Generation" section breaks the
numbered setup flow by sitting between "### Troubleshooting" and "### 3.
Configure Google APIs"; move that block under the backend setup so numbering
remains sequential—specifically cut the "LLM-Based Question Generation" heading
and its bullet list and paste it as a nested subsection (e.g., "#### LLM Model
Setup" or "#### LLM-Based Question Generation") inside the "## 2. Backend Setup"
section, ensuring its content stays unchanged and that "### Troubleshooting" and
"### 3. Configure Google APIs" remain adjacent in the main flow.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
backend/Generator/llm_generator.py (2)

49-73: create_chat_completion is outside the try/except — method's error contract is inconsistent.

The try/except at line 64 only guards response parsing. If create_chat_completion raises (e.g., jinja chat-format error, OOM, context-length overflow), the exception escapes generate_short_questions entirely, even though the method implicitly promises to return a list. The server route catches it, but the method itself is unpredictably inconsistent — some failures return [], others raise.

Additionally, the bare except Exception at line 72 silently discards failures with no logging (Ruff BLE001), making inference errors invisible in production.

Consolidate the inference call and the parsing into a single guarded block:

♻️ Proposed refactor
-        response = self.llm.create_chat_completion(
-            messages=[
-                {
-                    "role": "system",
-                    "content": "You generate short-answer quiz questions as JSON arrays. Output ONLY valid JSON.",
-                },
-                {
-                    "role": "user",
-                    "content": prompt,
-                },
-            ],
-            max_tokens=512,
-            temperature=0.7,
-        )
-
-        try:
-            choices = response.get("choices", [])
-            if not choices:
-                return []
-
-            raw = choices[0].get("message", {}).get("content", "")
-            return self._parse_response(raw, max_questions)
-
-        except Exception:
-            return []
+        try:
+            response = self.llm.create_chat_completion(
+                messages=[
+                    {
+                        "role": "system",
+                        "content": "You generate short-answer quiz questions as JSON arrays. Output ONLY valid JSON.",
+                    },
+                    {
+                        "role": "user",
+                        "content": prompt,
+                    },
+                ],
+                max_tokens=512,
+                temperature=0.7,
+            )
+            choices = response.get("choices", [])
+            if not choices:
+                return []
+            raw = choices[0].get("message", {}).get("content", "")
+            return self._parse_response(raw, max_questions)
+        except Exception as e:
+            print(f"LLM inference failed: {e}")
+            return []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/llm_generator.py` around lines 49 - 73, The call to
create_chat_completion should be moved inside the same guarded block as the
parsing so generate_short_questions consistently returns [] on any
inference/parsing failure; wrap both the create_chat_completion call and the
subsequent parsing/choices logic (the response variable, choices handling, and
_parse_response invocation) in a try/except that catches Exception as e (not a
bare except), log the exception details (use an existing logger like self.logger
or a module logger) including the exception/traceback, and return [] on error to
preserve the method's contract and surface failures in logs.

78-98: Greedy \[.*\] regex will miss valid JSON when LLM output has brackets in surrounding text.

With re.DOTALL, the greedy .* matches from the first [ to the last ] in cleaned. If the model emits any bracketed text before the array (e.g., a preamble like "Note [1]: here is the result: [...]"), the match spans both bracket pairs, producing a string that is not valid JSON, forcing an unnecessary fall-through to _fallback_parse.

The most robust fix is to first attempt json.loads(cleaned) directly (covers the dominant case where the whole response is clean JSON), then fall back to the regex extraction:

♻️ Proposed refactor
-        # Try to extract a JSON array from the text
-        match = re.search(r"\[.*\]", cleaned, re.DOTALL)
-        if match:
-            try:
-                qa_list = json.loads(match.group())
-                result = []
-                for item in qa_list[:max_questions]:
-                    if isinstance(item, dict) and "question" in item and "answer" in item:
-                        result.append(
-                            {
-                                "question": item["question"].strip(),
-                                "answer": item["answer"].strip(),
-                                "context": "",
-                            }
-                        )
-                if result:
-                    return result
-            except json.JSONDecodeError:
-                pass
+        def _extract_qa_list(text):
+            qa_list = json.loads(text)
+            result = []
+            for item in (qa_list if isinstance(qa_list, list) else []):
+                if isinstance(item, dict) and "question" in item and "answer" in item:
+                    result.append(
+                        {
+                            "question": item["question"].strip(),
+                            "answer": item["answer"].strip(),
+                            "context": "",
+                        }
+                    )
+            return result
+
+        # Try the whole cleaned string first (most common success path)
+        try:
+            result = _extract_qa_list(cleaned)
+            if result:
+                return result[:max_questions]
+        except (json.JSONDecodeError, TypeError):
+            pass
+
+        # Fall back to regex extraction for partial/wrapped JSON
+        match = re.search(r"\[.*\]", cleaned, re.DOTALL)
+        if match:
+            try:
+                result = _extract_qa_list(match.group())
+                if result:
+                    return result[:max_questions]
+            except (json.JSONDecodeError, TypeError):
+                pass

As an alternative, create_chat_completion supports a response_format argument to constrain responses to valid JSON objects, which would eliminate most parsing failures at the source.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/Generator/llm_generator.py` around lines 78 - 98, The current
extraction uses a greedy regex re.search(r"\[.*\]", cleaned, flags=re.DOTALL)
which can capture too much; first try parsing the entire cleaned string with
json.loads(cleaned) and return if it yields a list of dicts, and only if that
fails fall back to extracting arrays using a non-greedy regex r"\[.*?\]"
(flags=re.DOTALL) and attempt json.loads on each match until one parses;
preserve the existing filtering logic that builds result from
qa_list[:max_questions] (checking isinstance(item, dict) and presence of
"question"/"answer") and keep falling through to the existing _fallback_parse if
no valid JSON is found.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Around line 107-108: The except block in backend.server.py currently returns
jsonify({"error": str(e)}), exposing internals; instead, change the handler to
log the full exception on the server (use logger.exception or
app.logger.exception with the caught exception in the except block) and return a
sanitized, non-sensitive message to the client such as jsonify({"error":
"Internal server error"}) with a 500 status; apply the same change to the
/get_content route handler (same pattern) so both Model loading/inference error
paths use server-side logging and safe external messages.

---

Nitpick comments:
In `@backend/Generator/llm_generator.py`:
- Around line 49-73: The call to create_chat_completion should be moved inside
the same guarded block as the parsing so generate_short_questions consistently
returns [] on any inference/parsing failure; wrap both the
create_chat_completion call and the subsequent parsing/choices logic (the
response variable, choices handling, and _parse_response invocation) in a
try/except that catches Exception as e (not a bare except), log the exception
details (use an existing logger like self.logger or a module logger) including
the exception/traceback, and return [] on error to preserve the method's
contract and surface failures in logs.
- Around line 78-98: The current extraction uses a greedy regex
re.search(r"\[.*\]", cleaned, flags=re.DOTALL) which can capture too much; first
try parsing the entire cleaned string with json.loads(cleaned) and return if it
yields a list of dicts, and only if that fails fall back to extracting arrays
using a non-greedy regex r"\[.*?\]" (flags=re.DOTALL) and attempt json.loads on
each match until one parses; preserve the existing filtering logic that builds
result from qa_list[:max_questions] (checking isinstance(item, dict) and
presence of "question"/"answer") and keep falling through to the existing
_fallback_parse if no valid JSON is found.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/server.py`:
- Line 108: Remove the redundant exception argument passed to
app.logger.exception in the /get_shortq_llm handler (and the two other
logger.exception calls flagged) — logging.exception already captures the
traceback, so change calls like app.logger.exception("Error in /get_shortq_llm:
%s", e) to app.logger.exception("Error in /get_shortq_llm") (or include only
contextual text), leaving exc_info implicit; update the three occurrences (the
call in the /get_shortq_llm handler and the two other app.logger.exception calls
referenced) accordingly.
- Around line 99-103: The handler currently assumes request.get_json() returns a
dict and then calls data.get(...), which raises AttributeError when get_json()
returns None; update the route to validate the JSON body (e.g., check
request.is_json or if data is None after request.get_json()), and return a 400
Bad Request with an explanatory message when no valid JSON is provided instead
of letting the code proceed; ensure the variables referenced (input_text,
use_mediawiki, max_questions) are only accessed after this check. Also apply the
same guard to the other routes that parse JSON (e.g., the /get_mcq and
/get_shortq handlers) so they validate request.get_json() before using data.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b699864 and 929ed63.

📒 Files selected for processing (1)
  • backend/server.py

Comment on lines +99 to +103
try:
data = request.get_json()
input_text = data.get("input_text", "")
use_mediawiki = data.get("use_mediawiki", 0)
max_questions = data.get("max_questions", 4)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

request.get_json() can return None, causing a misleading 500.

When the client sends a request without Content-Type: application/json, or with a malformed body, request.get_json() returns None. The subsequent data.get("input_text", "") call then raises AttributeError, which the except Exception block catches and returns as a 500 Internal Server Error instead of a proper 400 Bad Request. The same unguarded pattern exists in other routes (/get_mcq, /get_shortq, etc.), but those lack try/except entirely, so at least here the failure is contained.

🛡️ Proposed fix
     try:
         data = request.get_json()
+        if data is None:
+            return jsonify({"error": "Invalid or missing JSON body"}), 400
         input_text = data.get("input_text", "")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
data = request.get_json()
input_text = data.get("input_text", "")
use_mediawiki = data.get("use_mediawiki", 0)
max_questions = data.get("max_questions", 4)
try:
data = request.get_json()
if data is None:
return jsonify({"error": "Invalid or missing JSON body"}), 400
input_text = data.get("input_text", "")
use_mediawiki = data.get("use_mediawiki", 0)
max_questions = data.get("max_questions", 4)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 99 - 103, The handler currently assumes
request.get_json() returns a dict and then calls data.get(...), which raises
AttributeError when get_json() returns None; update the route to validate the
JSON body (e.g., check request.is_json or if data is None after
request.get_json()), and return a 400 Bad Request with an explanatory message
when no valid JSON is provided instead of letting the code proceed; ensure the
variables referenced (input_text, use_mediawiki, max_questions) are only
accessed after this check. Also apply the same guard to the other routes that
parse JSON (e.g., the /get_mcq and /get_shortq handlers) so they validate
request.get_json() before using data.

questions = llm_shortq.generate_short_questions(input_text, max_questions)
return jsonify({"output": questions})
except Exception as e:
app.logger.exception("Error in /get_shortq_llm: %s", e)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Redundant exception argument in all three logger.exception calls.

logging.exception() implicitly captures exc_info=True and appends the full traceback. Passing e as a format-string argument additionally embeds the exception's str() into the log message, causing the exception message to appear twice in every log entry. Ruff (TRY401) flags all three occurrences.

♻️ Proposed fix
-        app.logger.exception("Error in /get_shortq_llm: %s", e)
+        app.logger.exception("Error in /get_shortq_llm")
-        app.logger.exception("ValueError in /get_content: %s", e)
+        app.logger.exception("ValueError in /get_content")
-        app.logger.exception("Unhandled exception in /get_content: %s", e)
+        app.logger.exception("Unhandled exception in /get_content")

Also applies to: 210-210, 213-213

🧰 Tools
🪛 Ruff (0.15.1)

[warning] 108-108: Redundant exception object included in logging.exception call

(TRY401)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` at line 108, Remove the redundant exception argument
passed to app.logger.exception in the /get_shortq_llm handler (and the two other
logger.exception calls flagged) — logging.exception already captures the
traceback, so change calls like app.logger.exception("Error in /get_shortq_llm:
%s", e) to app.logger.exception("Error in /get_shortq_llm") (or include only
contextual text), leaving exc_info implicit; update the three occurrences (the
call in the /get_shortq_llm handler and the two other app.logger.exception calls
referenced) accordingly.

@Aditya062003
Copy link
Copy Markdown
Contributor

Nice work! This looks like a useful addition. Could you expand support to other question types, and also investigate if we can raise the context limit?

@jayydevs
Copy link
Copy Markdown
Contributor Author

Thank you for the valuable feedback. I appreciate the suggestion.

I plan to extend LLM support to other question types as well. At the moment, my primary focus is on stabilizing and completing the backend implementation for short-answer generation while ensuring full backward compatibility with the existing endpoints.

In parallel, I will also explore the feasibility of increasing the context limit and evaluate potential optimizations to support larger inputs efficiently.

Thanks again for the thoughtful input, I’ll keep you updated on the progress.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
backend/server.py (2)

108-108: ⚠️ Potential issue | 🟡 Minor

Remove redundant exception argument in app.logger.exception(...).

logger.exception already captures traceback/exception context; passing e duplicates exception text in logs.

🔧 Proposed cleanup
-        app.logger.exception("Error in /get_shortq_llm: %s", e)
+        app.logger.exception("Error in /get_shortq_llm")
-        app.logger.exception("Error in /get_mcq_llm: %s", e)
+        app.logger.exception("Error in /get_mcq_llm")
-        app.logger.exception("Error in /get_boolq_llm: %s", e)
+        app.logger.exception("Error in /get_boolq_llm")
-        app.logger.exception("Error in /get_problems_llm: %s", e)
+        app.logger.exception("Error in /get_problems_llm")
-        app.logger.exception("ValueError in /get_content: %s", e)
+        app.logger.exception("ValueError in /get_content")
-        app.logger.exception("Unhandled exception in /get_content: %s", e)
+        app.logger.exception("Unhandled exception in /get_content")

Also applies to: 123-123, 138-138, 155-155, 257-257, 260-260

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` at line 108, Remove the redundant exception argument
passed to logger.exception calls — logger.exception already logs the exception
and traceback, so update each call (e.g., in server.py inside the
/get_shortq_llm handler and the other similar handlers referenced around the
occurrences at lines 123, 138, 155, 257, 260) to pass only a descriptive message
string (e.g., app.logger.exception("Error in /get_shortq_llm")) and remove the
trailing ", e" from those invocations.

100-103: ⚠️ Potential issue | 🟡 Minor

Validate request JSON before data.get(...) in all new LLM routes.

If request.get_json() returns None, these handlers fall into AttributeError and return a 500 instead of a client 400.

🔧 Proposed fix pattern (apply to all four routes)
-        data = request.get_json()
+        data = request.get_json(silent=True)
+        if not isinstance(data, dict):
+            return jsonify({"error": "Invalid or missing JSON body"}), 400
         input_text = data.get("input_text", "")

Also applies to: 115-118, 130-133, 145-150

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/server.py` around lines 100 - 103, The handlers call data =
request.get_json() and then data.get("input_text", ...) which will raise
AttributeError if get_json() returns None; update each new LLM route to validate
the request JSON immediately after calling request.get_json() (e.g., check if
data is None or if not request.is_json) and return a 400 response with a clear
error message when missing/invalid JSON instead of continuing; apply the same
pattern to the blocks that reference data and variables input_text,
use_mediawiki, max_questions (and the analogous groups in the other routes at
the noted locations) so all four LLM endpoints perform this validation before
accessing data.get(...).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/Generator/llm_generator.py`:
- Around line 167-172: generate_all_questions currently appends MCQ dicts with a
"correct_answer" key but downstream code expects an "answer" key
(qapair["answer"]); update the MCQ payload in the questions.append call inside
generate_all_questions so that each MCQ includes an "answer" field (set to the
same value as "correct_answer") — i.e., emit both "correct_answer" and "answer"
(or replace "correct_answer" with "answer" if only one should exist) to restore
compatibility with the downstream form generation that reads qapair["answer"].
- Around line 343-349: The code in llm_generator.py is fabricating MCQ keys by
defaulting correct_answer to "A" when the model omitted an answer; update the
block handling q_match/current_q/options to not invent an answer—either omit the
"correct_answer" key or set it to None (e.g., "correct_answer": None) when no
validated answer exists so downstream consumers can detect missing keys and
handle them appropriately; keep the existing questions.append structure and
options truncation (options[:4]) but remove the hardcoded "A" fallback.
- Around line 418-427: The fallback bool parser currently guesses labels using a
negation heuristic (q_match block) and writes inferred True/False into the
questions list; change this to stop inventing ground-truth: when q_match finds a
question but no explicit answer, do not set answer to a guessed boolean—instead
set answer to None (or omit the "answer" key) or add a flag like "parsed": False
so downstream code can handle unlabeled questions; update the code around
q_match and the questions.append call (the q_match variable and the questions
list entry) to preserve the question text but not produce a guessed label.
- Around line 70-79: Replace the overly broad "except Exception" handlers in
Generator.llm_generator (the blocks that call response.get(...) and
self._parse_response(...)) with a narrowed catch for the specific
parsing-related exceptions: except (AttributeError, TypeError, ValueError):
return [] so that parser/type/value issues are handled while other unexpected
exceptions propagate; make this change at all three locations that currently use
"except Exception" (the blocks around self._parse_response) to avoid silently
swallowing bugs.

---

Duplicate comments:
In `@backend/server.py`:
- Line 108: Remove the redundant exception argument passed to logger.exception
calls — logger.exception already logs the exception and traceback, so update
each call (e.g., in server.py inside the /get_shortq_llm handler and the other
similar handlers referenced around the occurrences at lines 123, 138, 155, 257,
260) to pass only a descriptive message string (e.g.,
app.logger.exception("Error in /get_shortq_llm")) and remove the trailing ", e"
from those invocations.
- Around line 100-103: The handlers call data = request.get_json() and then
data.get("input_text", ...) which will raise AttributeError if get_json()
returns None; update each new LLM route to validate the request JSON immediately
after calling request.get_json() (e.g., check if data is None or if not
request.is_json) and return a 400 response with a clear error message when
missing/invalid JSON instead of continuing; apply the same pattern to the blocks
that reference data and variables input_text, use_mediawiki, max_questions (and
the analogous groups in the other routes at the noted locations) so all four LLM
endpoints perform this validation before accessing data.get(...).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 31d384e6-7b72-4060-87d3-66da50725dd7

📥 Commits

Reviewing files that changed from the base of the PR and between 929ed63 and 970a5cf.

📒 Files selected for processing (3)
  • backend/Generator/llm_generator.py
  • backend/server.py
  • backend/test_server.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/test_server.py

@jayydevs
Copy link
Copy Markdown
Contributor Author

@Aditya062003 Raised the new revision and addressed all the coderabbit comments as well. Please review at your earliest convenience. Added, the logs also for all in description.

@Aditya062003
Copy link
Copy Markdown
Contributor

LGTM!

@Aditya062003 Aditya062003 merged commit 72dd3a5 into AOSSIE-Org:main Mar 17, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Add Qwen3-0.6B LLM for Question Generation

2 participants