feat: Add Qwen3-0.6B LLM endpoint for question generation#452
feat: Add Qwen3-0.6B LLM endpoint for question generation#452Aditya062003 merged 5 commits intoAOSSIE-Org:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Qwen3-0.6B LLM integration via llama.cpp: new backend LLMQuestionGenerator with lazy, thread-safe loading; four Flask endpoints for short/MCQ/boolean/all-question generation; tests for the endpoints; README docs; and dependency addition ( Changes
Sequence DiagramsequenceDiagram
participant Client
participant Server
participant Generator as LLMQuestionGenerator
participant Model as Qwen3_Model
Client->>Server: POST /get_shortq_llm (input_text, max_questions)
Server->>Generator: generate_short_questions(text, max_questions)
alt Model not loaded
Generator->>Model: _load_model() (Llama.from_pretrained)
Model-->>Generator: model instance
end
Generator->>Generator: _prepare_text() (truncate / sanitize)
Generator->>Generator: build prompt (system/user)
Generator->>Model: invoke model via llama-cpp-python
Model-->>Generator: raw response text
Generator->>Generator: _parse_response() JSON parse or fallback parsing
alt Parsed successfully
Generator-->>Server: formatted questions JSON
else Fallback parsed
Generator-->>Server: best-effort questions JSON
end
Server-->>Client: HTTP 200 JSON (or 500 on error)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can suggest fixes for GitHub Check annotations.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (4)
README.md (1)
54-73: "LLM-Based Question Generation" section breaks the numbered setup flow.The section is inserted between
### Troubleshootingand### 3. Configure Google APIs, interrupting the sequential numbered steps (1 → 2 → [unnumbered LLM section] → 3). Consider nesting it as a subsection of## 2. Backend Setup(e.g.,#### LLM Model Setup) so the numbered flow remains intact for readers following the setup guide.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 54 - 73, The "LLM-Based Question Generation" section breaks the numbered setup flow by sitting between "### Troubleshooting" and "### 3. Configure Google APIs"; move that block under the backend setup so numbering remains sequential—specifically cut the "LLM-Based Question Generation" heading and its bullet list and paste it as a nested subsection (e.g., "#### LLM Model Setup" or "#### LLM-Based Question Generation") inside the "## 2. Backend Setup" section, ensuring its content stays unchanged and that "### Troubleshooting" and "### 3. Configure Google APIs" remain adjacent in the main flow.backend/Generator/llm_generator.py (1)
63-64: Greedy\[.*\]regex will over-capture when prose contains stray brackets.
re.search(r"\[.*\]", cleaned, re.DOTALL)matches from the first[to the last]in the string. When the model's preamble includes brackets — e.g.,"Here are [4] questions: [{...}]"— the match spans the entire substring including the preamble, causingjson.loadsto fail and unnecessarily falling through to the fragile line-based fallback. Anchoring the pattern on[{…}]avoids this:♻️ Proposed fix
- match = re.search(r"\[.*\]", cleaned, re.DOTALL) + match = re.search(r"\[\s*\{.*\}\s*\]", cleaned, re.DOTALL)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/llm_generator.py` around lines 63 - 64, The regex used to extract JSON arrays is greedy and can over-capture (match = re.search(r"\[.*\]", cleaned, re.DOTALL)); replace it with a pattern that looks specifically for an array of objects and uses a non-greedy quantifier and DOTALL, e.g. search for something like r"\[\s*\{.*?\}\s*\]" with re.DOTALL, so update the match assignment in llm_generator.py (the line that defines match from cleaned) to use the anchored/non-greedy pattern to avoid spanning from the first '[' to the last ']' in the input.backend/test_server.py (2)
57-70: Test requires a live ~397 MB model download — no mocking for CI.
test_get_shortq_llmcalls the real/get_shortq_llmendpoint, which downloads Qwen3-0.6B on first run. In CI environments without a pre-cached model, this adds minutes of download time and requires internet access. Consider either:
- Mocking
LLMShortAnswerGenerator.generate_short_questionsat the unit-test level (preferred for CI), or- Tagging this test as an integration/slow test so it can be skipped in CI with
pytest -m "not llm".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/test_server.py` around lines 57 - 70, Test currently hits the real /get_shortq_llm endpoint and triggers a large model download; update the test to avoid network/model downloads by either (preferred) mocking LLMShortAnswerGenerator.generate_short_questions to return a small deterministic list and calling test_get_shortq_llm against that mock, or mark the test with a pytest marker (e.g., `@pytest.mark.llm` or `@pytest.mark.integration`) so CI can skip it with -m "not llm"; target the test function test_get_shortq_llm and the class/method LLMShortAnswerGenerator.generate_short_questions (or the HTTP client that posts to '/get_shortq_llm') when implementing the mock or marker.
128-128: Movetest_get_shortq_llm()to the end of the test runner.Running the LLM test first forces the ~397 MB model download before the other fast tests (MCQ, BoolQ, ShortQ). Placing it last keeps the fast feedback loop for the majority of tests.
♻️ Proposed fix
if __name__ == '__main__': - test_get_shortq_llm() test_get_mcq() test_get_boolq() test_get_shortq() test_get_problems() test_root() test_get_answer() test_get_boolean_answer() + test_get_shortq_llm()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/test_server.py` at line 128, The test runner currently calls test_get_shortq_llm() early which triggers a large model download; move the call to test_get_shortq_llm() to the end of the test sequence so it runs after the fast tests (e.g., test_get_mcq(), test_get_boolq(), test_get_shortq()). Locate the test invocation for test_get_shortq_llm() in backend/test_server.py and reorder the calls so that test_get_shortq_llm() is the last test executed by the runner.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/Generator/llm_generator.py`:
- Around line 43-56: The call to self.llm.create_chat_completion and the
subsequent access response["choices"][0]["message"]["content"] can raise
exceptions or IndexError when the LLM returns an empty choices list or missing
keys; wrap the create_chat_completion call in a try/except to catch
inference-level errors and log/raise a controlled exception, then validate the
response structure before indexing (check "choices" is present, is a non-empty
list, and that ["message"]["content"] exists) and handle the empty/malformed
case by logging and returning an empty result or raising a clear ValueError;
modify the method containing the create_chat_completion call and the use of
_parse_response to perform these guards and error paths.
- Around line 100-102: The current a_match regex in llm_generator.py (where
a_match is assigned with re.match) incorrectly treats "A." list items as
answers; update the pattern so the dot variant requires the full word "Answer"
while allowing the single-letter "A" only when followed by a colon—i.e., change
the re.match call that assigns a_match to two alternatives: one that matches
"Answer" followed by either ':' or '.' and one that matches the single letter
"A" only when followed by ':'; keep re.IGNORECASE and the group that captures
the answer text intact.
- Around line 14-24: The _load_model method has a TOCTOU race when multiple
threads see self.llm is None and concurrently create Llama instances; fix by
implementing double-checked locking: add a threading.Lock (e.g., self._llm_lock)
on the Generator object (initialized in __init__), then in _load_model re-check
self.llm, and if still None acquire self._llm_lock, re-check self.llm again, and
only then call Llama.from_pretrained to assign self.llm; release the lock
afterward so only one thread downloads/initializes the model.
In `@backend/server.py`:
- Around line 97-105: Wrap the body of get_shortq_llm in a try/except similar to
the /get_content handler: call process_input_text and
llm_shortq.generate_short_questions inside a try block, catch Exception as err,
log the error (using the same logger used elsewhere) and return a JSON error
response with an appropriate HTTP status (e.g., jsonify({"error": "failed to
generate questions", "details": str(err)}), 500). Ensure you still return
jsonify({"output": questions}) on success and reference the functions
get_shortq_llm, process_input_text, and llm_shortq.generate_short_questions so
the change is applied to the correct code.
---
Nitpick comments:
In `@backend/Generator/llm_generator.py`:
- Around line 63-64: The regex used to extract JSON arrays is greedy and can
over-capture (match = re.search(r"\[.*\]", cleaned, re.DOTALL)); replace it with
a pattern that looks specifically for an array of objects and uses a non-greedy
quantifier and DOTALL, e.g. search for something like r"\[\s*\{.*?\}\s*\]" with
re.DOTALL, so update the match assignment in llm_generator.py (the line that
defines match from cleaned) to use the anchored/non-greedy pattern to avoid
spanning from the first '[' to the last ']' in the input.
In `@backend/test_server.py`:
- Around line 57-70: Test currently hits the real /get_shortq_llm endpoint and
triggers a large model download; update the test to avoid network/model
downloads by either (preferred) mocking
LLMShortAnswerGenerator.generate_short_questions to return a small deterministic
list and calling test_get_shortq_llm against that mock, or mark the test with a
pytest marker (e.g., `@pytest.mark.llm` or `@pytest.mark.integration`) so CI can
skip it with -m "not llm"; target the test function test_get_shortq_llm and the
class/method LLMShortAnswerGenerator.generate_short_questions (or the HTTP
client that posts to '/get_shortq_llm') when implementing the mock or marker.
- Line 128: The test runner currently calls test_get_shortq_llm() early which
triggers a large model download; move the call to test_get_shortq_llm() to the
end of the test sequence so it runs after the fast tests (e.g., test_get_mcq(),
test_get_boolq(), test_get_shortq()). Locate the test invocation for
test_get_shortq_llm() in backend/test_server.py and reorder the calls so that
test_get_shortq_llm() is the last test executed by the runner.
In `@README.md`:
- Around line 54-73: The "LLM-Based Question Generation" section breaks the
numbered setup flow by sitting between "### Troubleshooting" and "### 3.
Configure Google APIs"; move that block under the backend setup so numbering
remains sequential—specifically cut the "LLM-Based Question Generation" heading
and its bullet list and paste it as a nested subsection (e.g., "#### LLM Model
Setup" or "#### LLM-Based Question Generation") inside the "## 2. Backend Setup"
section, ensuring its content stays unchanged and that "### Troubleshooting" and
"### 3. Configure Google APIs" remain adjacent in the main flow.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/Generator/llm_generator.py (2)
49-73:create_chat_completionis outside thetry/except— method's error contract is inconsistent.The
try/exceptat line 64 only guards response parsing. Ifcreate_chat_completionraises (e.g., jinja chat-format error, OOM, context-length overflow), the exception escapesgenerate_short_questionsentirely, even though the method implicitly promises to return a list. The server route catches it, but the method itself is unpredictably inconsistent — some failures return[], others raise.Additionally, the bare
except Exceptionat line 72 silently discards failures with no logging (Ruff BLE001), making inference errors invisible in production.Consolidate the inference call and the parsing into a single guarded block:
♻️ Proposed refactor
- response = self.llm.create_chat_completion( - messages=[ - { - "role": "system", - "content": "You generate short-answer quiz questions as JSON arrays. Output ONLY valid JSON.", - }, - { - "role": "user", - "content": prompt, - }, - ], - max_tokens=512, - temperature=0.7, - ) - - try: - choices = response.get("choices", []) - if not choices: - return [] - - raw = choices[0].get("message", {}).get("content", "") - return self._parse_response(raw, max_questions) - - except Exception: - return [] + try: + response = self.llm.create_chat_completion( + messages=[ + { + "role": "system", + "content": "You generate short-answer quiz questions as JSON arrays. Output ONLY valid JSON.", + }, + { + "role": "user", + "content": prompt, + }, + ], + max_tokens=512, + temperature=0.7, + ) + choices = response.get("choices", []) + if not choices: + return [] + raw = choices[0].get("message", {}).get("content", "") + return self._parse_response(raw, max_questions) + except Exception as e: + print(f"LLM inference failed: {e}") + return []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/llm_generator.py` around lines 49 - 73, The call to create_chat_completion should be moved inside the same guarded block as the parsing so generate_short_questions consistently returns [] on any inference/parsing failure; wrap both the create_chat_completion call and the subsequent parsing/choices logic (the response variable, choices handling, and _parse_response invocation) in a try/except that catches Exception as e (not a bare except), log the exception details (use an existing logger like self.logger or a module logger) including the exception/traceback, and return [] on error to preserve the method's contract and surface failures in logs.
78-98: Greedy\[.*\]regex will miss valid JSON when LLM output has brackets in surrounding text.With
re.DOTALL, the greedy.*matches from the first[to the last]incleaned. If the model emits any bracketed text before the array (e.g., a preamble like"Note [1]: here is the result: [...]"), the match spans both bracket pairs, producing a string that is not valid JSON, forcing an unnecessary fall-through to_fallback_parse.The most robust fix is to first attempt
json.loads(cleaned)directly (covers the dominant case where the whole response is clean JSON), then fall back to the regex extraction:♻️ Proposed refactor
- # Try to extract a JSON array from the text - match = re.search(r"\[.*\]", cleaned, re.DOTALL) - if match: - try: - qa_list = json.loads(match.group()) - result = [] - for item in qa_list[:max_questions]: - if isinstance(item, dict) and "question" in item and "answer" in item: - result.append( - { - "question": item["question"].strip(), - "answer": item["answer"].strip(), - "context": "", - } - ) - if result: - return result - except json.JSONDecodeError: - pass + def _extract_qa_list(text): + qa_list = json.loads(text) + result = [] + for item in (qa_list if isinstance(qa_list, list) else []): + if isinstance(item, dict) and "question" in item and "answer" in item: + result.append( + { + "question": item["question"].strip(), + "answer": item["answer"].strip(), + "context": "", + } + ) + return result + + # Try the whole cleaned string first (most common success path) + try: + result = _extract_qa_list(cleaned) + if result: + return result[:max_questions] + except (json.JSONDecodeError, TypeError): + pass + + # Fall back to regex extraction for partial/wrapped JSON + match = re.search(r"\[.*\]", cleaned, re.DOTALL) + if match: + try: + result = _extract_qa_list(match.group()) + if result: + return result[:max_questions] + except (json.JSONDecodeError, TypeError): + passAs an alternative,
create_chat_completionsupports aresponse_formatargument to constrain responses to valid JSON objects, which would eliminate most parsing failures at the source.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/llm_generator.py` around lines 78 - 98, The current extraction uses a greedy regex re.search(r"\[.*\]", cleaned, flags=re.DOTALL) which can capture too much; first try parsing the entire cleaned string with json.loads(cleaned) and return if it yields a list of dicts, and only if that fails fall back to extracting arrays using a non-greedy regex r"\[.*?\]" (flags=re.DOTALL) and attempt json.loads on each match until one parses; preserve the existing filtering logic that builds result from qa_list[:max_questions] (checking isinstance(item, dict) and presence of "question"/"answer") and keep falling through to the existing _fallback_parse if no valid JSON is found.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/server.py`:
- Around line 107-108: The except block in backend.server.py currently returns
jsonify({"error": str(e)}), exposing internals; instead, change the handler to
log the full exception on the server (use logger.exception or
app.logger.exception with the caught exception in the except block) and return a
sanitized, non-sensitive message to the client such as jsonify({"error":
"Internal server error"}) with a 500 status; apply the same change to the
/get_content route handler (same pattern) so both Model loading/inference error
paths use server-side logging and safe external messages.
---
Nitpick comments:
In `@backend/Generator/llm_generator.py`:
- Around line 49-73: The call to create_chat_completion should be moved inside
the same guarded block as the parsing so generate_short_questions consistently
returns [] on any inference/parsing failure; wrap both the
create_chat_completion call and the subsequent parsing/choices logic (the
response variable, choices handling, and _parse_response invocation) in a
try/except that catches Exception as e (not a bare except), log the exception
details (use an existing logger like self.logger or a module logger) including
the exception/traceback, and return [] on error to preserve the method's
contract and surface failures in logs.
- Around line 78-98: The current extraction uses a greedy regex
re.search(r"\[.*\]", cleaned, flags=re.DOTALL) which can capture too much; first
try parsing the entire cleaned string with json.loads(cleaned) and return if it
yields a list of dicts, and only if that fails fall back to extracting arrays
using a non-greedy regex r"\[.*?\]" (flags=re.DOTALL) and attempt json.loads on
each match until one parses; preserve the existing filtering logic that builds
result from qa_list[:max_questions] (checking isinstance(item, dict) and
presence of "question"/"answer") and keep falling through to the existing
_fallback_parse if no valid JSON is found.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/server.py`:
- Line 108: Remove the redundant exception argument passed to
app.logger.exception in the /get_shortq_llm handler (and the two other
logger.exception calls flagged) — logging.exception already captures the
traceback, so change calls like app.logger.exception("Error in /get_shortq_llm:
%s", e) to app.logger.exception("Error in /get_shortq_llm") (or include only
contextual text), leaving exc_info implicit; update the three occurrences (the
call in the /get_shortq_llm handler and the two other app.logger.exception calls
referenced) accordingly.
- Around line 99-103: The handler currently assumes request.get_json() returns a
dict and then calls data.get(...), which raises AttributeError when get_json()
returns None; update the route to validate the JSON body (e.g., check
request.is_json or if data is None after request.get_json()), and return a 400
Bad Request with an explanatory message when no valid JSON is provided instead
of letting the code proceed; ensure the variables referenced (input_text,
use_mediawiki, max_questions) are only accessed after this check. Also apply the
same guard to the other routes that parse JSON (e.g., the /get_mcq and
/get_shortq handlers) so they validate request.get_json() before using data.
| try: | ||
| data = request.get_json() | ||
| input_text = data.get("input_text", "") | ||
| use_mediawiki = data.get("use_mediawiki", 0) | ||
| max_questions = data.get("max_questions", 4) |
There was a problem hiding this comment.
request.get_json() can return None, causing a misleading 500.
When the client sends a request without Content-Type: application/json, or with a malformed body, request.get_json() returns None. The subsequent data.get("input_text", "") call then raises AttributeError, which the except Exception block catches and returns as a 500 Internal Server Error instead of a proper 400 Bad Request. The same unguarded pattern exists in other routes (/get_mcq, /get_shortq, etc.), but those lack try/except entirely, so at least here the failure is contained.
🛡️ Proposed fix
try:
data = request.get_json()
+ if data is None:
+ return jsonify({"error": "Invalid or missing JSON body"}), 400
input_text = data.get("input_text", "")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| data = request.get_json() | |
| input_text = data.get("input_text", "") | |
| use_mediawiki = data.get("use_mediawiki", 0) | |
| max_questions = data.get("max_questions", 4) | |
| try: | |
| data = request.get_json() | |
| if data is None: | |
| return jsonify({"error": "Invalid or missing JSON body"}), 400 | |
| input_text = data.get("input_text", "") | |
| use_mediawiki = data.get("use_mediawiki", 0) | |
| max_questions = data.get("max_questions", 4) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` around lines 99 - 103, The handler currently assumes
request.get_json() returns a dict and then calls data.get(...), which raises
AttributeError when get_json() returns None; update the route to validate the
JSON body (e.g., check request.is_json or if data is None after
request.get_json()), and return a 400 Bad Request with an explanatory message
when no valid JSON is provided instead of letting the code proceed; ensure the
variables referenced (input_text, use_mediawiki, max_questions) are only
accessed after this check. Also apply the same guard to the other routes that
parse JSON (e.g., the /get_mcq and /get_shortq handlers) so they validate
request.get_json() before using data.
| questions = llm_shortq.generate_short_questions(input_text, max_questions) | ||
| return jsonify({"output": questions}) | ||
| except Exception as e: | ||
| app.logger.exception("Error in /get_shortq_llm: %s", e) |
There was a problem hiding this comment.
Redundant exception argument in all three logger.exception calls.
logging.exception() implicitly captures exc_info=True and appends the full traceback. Passing e as a format-string argument additionally embeds the exception's str() into the log message, causing the exception message to appear twice in every log entry. Ruff (TRY401) flags all three occurrences.
♻️ Proposed fix
- app.logger.exception("Error in /get_shortq_llm: %s", e)
+ app.logger.exception("Error in /get_shortq_llm")- app.logger.exception("ValueError in /get_content: %s", e)
+ app.logger.exception("ValueError in /get_content")- app.logger.exception("Unhandled exception in /get_content: %s", e)
+ app.logger.exception("Unhandled exception in /get_content")Also applies to: 210-210, 213-213
🧰 Tools
🪛 Ruff (0.15.1)
[warning] 108-108: Redundant exception object included in logging.exception call
(TRY401)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/server.py` at line 108, Remove the redundant exception argument
passed to app.logger.exception in the /get_shortq_llm handler (and the two other
logger.exception calls flagged) — logging.exception already captures the
traceback, so change calls like app.logger.exception("Error in /get_shortq_llm:
%s", e) to app.logger.exception("Error in /get_shortq_llm") (or include only
contextual text), leaving exc_info implicit; update the three occurrences (the
call in the /get_shortq_llm handler and the two other app.logger.exception calls
referenced) accordingly.
|
Nice work! This looks like a useful addition. Could you expand support to other question types, and also investigate if we can raise the context limit? |
|
Thank you for the valuable feedback. I appreciate the suggestion. I plan to extend LLM support to other question types as well. At the moment, my primary focus is on stabilizing and completing the backend implementation for short-answer generation while ensuring full backward compatibility with the existing endpoints. In parallel, I will also explore the feasibility of increasing the context limit and evaluate potential optimizations to support larger inputs efficiently. Thanks again for the thoughtful input, I’ll keep you updated on the progress. |
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (2)
backend/server.py (2)
108-108:⚠️ Potential issue | 🟡 MinorRemove redundant exception argument in
app.logger.exception(...).
logger.exceptionalready captures traceback/exception context; passingeduplicates exception text in logs.🔧 Proposed cleanup
- app.logger.exception("Error in /get_shortq_llm: %s", e) + app.logger.exception("Error in /get_shortq_llm")- app.logger.exception("Error in /get_mcq_llm: %s", e) + app.logger.exception("Error in /get_mcq_llm")- app.logger.exception("Error in /get_boolq_llm: %s", e) + app.logger.exception("Error in /get_boolq_llm")- app.logger.exception("Error in /get_problems_llm: %s", e) + app.logger.exception("Error in /get_problems_llm")- app.logger.exception("ValueError in /get_content: %s", e) + app.logger.exception("ValueError in /get_content")- app.logger.exception("Unhandled exception in /get_content: %s", e) + app.logger.exception("Unhandled exception in /get_content")Also applies to: 123-123, 138-138, 155-155, 257-257, 260-260
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` at line 108, Remove the redundant exception argument passed to logger.exception calls — logger.exception already logs the exception and traceback, so update each call (e.g., in server.py inside the /get_shortq_llm handler and the other similar handlers referenced around the occurrences at lines 123, 138, 155, 257, 260) to pass only a descriptive message string (e.g., app.logger.exception("Error in /get_shortq_llm")) and remove the trailing ", e" from those invocations.
100-103:⚠️ Potential issue | 🟡 MinorValidate request JSON before
data.get(...)in all new LLM routes.If
request.get_json()returnsNone, these handlers fall intoAttributeErrorand return a 500 instead of a client 400.🔧 Proposed fix pattern (apply to all four routes)
- data = request.get_json() + data = request.get_json(silent=True) + if not isinstance(data, dict): + return jsonify({"error": "Invalid or missing JSON body"}), 400 input_text = data.get("input_text", "")Also applies to: 115-118, 130-133, 145-150
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 100 - 103, The handlers call data = request.get_json() and then data.get("input_text", ...) which will raise AttributeError if get_json() returns None; update each new LLM route to validate the request JSON immediately after calling request.get_json() (e.g., check if data is None or if not request.is_json) and return a 400 response with a clear error message when missing/invalid JSON instead of continuing; apply the same pattern to the blocks that reference data and variables input_text, use_mediawiki, max_questions (and the analogous groups in the other routes at the noted locations) so all four LLM endpoints perform this validation before accessing data.get(...).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/Generator/llm_generator.py`:
- Around line 167-172: generate_all_questions currently appends MCQ dicts with a
"correct_answer" key but downstream code expects an "answer" key
(qapair["answer"]); update the MCQ payload in the questions.append call inside
generate_all_questions so that each MCQ includes an "answer" field (set to the
same value as "correct_answer") — i.e., emit both "correct_answer" and "answer"
(or replace "correct_answer" with "answer" if only one should exist) to restore
compatibility with the downstream form generation that reads qapair["answer"].
- Around line 343-349: The code in llm_generator.py is fabricating MCQ keys by
defaulting correct_answer to "A" when the model omitted an answer; update the
block handling q_match/current_q/options to not invent an answer—either omit the
"correct_answer" key or set it to None (e.g., "correct_answer": None) when no
validated answer exists so downstream consumers can detect missing keys and
handle them appropriately; keep the existing questions.append structure and
options truncation (options[:4]) but remove the hardcoded "A" fallback.
- Around line 418-427: The fallback bool parser currently guesses labels using a
negation heuristic (q_match block) and writes inferred True/False into the
questions list; change this to stop inventing ground-truth: when q_match finds a
question but no explicit answer, do not set answer to a guessed boolean—instead
set answer to None (or omit the "answer" key) or add a flag like "parsed": False
so downstream code can handle unlabeled questions; update the code around
q_match and the questions.append call (the q_match variable and the questions
list entry) to preserve the question text but not produce a guessed label.
- Around line 70-79: Replace the overly broad "except Exception" handlers in
Generator.llm_generator (the blocks that call response.get(...) and
self._parse_response(...)) with a narrowed catch for the specific
parsing-related exceptions: except (AttributeError, TypeError, ValueError):
return [] so that parser/type/value issues are handled while other unexpected
exceptions propagate; make this change at all three locations that currently use
"except Exception" (the blocks around self._parse_response) to avoid silently
swallowing bugs.
---
Duplicate comments:
In `@backend/server.py`:
- Line 108: Remove the redundant exception argument passed to logger.exception
calls — logger.exception already logs the exception and traceback, so update
each call (e.g., in server.py inside the /get_shortq_llm handler and the other
similar handlers referenced around the occurrences at lines 123, 138, 155, 257,
260) to pass only a descriptive message string (e.g.,
app.logger.exception("Error in /get_shortq_llm")) and remove the trailing ", e"
from those invocations.
- Around line 100-103: The handlers call data = request.get_json() and then
data.get("input_text", ...) which will raise AttributeError if get_json()
returns None; update each new LLM route to validate the request JSON immediately
after calling request.get_json() (e.g., check if data is None or if not
request.is_json) and return a 400 response with a clear error message when
missing/invalid JSON instead of continuing; apply the same pattern to the blocks
that reference data and variables input_text, use_mediawiki, max_questions (and
the analogous groups in the other routes at the noted locations) so all four LLM
endpoints perform this validation before accessing data.get(...).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 31d384e6-7b72-4060-87d3-66da50725dd7
📒 Files selected for processing (3)
backend/Generator/llm_generator.pybackend/server.pybackend/test_server.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/test_server.py
|
@Aditya062003 Raised the new revision and addressed all the coderabbit comments as well. Please review at your earliest convenience. Added, the logs also for all in description. |
|
LGTM! |
Description
This PR implements LLM-based question generation using Qwen3-0.6B model, providing users with an AI-powered alternative to traditional question generation methods.
Addressed Issues:
Fixes #450
Changes Made
Backend Implementation
New Module:
backend/Generator/llm_generator.pyLLMShortAnswerGeneratorclass with lazy model loadingNew Endpoint:
POST /get_shortq_llminserver.pyDependencies: Added
llama-cpp-pythontorequirements.txtTesting
test_get_shortq_llm()intest_server.pyLogs:
Documentation
README.mdwith LLM setup instructionsREADME.mdwith LLM feature mentionsHow to Test
Install Dependencies:
cd backend pip install -r requirements.txtStart Backend Server:
Test Endpoint with cURL:
Run Tests:
Expected Response
{ "output": [ { "question": "What is the primary goal of artificial intelligence?", "answer": "to simulate human intelligence processes by machines.", "context": "" }, { "question": "What includes learning, reasoning, and self-correction?", "answer": "artificial intelligence.", "context": "" } ] }Performance Metrics
API Reference
Endpoint:
POST /get_shortq_llmRequest Parameters:
input_text(string, required): Text passage to generate questions frommax_questions(integer, optional, default: 4): Number of questions to generateuse_mediawiki(integer, optional, default: 0): Set to 1 to fetch content from MediaWikiResponse:
output(array): Array of {question, answer, context} objectsScreenshots/Recordings:
N/A - Backend feature, no UI changes
Additional Notes:
First Request Warning: The first request will be slower (~ 5-10 seconds) as the model downloads from Hugging Face (~397MB). Subsequent requests will be fast (~2-3 seconds).
Lazy Loading: The model loads only when the endpoint is first called, not at server startup. This keeps initial server startup time fast.
Local Processing: All question generation happens locally on the user's machine. No external API calls are made.
Alternative Method: This endpoint provides an alternative to
/get_shortq. Users can choose based on their preferences.Future Extensibility: This implementation is designed to be extended to other question types in future PRs.
Checklist
AI Usage Disclosure
Check one of the checkboxes below:
I have used the following AI models and tools: Gemini and ChatGPT to research on different methods to improve efficiency.
Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Tests
Chores