Conversation
Extract _build_prompt() and _parse_response() methods from OllamaClassifier and GroqClassifier into the shared BaseClassifier to eliminate code duplication. Changes: - Add _build_prompt() to BaseClassifier with comprehensive docstrings - Add _parse_response() to BaseClassifier with standard format handling - Remove duplicate _build_prompt() from OllamaClassifier (~18 lines) - Remove duplicate _build_prompt() from GroqClassifier (~18 lines) - Remove duplicate _parse_response() from OllamaClassifier (~45 lines) - Remove duplicate _parse_response() from GroqClassifier (~45 lines) Benefits: - DRY principle: Single implementation eliminates ~100 lines of duplicate code - Maintainability: Changes to prompt building or parsing only need one update - Consistency: Both classifiers use identical logic guaranteed - Extensibility: New classifiers can inherit these methods without reimplementation The shared implementation: - Uses prompt_template attribute set during initialization - Handles standard Classificatie/Reden/Vertrouwen/Personage format - Provides graceful fallbacks for invalid responses - Properly clamps confidence values using ConfidenceDefaults Related: Part of Refactor #2+#3 (Consolidate Classifier Methods) Uses: Classification enum from constants.py (Agent A's work)
Summary of ChangesHello @Gambitnl, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the classifier architecture by extracting common prompt building and response parsing logic into the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is a great refactoring that moves duplicated _build_prompt and _parse_response methods into the BaseClassifier, adhering to the DRY principle. This significantly improves maintainability and consistency across the different classifier implementations.
I have one suggestion for the new _parse_response method in BaseClassifier. The current implementation is vulnerable to IndexError if the LLM response is malformed, and it silently ignores some parsing errors. I've proposed a more robust implementation that handles these edge cases gracefully and improves logging, which should make the system more resilient.
Overall, this is a solid improvement to the codebase.
| def _parse_response( | ||
| self, | ||
| response: str, | ||
| index: int | ||
| ) -> ClassificationResult: | ||
| """ | ||
| Parse LLM response into ClassificationResult. | ||
|
|
||
| This method handles the standard response format used by all | ||
| classifier implementations: | ||
| - Classificatie: IC/OOC/MIXED | ||
| - Reden: reasoning text | ||
| - Vertrouwen: confidence score (0.0-1.0) | ||
| - Personage: character name (or N/A) | ||
|
|
||
| Args: | ||
| response: Raw text response from the LLM | ||
| index: Segment index for the classification result | ||
|
|
||
| Returns: | ||
| ClassificationResult with parsed values | ||
| """ | ||
| classification = Classification.IN_CHARACTER | ||
| confidence = ConfidenceDefaults.DEFAULT | ||
| reasoning = "Could not parse response" | ||
| character = None | ||
|
|
||
| lines = response.strip().split('\n') | ||
| for line in lines: | ||
| line = line.strip() | ||
| if line.startswith("Classificatie:"): | ||
| class_text = line.split(":", 1)[1].strip().upper() | ||
| try: | ||
| classification = Classification(class_text) | ||
| except ValueError: | ||
| if hasattr(self, 'logger'): | ||
| self.logger.warning( | ||
| "Invalid classification '%s' for segment %s, defaulting to IC", | ||
| class_text, | ||
| index | ||
| ) | ||
| classification = Classification.IN_CHARACTER | ||
| elif line.startswith("Reden:"): | ||
| reasoning = line.split(":", 1)[1].strip() | ||
| elif line.startswith("Vertrouwen:"): | ||
| try: | ||
| conf_text = line.split(":", 1)[1].strip() | ||
| confidence = float(conf_text) | ||
| confidence = ConfidenceDefaults.clamp(confidence) | ||
| except ValueError: | ||
| pass | ||
| elif line.startswith("Personage:"): | ||
| char_text = line.split(":", 1)[1].strip() | ||
| if char_text.upper() != "N/A": | ||
| character = char_text | ||
|
|
||
| return ClassificationResult( | ||
| segment_index=index, | ||
| classification=classification, | ||
| confidence=confidence, | ||
| reasoning=reasoning, | ||
| character=character | ||
| ) |
There was a problem hiding this comment.
This new shared _parse_response method is a great step towards a DRYer codebase. However, the current implementation can be made more robust against malformed LLM responses.
-
Potential
IndexError: If a line in the response does not contain a:,line.split(":", 1)[1]will raise anIndexError. This is unhandled and would crash the process. For example, a response line of just"Classificatie"or any other malformed line without a colon would cause this. -
Silent Error on Confidence Parsing: The
try...except ValueError: passblock for parsing the confidence score silently ignores malformed values (e.g., non-numeric text). This is inconsistent with how invalid classification values are handled (which are logged) and can hide issues with the LLM's output format. It would be better to log a warning here for consistency. -
Parsing Robustness: The current
startswithlogic is not robust against extra whitespace around the key (e.g.,"Classificatie : value").
I'm suggesting a refactoring of this method to address these points. This version is safer against malformed input and improves logging, without changing the core logic.
def _parse_response(
self,
response: str,
index: int
) -> ClassificationResult:
"""
Parse LLM response into ClassificationResult.
This method handles the standard response format used by all
classifier implementations:
- Classificatie: IC/OOC/MIXED
- Reden: reasoning text
- Vertrouwen: confidence score (0.0-1.0)
- Personage: character name (or N/A)
Args:
response: Raw text response from the LLM
index: Segment index for the classification result
Returns:
ClassificationResult with parsed values
"""
classification = Classification.IN_CHARACTER
confidence = ConfidenceDefaults.DEFAULT
reasoning = "Could not parse response"
character = None
lines = response.strip().split('\n')
for line in lines:
parts = line.strip().split(":", 1)
if len(parts) != 2:
continue
key, value = parts[0].strip(), parts[1].strip()
if key == "Classificatie":
try:
classification = Classification(value.upper())
except ValueError:
if hasattr(self, 'logger'):
self.logger.warning(
"Invalid classification '%s' for segment %s, defaulting to IC",
value,
index
)
classification = Classification.IN_CHARACTER
elif key == "Reden":
reasoning = value
elif key == "Vertrouwen":
try:
confidence = float(value)
confidence = ConfidenceDefaults.clamp(confidence)
except ValueError:
if hasattr(self, 'logger'):
self.logger.warning(
"Invalid confidence value '%s' for segment %s, using default",
value,
index
)
elif key == "Personage":
if value.upper() != "N/A":
character = value
return ClassificationResult(
segment_index=index,
classification=classification,
confidence=confidence,
reasoning=reasoning,
character=character
)Address code review feedback to make response parsing more resilient: 1. Fix potential IndexError when response lines don't contain ':' - Now splits first and checks if we have exactly 2 parts - Skips malformed lines with 'continue' instead of crashing 2. Add logging for invalid confidence values - Previously silently ignored ValueError, now logs warning - Consistent with how invalid classification values are handled 3. Improve parsing precision - Use exact key matching (==) instead of startswith() - Strips whitespace from both key and value separately - More robust against extra whitespace (e.g., "Classificatie : value") Benefits: - Prevents crashes on malformed LLM responses - Better observability through consistent warning logs - More precise key matching reduces false positives - Graceful degradation with clear fallback values Thanks to gemini-code-assist bot for the review feedback!
Fixed issues identified by code review bot: 1. Corrected bug count from 25 to 23 bugs 2. Fixed BUG #12 location (line 120, not 155-161) and improved description 3. Fixed BUG #13 with proper cleanup suggestion (removed useless finally: pass) 4. Removed BUG #20 (invalid - code already implements suggested fix) 5. Removed BUG #25 (invalid - code uses correct pattern with Optional[]=None) 6. Updated summary counts: Medium 7→8, Low 11→8 7. Renumbered remaining bugs after removals 8. Updated total effort estimate: 34-46h → 32-44h All bugs now verified against actual source code with correct line numbers and accurate descriptions.
* docs: Add comprehensive bug report with 25 identified issues Created docs/KNOWN_ISSUES.md documenting all bugs found during systematic codebase analysis. Bugs are categorized by severity: - Critical (3): Security vulnerabilities and data loss risks - High (4): Crashes and race conditions - Medium (7): Logic errors and resource leaks - Low (11): Edge cases and type inconsistencies Each bug includes: - File location and line numbers - Detailed description and impact assessment - Code examples and suggested fixes - Priority classification Also includes fix priority roadmap with estimated effort (34-46 hours total) and testing recommendations for each category. * fix: Correct inaccuracies in KNOWN_ISSUES.md based on code review Fixed issues identified by code review bot: 1. Corrected bug count from 25 to 23 bugs 2. Fixed BUG #12 location (line 120, not 155-161) and improved description 3. Fixed BUG #13 with proper cleanup suggestion (removed useless finally: pass) 4. Removed BUG #20 (invalid - code already implements suggested fix) 5. Removed BUG #25 (invalid - code uses correct pattern with Optional[]=None) 6. Updated summary counts: Medium 7→8, Low 11→8 7. Renumbered remaining bugs after removals 8. Updated total effort estimate: 34-46h → 32-44h All bugs now verified against actual source code with correct line numbers and accurate descriptions. --------- Co-authored-by: Claude <noreply@anthropic.com>
Extract _build_prompt() and _parse_response() methods from OllamaClassifier and GroqClassifier into the shared BaseClassifier to eliminate code duplication.
Changes:
Benefits:
The shared implementation:
Related: Part of Refactor #2+#3 (Consolidate Classifier Methods)
Uses: Classification enum from constants.py (Agent A's work)