Skip to content

Agent H: Refactor classifier methods and pipeline processing#25

Merged
Gambitnl merged 2 commits intomainfrom
claude/refactor-classifier-prompt-parsing-011CV4ptrCX5qdFiw6W5SwbY
Nov 12, 2025
Merged

Agent H: Refactor classifier methods and pipeline processing#25
Gambitnl merged 2 commits intomainfrom
claude/refactor-classifier-prompt-parsing-011CV4ptrCX5qdFiw6W5SwbY

Conversation

@Gambitnl
Copy link
Copy Markdown
Owner

Extract _build_prompt() and _parse_response() methods from OllamaClassifier and GroqClassifier into the shared BaseClassifier to eliminate code duplication.

Changes:

  • Add _build_prompt() to BaseClassifier with comprehensive docstrings
  • Add _parse_response() to BaseClassifier with standard format handling
  • Remove duplicate _build_prompt() from OllamaClassifier (~18 lines)
  • Remove duplicate _build_prompt() from GroqClassifier (~18 lines)
  • Remove duplicate _parse_response() from OllamaClassifier (~45 lines)
  • Remove duplicate _parse_response() from GroqClassifier (~45 lines)

Benefits:

  • DRY principle: Single implementation eliminates ~100 lines of duplicate code
  • Maintainability: Changes to prompt building or parsing only need one update
  • Consistency: Both classifiers use identical logic guaranteed
  • Extensibility: New classifiers can inherit these methods without reimplementation

The shared implementation:

  • Uses prompt_template attribute set during initialization
  • Handles standard Classificatie/Reden/Vertrouwen/Personage format
  • Provides graceful fallbacks for invalid responses
  • Properly clamps confidence values using ConfidenceDefaults

Related: Part of Refactor #2+#3 (Consolidate Classifier Methods)
Uses: Classification enum from constants.py (Agent A's work)

Extract _build_prompt() and _parse_response() methods from OllamaClassifier
and GroqClassifier into the shared BaseClassifier to eliminate code duplication.

Changes:
- Add _build_prompt() to BaseClassifier with comprehensive docstrings
- Add _parse_response() to BaseClassifier with standard format handling
- Remove duplicate _build_prompt() from OllamaClassifier (~18 lines)
- Remove duplicate _build_prompt() from GroqClassifier (~18 lines)
- Remove duplicate _parse_response() from OllamaClassifier (~45 lines)
- Remove duplicate _parse_response() from GroqClassifier (~45 lines)

Benefits:
- DRY principle: Single implementation eliminates ~100 lines of duplicate code
- Maintainability: Changes to prompt building or parsing only need one update
- Consistency: Both classifiers use identical logic guaranteed
- Extensibility: New classifiers can inherit these methods without reimplementation

The shared implementation:
- Uses prompt_template attribute set during initialization
- Handles standard Classificatie/Reden/Vertrouwen/Personage format
- Provides graceful fallbacks for invalid responses
- Properly clamps confidence values using ConfidenceDefaults

Related: Part of Refactor #2+#3 (Consolidate Classifier Methods)
Uses: Classification enum from constants.py (Agent A's work)
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Gambitnl, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the classifier architecture by extracting common prompt building and response parsing logic into the BaseClassifier. This strategic move centralizes critical functionality, eliminating code duplication in derived classes like OllamaClassifier and GroqClassifier. The change enhances the system's maintainability, ensures consistent behavior across different LLM integrations, and simplifies the process of adding new classifiers in the future.

Highlights

  • Centralized Prompt Building: The _build_prompt method has been moved from OllamaClassifier and GroqClassifier to the BaseClassifier, ensuring a single, consistent way to construct prompts for LLMs.
  • Standardized Response Parsing: The _parse_response method, responsible for interpreting LLM outputs, is now part of BaseClassifier, handling a consistent response format, error cases, and confidence clamping.
  • Significant Code Reduction: This refactoring eliminates approximately 100 lines of duplicate code across OllamaClassifier and GroqClassifier, adhering to the DRY principle.
  • Improved Maintainability and Extensibility: By centralizing these core methods, future changes to prompt building or response parsing logic only require a single update, and new classifiers can easily inherit this functionality.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a great refactoring that moves duplicated _build_prompt and _parse_response methods into the BaseClassifier, adhering to the DRY principle. This significantly improves maintainability and consistency across the different classifier implementations.

I have one suggestion for the new _parse_response method in BaseClassifier. The current implementation is vulnerable to IndexError if the LLM response is malformed, and it silently ignores some parsing errors. I've proposed a more robust implementation that handles these edge cases gracefully and improves logging, which should make the system more resilient.

Overall, this is a solid improvement to the codebase.

Comment thread src/classifier.py
Comment on lines +106 to +168
def _parse_response(
self,
response: str,
index: int
) -> ClassificationResult:
"""
Parse LLM response into ClassificationResult.

This method handles the standard response format used by all
classifier implementations:
- Classificatie: IC/OOC/MIXED
- Reden: reasoning text
- Vertrouwen: confidence score (0.0-1.0)
- Personage: character name (or N/A)

Args:
response: Raw text response from the LLM
index: Segment index for the classification result

Returns:
ClassificationResult with parsed values
"""
classification = Classification.IN_CHARACTER
confidence = ConfidenceDefaults.DEFAULT
reasoning = "Could not parse response"
character = None

lines = response.strip().split('\n')
for line in lines:
line = line.strip()
if line.startswith("Classificatie:"):
class_text = line.split(":", 1)[1].strip().upper()
try:
classification = Classification(class_text)
except ValueError:
if hasattr(self, 'logger'):
self.logger.warning(
"Invalid classification '%s' for segment %s, defaulting to IC",
class_text,
index
)
classification = Classification.IN_CHARACTER
elif line.startswith("Reden:"):
reasoning = line.split(":", 1)[1].strip()
elif line.startswith("Vertrouwen:"):
try:
conf_text = line.split(":", 1)[1].strip()
confidence = float(conf_text)
confidence = ConfidenceDefaults.clamp(confidence)
except ValueError:
pass
elif line.startswith("Personage:"):
char_text = line.split(":", 1)[1].strip()
if char_text.upper() != "N/A":
character = char_text

return ClassificationResult(
segment_index=index,
classification=classification,
confidence=confidence,
reasoning=reasoning,
character=character
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This new shared _parse_response method is a great step towards a DRYer codebase. However, the current implementation can be made more robust against malformed LLM responses.

  1. Potential IndexError: If a line in the response does not contain a :, line.split(":", 1)[1] will raise an IndexError. This is unhandled and would crash the process. For example, a response line of just "Classificatie" or any other malformed line without a colon would cause this.

  2. Silent Error on Confidence Parsing: The try...except ValueError: pass block for parsing the confidence score silently ignores malformed values (e.g., non-numeric text). This is inconsistent with how invalid classification values are handled (which are logged) and can hide issues with the LLM's output format. It would be better to log a warning here for consistency.

  3. Parsing Robustness: The current startswith logic is not robust against extra whitespace around the key (e.g., "Classificatie : value").

I'm suggesting a refactoring of this method to address these points. This version is safer against malformed input and improves logging, without changing the core logic.

    def _parse_response(
        self,
        response: str,
        index: int
    ) -> ClassificationResult:
        """
        Parse LLM response into ClassificationResult.

        This method handles the standard response format used by all
        classifier implementations:
        - Classificatie: IC/OOC/MIXED
        - Reden: reasoning text
        - Vertrouwen: confidence score (0.0-1.0)
        - Personage: character name (or N/A)

        Args:
            response: Raw text response from the LLM
            index: Segment index for the classification result

        Returns:
            ClassificationResult with parsed values
        """
        classification = Classification.IN_CHARACTER
        confidence = ConfidenceDefaults.DEFAULT
        reasoning = "Could not parse response"
        character = None

        lines = response.strip().split('\n')
        for line in lines:
            parts = line.strip().split(":", 1)
            if len(parts) != 2:
                continue

            key, value = parts[0].strip(), parts[1].strip()

            if key == "Classificatie":
                try:
                    classification = Classification(value.upper())
                except ValueError:
                    if hasattr(self, 'logger'):
                        self.logger.warning(
                            "Invalid classification '%s' for segment %s, defaulting to IC",
                            value,
                            index
                        )
                    classification = Classification.IN_CHARACTER
            elif key == "Reden":
                reasoning = value
            elif key == "Vertrouwen":
                try:
                    confidence = float(value)
                    confidence = ConfidenceDefaults.clamp(confidence)
                except ValueError:
                    if hasattr(self, 'logger'):
                        self.logger.warning(
                            "Invalid confidence value '%s' for segment %s, using default",
                            value,
                            index
                        )
            elif key == "Personage":
                if value.upper() != "N/A":
                    character = value

        return ClassificationResult(
            segment_index=index,
            classification=classification,
            confidence=confidence,
            reasoning=reasoning,
            character=character
        )

Address code review feedback to make response parsing more resilient:

1. Fix potential IndexError when response lines don't contain ':'
   - Now splits first and checks if we have exactly 2 parts
   - Skips malformed lines with 'continue' instead of crashing

2. Add logging for invalid confidence values
   - Previously silently ignored ValueError, now logs warning
   - Consistent with how invalid classification values are handled

3. Improve parsing precision
   - Use exact key matching (==) instead of startswith()
   - Strips whitespace from both key and value separately
   - More robust against extra whitespace (e.g., "Classificatie : value")

Benefits:
- Prevents crashes on malformed LLM responses
- Better observability through consistent warning logs
- More precise key matching reduces false positives
- Graceful degradation with clear fallback values

Thanks to gemini-code-assist bot for the review feedback!
@Gambitnl Gambitnl merged commit 0048081 into main Nov 12, 2025
3 checks passed
Gambitnl pushed a commit that referenced this pull request Nov 16, 2025
Fixed issues identified by code review bot:

1. Corrected bug count from 25 to 23 bugs
2. Fixed BUG #12 location (line 120, not 155-161) and improved description
3. Fixed BUG #13 with proper cleanup suggestion (removed useless finally: pass)
4. Removed BUG #20 (invalid - code already implements suggested fix)
5. Removed BUG #25 (invalid - code uses correct pattern with Optional[]=None)
6. Updated summary counts: Medium 7→8, Low 11→8
7. Renumbered remaining bugs after removals
8. Updated total effort estimate: 34-46h → 32-44h

All bugs now verified against actual source code with correct line numbers
and accurate descriptions.
Gambitnl added a commit that referenced this pull request Nov 16, 2025
* docs: Add comprehensive bug report with 25 identified issues

Created docs/KNOWN_ISSUES.md documenting all bugs found during
systematic codebase analysis. Bugs are categorized by severity:

- Critical (3): Security vulnerabilities and data loss risks
- High (4): Crashes and race conditions
- Medium (7): Logic errors and resource leaks
- Low (11): Edge cases and type inconsistencies

Each bug includes:
- File location and line numbers
- Detailed description and impact assessment
- Code examples and suggested fixes
- Priority classification

Also includes fix priority roadmap with estimated effort (34-46 hours
total) and testing recommendations for each category.

* fix: Correct inaccuracies in KNOWN_ISSUES.md based on code review

Fixed issues identified by code review bot:

1. Corrected bug count from 25 to 23 bugs
2. Fixed BUG #12 location (line 120, not 155-161) and improved description
3. Fixed BUG #13 with proper cleanup suggestion (removed useless finally: pass)
4. Removed BUG #20 (invalid - code already implements suggested fix)
5. Removed BUG #25 (invalid - code uses correct pattern with Optional[]=None)
6. Updated summary counts: Medium 7→8, Low 11→8
7. Renumbered remaining bugs after removals
8. Updated total effort estimate: 34-46h → 32-44h

All bugs now verified against actual source code with correct line numbers
and accurate descriptions.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants