Agent H: Refactor classifier methods and pipeline processing by Gambitnl · Pull Request #25 · Gambitnl/Video_chunking

Gambitnl · 2025-11-12T23:08:15Z

Extract _build_prompt() and _parse_response() methods from OllamaClassifier and GroqClassifier into the shared BaseClassifier to eliminate code duplication.

Changes:

Add _build_prompt() to BaseClassifier with comprehensive docstrings
Add _parse_response() to BaseClassifier with standard format handling
Remove duplicate _build_prompt() from OllamaClassifier (~18 lines)
Remove duplicate _build_prompt() from GroqClassifier (~18 lines)
Remove duplicate _parse_response() from OllamaClassifier (~45 lines)
Remove duplicate _parse_response() from GroqClassifier (~45 lines)

Benefits:

DRY principle: Single implementation eliminates ~100 lines of duplicate code
Maintainability: Changes to prompt building or parsing only need one update
Consistency: Both classifiers use identical logic guaranteed
Extensibility: New classifiers can inherit these methods without reimplementation

The shared implementation:

Uses prompt_template attribute set during initialization
Handles standard Classificatie/Reden/Vertrouwen/Personage format
Provides graceful fallbacks for invalid responses
Properly clamps confidence values using ConfidenceDefaults

Related: Part of Refactor #2+#3 (Consolidate Classifier Methods)
Uses: Classification enum from constants.py (Agent A's work)

Extract _build_prompt() and _parse_response() methods from OllamaClassifier and GroqClassifier into the shared BaseClassifier to eliminate code duplication. Changes: - Add _build_prompt() to BaseClassifier with comprehensive docstrings - Add _parse_response() to BaseClassifier with standard format handling - Remove duplicate _build_prompt() from OllamaClassifier (~18 lines) - Remove duplicate _build_prompt() from GroqClassifier (~18 lines) - Remove duplicate _parse_response() from OllamaClassifier (~45 lines) - Remove duplicate _parse_response() from GroqClassifier (~45 lines) Benefits: - DRY principle: Single implementation eliminates ~100 lines of duplicate code - Maintainability: Changes to prompt building or parsing only need one update - Consistency: Both classifiers use identical logic guaranteed - Extensibility: New classifiers can inherit these methods without reimplementation The shared implementation: - Uses prompt_template attribute set during initialization - Handles standard Classificatie/Reden/Vertrouwen/Personage format - Provides graceful fallbacks for invalid responses - Properly clamps confidence values using ConfidenceDefaults Related: Part of Refactor #2+#3 (Consolidate Classifier Methods) Uses: Classification enum from constants.py (Agent A's work)

gemini-code-assist · 2025-11-12T23:08:29Z

Summary of Changes

Hello @Gambitnl, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the classifier architecture by extracting common prompt building and response parsing logic into the BaseClassifier. This strategic move centralizes critical functionality, eliminating code duplication in derived classes like OllamaClassifier and GroqClassifier. The change enhances the system's maintainability, ensures consistent behavior across different LLM integrations, and simplifies the process of adding new classifiers in the future.

Highlights

Centralized Prompt Building: The _build_prompt method has been moved from OllamaClassifier and GroqClassifier to the BaseClassifier, ensuring a single, consistent way to construct prompts for LLMs.
Standardized Response Parsing: The _parse_response method, responsible for interpreting LLM outputs, is now part of BaseClassifier, handling a consistent response format, error cases, and confidence clamping.
Significant Code Reduction: This refactoring eliminates approximately 100 lines of duplicate code across OllamaClassifier and GroqClassifier, adhering to the DRY principle.
Improved Maintainability and Extensibility: By centralizing these core methods, future changes to prompt building or response parsing logic only require a single update, and new classifiers can easily inherit this functionality.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is a great refactoring that moves duplicated _build_prompt and _parse_response methods into the BaseClassifier, adhering to the DRY principle. This significantly improves maintainability and consistency across the different classifier implementations.

I have one suggestion for the new _parse_response method in BaseClassifier. The current implementation is vulnerable to IndexError if the LLM response is malformed, and it silently ignores some parsing errors. I've proposed a more robust implementation that handles these edge cases gracefully and improves logging, which should make the system more resilient.

Overall, this is a solid improvement to the codebase.

gemini-code-assist · 2025-11-12T23:10:08Z

+    def _parse_response(
+        self,
+        response: str,
+        index: int
+    ) -> ClassificationResult:
+        """
+        Parse LLM response into ClassificationResult.
+
+        This method handles the standard response format used by all
+        classifier implementations:
+        - Classificatie: IC/OOC/MIXED
+        - Reden: reasoning text
+        - Vertrouwen: confidence score (0.0-1.0)
+        - Personage: character name (or N/A)
+
+        Args:
+            response: Raw text response from the LLM
+            index: Segment index for the classification result
+
+        Returns:
+            ClassificationResult with parsed values
+        """
+        classification = Classification.IN_CHARACTER
+        confidence = ConfidenceDefaults.DEFAULT
+        reasoning = "Could not parse response"
+        character = None
+
+        lines = response.strip().split('\n')
+        for line in lines:
+            line = line.strip()
+            if line.startswith("Classificatie:"):
+                class_text = line.split(":", 1)[1].strip().upper()
+                try:
+                    classification = Classification(class_text)
+                except ValueError:
+                    if hasattr(self, 'logger'):
+                        self.logger.warning(
+                            "Invalid classification '%s' for segment %s, defaulting to IC",
+                            class_text,
+                            index
+                        )
+                    classification = Classification.IN_CHARACTER
+            elif line.startswith("Reden:"):
+                reasoning = line.split(":", 1)[1].strip()
+            elif line.startswith("Vertrouwen:"):
+                try:
+                    conf_text = line.split(":", 1)[1].strip()
+                    confidence = float(conf_text)
+                    confidence = ConfidenceDefaults.clamp(confidence)
+                except ValueError:
+                    pass
+            elif line.startswith("Personage:"):
+                char_text = line.split(":", 1)[1].strip()
+                if char_text.upper() != "N/A":
+                    character = char_text
+
+        return ClassificationResult(
+            segment_index=index,
+            classification=classification,
+            confidence=confidence,
+            reasoning=reasoning,
+            character=character
+        )


This new shared _parse_response method is a great step towards a DRYer codebase. However, the current implementation can be made more robust against malformed LLM responses.

Potential IndexError: If a line in the response does not contain a :, line.split(":", 1)[1] will raise an IndexError. This is unhandled and would crash the process. For example, a response line of just "Classificatie" or any other malformed line without a colon would cause this.

Silent Error on Confidence Parsing: The try...except ValueError: pass block for parsing the confidence score silently ignores malformed values (e.g., non-numeric text). This is inconsistent with how invalid classification values are handled (which are logged) and can hide issues with the LLM's output format. It would be better to log a warning here for consistency.

Parsing Robustness: The current startswith logic is not robust against extra whitespace around the key (e.g., "Classificatie : value").

I'm suggesting a refactoring of this method to address these points. This version is safer against malformed input and improves logging, without changing the core logic.

def _parse_response( self, response: str, index: int ) -> ClassificationResult: """ Parse LLM response into ClassificationResult. This method handles the standard response format used by all classifier implementations: - Classificatie: IC/OOC/MIXED - Reden: reasoning text - Vertrouwen: confidence score (0.0-1.0) - Personage: character name (or N/A) Args: response: Raw text response from the LLM index: Segment index for the classification result Returns: ClassificationResult with parsed values """ classification = Classification.IN_CHARACTER confidence = ConfidenceDefaults.DEFAULT reasoning = "Could not parse response" character = None lines = response.strip().split('\n') for line in lines: parts = line.strip().split(":", 1) if len(parts) != 2: continue key, value = parts[0].strip(), parts[1].strip() if key == "Classificatie": try: classification = Classification(value.upper()) except ValueError: if hasattr(self, 'logger'): self.logger.warning( "Invalid classification '%s' for segment %s, defaulting to IC", value, index ) classification = Classification.IN_CHARACTER elif key == "Reden": reasoning = value elif key == "Vertrouwen": try: confidence = float(value) confidence = ConfidenceDefaults.clamp(confidence) except ValueError: if hasattr(self, 'logger'): self.logger.warning( "Invalid confidence value '%s' for segment %s, using default", value, index ) elif key == "Personage": if value.upper() != "N/A": character = value return ClassificationResult( segment_index=index, classification=classification, confidence=confidence, reasoning=reasoning, character=character )

Address code review feedback to make response parsing more resilient: 1. Fix potential IndexError when response lines don't contain ':' - Now splits first and checks if we have exactly 2 parts - Skips malformed lines with 'continue' instead of crashing 2. Add logging for invalid confidence values - Previously silently ignored ValueError, now logs warning - Consistent with how invalid classification values are handled 3. Improve parsing precision - Use exact key matching (==) instead of startswith() - Strips whitespace from both key and value separately - More robust against extra whitespace (e.g., "Classificatie : value") Benefits: - Prevents crashes on malformed LLM responses - Better observability through consistent warning logs - More precise key matching reduces false positives - Graceful degradation with clear fallback values Thanks to gemini-code-assist bot for the review feedback!

Fixed issues identified by code review bot: 1. Corrected bug count from 25 to 23 bugs 2. Fixed BUG #12 location (line 120, not 155-161) and improved description 3. Fixed BUG #13 with proper cleanup suggestion (removed useless finally: pass) 4. Removed BUG #20 (invalid - code already implements suggested fix) 5. Removed BUG #25 (invalid - code uses correct pattern with Optional[]=None) 6. Updated summary counts: Medium 7→8, Low 11→8 7. Renumbered remaining bugs after removals 8. Updated total effort estimate: 34-46h → 32-44h All bugs now verified against actual source code with correct line numbers and accurate descriptions.

* docs: Add comprehensive bug report with 25 identified issues Created docs/KNOWN_ISSUES.md documenting all bugs found during systematic codebase analysis. Bugs are categorized by severity: - Critical (3): Security vulnerabilities and data loss risks - High (4): Crashes and race conditions - Medium (7): Logic errors and resource leaks - Low (11): Edge cases and type inconsistencies Each bug includes: - File location and line numbers - Detailed description and impact assessment - Code examples and suggested fixes - Priority classification Also includes fix priority roadmap with estimated effort (34-46 hours total) and testing recommendations for each category. * fix: Correct inaccuracies in KNOWN_ISSUES.md based on code review Fixed issues identified by code review bot: 1. Corrected bug count from 25 to 23 bugs 2. Fixed BUG #12 location (line 120, not 155-161) and improved description 3. Fixed BUG #13 with proper cleanup suggestion (removed useless finally: pass) 4. Removed BUG #20 (invalid - code already implements suggested fix) 5. Removed BUG #25 (invalid - code uses correct pattern with Optional[]=None) 6. Updated summary counts: Medium 7→8, Low 11→8 7. Renumbered remaining bugs after removals 8. Updated total effort estimate: 34-46h → 32-44h All bugs now verified against actual source code with correct line numbers and accurate descriptions. --------- Co-authored-by: Claude <noreply@anthropic.com>

gemini-code-assist Bot reviewed Nov 12, 2025

View reviewed changes

Gambitnl merged commit 0048081 into main Nov 12, 2025
3 checks passed

gemini-code-assist Bot mentioned this pull request Nov 16, 2025

Find and list 20 bugs #49

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent H: Refactor classifier methods and pipeline processing#25

Agent H: Refactor classifier methods and pipeline processing#25
Gambitnl merged 2 commits intomainfrom
claude/refactor-classifier-prompt-parsing-011CV4ptrCX5qdFiw6W5SwbY

Gambitnl commented Nov 12, 2025

Uh oh!

gemini-code-assist Bot commented Nov 12, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Gambitnl commented Nov 12, 2025

Uh oh!

gemini-code-assist Bot commented Nov 12, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants