Task3: Refactor artifact counting into dedicated class by Gambitnl · Pull Request #33 · Gambitnl/Video_chunking

Gambitnl · 2025-11-14T14:48:31Z

Enhanced the CampaignArtifactCounter class to align with refactoring documentation specifications. The core extraction was already complete, but added additional useful features for better usability.

Changes to src/artifact_counter.py:

Enhanced ArtifactCounts dataclass with session_ids and narrative_paths lists
Added session_count and narrative_count properties for backward compatibility
Added total_artifacts computed property
Enhanced to_dict() to include new fields
Updated _count_session_artifacts to track session IDs and narrative paths
Added count_sessions() convenience method
Added count_narratives() convenience method
Added get_all_campaigns() to list all campaigns with artifacts
Added get_campaign_summary() for detailed campaign information

Changes to tests/test_artifact_counter.py:

Added tests for new properties (session_count, narrative_count, total_artifacts)
Added test for session_ids and narrative_paths tracking
Added tests for count_sessions() convenience method
Added tests for count_narratives() convenience method
Added tests for get_all_campaigns() method
Added tests for get_campaign_summary() method
Added tests to verify session IDs and narrative paths are tracked correctly

All changes maintain backward compatibility with existing code. The original implementation already had proper error handling, caching, and testability as required by the refactoring task.

Related: docs/refactoring/08-campaign-artifact-counting.md

Enhanced the CampaignArtifactCounter class to align with refactoring documentation specifications. The core extraction was already complete, but added additional useful features for better usability. Changes to src/artifact_counter.py: - Enhanced ArtifactCounts dataclass with session_ids and narrative_paths lists - Added session_count and narrative_count properties for backward compatibility - Added total_artifacts computed property - Enhanced to_dict() to include new fields - Updated _count_session_artifacts to track session IDs and narrative paths - Added count_sessions() convenience method - Added count_narratives() convenience method - Added get_all_campaigns() to list all campaigns with artifacts - Added get_campaign_summary() for detailed campaign information Changes to tests/test_artifact_counter.py: - Added tests for new properties (session_count, narrative_count, total_artifacts) - Added test for session_ids and narrative_paths tracking - Added tests for count_sessions() convenience method - Added tests for count_narratives() convenience method - Added tests for get_all_campaigns() method - Added tests for get_campaign_summary() method - Added tests to verify session IDs and narrative paths are tracked correctly All changes maintain backward compatibility with existing code. The original implementation already had proper error handling, caching, and testability as required by the refactoring task. Related: docs/refactoring/08-campaign-artifact-counting.md

gemini-code-assist · 2025-11-14T14:48:51Z

Summary of Changes

Hello @Gambitnl, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors and significantly enhances the CampaignArtifactCounter class, aligning it with specified refactoring documentation. The core changes involve enriching the ArtifactCounts dataclass with detailed tracking for individual session IDs and narrative paths, introducing convenience properties and methods for easier access to artifact counts, and providing new functionalities to list all campaigns and retrieve comprehensive summaries. These updates improve the usability and data granularity of the artifact counting mechanism while maintaining full backward compatibility.

Highlights

Enhanced ArtifactCounts Dataclass: The ArtifactCounts dataclass now includes session_ids (list of strings) and narrative_paths (list of Path objects) for more granular tracking of individual artifacts.
New Convenience Properties: Added session_count, narrative_count (for backward compatibility), and total_artifacts properties to the ArtifactCounts dataclass, providing easier access to artifact counts.
Extended Serialization: The to_dict() method in ArtifactCounts has been updated to include the newly added session_ids, narrative_paths, and total_artifacts fields in its serialized output.
Improved Artifact Tracking Logic: The internal _count_session_artifacts function now actively tracks and populates individual session IDs and narrative file paths during the artifact counting process.
New Public Methods for CampaignArtifactCounter: Introduced count_sessions(), count_narratives() for specific counts, get_all_campaigns() to list all campaigns with artifacts, and get_campaign_summary() for a comprehensive overview of a campaign's artifacts.
Comprehensive Test Coverage: New tests have been added to test_artifact_counter.py to validate all new properties and methods, ensuring the correctness and backward compatibility of the enhanced functionality.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request successfully refactors the artifact counting logic into a dedicated CampaignArtifactCounter class, which significantly improves modularity, testability, and maintainability. The addition of new features like tracking session IDs and narrative paths, along with convenience methods, enhances the utility of the class. The tests are comprehensive and cover the new functionality well. I've included a few suggestions to further improve code quality by reducing duplication and addressing a potential performance concern.

gemini-code-assist · 2025-11-14T14:49:52Z

+    def get_all_campaigns(self) -> List[str]:
+        """
+        Get list of all campaigns that have artifacts.
+
+        Returns:
+            Sorted list of campaign IDs found in the output directory
+        """
+        campaigns = set()
+
+        if not self.output_dir.exists():
+            self.logger.warning(f"Output directory not found: {self.output_dir}")
+            return []
+
+        try:
+            data_files = list(self.output_dir.glob("**/*_data.json"))
+        except Exception as e:
+            self.logger.error(f"Failed to glob data files: {e}")
+            return []
+
+        for data_path in data_files:
+            try:
+                payload = json.loads(data_path.read_text(encoding="utf-8"))
+                metadata = payload.get("metadata") or {}
+                campaign_id = metadata.get("campaign_id")
+                if campaign_id:
+                    campaigns.add(campaign_id)
+            except Exception as e:
+                self.logger.debug(f"Skipping {data_path.name}: {e}")
+                continue
+
+        return sorted(campaigns)


This method performs a file system scan (glob) every time it is called. As noted in the refactoring design document, this can be inefficient for directories with a large number of sessions. While the main count_artifacts method is cached, this discovery method is not. For applications that call this frequently (e.g., to populate a UI dropdown), this could become a performance bottleneck.

Consider adding a caching mechanism to this method, similar to the one used in count_artifacts. A short-lived cache would improve performance for repeated calls without returning overly stale data. You could implement this using a separate cache dictionary and a lock to ensure thread safety.

gemini-code-assist · 2025-11-14T14:49:52Z

+        return {
+            "campaign_id": campaign_id,
+            "session_count": counts.sessions,
+            "narrative_count": counts.narratives,
+            "total_artifacts": counts.total_artifacts,
+            "session_ids": counts.session_ids,
+            "narrative_paths": [str(p) for p in counts.narrative_paths],
+            "error_count": len(counts.errors),
+            "errors": counts.errors,
+            "last_updated": counts.last_updated.isoformat() if counts.last_updated else None
+        }


The dictionary being constructed here has significant overlap with the output of ArtifactCounts.to_dict(). This creates code duplication, as much of the logic for serialization is repeated. To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, this method should leverage counts.to_dict() and then augment the result.

By centralizing the dictionary creation logic within to_dict(), any future changes to the ArtifactCounts data structure will only need to be updated in one place.

summary = counts.to_dict() summary["campaign_id"] = campaign_id summary["errors"] = counts.errors summary["session_count"] = summary.pop("sessions") summary["narrative_count"] = summary.pop("narratives") return summary

gemini-code-assist · 2025-11-14T14:49:52Z

+
+        assert len(counts.narrative_paths) == 3
+        # Check that all paths are Path objects
+        from pathlib import Path


This from pathlib import Path statement is redundant. Path is already imported at the top of the file (line 6). According to PEP 8, imports should be at the top of the file to improve readability and avoid unnecessary re-imports. This line can be safely removed.

Updated documentation to reflect completion of campaign artifact counter extraction and enhancement: Changes to docs/refactoring/08-campaign-artifact-counting.md: - Added completion status banner at the top - Added comprehensive Implementation Notes section documenting: - What was already complete (core extraction) - Enhancements added (convenience methods, query methods, tracking) - Usage examples for all new features - Actual vs estimated effort (2 hours vs 9-13 hours) - Success criteria validation - Benefits realized - Integration details - Files modified - Lessons learned Changes to docs/refactoring/README.md: - Marked task #8 as completed with checkmark - Added completion date and status - Added completion notes summarizing enhancements - Updated summary statistics to show 1 of 10 completed - Updated "Last Updated" date to 2025-11-14 The documentation now provides: - Clear completion status for tracking progress - Detailed implementation notes for future reference - Usage examples for developers - Lessons learned for future refactorings

Implemented three improvements based on Gemini code review: 1. Performance: Added caching to get_all_campaigns() - Previously performed filesystem scan on every call - Now uses double-checked locking pattern with TTL-based cache - Added use_cache parameter (default: True) for cache control - Prevents performance bottlenecks when called frequently (e.g., UI dropdowns) 2. Code duplication: Refactored get_campaign_summary() to use to_dict() - Eliminated duplicate dictionary construction logic - Now leverages ArtifactCounts.to_dict() as base - Augments with campaign_id and errors fields - Renames keys for API consistency (sessions -> session_count, etc.) - Follows DRY principle for easier maintenance 3. Code quality: Removed redundant import in tests - Removed inline 'from pathlib import Path' in test_narrative_paths_tracked - Path already imported at module level (line 6) - Improves readability per PEP 8 guidelines Additional improvements: - Updated clear_cache() to also clear campaigns list cache - Enhanced get_cache_stats() to include campaigns_list_cached status - Added comprehensive tests for new caching behavior: - test_get_all_campaigns_caching: Verifies cache works correctly - test_clear_cache_clears_campaigns_list: Verifies cache clearing - test_get_campaign_summary_uses_to_dict: Verifies refactored logic All changes tested and verified with manual integration tests. Related review: gemini-code-assist bot feedback

Resolved merge conflict by acknowledging that both refactorings #6 and #8 have been completed: - Task #6 (Diarizer Complex Method) - completed on main branch - Task #8 (Campaign Artifact Counting) - completed on this branch Changes: - Marked task #6 as completed with status and completion notes - Updated Summary Statistics: 2 of 10 completed (was 1 of 10) - Added actual effort tracking: ~4 hours total - Updated status line to list both completed tasks - Added "Next Priority" guidance for Phase 1 tasks This resolves the upcoming merge conflict when this branch is merged to main or when a PR is created.

…owpfACvL8iZudn6FP6Ews

gemini-code-assist Bot reviewed Nov 14, 2025

View reviewed changes

claude and others added 4 commits November 14, 2025 14:50

Merge branch 'main' into claude/extract-campaign-artifact-counter-01S…

4e377ce

…owpfACvL8iZudn6FP6Ews

Gambitnl merged commit 3390d70 into main Nov 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task3: Refactor artifact counting into dedicated class#33

Task3: Refactor artifact counting into dedicated class#33
Gambitnl merged 5 commits intomainfrom
claude/extract-campaign-artifact-counter-01SowpfACvL8iZudn6FP6Ews

Gambitnl commented Nov 14, 2025

Uh oh!

gemini-code-assist Bot commented Nov 14, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Nov 14, 2025

Uh oh!

gemini-code-assist Bot Nov 14, 2025

Uh oh!

gemini-code-assist Bot Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Gambitnl commented Nov 14, 2025

Uh oh!

gemini-code-assist Bot commented Nov 14, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants