Skip to content

Task3: Refactor artifact counting into dedicated class#33

Merged
Gambitnl merged 5 commits intomainfrom
claude/extract-campaign-artifact-counter-01SowpfACvL8iZudn6FP6Ews
Nov 14, 2025
Merged

Task3: Refactor artifact counting into dedicated class#33
Gambitnl merged 5 commits intomainfrom
claude/extract-campaign-artifact-counter-01SowpfACvL8iZudn6FP6Ews

Conversation

@Gambitnl
Copy link
Copy Markdown
Owner

Enhanced the CampaignArtifactCounter class to align with refactoring documentation specifications. The core extraction was already complete, but added additional useful features for better usability.

Changes to src/artifact_counter.py:

  • Enhanced ArtifactCounts dataclass with session_ids and narrative_paths lists
  • Added session_count and narrative_count properties for backward compatibility
  • Added total_artifacts computed property
  • Enhanced to_dict() to include new fields
  • Updated _count_session_artifacts to track session IDs and narrative paths
  • Added count_sessions() convenience method
  • Added count_narratives() convenience method
  • Added get_all_campaigns() to list all campaigns with artifacts
  • Added get_campaign_summary() for detailed campaign information

Changes to tests/test_artifact_counter.py:

  • Added tests for new properties (session_count, narrative_count, total_artifacts)
  • Added test for session_ids and narrative_paths tracking
  • Added tests for count_sessions() convenience method
  • Added tests for count_narratives() convenience method
  • Added tests for get_all_campaigns() method
  • Added tests for get_campaign_summary() method
  • Added tests to verify session IDs and narrative paths are tracked correctly

All changes maintain backward compatibility with existing code. The original implementation already had proper error handling, caching, and testability as required by the refactoring task.

Related: docs/refactoring/08-campaign-artifact-counting.md

Enhanced the CampaignArtifactCounter class to align with refactoring
documentation specifications. The core extraction was already complete,
but added additional useful features for better usability.

Changes to src/artifact_counter.py:
- Enhanced ArtifactCounts dataclass with session_ids and narrative_paths lists
- Added session_count and narrative_count properties for backward compatibility
- Added total_artifacts computed property
- Enhanced to_dict() to include new fields
- Updated _count_session_artifacts to track session IDs and narrative paths
- Added count_sessions() convenience method
- Added count_narratives() convenience method
- Added get_all_campaigns() to list all campaigns with artifacts
- Added get_campaign_summary() for detailed campaign information

Changes to tests/test_artifact_counter.py:
- Added tests for new properties (session_count, narrative_count, total_artifacts)
- Added test for session_ids and narrative_paths tracking
- Added tests for count_sessions() convenience method
- Added tests for count_narratives() convenience method
- Added tests for get_all_campaigns() method
- Added tests for get_campaign_summary() method
- Added tests to verify session IDs and narrative paths are tracked correctly

All changes maintain backward compatibility with existing code.
The original implementation already had proper error handling,
caching, and testability as required by the refactoring task.

Related: docs/refactoring/08-campaign-artifact-counting.md
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @Gambitnl, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors and significantly enhances the CampaignArtifactCounter class, aligning it with specified refactoring documentation. The core changes involve enriching the ArtifactCounts dataclass with detailed tracking for individual session IDs and narrative paths, introducing convenience properties and methods for easier access to artifact counts, and providing new functionalities to list all campaigns and retrieve comprehensive summaries. These updates improve the usability and data granularity of the artifact counting mechanism while maintaining full backward compatibility.

Highlights

  • Enhanced ArtifactCounts Dataclass: The ArtifactCounts dataclass now includes session_ids (list of strings) and narrative_paths (list of Path objects) for more granular tracking of individual artifacts.
  • New Convenience Properties: Added session_count, narrative_count (for backward compatibility), and total_artifacts properties to the ArtifactCounts dataclass, providing easier access to artifact counts.
  • Extended Serialization: The to_dict() method in ArtifactCounts has been updated to include the newly added session_ids, narrative_paths, and total_artifacts fields in its serialized output.
  • Improved Artifact Tracking Logic: The internal _count_session_artifacts function now actively tracks and populates individual session IDs and narrative file paths during the artifact counting process.
  • New Public Methods for CampaignArtifactCounter: Introduced count_sessions(), count_narratives() for specific counts, get_all_campaigns() to list all campaigns with artifacts, and get_campaign_summary() for a comprehensive overview of a campaign's artifacts.
  • Comprehensive Test Coverage: New tests have been added to test_artifact_counter.py to validate all new properties and methods, ensuring the correctness and backward compatibility of the enhanced functionality.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully refactors the artifact counting logic into a dedicated CampaignArtifactCounter class, which significantly improves modularity, testability, and maintainability. The addition of new features like tracking session IDs and narrative paths, along with convenience methods, enhances the utility of the class. The tests are comprehensive and cover the new functionality well. I've included a few suggestions to further improve code quality by reducing duplication and addressing a potential performance concern.

Comment thread src/artifact_counter.py Outdated
Comment on lines +320 to +350
def get_all_campaigns(self) -> List[str]:
"""
Get list of all campaigns that have artifacts.

Returns:
Sorted list of campaign IDs found in the output directory
"""
campaigns = set()

if not self.output_dir.exists():
self.logger.warning(f"Output directory not found: {self.output_dir}")
return []

try:
data_files = list(self.output_dir.glob("**/*_data.json"))
except Exception as e:
self.logger.error(f"Failed to glob data files: {e}")
return []

for data_path in data_files:
try:
payload = json.loads(data_path.read_text(encoding="utf-8"))
metadata = payload.get("metadata") or {}
campaign_id = metadata.get("campaign_id")
if campaign_id:
campaigns.add(campaign_id)
except Exception as e:
self.logger.debug(f"Skipping {data_path.name}: {e}")
continue

return sorted(campaigns)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This method performs a file system scan (glob) every time it is called. As noted in the refactoring design document, this can be inefficient for directories with a large number of sessions. While the main count_artifacts method is cached, this discovery method is not. For applications that call this frequently (e.g., to populate a UI dropdown), this could become a performance bottleneck.

Consider adding a caching mechanism to this method, similar to the one used in count_artifacts. A short-lived cache would improve performance for repeated calls without returning overly stale data. You could implement this using a separate cache dictionary and a lock to ensure thread safety.

Comment thread src/artifact_counter.py Outdated
Comment on lines +365 to +375
return {
"campaign_id": campaign_id,
"session_count": counts.sessions,
"narrative_count": counts.narratives,
"total_artifacts": counts.total_artifacts,
"session_ids": counts.session_ids,
"narrative_paths": [str(p) for p in counts.narrative_paths],
"error_count": len(counts.errors),
"errors": counts.errors,
"last_updated": counts.last_updated.isoformat() if counts.last_updated else None
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The dictionary being constructed here has significant overlap with the output of ArtifactCounts.to_dict(). This creates code duplication, as much of the logic for serialization is repeated. To improve maintainability and adhere to the DRY (Don't Repeat Yourself) principle, this method should leverage counts.to_dict() and then augment the result.

By centralizing the dictionary creation logic within to_dict(), any future changes to the ArtifactCounts data structure will only need to be updated in one place.

        summary = counts.to_dict()
        summary["campaign_id"] = campaign_id
        summary["errors"] = counts.errors
        summary["session_count"] = summary.pop("sessions")
        summary["narrative_count"] = summary.pop("narratives")
        return summary

Comment thread tests/test_artifact_counter.py Outdated

assert len(counts.narrative_paths) == 3
# Check that all paths are Path objects
from pathlib import Path
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This from pathlib import Path statement is redundant. Path is already imported at the top of the file (line 6). According to PEP 8, imports should be at the top of the file to improve readability and avoid unnecessary re-imports. This line can be safely removed.

claude and others added 4 commits November 14, 2025 14:50
Updated documentation to reflect completion of campaign artifact
counter extraction and enhancement:

Changes to docs/refactoring/08-campaign-artifact-counting.md:
- Added completion status banner at the top
- Added comprehensive Implementation Notes section documenting:
  - What was already complete (core extraction)
  - Enhancements added (convenience methods, query methods, tracking)
  - Usage examples for all new features
  - Actual vs estimated effort (2 hours vs 9-13 hours)
  - Success criteria validation
  - Benefits realized
  - Integration details
  - Files modified
  - Lessons learned

Changes to docs/refactoring/README.md:
- Marked task #8 as completed with checkmark
- Added completion date and status
- Added completion notes summarizing enhancements
- Updated summary statistics to show 1 of 10 completed
- Updated "Last Updated" date to 2025-11-14

The documentation now provides:
- Clear completion status for tracking progress
- Detailed implementation notes for future reference
- Usage examples for developers
- Lessons learned for future refactorings
Implemented three improvements based on Gemini code review:

1. Performance: Added caching to get_all_campaigns()
   - Previously performed filesystem scan on every call
   - Now uses double-checked locking pattern with TTL-based cache
   - Added use_cache parameter (default: True) for cache control
   - Prevents performance bottlenecks when called frequently (e.g., UI dropdowns)

2. Code duplication: Refactored get_campaign_summary() to use to_dict()
   - Eliminated duplicate dictionary construction logic
   - Now leverages ArtifactCounts.to_dict() as base
   - Augments with campaign_id and errors fields
   - Renames keys for API consistency (sessions -> session_count, etc.)
   - Follows DRY principle for easier maintenance

3. Code quality: Removed redundant import in tests
   - Removed inline 'from pathlib import Path' in test_narrative_paths_tracked
   - Path already imported at module level (line 6)
   - Improves readability per PEP 8 guidelines

Additional improvements:
- Updated clear_cache() to also clear campaigns list cache
- Enhanced get_cache_stats() to include campaigns_list_cached status
- Added comprehensive tests for new caching behavior:
  - test_get_all_campaigns_caching: Verifies cache works correctly
  - test_clear_cache_clears_campaigns_list: Verifies cache clearing
  - test_get_campaign_summary_uses_to_dict: Verifies refactored logic

All changes tested and verified with manual integration tests.

Related review: gemini-code-assist bot feedback
Resolved merge conflict by acknowledging that both refactorings #6
and #8 have been completed:

- Task #6 (Diarizer Complex Method) - completed on main branch
- Task #8 (Campaign Artifact Counting) - completed on this branch

Changes:
- Marked task #6 as completed with status and completion notes
- Updated Summary Statistics: 2 of 10 completed (was 1 of 10)
- Added actual effort tracking: ~4 hours total
- Updated status line to list both completed tasks
- Added "Next Priority" guidance for Phase 1 tasks

This resolves the upcoming merge conflict when this branch is
merged to main or when a PR is created.
@Gambitnl Gambitnl merged commit 3390d70 into main Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants