Skip to content

Conversation

miroslavpojer
Copy link
Collaborator

@miroslavpojer miroslavpojer commented Sep 30, 2025

Release Notes:

  • One by one logic for getting sub-issues replaced by bulk GraphQL.

Summary by CodeRabbit

  • New Features

    • Hierarchy mode now auto-discovers and includes sub-issues (including across repos), nesting them under parents for more complete changelogs.
    • Added an example integration script to scan sub-issues (requires a GITHUB_TOKEN).
  • Refactor

    • Internal collections moved to stable mappings for issues/PRs/commits, improving deduplication and accuracy.
    • Expanded missing-sub-issue discovery and fetching to produce more complete outputs.
  • Tests

    • Test suite updated for hierarchy behavior and new data shapes.
  • Style

    • Linting thresholds relaxed for complex classes/blocks.

Fixed #176

@miroslavpojer miroslavpojer self-assigned this Sep 30, 2025
Copy link

coderabbitai bot commented Sep 30, 2025

Walkthrough

Converts internal collections to dict mappings keyed by items, adds a bulk GraphQL sub-issue collector and mining expansion to fetch missing sub-issues, introduces stable ID utilities, updates record factories to use explicit IDs, integrates hierarchy expansion into the generator, updates tests to the new shapes, and tweaks lint thresholds.

Changes

Cohort / File(s) Summary
Lint config
./.pylintrc
Increased design thresholds: max-attributes 7→10; max-nested-blocks 5→6.
Data model
release_notes_generator/model/mined_data.py
Added release and parents_sub_issues; switched issues, pull_requests, commits from lists to dict[item, Repository]; updated emptiness/logging.
Filter logic
release_notes_generator/data/filter.py
Reworked to produce/consume dict mappings (`Issue
Miner & sub-issue expansion
release_notes_generator/data/miner.py
Added mine_data, mine_missing_sub_issues, _scan_sub_issues_for_parents, _fetch_missing_issues_and_prs, _get_repository/_fetch_repository; moved issues into dict mappings and updated dedupe/filtering to use dicts and PR lists.
Bulk sub-issue collector
release_notes_generator/data/utils/bulk_sub_issue_collector.py, integration_test.py
New CollectorConfig and BulkSubIssueCollector implementing batched GraphQL queries, pagination, retries/backoff, mapping parents→sub-issues; integration test demonstrates iterative scanning and token handling.
ID utilities
release_notes_generator/utils/record_utils.py
New get_id, parse_issue_id, format_issue_id, memoized _entity_id, and IssueIdParseError for stable ID handling and parsing.
Generator integration
release_notes_generator/generator.py
Updated imports; when hierarchy enabled, expands filtered data with miner.mine_missing_sub_issues before record creation.
Record factories
release_notes_generator/record/factory/*.py
Factories updated to accept/propagate explicit IDs via get_id, removed internal ID caching/get_repository helpers, updated signatures for registering commits/PRs/issues and simplified hierarchy handling using parents_sub_issues.
Tests migration
tests/conftest.py, tests/**/test_*.py
Tests updated to use dict-based mined data keyed by objects (via get_id), adjusted import/patch paths to release_notes_generator.data.*, and updated expectations for hierarchy and logging.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor CI as CI/Runner
  participant Gen as Generator
  participant Miner as DataMiner
  participant Coll as BulkSubIssueCollector
  participant GH as GitHub API

  CI->>Gen: generate(inputs)
  Gen->>Miner: mine_data()
  Miner->>GH: fetch repo, release, issues, PRs, commits
  Miner-->>Gen: MinedData (dict mappings)

  alt hierarchy enabled
    Gen->>Miner: mine_missing_sub_issues(data)
    Miner->>Coll: scan_sub_issues_for_parents(parents)
    loop until no new parents
      Coll->>GH: POST batched GraphQL queries (per-repo aliases)
      GH-->>Coll: aliased blocks with subIssues nodes & pageInfo
      Coll-->>Miner: updated parents_sub_issues, new parents list
    end
    Miner->>GH: fetch missing issues/PRs for discovered parents
    Miner-->>Gen: augmented MinedData (parents_sub_issues populated)
  end

  Gen->>Gen: build records (factories using explicit IDs)
  Gen-->>CI: release notes output
Loading
sequenceDiagram
  autonumber
  participant Coll as BulkSubIssueCollector
  participant GH as GitHub GraphQL

  Coll->>Coll: group parents by repo, prepare aliases & cursors
  loop per request cycle
    Coll->>GH: POST batched GraphQL payload
    GH-->>Coll: aliased repo blocks with nodes & pageInfo
    Coll->>Coll: parse alias nodes, append child issue ids, mark new parents
    Coll->>Coll: backoff/retry on errors, sleep(pacing)
  end
  Coll-->>Caller: sorted unique list of new parent IDs
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • Zejnilovic
  • benedeki
  • lsulak

Poem

I hop through graphs in batches, light and spry,
Finding children where their parent-links lie.
With tidy IDs and mappings neat in store,
I stitch hierarchies and fetch a little more.
🥕 Hooray — release notes bloom, tidy and pure.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes Check ⚠️ Warning This pull request includes extensive refactoring unrelated to the bulk sub-issue feature—such as updating the Pylint configuration, converting numerous list-based collections to dicts in filter and model modules, overhauling record factory signatures, adding new record utilities, and modifying multiple test suites—which goes beyond the scope of replacing one-by-one sub-issue logic. Please extract or defer unrelated refactors into separate pull requests and restrict this PR to introducing and integrating the bulk GraphQL sub-issue retrieval logic.
Description Check ⚠️ Warning The pull request description does not follow the repository’s required template because it omits the “## Checklist” section with checklist items and the “## Additional Context” section, and it uses “Fixed #176” instead of the template-specified “Fixes #176” syntax for issue linkage. Please update the description to include the “## Checklist” header with the mandatory checkbox tasks, an “## Additional Context” section if needed, and use “Fixes #176” (not “Fixed”) to correctly link the issue per the template.
✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title accurately conveys the primary change of the pull request—introducing bulk logic for fetching sub-issues—and directly maps to the objective of replacing one-by-one sub-issue retrieval with a batch GraphQL solution.
Linked Issues Check ✅ Passed The implementation adds the BulkSubIssueCollector with GraphQL batching, integrates it into the DataMiner via mine_missing_sub_issues, and updates the generator to invoke this bulk workflow when hierarchy support is enabled, thereby replacing the one-by-one sub-issue API calls as specified in issue #176.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/176-Introduce-bulk-logic-for-getting-sub-issues

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
release_notes_generator/record/factory/issue_hierarchy_record_factory.py (1)

206-218: Fix undefined variable reference.

Line 211 references id which is not defined in this scope. This will raise a NameError at runtime. The correct variable name is iid.

Apply this diff:

         skip_record = any(item in issue_labels for item in ActionInputs.get_skip_release_notes_labels())
-        logger.debug("Created record for sub issue %s: %s", id, issue.title)
+        logger.debug("Created record for sub issue %s: %s", iid, issue.title)
         self.__registered_issues.add(iid)
🧹 Nitpick comments (13)
release_notes_generator/data/utils/bulk_sub_issue_collector.py (1)

196-223: Improve error handling patterns.

The retry logic is functional but has minor style issues flagged by static analysis:

  1. Line 210: Use data.get("errors") instead of checking key existence first (RUF019)
  2. Line 212: The long error message string should be defined as a constant or exception class (TRY003)
  3. Line 214: The return statement after successful request could be in an else block for clarity (TRY300)

These are not critical but would improve code quality.

Apply these improvements:

                 resp.raise_for_status()
                 data = resp.json()
-                if "errors" in data and data["errors"]:
+                if data.get("errors"):
                     logger.error("GraphQL errors: %s", data["errors"])
                     raise RuntimeError(f"GitHub GraphQL errors: {data['errors']}")
-                logger.debug("Posted graphql query")
-                return data
+                else:
+                    logger.debug("Posted graphql query")
+                    return data
             except Exception as e:
                 logger.exception("GraphQL query failed")
                 last_exc = e
tests/release_notes/data/test_filter.py (1)

85-87: Consider using next(iter(...)) for accessing first dictionary item.

The current pattern list(filtered_data.issues.keys())[0] works but is less idiomatic than next(iter(filtered_data.issues.keys())).

Apply this diff:

-    assert list(filtered_data.issues.keys())[0].closed_at == datetime(2023, 1, 2)
-    assert list(filtered_data.pull_requests.keys())[0].merged_at == datetime(2023, 2, 3)
-    assert list(filtered_data.commits.keys())[0].commit.author.date == datetime(2024, 1, 4)
+    assert next(iter(filtered_data.issues.keys())).closed_at == datetime(2023, 1, 2)
+    assert next(iter(filtered_data.pull_requests.keys())).merged_at == datetime(2023, 2, 3)
+    assert next(iter(filtered_data.commits.keys())).commit.author.date == datetime(2024, 1, 4)
integration_test.py (1)

12-14: Optional: Consider defining a custom exception class.

The static analysis tool suggests avoiding long messages in generic exception classes. For a demo script this is fine, but if this evolves into production code, consider a custom exception.

tests/conftest.py (3)

628-640: Consider removing commented code.

The fixture now correctly uses data.parents_sub_issues to define parent-child relationships instead of the old get_sub_issues approach. However, the commented-out line 637 (# hi_two_sub_issues_no_prs.get_sub_issues.return_value = [sub_issue_1, sub_issue_2]) could be removed if it's no longer needed for reference.

If the commented code is intended as documentation for the migration, consider adding a brief comment explaining the old vs. new approach. Otherwise, remove it to reduce clutter:

-    # hi_two_sub_issues_no_prs.get_sub_issues.return_value = [sub_issue_1, sub_issue_2]
     data.parents_sub_issues[si1 := get_id(sub_issue_1, mock_repo)] = []
     data.parents_sub_issues[si2 := get_id(sub_issue_2, mock_repo)] = []
     data.parents_sub_issues[get_id(hi_two_sub_issues_no_prs, mock_repo)] = [si1, si2]

661-664: Multiple commented lines suggest incomplete cleanup.

Similar to the earlier comment, lines 661, 688, 719-720 contain commented-out get_sub_issues calls. For maintainability, consider either:

  1. Removing these comments if they're no longer needed
  2. Adding a single explanatory comment at the top of the fixture explaining the migration from get_sub_issues to parents_sub_issues

Also applies to: 688-691, 719-724


770-792: Consider extracting issue keys earlier for readability.

The pattern iks = list(data.issues.keys()) followed by indexed access (iks[0], iks[1], etc.) works correctly but is somewhat verbose. This is acceptable for test fixtures, but consider whether named variables for specific issues would improve readability in complex scenarios.

Example refactor (optional):

iks = list(data.issues.keys())
solo_closed_issue = iks[0]
hi_301 = iks[1]
hi_302 = iks[2]
# ... etc

solo_closed_issue.get_labels.return_value = [l_enh]
hi_301.get_labels.return_value = [l_epic]

However, the current approach is fine for test fixtures where index-based access maintains alignment with the construction order.

release_notes_generator/utils/record_utils.py (2)

48-57: Consider merging duplicate helper functions.

Both _issue_id and _pr_id have identical implementations. Since issues and PRs use the same ID format (org/repo#number), consider merging them into a single memoized helper function.

Apply this diff to merge the duplicate functions:

 @lru_cache(maxsize=2048)
-def _issue_id(repo_full_name: str, number: int) -> str:
-    """Format 'org/repo#123' from components."""
+def _entity_id(repo_full_name: str, number: int) -> str:
+    """Format 'org/repo#123' from components (for Issue or PullRequest)."""
     return f"{repo_full_name}#{number}"
-
-
-@lru_cache(maxsize=2048)
-def _pr_id(repo_full_name: str, number: int) -> str:
-    """Format 'org/repo#123' from components."""
-    return f"{repo_full_name}#{number}"

Then update the callers:

     if isinstance(obj, Issue):
         issue = cast(Issue, obj)
-        return _issue_id(repository.full_name, issue.number)
+        return _entity_id(repository.full_name, issue.number)
     elif isinstance(obj, PullRequest):
         pr = cast(PullRequest, obj)
-        return _pr_id(repository.full_name, pr.number)
+        return _entity_id(repository.full_name, pr.number)

60-68: Consider the static analysis hint about exception messages.

The static analysis tool (Ruff TRY003) suggests avoiding long messages outside the exception class. While the current implementation is acceptable, you could consider defining the message format in the exception class for better maintainability.

Optional refactor to address the static analysis hint:

 class IssueIdParseError(ValueError):
     """Raised when an issue ID cannot be parsed."""

-    pass
+    def __init__(self, issue_id: str):
+        super().__init__(f"Invalid issue id '{issue_id}', expected 'org/repo#number'")
+        self.issue_id = issue_id

Then update the caller:

     m = ISSUE_ID_RE.match(issue_id.strip())
     if not m:
-        raise IssueIdParseError(f"Invalid issue id '{issue_id}', expected 'org/repo#number'")
+        raise IssueIdParseError(issue_id)
     return m.group("org"), m.group("repo"), int(m.group("num"))

However, the current implementation is clear and sufficient for this use case.

release_notes_generator/record/factory/default_record_factory.py (2)

59-71: Remove commented-out code.

The commented-out get_repository method should be deleted entirely rather than left as a comment block. Version control preserves the history if needed.

Apply this diff:

-    # TODO - this should not be needed now - delete
-    # def get_repository(self, full_name: str) -> Optional[Repository]:
-    #     """
-    #     Retrieves the specified GitHub repository.
-    #
-    #     Returns:
-    #         Optional[Repository]: The GitHub repository if found, None otherwise.
-    #     """
-    #     repo: Optional[Repository] = self._safe_call(self._github.get_repo)(full_name)
-    #     if repo is None:
-    #         logger.error("Repository not found: %s", full_name)
-    #         return None
-    #     return repo
-

108-115: Simplify redundant repository lookup.

Lines 111-112 check data.get_repository(pi_repo_name) again after line 108 already retrieved and stored parent_repository. The conditional should check parent_repository directly.

Apply this diff:

                    parent_repository = data.get_repository(pi_repo_name)
                    if parent_repository is not None:
                        # cache for subsequent lookups
-                       if data.get_repository(pi_repo_name) is None:
-                           data.add_repository(parent_repository)
                        parent_issue = self._safe_call(parent_repository.get_issue)(pi_number)
                    else:
                        parent_issue = None
release_notes_generator/data/miner.py (2)

24-24: Remove unnecessary import.

In Python 3, list is a built-in type and doesn't need to be imported from builtins. This import is redundant.

Apply this diff:

-from builtins import list
 from typing import Optional

168-179: Eliminate duplicate repository fetching logic.

The _get_repository method (lines 168-179) is identical to the public get_repository method (lines 194-205). This is unnecessary duplication.

Remove _get_repository and update its caller at line 147 to use the public get_repository method instead:

-            if data.get_repository(f"{org}/{repo}") is None:
-                new_repo = self._get_repository(f"{org}/{repo}")
+            if data.get_repository(f"{org}/{repo}") is None:
+                new_repo = self.get_repository(f"{org}/{repo}")
                 if new_repo is not None:

Then remove the _get_repository method entirely:

-    def _get_repository(self, full_name: str) -> Optional[Repository]:
-        """
-        Retrieves the specified GitHub repository.
-
-        Returns:
-            Optional[Repository]: The GitHub repository if found, None otherwise.
-        """
-        repo: Optional[Repository] = self._safe_call(self.github_instance.get_repo)(full_name)
-        if repo is None:
-            logger.error("Repository not found: %s", full_name)
-            return None
-        return repo
-
release_notes_generator/record/factory/issue_hierarchy_record_factory.py (1)

220-260: Use shallow copy instead of deepcopy for string list.

Line 222 uses deepcopy for a list of strings, but since strings are immutable, a shallow copy is sufficient and more efficient.

Apply this diff:

-from copy import deepcopy
+from copy import copy

And update line 222:

     def _re_register_hierarchy_issues(self, sub_issues_ids: list[str], sub_issue_parents: dict[str, str]):
         logger.debug("Re-registering hierarchy issues ...")
-        reduced_sub_issue_ids: list[str] = deepcopy(sub_issues_ids)
+        reduced_sub_issue_ids: list[str] = copy(sub_issues_ids)

Or use list slicing for a more Pythonic approach:

-        reduced_sub_issue_ids: list[str] = deepcopy(sub_issues_ids)
+        reduced_sub_issue_ids: list[str] = sub_issues_ids[:]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 82ea74b and 569c279.

📒 Files selected for processing (16)
  • .pylintrc (2 hunks)
  • integration_test.py (1 hunks)
  • release_notes_generator/data/filter.py (5 hunks)
  • release_notes_generator/data/miner.py (6 hunks)
  • release_notes_generator/data/utils/bulk_sub_issue_collector.py (1 hunks)
  • release_notes_generator/generator.py (2 hunks)
  • release_notes_generator/model/mined_data.py (3 hunks)
  • release_notes_generator/record/factory/default_record_factory.py (10 hunks)
  • release_notes_generator/record/factory/issue_hierarchy_record_factory.py (8 hunks)
  • release_notes_generator/utils/record_utils.py (1 hunks)
  • tests/conftest.py (10 hunks)
  • tests/release_notes/data/test_filter.py (4 hunks)
  • tests/release_notes/data/test_miner.py (7 hunks)
  • tests/release_notes/record/factory/test_default_record_factory.py (7 hunks)
  • tests/release_notes/record/factory/test_issue_hierarchy_record_factory.py (4 hunks)
  • tests/test_release_notes_generator.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-28T14:13:38.979Z
Learnt from: miroslavpojer
PR: AbsaOSS/generate-release-notes#173
File: release_notes_generator/record/factory/default_record_factory.py:109-126
Timestamp: 2025-09-28T14:13:38.979Z
Learning: In release_notes_generator/record/factory/default_record_factory.py, there is a cross-repo fetch bug where parent_issue_id in format "owner/repo#number" is fetched from the wrong repository because the code only queries data.repository instead of parsing the owner/repo from the ID and using the GitHub client to fetch from the correct repository.

Applied to files:

  • release_notes_generator/record/factory/default_record_factory.py
  • release_notes_generator/record/factory/issue_hierarchy_record_factory.py
🧬 Code graph analysis (15)
tests/release_notes/data/test_miner.py (2)
release_notes_generator/data/miner.py (1)
  • DataMiner (46-348)
tests/conftest.py (2)
  • mock_repo (96-99)
  • mock_git_releases (111-120)
release_notes_generator/utils/record_utils.py (2)
release_notes_generator/model/issue_record.py (1)
  • issue (63-68)
release_notes_generator/model/commit_record.py (1)
  • commit (45-50)
tests/release_notes/data/test_filter.py (3)
release_notes_generator/data/filter.py (3)
  • filter (38-45)
  • filter (56-118)
  • FilterByRelease (48-170)
release_notes_generator/model/mined_data.py (2)
  • MinedData (38-79)
  • home_repository (55-57)
tests/conftest.py (1)
  • mock_repo (96-99)
tests/release_notes/record/factory/test_default_record_factory.py (1)
tests/conftest.py (1)
  • mock_repo (96-99)
release_notes_generator/record/factory/default_record_factory.py (3)
release_notes_generator/utils/record_utils.py (1)
  • get_id (26-45)
release_notes_generator/model/mined_data.py (1)
  • get_repository (65-70)
release_notes_generator/record/factory/issue_hierarchy_record_factory.py (1)
  • _create_record_for_issue (199-204)
release_notes_generator/model/mined_data.py (1)
release_notes_generator/data/miner.py (1)
  • get_repository (194-205)
release_notes_generator/generator.py (3)
release_notes_generator/data/filter.py (3)
  • filter (38-45)
  • filter (56-118)
  • FilterByRelease (48-170)
release_notes_generator/data/miner.py (1)
  • mine_missing_sub_issues (91-103)
release_notes_generator/action_inputs.py (2)
  • ActionInputs (64-513)
  • get_hierarchy (172-177)
tests/conftest.py (1)
release_notes_generator/utils/record_utils.py (1)
  • get_id (26-45)
tests/release_notes/record/factory/test_issue_hierarchy_record_factory.py (2)
release_notes_generator/model/hierarchy_issue_record.py (5)
  • sub_issues (47-51)
  • pull_requests_count (60-75)
  • level (30-34)
  • level (37-44)
  • sub_hierarchy_issues (54-58)
release_notes_generator/model/issue_record.py (1)
  • pull_requests_count (223-229)
release_notes_generator/data/utils/bulk_sub_issue_collector.py (1)
release_notes_generator/utils/record_utils.py (2)
  • parse_issue_id (60-68)
  • format_issue_id (71-73)
release_notes_generator/data/filter.py (1)
release_notes_generator/model/mined_data.py (1)
  • MinedData (38-79)
release_notes_generator/record/factory/issue_hierarchy_record_factory.py (3)
release_notes_generator/utils/record_utils.py (2)
  • get_id (26-45)
  • parse_issue_id (60-68)
release_notes_generator/record/factory/default_record_factory.py (1)
  • _create_record_for_issue (195-212)
release_notes_generator/model/mined_data.py (1)
  • MinedData (38-79)
tests/test_release_notes_generator.py (1)
tests/conftest.py (1)
  • mock_git_release (104-107)
integration_test.py (1)
release_notes_generator/data/utils/bulk_sub_issue_collector.py (3)
  • CollectorConfig (21-41)
  • BulkSubIssueCollector (44-233)
  • scan_sub_issues_for_parents (67-192)
release_notes_generator/data/miner.py (4)
release_notes_generator/action_inputs.py (3)
  • ActionInputs (64-513)
  • get_github_repository (113-117)
  • get_github_token (120-124)
release_notes_generator/data/utils/bulk_sub_issue_collector.py (3)
  • CollectorConfig (21-41)
  • BulkSubIssueCollector (44-233)
  • scan_sub_issues_for_parents (67-192)
release_notes_generator/model/mined_data.py (4)
  • MinedData (38-79)
  • get_repository (65-70)
  • home_repository (55-57)
  • add_repository (59-63)
release_notes_generator/utils/record_utils.py (2)
  • get_id (26-45)
  • parse_issue_id (60-68)
🪛 Ruff (0.13.1)
release_notes_generator/utils/record_utils.py

67-67: Avoid specifying long messages outside the exception class

(TRY003)

tests/release_notes/data/test_filter.py

85-85: Prefer next(iter(filtered_data.issues.keys())) over single element slice

Replace with next(iter(filtered_data.issues.keys()))

(RUF015)


86-86: Prefer next(iter(filtered_data.pull_requests.keys())) over single element slice

Replace with next(iter(filtered_data.pull_requests.keys()))

(RUF015)


87-87: Prefer next(iter(filtered_data.commits.keys())) over single element slice

Replace with next(iter(filtered_data.commits.keys()))

(RUF015)

release_notes_generator/data/utils/bulk_sub_issue_collector.py

210-210: Unnecessary key check before dictionary access

Replace with dict.get

(RUF019)


212-212: Abstract raise to an inner function

(TRY301)


212-212: Avoid specifying long messages outside the exception class

(TRY003)


214-214: Consider moving this statement to an else block

(TRY300)


223-223: Avoid specifying long messages outside the exception class

(TRY003)

integration_test.py

14-14: Avoid specifying long messages outside the exception class

(TRY003)

release_notes_generator/data/miner.py

62-62: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (38)
.pylintrc (1)

301-301: LGTM! Pylint threshold adjustments align with PR scope.

The increases to max-attributes (7→10) and max-nested-blocks (5→6) are reasonable given the new data structures (dict-based mappings, BulkSubIssueCollector with multiple config fields) introduced in this PR.

Also applies to: 473-473

release_notes_generator/data/filter.py (3)

23-25: LGTM! Explicit imports added for type hints.

The explicit imports for Issue, PullRequest, and Repository properly support the new dict-based type signatures introduced in this module.


75-99: LGTM! Refactored to dict-based data structures.

The filtering logic now correctly constructs dict[PullRequest, Repository] and dict[Commit, Repository] mappings. The timestamp-based filtering logic is preserved, and deduplication via pulls_seen set prevents duplicate entries.


120-170: LGTM! Updated return types and implementations for dict-based filtering.

All three issue filtering methods now correctly return dict[Issue, Repository] and use dict comprehensions to filter based on timestamp and state criteria. The logic is consistent with the original filtering behavior.

release_notes_generator/data/utils/bulk_sub_issue_collector.py (4)

20-42: LGTM! Well-structured configuration dataclass.

The CollectorConfig provides sensible defaults for API interaction, retry/backoff, pagination, and pacing. Using a frozen dataclass ensures immutability and clear documentation of configuration options.


50-64: LGTM! Clean initialization with optional dependency injection.

The constructor properly accepts optional CollectorConfig and Session parameters, enabling testability and customization while providing sensible defaults.


225-233: LGTM! Simple and effective alias lookup.

The helper method correctly searches through repository blocks to find the issue node matching the given alias.


67-192: Pagination and deduplication logic verified; no further action needed.

tests/test_release_notes_generator.py (1)

117-117: LGTM! Patch paths updated for module reorganization.

The mock patch targets have been correctly updated from release_notes_generator.miner.DataMiner.get_latest_release to release_notes_generator.data.miner.DataMiner.get_latest_release to reflect the new module structure.

Also applies to: 172-172

tests/release_notes/record/factory/test_issue_hierarchy_record_factory.py (1)

80-80: All hierarchy issue fixtures correctly contain two sub-issues. No updates required.

tests/release_notes/record/factory/test_default_record_factory.py (1)

190-195: LGTM! Test data migration to dictionary-based structures is correct.

The test fixtures have been successfully updated to use the new dictionary-based data model (dict[Issue, Repository], dict[PullRequest, Repository], dict[Commit, Repository]). The pattern is consistent throughout: each mocked GitHub object is mapped to mock_repo, and empty collections use {} instead of [].

Also applies to: 233-235, 275-276, 284-284, 309-310, 318-318, 353-356, 393-396, 424-426

tests/release_notes/data/test_filter.py (1)

23-30: LGTM! Test updated for dictionary-based data structures.

The test correctly migrates to the new dict-based model and updates logger patch paths to match the relocated filter module.

Also applies to: 64-75

release_notes_generator/generator.py (2)

29-30: LGTM! Import paths updated correctly.

The imports have been updated to reflect the new module organization (data.filter and data.miner).


93-95: LGTM! Hierarchy expansion step correctly integrated.

The conditional expansion of issues using mine_missing_sub_issues is correctly placed after release filtering and before record generation. The guard ensures this only runs when hierarchy support is enabled.

integration_test.py (2)

10-10: Verify that disabling SSL warnings is intentional for demo purposes only.

This script disables SSL warnings, which should only be done in controlled testing environments, never in production code.


21-28: LGTM! Iterative sub-issue collection logic is clear.

The loop demonstrates the intended usage pattern: repeatedly scan for new parents until none are found, printing progress at each iteration.

release_notes_generator/model/mined_data.py (4)

45-52: LGTM! Data model successfully migrated to dictionary-based structures.

The migration from list-based to dictionary-based storage (dict[Issue, Repository], etc.) is well-designed and supports:

  • Cross-repository issue/PR/commit tracking
  • Sub-issue hierarchy via parents_sub_issues mapping
  • Efficient lookups and deduplication

63-63: LGTM! Added helpful debug logging.

The debug log provides visibility into repository additions during data mining.


65-70: LGTM! Clear docstring and implementation.

The updated docstring accurately describes the behavior, and the implementation correctly returns None for missing repositories.


79-79: LGTM! is_empty correctly checks dictionary truthiness.

The method correctly determines emptiness by checking if any of the dictionary collections have items.

tests/release_notes/data/test_miner.py (3)

27-27: LGTM - Import path correctly updated.

The import path change from release_notes_generator.miner to release_notes_generator.data.miner aligns with the module reorganization described in the PR objectives.


44-55: LGTM - Data structure migration to dict-based model.

The transition from list-based to dictionary-based storage (with mock objects as keys and mock_repo as values) correctly implements the new data model. This aligns with the broader refactoring to support bulk sub-issue collection and explicit ID handling.


60-61: LGTM - Logger patch paths consistently updated.

All logger mock patch targets have been correctly updated to reflect the new module path release_notes_generator.data.miner.logger.*. This ensures tests continue to work with the reorganized module structure.

Also applies to: 87-88, 116-116, 138-138, 163-163, 190-191

tests/conftest.py (2)

40-40: LGTM - Required import for ID utilities.

The import of get_id from release_notes_generator.utils.record_utils is necessary to support the new dictionary-based data model where stable IDs are used as keys.


737-750: LGTM - Dictionary-based data structure correctly implemented.

The fixture correctly initializes data.issues, data.pull_requests, and data.commits as dictionaries with mock objects as keys and mock_repo as values. This aligns with the new data model supporting bulk operations and explicit repository tracking.

release_notes_generator/utils/record_utils.py (2)

26-45: LGTM - Robust type dispatching for ID generation.

The get_id function correctly handles Issue, PullRequest, and Commit objects with appropriate fallback to str(obj) for unknown types. The use of cast for type hinting is appropriate.


71-73: LGTM - Clean formatting function.

The format_issue_id function provides a clean inverse operation to parse_issue_id, enabling round-trip conversion of issue identifiers.

release_notes_generator/record/factory/default_record_factory.py (3)

82-82: LGTM!

The updated signature correctly accepts an explicit l_pid parameter, aligning with the new ID handling pattern.


169-193: LGTM!

The updated signature and implementation correctly use the explicit cid parameter for commit identification.


195-212: LGTM!

The updated signature correctly accepts an explicit iid parameter and the implementation properly uses it as the record key. Documentation is clear.

release_notes_generator/data/miner.py (4)

55-89: LGTM!

The mine_data method provides a clear orchestration of the data mining workflow with good separation of concerns. The ValueError message at line 62 is appropriately simple for this context.


105-125: Review TLS verification setting.

Line 115 sets verify_tls=False in the CollectorConfig, which disables SSL/TLS certificate verification. This could expose the system to man-in-the-middle attacks when communicating with the GitHub API.

Verify whether this is:

  1. A temporary setting for development/testing
  2. Required due to self-signed certificates in enterprise environments
  3. An oversight that should be corrected

If TLS verification should remain disabled, consider making it configurable via ActionInputs rather than hardcoded. Otherwise, remove this parameter to use the secure default (verify_tls=True).


247-295: LGTM!

The refactored _get_issues method correctly populates data.issues as a dictionary and properly handles deduplication by issue number.


323-348: LGTM!

The __filter_duplicated_issues method is correctly updated to work with the new dictionary-based data structures.

release_notes_generator/record/factory/issue_hierarchy_record_factory.py (4)

143-147: Ensure repository is not None before use.

Line 147 uses type: ignore[arg-type] to suppress a type error, indicating that r might be None. However, the code at line 145 already checks if r is not None, so within that block r is guaranteed to be non-None. The type checker might not be inferring this correctly.

Consider restructuring to make the type checker happy without suppressing errors:

                    org, repo, num = parse_issue_id(issue_id)
                    r = data.get_repository(f"{org}/{repo}")
-                   parent_issue = self._safe_call(r.get_issue)(num) if r is not None else None
                    if parent_issue is not None:
-                       self._create_record_for_issue(parent_issue, get_id(parent_issue, r))  # type: ignore[arg-type]
+                   if r is not None:
+                       parent_issue = self._safe_call(r.get_issue)(num)
+                       if parent_issue is not None:
+                           self._create_record_for_issue(parent_issue, get_id(parent_issue, r))

169-187: LGTM!

The updated signature correctly accepts an explicit iid parameter and uses it consistently for record registration.


199-204: LGTM!

The method correctly delegates to the parent class with the explicit iid parameter and maintains registration state.


220-260: LGTM!

The hierarchy re-registration logic correctly builds the parent-child relationships bottom-up with proper recursion termination. The tracking of made_progress prevents infinite loops on unresolvable hierarchies.

Comment on lines 16 to 17
# If you need to disable TLS verification (to mirror your example):
cfg = CollectorConfig(verify_tls=False)
Copy link

@coderabbitai coderabbitai bot Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

TLS verification is disabled—ensure this is for testing only.

Setting verify_tls=False bypasses certificate validation, which is insecure. Ensure this script is used only in controlled testing environments and never in production.

If this is intended only as a local demo, consider adding a comment or docstring warning:

+# WARNING: This script disables TLS verification for testing purposes only.
+# Never use this configuration in production environments.
 cfg = CollectorConfig(verify_tls=False)

🏁 Script executed:

#!/bin/bash
# Locate the integration test file and show lines around TLS verification disabling
fd integration_test.py -x rg -n -C2 "verify_tls" {} +

Length of output: 326


Warn: TLS verification disabled only for testing
Setting verify_tls=False bypasses certificate validation and must not be used in production.
Apply above cfg = CollectorConfig(verify_tls=False):

+# WARNING: TLS verification is disabled for testing purposes only.
+# Do not use this configuration in production.
 cfg = CollectorConfig(verify_tls=False)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# If you need to disable TLS verification (to mirror your example):
cfg = CollectorConfig(verify_tls=False)
# If you need to disable TLS verification (to mirror your example):
# WARNING: TLS verification is disabled for testing purposes only.
# Do not use this configuration in production.
cfg = CollectorConfig(verify_tls=False)
🤖 Prompt for AI Agents
In integration_test.py around lines 16-17, the test currently sets cfg =
CollectorConfig(verify_tls=False) which disables TLS verification; update the
code to ensure this is only used in test contexts by gating it behind a clear
test-only flag or environment variable (e.g., if os.getenv("TEST_DISABLE_TLS")
== "1"), add a prominent inline comment stating "DO NOT USE IN PRODUCTION — TLS
verification disabled for tests only", and ensure production/default code path
uses verify_tls=True; alternatively replace disabling TLS with a test fixture or
mocking of the network/TLS layer so certificate validation remains enabled in
real runs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miroslavpojer Thanks for adding the warning comment! 🎉 This helps ensure the TLS verification setting is clearly documented for future reference.


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

Comment on lines 67 to 106
logger.debug(
"Creation of records started..."
) # NEW: uz mam mnapovani, kdo je hierarchy, kdo je SubIssue a kdo je Issue
for issue, repo in data.issues.items():
iid = get_id(issue, repo)

issues_expansion = []
for issue in data.issues:
iid = self.get_id(issue)
if iid in self.__registered_issues:
continue
if len(data.parents_sub_issues[iid]) > 0:
# issue has sub-issues - it is either hierarchy issue or sub-hierarchy issue
self._create_record_for_hierarchy_issue(issue, iid)

if iid in self.__sub_issue_parents and iid not in self.__local_sub_issues_checked:
issues_expansion.extend(self._create_issue_record_using_sub_issues_existence(issue, data))
self.__local_sub_issues_checked.append(iid)
elif any(iid in sublist for sublist in data.parents_sub_issues.values()):
# issue has no sub-issues - it is sub-issue
self._create_record_for_sub_issue(issue, iid)

data.issues.extend(issues_expansion)

# Third register all external sub-issues
for ext_sub_issue in self.__external_sub_issues:
if self.get_id(ext_sub_issue) in self.__registered_issues:
continue

self._create_record_for_sub_issue(ext_sub_issue)

# Now register all issues without sub-issues
for issue in data.issues:
if self.get_id(issue) in self.__registered_issues:
continue

self._create_issue_record_using_sub_issues_not_existence(issue)
else:
# issue is not sub-issue and has no sub-issues - it is issue
self._create_record_for_issue(issue, iid)

# dev note: Each issue is now in records dict by its issue number - all on same level - no hierarchy
# This is useful for population by PRs and commits

logger.debug("Registering Commits to Pull Requests and Pull Requests to Issues...")
for pull in data.pull_requests:
self._register_pull_and_its_commits_to_issue(pull, data)
for pull, repo in data.pull_requests.items():
self._register_pull_and_its_commits_to_issue(pull, get_id(pull, repo), data)

logger.debug("Registering direct commits to records...")
for commit in data.commits:
for commit, repo in data.commits.items():
if commit.sha not in self.__registered_commits:
self._records[self.get_id(commit)] = CommitRecord(commit)
self._records[get_id(commit, repo)] = CommitRecord(commit)

# dev note: now we have all PRs and commits registered to issues or as stand-alone records
# let build hierarchy
logger.debug("Building issues hierarchy...")
self._re_register_hierarchy_issues()

self._re_register_hierarchy_issues(
sub_issues_ids=list({iid for sublist in data.parents_sub_issues.values() for iid in sublist}),
sub_issue_parents={
sub_issue: parent for parent, sublist in data.parents_sub_issues.items() for sub_issue in sublist
},
)
Copy link

@coderabbitai coderabbitai bot Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prevent potential KeyError.

Line 73 accesses data.parents_sub_issues[iid] directly, which will raise a KeyError if the issue ID is not present in the dictionary. Use .get() with a default value for safe access.

Apply this diff:

         for issue, repo in data.issues.items():
             iid = get_id(issue, repo)
 
-            if len(data.parents_sub_issues[iid]) > 0:
+            if len(data.parents_sub_issues.get(iid, [])) > 0:
                 # issue has sub-issues - it is either hierarchy issue or sub-hierarchy issue
                 self._create_record_for_hierarchy_issue(issue, iid)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.debug(
"Creation of records started..."
) # NEW: uz mam mnapovani, kdo je hierarchy, kdo je SubIssue a kdo je Issue
for issue, repo in data.issues.items():
iid = get_id(issue, repo)
issues_expansion = []
for issue in data.issues:
iid = self.get_id(issue)
if iid in self.__registered_issues:
continue
if len(data.parents_sub_issues[iid]) > 0:
# issue has sub-issues - it is either hierarchy issue or sub-hierarchy issue
self._create_record_for_hierarchy_issue(issue, iid)
if iid in self.__sub_issue_parents and iid not in self.__local_sub_issues_checked:
issues_expansion.extend(self._create_issue_record_using_sub_issues_existence(issue, data))
self.__local_sub_issues_checked.append(iid)
elif any(iid in sublist for sublist in data.parents_sub_issues.values()):
# issue has no sub-issues - it is sub-issue
self._create_record_for_sub_issue(issue, iid)
data.issues.extend(issues_expansion)
# Third register all external sub-issues
for ext_sub_issue in self.__external_sub_issues:
if self.get_id(ext_sub_issue) in self.__registered_issues:
continue
self._create_record_for_sub_issue(ext_sub_issue)
# Now register all issues without sub-issues
for issue in data.issues:
if self.get_id(issue) in self.__registered_issues:
continue
self._create_issue_record_using_sub_issues_not_existence(issue)
else:
# issue is not sub-issue and has no sub-issues - it is issue
self._create_record_for_issue(issue, iid)
# dev note: Each issue is now in records dict by its issue number - all on same level - no hierarchy
# This is useful for population by PRs and commits
logger.debug("Registering Commits to Pull Requests and Pull Requests to Issues...")
for pull in data.pull_requests:
self._register_pull_and_its_commits_to_issue(pull, data)
for pull, repo in data.pull_requests.items():
self._register_pull_and_its_commits_to_issue(pull, get_id(pull, repo), data)
logger.debug("Registering direct commits to records...")
for commit in data.commits:
for commit, repo in data.commits.items():
if commit.sha not in self.__registered_commits:
self._records[self.get_id(commit)] = CommitRecord(commit)
self._records[get_id(commit, repo)] = CommitRecord(commit)
# dev note: now we have all PRs and commits registered to issues or as stand-alone records
# let build hierarchy
logger.debug("Building issues hierarchy...")
self._re_register_hierarchy_issues()
self._re_register_hierarchy_issues(
sub_issues_ids=list({iid for sublist in data.parents_sub_issues.values() for iid in sublist}),
sub_issue_parents={
sub_issue: parent for parent, sublist in data.parents_sub_issues.items() for sub_issue in sublist
},
)
for issue, repo in data.issues.items():
iid = get_id(issue, repo)
if len(data.parents_sub_issues.get(iid, [])) > 0:
# issue has sub-issues - it is either hierarchy issue or sub-hierarchy issue
self._create_record_for_hierarchy_issue(issue, iid)
🤖 Prompt for AI Agents
In release_notes_generator/record/factory/issue_hierarchy_record_factory.py
around lines 67 to 106, replace the direct dict access
data.parents_sub_issues[iid] (around line 73) with a safe lookup using
data.parents_sub_issues.get(iid, []) so missing keys return an empty list;
update any other similar direct accesses in this block (including the
comprehension building sub_issues_ids and sub_issue_parents) to use .get(...,
[]) to avoid KeyError when an issue id is absent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miroslavpojer, great! Thank you for applying the fix. This will prevent potential KeyError exceptions when an issue ID is not present in the parents_sub_issues dictionary. 🎉


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
release_notes_generator/record/factory/default_record_factory.py (1)

88-104: Verify redundant repository check in caching logic.

The cross-repo fetch bug mentioned in the learning appears to be addressed—the code now correctly parses pi_repo_name from the parent issue ID and fetches from the appropriate repository.

However, the caching logic (lines 95-98) appears redundant:

parent_repository = data.get_repository(pi_repo_name)
if parent_repository is not None:
    # cache for subsequent lookups
    if data.get_repository(pi_repo_name) is None:  # This check is redundant
        data.add_repository(parent_repository)

Line 97 checks if data.get_repository(pi_repo_name) returns None, but this is redundant since we already know parent_repository is not None from line 95. The inner condition will always be False.

Based on learnings

Apply this diff to fix the redundant check:

 parent_repository = data.get_repository(pi_repo_name)
 if parent_repository is not None:
-    # cache for subsequent lookups
-    if data.get_repository(pi_repo_name) is None:
-        data.add_repository(parent_repository)
     parent_issue = self._safe_call(parent_repository.get_issue)(pi_number)
 else:
     parent_issue = None

Or, if the intent was to cache repositories fetched via GitHub API (not from data), the logic should be:

 parent_repository = data.get_repository(pi_repo_name)
-if parent_repository is not None:
+if parent_repository is None:
+    # Fetch from GitHub API and cache
+    parent_repository = self._github.get_repo(pi_repo_name)
+    data.add_repository(parent_repository)
+
+if parent_repository is not None:
     # cache for subsequent lookups
-    if data.get_repository(pi_repo_name) is None:
-        data.add_repository(parent_repository)
     parent_issue = self._safe_call(parent_repository.get_issue)(pi_number)
 else:
     parent_issue = None
tests/release_notes/data/test_filter.py (1)

28-50: Align test_filter_no_release with dict-based model
Replace list assignments in tests/release_notes/data/test_filter.py (lines 37–39) with dicts keyed by each mock (as in test_filter_with_release) so that data.issues, data.pull_requests, and data.commits conform to the new dict-based API.

🧹 Nitpick comments (1)
integration_test.py (1)

12-14: Consider extracting exception message (optional).

The token validation logic is correct. The static analysis tool suggests extracting the long exception message to a constant or custom exception class, which would align with best practices, but this is optional for a demonstration script.

Apply this diff if you want to address the static analysis hint:

+class MissingTokenError(ValueError):
+    """Raised when GITHUB_TOKEN environment variable is not set."""
+    pass
+
 token = os.getenv("GITHUB_TOKEN")
 if token is None:
-    raise ValueError("GITHUB_TOKEN environment variable is not set")
+    raise MissingTokenError("GITHUB_TOKEN environment variable is not set")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 569c279 and 02a17e4.

📒 Files selected for processing (8)
  • integration_test.py (1 hunks)
  • release_notes_generator/data/miner.py (6 hunks)
  • release_notes_generator/data/utils/bulk_sub_issue_collector.py (1 hunks)
  • release_notes_generator/record/factory/default_record_factory.py (9 hunks)
  • release_notes_generator/record/factory/issue_hierarchy_record_factory.py (8 hunks)
  • release_notes_generator/utils/record_utils.py (1 hunks)
  • tests/conftest.py (10 hunks)
  • tests/release_notes/data/test_filter.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-28T14:13:38.979Z
Learnt from: miroslavpojer
PR: AbsaOSS/generate-release-notes#173
File: release_notes_generator/record/factory/default_record_factory.py:109-126
Timestamp: 2025-09-28T14:13:38.979Z
Learning: In release_notes_generator/record/factory/default_record_factory.py, there is a cross-repo fetch bug where parent_issue_id in format "owner/repo#number" is fetched from the wrong repository because the code only queries data.repository instead of parsing the owner/repo from the ID and using the GitHub client to fetch from the correct repository.

Applied to files:

  • release_notes_generator/record/factory/default_record_factory.py
  • release_notes_generator/record/factory/issue_hierarchy_record_factory.py
🧬 Code graph analysis (8)
tests/conftest.py (1)
release_notes_generator/utils/record_utils.py (1)
  • get_id (26-45)
release_notes_generator/record/factory/default_record_factory.py (3)
release_notes_generator/utils/record_utils.py (1)
  • get_id (26-45)
release_notes_generator/model/mined_data.py (1)
  • get_repository (65-70)
release_notes_generator/record/factory/issue_hierarchy_record_factory.py (1)
  • _create_record_for_issue (198-203)
tests/release_notes/data/test_filter.py (3)
release_notes_generator/data/filter.py (3)
  • filter (38-45)
  • filter (56-118)
  • FilterByRelease (48-170)
release_notes_generator/model/mined_data.py (2)
  • MinedData (38-79)
  • home_repository (55-57)
tests/conftest.py (1)
  • mock_repo (96-99)
integration_test.py (1)
release_notes_generator/data/utils/bulk_sub_issue_collector.py (3)
  • CollectorConfig (21-41)
  • BulkSubIssueCollector (44-234)
  • scan_sub_issues_for_parents (67-192)
release_notes_generator/data/utils/bulk_sub_issue_collector.py (1)
release_notes_generator/utils/record_utils.py (2)
  • parse_issue_id (54-62)
  • format_issue_id (65-67)
release_notes_generator/record/factory/issue_hierarchy_record_factory.py (6)
release_notes_generator/utils/record_utils.py (2)
  • get_id (26-45)
  • parse_issue_id (54-62)
release_notes_generator/record/factory/default_record_factory.py (1)
  • _create_record_for_issue (181-198)
release_notes_generator/model/commit_record.py (2)
  • commit (45-50)
  • CommitRecord (13-78)
release_notes_generator/model/mined_data.py (2)
  • MinedData (38-79)
  • get_repository (65-70)
release_notes_generator/model/hierarchy_issue_record.py (3)
  • HierarchyIssueRecord (16-169)
  • sub_issues (47-51)
  • sub_hierarchy_issues (54-58)
release_notes_generator/model/sub_issue_record.py (1)
  • SubIssueRecord (15-26)
release_notes_generator/data/miner.py (5)
release_notes_generator/action_inputs.py (2)
  • get_github_repository (113-117)
  • get_github_token (120-124)
release_notes_generator/data/utils/bulk_sub_issue_collector.py (3)
  • CollectorConfig (21-41)
  • BulkSubIssueCollector (44-234)
  • scan_sub_issues_for_parents (67-192)
release_notes_generator/model/mined_data.py (4)
  • MinedData (38-79)
  • get_repository (65-70)
  • home_repository (55-57)
  • add_repository (59-63)
release_notes_generator/model/pull_request_record.py (1)
  • PullRequestRecord (15-293)
release_notes_generator/utils/record_utils.py (2)
  • get_id (26-45)
  • parse_issue_id (54-62)
release_notes_generator/utils/record_utils.py (2)
release_notes_generator/model/issue_record.py (1)
  • issue (63-68)
release_notes_generator/model/commit_record.py (1)
  • commit (45-50)
🪛 Ruff (0.13.1)
integration_test.py

14-14: Avoid specifying long messages outside the exception class

(TRY003)

release_notes_generator/data/utils/bulk_sub_issue_collector.py

212-212: Abstract raise to an inner function

(TRY301)


212-212: Avoid specifying long messages outside the exception class

(TRY003)


215-215: Consider moving this statement to an else block

(TRY300)


224-224: Avoid specifying long messages outside the exception class

(TRY003)

release_notes_generator/data/miner.py

61-61: Avoid specifying long messages outside the exception class

(TRY003)

release_notes_generator/utils/record_utils.py

61-61: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (38)
tests/release_notes/data/test_filter.py (2)

23-25: LGTM: Import paths updated correctly.

The import paths have been updated to reflect the module reorganization, and the mock_repo fixture import is appropriately added to support the dict-based test data structure.


53-92: LGTM: Test correctly migrated to dict-based model.

The test data structures have been properly converted to dictionaries keyed by mock objects with repository values, and assertions correctly use next(iter(...).keys()) to access the first item. Logger patch paths are also correctly updated.

integration_test.py (3)

1-10: LGTM: Integration test setup is appropriate.

The imports and InsecureRequestWarning suppression are correctly configured for a demonstration script with disabled TLS verification.


16-18: Warning comment added as requested.

The TLS verification setting is now properly documented with a warning comment indicating this is for testing purposes only, as addressed in the previous review.


20-29: LGTM: Integration test logic is correct.

The script effectively demonstrates the BulkSubIssueCollector's iterative parent discovery workflow, collecting sub-issues and printing progression until no new parents are found.

tests/conftest.py (6)

40-40: LGTM: Import added for ID utility.

The get_id import is correctly added to support the dict-based test data structure migration.


611-747: LGTM: Complex fixture correctly migrated to dict-based model.

The fixture has been properly updated to use:

  • data.parents_sub_issues[get_id(...)] for establishing parent-child relationships
  • Dictionary structures for issues, PRs, and commits with {item: mock_repo} mappings
  • Walrus operator for clean inline ID assignment

The hierarchical test data setup is thorough and well-structured.


750-788: LGTM: Label assignment uses correct dict key access pattern.

The fixture correctly accesses dictionary keys via list(data.issues.keys()) and assigns labels to specific items using index-based access. This relies on dict insertion order (Python 3.7+), which is appropriate for controlled test fixtures.


791-844: LGTM: Type and label assignment pattern is consistent.

The fixture correctly assigns both types and labels to specific issues using the same dict key access pattern, maintaining consistency with related fixtures.


847-879: LGTM: Combined type and label fixture is correct.

The fixture properly combines type assignment with the existing label configuration, maintaining the consistent dict key access pattern.


143-199: LGTM: Fixture cleanup is appropriate.

The removal of get_sub_issues().return_value stubs is correct, as sub-issue relationships are now managed via data.parents_sub_issues mapping rather than direct issue attributes.

release_notes_generator/utils/record_utils.py (5)

1-15: LGTM: Module setup is correct.

The imports and module structure are appropriate for a utility module handling GitHub object identifiers.


17-23: LGTM: Regex pattern and exception class are well-defined.

The regex correctly parses the org/repo#number format, and the custom exception provides clear error semantics for invalid IDs.


26-45: LGTM: get_id function correctly handles all GitHub object types.

The function appropriately dispatches based on object type, using _entity_id for Issues/PRs, commit.sha for Commits, and a safe fallback for other types. Type casting is correctly applied.


48-51: LGTM: Memoized helper improves performance.

The _entity_id helper is appropriately memoized with a reasonable cache size for formatting efficiency.


54-67: LGTM: ID parsing and formatting functions are correct.

Both parse_issue_id and format_issue_id correctly handle the org/repo#number format. The static analysis hint about the exception message is a style preference—the current message is descriptive and appropriate for this utility function.

release_notes_generator/record/factory/default_record_factory.py (4)

41-41: LGTM: Import added for centralized ID utility.

The import of get_id from the utils module is appropriate and replaces the internal ID generation logic that was removed.


59-153: LGTM: Refactored to use explicit IDs and centralized repository lookup.

The method has been correctly refactored to:

  • Accept and pass explicit IDs throughout the flow
  • Iterate over (item, repo) pairs from dict-based data structures
  • Use get_id(item, repo) for consistent ID generation
  • Rely on data.get_repository() for repository lookups

The explicit ID approach improves maintainability and aligns with the broader dict-based migration.


155-179: LGTM: Method signature updated for explicit commit ID.

The register_commit_to_record method correctly accepts an explicit cid parameter and uses it as the key for storing the CommitRecord. This aligns with the refactored explicit ID approach.


181-198: LGTM: Method signature updated for explicit issue ID.

The _create_record_for_issue method correctly accepts an explicit iid parameter, which is properly documented and used as the key for storing the IssueRecord. This completes the refactored explicit ID approach.

release_notes_generator/data/miner.py (7)

54-88: LGTM! Clean refactoring of data mining workflow.

The new mine_data() method provides a clear, well-structured flow that initializes repository context, fetches data, and applies deduplication. The explicit return of MinedData and the separation of concerns (mining vs. filtering) improves maintainability.


90-102: LGTM! Efficient bulk sub-issue mining.

The mine_missing_sub_issues() method cleanly integrates the new bulk GraphQL collector to replace one-by-one fetching. The flow from scanning to fetching is logical and well-documented.


104-124: Verify TLS configuration and consider making it configurable.

Line 114 hardcodes verify_tls=False, which disables certificate verification. This is a security concern for production environments.

Consider making TLS verification configurable via ActionInputs or environment variables:

-        cfg = CollectorConfig(verify_tls=False)
+        cfg = CollectorConfig(verify_tls=ActionInputs.get_verify_tls())  # Add to action_inputs.py

If verify_tls=False is intentional for testing, please add a comment explaining why.


126-165: LGTM! Robust missing issue fetching with proper error handling.

The method correctly handles cross-repository references, caches repositories to avoid redundant fetches, and logs errors when issues or repositories are not found. The logic aligns well with the bulk sub-issue discovery flow.


167-178: LGTM! Clean repository fetching helper.

The _fetch_repository() method provides proper error handling and logging for repository retrieval failures.


246-294: LGTM! Correct dict-based issue population.

The refactored _get_issues() method properly populates data.issues as a dict mapping Issue to Repository, handles both "all issues" and "since release" scenarios, and deduplicates by issue number. The logic is sound and aligns with the new data model.


322-347: LGTM! Dict-based filtering correctly implemented.

The __filter_duplicated_issues() method has been properly updated to work with dict-based data structures. The comprehension correctly filters out issues that match PR numbers while preserving the Issue -> Repository mapping.

release_notes_generator/data/utils/bulk_sub_issue_collector.py (5)

20-42: LGTM! Well-documented configuration with sensible defaults.

The CollectorConfig dataclass provides clear, customizable options for API interaction, retry behavior, pagination, and pacing. The defaults align with GitHub's GraphQL API limits (max 100 per page) and production best practices.


44-64: LGTM! Clean initialization with dependency injection.

The constructor properly accepts optional configuration and session, sets up authorization headers, and initializes the parents_sub_issues dictionary for accumulating results.


66-192: LGTM! Complex but well-structured bulk GraphQL logic.

The scan_sub_issues_for_parents() method implements sophisticated batching and pagination:

  1. Groups parents by repository to optimize GraphQL queries
  2. Maintains per-parent cursors for pagination
  3. Constructs batched GraphQL queries respecting config limits
  4. Handles missing/inaccessible parents gracefully
  5. Identifies new parents from children with sub-issues
  6. Returns deterministic sorted output

The complexity is justified by the performance gains over one-by-one fetching. The logic correctly handles edge cases (empty results, pagination, cross-repo references).


196-224: LGTM! Robust retry logic with exponential backoff.

The _post_graphql() method implements proper error handling with retries, backoff, and logging. It correctly raises GraphQL errors and handles HTTP failures.

The static analysis hints (TRY301, TRY003, TRY300) are overly pedantic for this context—the current error handling is clear and appropriate for a specialized GraphQL client.


226-234: LGTM! Simple and effective alias resolution.

The _find_alias_node() helper correctly traverses the GraphQL response structure to locate issue nodes by alias.

release_notes_generator/record/factory/issue_hierarchy_record_factory.py (6)

66-82: LGTM! Clean data-driven classification logic.

The refactored generate() method correctly iterates over dict-based data.issues, uses get_id() for stable identifiers, and classifies each issue as hierarchy, sub-issue, or plain issue based on data.parents_sub_issues. The logic is clear and aligns with the new data model.


88-105: LGTM! Proper dict iteration and hierarchy construction.

The method correctly iterates over dict-based data.pull_requests and data.commits, uses get_id() for identifiers, and builds the hierarchy with the updated _re_register_hierarchy_issues() signature. The comprehensions for sub_issues_ids and sub_issue_parents are correct.


118-166: LGTM! Cross-repo issue handling fixed.

The updated _register_pull_and_its_commits_to_issue() method now correctly handles cross-repository references:

  1. Parses issue IDs to extract org/repo/number (line 142)
  2. Fetches the correct repository from data (line 143)
  3. Retrieves the issue from the resolved repository (line 144)
  4. Uses get_id() with the correct repository (line 146)

This addresses the cross-repo fetch bug mentioned in the retrieved learnings from PR #173.


168-186: LGTM! Explicit iid parameter improves clarity.

The _create_record_for_hierarchy_issue() method now accepts an explicit iid parameter, eliminating ambiguity about identifier construction and ensuring consistency across the codebase.


198-212: LGTM! Consistent iid propagation.

Both _create_record_for_issue() and _create_record_for_sub_issue() correctly accept and propagate the iid parameter. The cross-repo detection logic in _create_record_for_sub_issue() (lines 214-217) is sound.


219-259: LGTM! Correct bottom-up hierarchy construction.

The _re_register_hierarchy_issues() method implements proper bottom-up hierarchy building:

  1. Accepts explicit sub_issues_ids and sub_issue_parents parameters
  2. Skips sub-issues that are themselves parents (line 229)
  3. Attaches sub-issues/sub-hierarchies to their parents
  4. Removes processed entries from tracking structures
  5. Recurses with reduced lists when progress is made (line 258-259)

The recursion termination is safe—it only recurses when made_progress is true and items were removed, preventing infinite loops.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
release_notes_generator/data/miner.py (1)

334-335: Consider using set for open_pr_numbers for consistency.

Line 334 uses a set comprehension for pr_numbers, but line 335 uses a list comprehension for open_pr_numbers. Both are used in membership checks (not in), so using a set for both would be more consistent and slightly more efficient for larger datasets.

Apply this diff:

         pr_numbers = {pr.number for pr in data.pull_requests.keys()}
-        open_pr_numbers = [pr.number for pr in open_pull_requests]
+        open_pr_numbers = {pr.number for pr in open_pull_requests}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 02a17e4 and 5ece6a5.

📒 Files selected for processing (1)
  • release_notes_generator/data/miner.py (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
release_notes_generator/data/miner.py (6)
release_notes_generator/action_inputs.py (2)
  • get_github_repository (113-117)
  • get_github_token (120-124)
release_notes_generator/data/utils/bulk_sub_issue_collector.py (3)
  • CollectorConfig (21-41)
  • BulkSubIssueCollector (44-234)
  • scan_sub_issues_for_parents (67-192)
release_notes_generator/model/issue_record.py (2)
  • IssueRecord (16-334)
  • issue (63-68)
release_notes_generator/model/mined_data.py (4)
  • MinedData (38-79)
  • get_repository (65-70)
  • home_repository (55-57)
  • add_repository (59-63)
release_notes_generator/model/pull_request_record.py (1)
  • PullRequestRecord (15-293)
release_notes_generator/utils/record_utils.py (2)
  • get_id (26-45)
  • parse_issue_id (54-62)
🪛 Ruff (0.13.1)
release_notes_generator/data/miner.py

61-61: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (5)
release_notes_generator/data/miner.py (5)

54-88: LGTM! Well-structured data mining orchestration.

The mine_data method effectively coordinates the data mining workflow, gathering repository information, releases, issues, pull requests, and commits, then applying deduplication. The flow is clear and the logging statements provide good observability.


90-102: LGTM! Efficient sub-issue mining workflow.

The method correctly delegates to _scan_sub_issues_for_parents to build the hierarchy map, then fetches missing issues. The return type in the docstring matches the implementation.


126-165: LGTM! Robust missing issue fetching with caching.

The method efficiently fetches missing issues while caching repositories for subsequent lookups. Error handling is appropriate, and the docstring accurately describes the return type.


167-178: LGTM! Clear repository fetching helper.

The method provides a clean abstraction for fetching repositories with appropriate error logging.


246-294: LGTM! Effective issue fetching with deduplication.

The method correctly handles both scenarios (with and without release) and appropriately deduplicates issues by number. The defensive handling of potential None values from API calls is good practice.

dict[str, list[str]]: A dictionary mapping parent issue IDs to their sub-issue IDs.
"""
new_parent_ids: list[str] = parents_to_check
cfg = CollectorConfig(verify_tls=False)
Copy link

@coderabbitai coderabbitai bot Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Security concern: TLS verification disabled.

Setting verify_tls=False disables certificate validation, exposing the connection to potential man-in-the-middle attacks. Unless there's a specific reason (e.g., testing environment with self-signed certificates), TLS verification should be enabled.

Consider applying this diff:

-        cfg = CollectorConfig(verify_tls=False)
+        cfg = CollectorConfig(verify_tls=True)

If TLS verification must be disabled for specific environments, consider making it configurable via action inputs rather than hardcoding.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cfg = CollectorConfig(verify_tls=False)
cfg = CollectorConfig(verify_tls=True)
🤖 Prompt for AI Agents
In release_notes_generator/data/miner.py around line 114, the CollectorConfig is
instantiated with verify_tls=False which disables TLS certificate validation;
change this to enable verification by default (verify_tls=True) and make
disabling TLS validation configurable (e.g., read from a secure configuration,
environment variable, or action input such as VERIFY_TLS=false for
non-production/testing only), validate/parse that config value as a boolean, and
only pass verify_tls=False when the config explicitly requests it so production
defaults remain secure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not now.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miroslavpojer, understood. I've noted your decision to defer this change. Feel free to revisit the TLS verification setting when appropriate for your environment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
integration_test.py (1)

16-18: Consider defining the error message in the exception class.

The error message could be moved to MissingTokenError as a class attribute or default message to align with the TRY003 guideline, though the current implementation is acceptable for a test script.

If you prefer to address the linter hint, apply this diff:

 class MissingTokenError(ValueError):
-    """Raised when GITHUB_TOKEN environment variable is not set."""
-    pass
+    """Raised when GITHUB_TOKEN environment variable is not set."""
+    
+    def __init__(self):
+        super().__init__("GITHUB_TOKEN environment variable is not set")
 
 token = os.getenv("GITHUB_TOKEN")
 if token is None:
-    raise MissingTokenError("GITHUB_TOKEN environment variable is not set")
+    raise MissingTokenError()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ece6a5 and 49c8170.

📒 Files selected for processing (1)
  • integration_test.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
integration_test.py (1)
release_notes_generator/data/utils/bulk_sub_issue_collector.py (3)
  • CollectorConfig (21-41)
  • BulkSubIssueCollector (44-234)
  • scan_sub_issues_for_parents (67-192)
🪛 Ruff (0.13.1)
integration_test.py

18-18: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (7)
integration_test.py (7)

1-3: LGTM!

Clear and concise module docstring.


5-8: LGTM!

All imports are necessary and correctly used throughout the script.


10-10: LGTM!

Suppressing InsecureRequestWarning is appropriate given that TLS verification is disabled for this test script.


12-14: LGTM!

Custom exception class with clear docstring is a good practice for specific error conditions.


20-22: LGTM!

The warning comment clearly indicates this is for testing only, addressing the previous review feedback.


24-24: LGTM!

Correct instantiation of BulkSubIssueCollector with validated token and configuration.


26-33: LGTM!

The demonstration effectively shows the hierarchical traversal workflow: starting with an initial parent, iteratively discovering new parents from sub-issues, and accumulating results. The loop correctly terminates when no new parents are found.

@miroslavpojer miroslavpojer merged commit 99efda2 into master Oct 1, 2025
7 checks passed
@miroslavpojer miroslavpojer deleted the feature/176-Introduce-bulk-logic-for-getting-sub-issues branch October 1, 2025 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce bulk logic for getting sub-issues

1 participant