#15-create git is&pr preprocessor, update pinecone sync logic#118
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds Pinecone indexing across multiple trackers: new settings, preprocessors for issues/PRs, core GitHub preprocessing, and management-command hooks to run Pinecone syncs (with CLI options and logging) for Clang, Boost Library, and Boost Mailing List trackers. Changes
Sequence DiagramsequenceDiagram
actor User
participant Cmd as Tracker Command\n(e.g., run_clang_github_tracker)
participant Sync as Pinecone Sync\n(run_cppa_pinecone_sync)
participant Pre as Tracker Preprocessor\n(issue/pr_preprocessor)
participant Core as Core GitHub\nPreprocessing
participant Raw as Raw JSON Files\n(workspace/raw)
participant Pine as Pinecone API
User->>Cmd: run command (--pinecone-app-type)
Cmd->>Cmd: fetch/sync raw GitHub data
Cmd->>Sync: _run_pinecone_sync(app_type, namespace, preprocessor_path)
Sync->>Pre: call preprocess_for_pinecone(failed_ids, final_sync_at)
Pre->>Core: delegate to preprocess_* (owner[/repo])
Core->>Raw: iterate JSON files
Raw-->>Core: return issue/PR JSON
Core->>Core: build documents, parse timestamps, dedupe
Core-->>Pre: return documents, chunked_flag
Pre-->>Sync: return documents
Sync->>Pine: upsert documents (namespace=app_type)
Pine-->>Sync: ack
Sync-->>Cmd: complete
Cmd-->>User: report success
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (4)
clang_github_tracker/preprocessors/pr_preprocessor.py (1)
23-24: Same recommendation: use Django settings forAPP_TYPE.For consistency with centralized settings:
♻️ Proposed fix
-NAMESPACE = "github-clang" -APP_TYPE = os.getenv("CLANG_GITHUB_PINECONE_APP_TYPE", NAMESPACE) +NAMESPACE = "github-clang" +APP_TYPE = settings.CLANG_GITHUB_PINECONE_APP_TYPE🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clang_github_tracker/preprocessors/pr_preprocessor.py` around lines 23 - 24, Replace the direct env-var fallback for APP_TYPE with a Django settings-backed value: instead of assigning APP_TYPE = os.getenv("CLANG_GITHUB_PINECONE_APP_TYPE", NAMESPACE), read APP_TYPE from django.conf.settings (e.g., settings.CLANG_GITHUB_PINECONE_APP_TYPE) with NAMESPACE as the default; update imports to include from django.conf import settings and ensure NAMESPACE remains the default constant used when the setting is missing or empty.boost_library_tracker/preprocessors/pr_preprocessor.py (1)
26-27: Same recommendation: use Django settings forAPP_TYPE.For consistency with the sibling
issue_preprocessor.pyfix and the centralized settings inconfig/settings.py:♻️ Proposed fix
-NAMESPACE = "github-boostorg" -APP_TYPE = os.getenv("BOOST_GITHUB_PINECONE_APP_TYPE", NAMESPACE) +NAMESPACE = "github-boostorg" +APP_TYPE = settings.BOOST_GITHUB_PINECONE_APP_TYPE🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_tracker/preprocessors/pr_preprocessor.py` around lines 26 - 27, Replace the environment-variable fallback for APP_TYPE in pr_preprocessor.py with the centralized Django setting used elsewhere: import and read the value from config.settings (the same approach applied in issue_preprocessor.py) instead of calling os.getenv; update references to NAMESPACE and APP_TYPE so APP_TYPE defaults to NAMESPACE when the settings value is absent, and ensure the module imports settings at top and uses settings.BOOST_GITHUB_PINECONE_APP_TYPE (or the existing setting name in config/settings.py) to keep behavior consistent.clang_github_tracker/management/commands/run_clang_github_tracker.py (1)
170-186: Redundant fallback:effective_app_typeandeffective_namespaceduplicate logic already in lines 103-108.The
pinecone_app_typeandpinecone_namespacevariables are already guaranteed to have values from settings fallback at lines 103-108. The additionalor settings.*checks are unnecessary.♻️ Proposed simplification
- # Phase: upsert issues and PRs to Pinecone - effective_app_type = ( - pinecone_app_type or settings.CLANG_GITHUB_PINECONE_APP_TYPE - ) - effective_namespace = ( - pinecone_namespace or settings.CLANG_GITHUB_PINECONE_NAMESPACE - ) _run_pinecone_sync( - effective_app_type, - effective_namespace, + pinecone_app_type, + pinecone_namespace, "clang_github_tracker.preprocessors.issue_preprocessor.preprocess_for_pinecone", ) _run_pinecone_sync( - effective_app_type, - effective_namespace, + pinecone_app_type, + pinecone_namespace, "clang_github_tracker.preprocessors.pr_preprocessor.preprocess_for_pinecone", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clang_github_tracker/management/commands/run_clang_github_tracker.py` around lines 170 - 186, The variables effective_app_type and effective_namespace are redundantly reapplying the settings fallback already handled earlier; simplify by removing those local fallbacks and pass the existing pinecone_app_type and pinecone_namespace directly to _run_pinecone_sync. Locate the block that sets effective_app_type/effective_namespace and the two _run_pinecone_sync calls in run_clang_github_tracker.py and replace uses of effective_app_type/effective_namespace with pinecone_app_type/pinecone_namespace (or remove the intermediate variables entirely) so the earlier fallback logic is the single source of truth.boost_library_tracker/preprocessors/issue_preprocessor.py (1)
28-29: Consider using Django settings forAPP_TYPEinstead ofos.getenvfor consistency.The code reads
APP_TYPEdirectly fromos.getenv, butconfig/settings.pyalready definesBOOST_GITHUB_PINECONE_APP_TYPEwith proper normalization and fallback logic. Using the settings ensures consistent behavior across the codebase.♻️ Proposed fix
-NAMESPACE = "github-boostorg" -APP_TYPE = os.getenv("BOOST_GITHUB_PINECONE_APP_TYPE", NAMESPACE) +NAMESPACE = "github-boostorg" +APP_TYPE = settings.BOOST_GITHUB_PINECONE_APP_TYPEYou can also remove the
import osline if no longer needed.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_tracker/preprocessors/issue_preprocessor.py` around lines 28 - 29, Replace the direct os.getenv usage for APP_TYPE with the Django setting that centralizes normalization/fallback: import settings from django.conf and set APP_TYPE = settings.BOOST_GITHUB_PINECONE_APP_TYPE (keeping the existing NAMESPACE constant), and remove the unused import os if it becomes redundant; refer to the NAMESPACE and APP_TYPE symbols and the config/settings.py definition when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@github_activity_tracker/preprocessors/github_preprocess.py`:
- Around line 181-222: build_pr_document currently omits the repository owner in
the returned metadata like build_issue_document did; update the function
signature to accept an owner parameter (e.g., add owner: str to
build_pr_document) and include "owner": owner in the metadata dict returned by
build_pr_document so the metadata mirrors build_issue_document (locate
build_pr_document, its returned metadata block, and add the owner field there).
- Around line 137-178: The metadata currently returned by build_issue_document
is missing the required "owner" field and the function signature doesn't accept
an owner to populate it; update build_issue_document(path, data, repo) to accept
an additional owner: str parameter, add "owner": owner to the metadata dict
(alongside repo_name), and update all call sites that invoke
build_issue_document to pass the repository owner string so the document shape
matches the docstring.
---
Nitpick comments:
In `@boost_library_tracker/preprocessors/issue_preprocessor.py`:
- Around line 28-29: Replace the direct os.getenv usage for APP_TYPE with the
Django setting that centralizes normalization/fallback: import settings from
django.conf and set APP_TYPE = settings.BOOST_GITHUB_PINECONE_APP_TYPE (keeping
the existing NAMESPACE constant), and remove the unused import os if it becomes
redundant; refer to the NAMESPACE and APP_TYPE symbols and the
config/settings.py definition when making the change.
In `@boost_library_tracker/preprocessors/pr_preprocessor.py`:
- Around line 26-27: Replace the environment-variable fallback for APP_TYPE in
pr_preprocessor.py with the centralized Django setting used elsewhere: import
and read the value from config.settings (the same approach applied in
issue_preprocessor.py) instead of calling os.getenv; update references to
NAMESPACE and APP_TYPE so APP_TYPE defaults to NAMESPACE when the settings value
is absent, and ensure the module imports settings at top and uses
settings.BOOST_GITHUB_PINECONE_APP_TYPE (or the existing setting name in
config/settings.py) to keep behavior consistent.
In `@clang_github_tracker/management/commands/run_clang_github_tracker.py`:
- Around line 170-186: The variables effective_app_type and effective_namespace
are redundantly reapplying the settings fallback already handled earlier;
simplify by removing those local fallbacks and pass the existing
pinecone_app_type and pinecone_namespace directly to _run_pinecone_sync. Locate
the block that sets effective_app_type/effective_namespace and the two
_run_pinecone_sync calls in run_clang_github_tracker.py and replace uses of
effective_app_type/effective_namespace with pinecone_app_type/pinecone_namespace
(or remove the intermediate variables entirely) so the earlier fallback logic is
the single source of truth.
In `@clang_github_tracker/preprocessors/pr_preprocessor.py`:
- Around line 23-24: Replace the direct env-var fallback for APP_TYPE with a
Django settings-backed value: instead of assigning APP_TYPE =
os.getenv("CLANG_GITHUB_PINECONE_APP_TYPE", NAMESPACE), read APP_TYPE from
django.conf.settings (e.g., settings.CLANG_GITHUB_PINECONE_APP_TYPE) with
NAMESPACE as the default; update imports to include from django.conf import
settings and ensure NAMESPACE remains the default constant used when the setting
is missing or empty.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: f824617a-64c6-4215-b7ba-2b307505d570
📒 Files selected for processing (13)
.env.exampleboost_library_tracker/management/commands/run_boost_library_tracker.pyboost_library_tracker/preprocessors/__init__.pyboost_library_tracker/preprocessors/issue_preprocessor.pyboost_library_tracker/preprocessors/pr_preprocessor.pyboost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.pyclang_github_tracker/management/commands/run_clang_github_tracker.pyclang_github_tracker/preprocessors/__init__.pyclang_github_tracker/preprocessors/issue_preprocessor.pyclang_github_tracker/preprocessors/pr_preprocessor.pyconfig/settings.pygithub_activity_tracker/preprocessors/__init__.pygithub_activity_tracker/preprocessors/github_preprocess.py
|
@coderabbitai continue to review |
|
✅ Actions performedReview triggered.
|
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
boost_library_tracker/management/commands/run_boost_library_tracker.py (1)
245-275: Consider centralizing_run_pinecone_syncinto a shared utility.The helper is now duplicated across tracker commands with near-identical behavior. Moving it to a common module will reduce drift and make future error-handling changes safer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@boost_library_tracker/management/commands/run_boost_library_tracker.py` around lines 245 - 275, Extract the duplicated helper _run_pinecone_sync into a single shared utility module (e.g., a new utils or common module) and replace the copies in each tracker command with imports from that module; keep the same signature (app_type: str, namespace: str, preprocessor_dotted_path: str), same calls to call_command("run_cppa_pinecone_sync", ...), and same logging/error handling, then update all files that contained the duplicate (including run_boost_library_tracker's usage) to import the shared _run_pinecone_sync and remove the local definitions so future changes to error handling or behavior are centralized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@boost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.py`:
- Around line 338-341: The early return when the API fetch yields no emails
prevents the final Pinecone indexing step (_run_pinecone_sync) from running;
instead of returning immediately when the fetch is empty, change the control
flow so you skip the per-email workspace processing branch but continue to the
Pinecone phase. Concretely: remove or replace the early "return" after the
empty-fetch check with logic that sets a flag (e.g., has_indexable_items) if any
items were persisted during workspace processing, or simply proceed to call
_run_pinecone_sync unconditionally; ensure
_run_pinecone_sync(app_type=pinecone_app_type, namespace=pinecone_namespace) is
reachable at the end of the command even when the API returned zero emails.
In `@clang_github_tracker/management/commands/run_clang_github_tracker.py`:
- Around line 171-187: The two calls to _run_pinecone_sync are using the same
effective_app_type (variable effective_app_type), which causes a shared
PineconeSyncStatus cursor and makes PRs potentially skip updates after issues
advance the cursor; fix by invoking _run_pinecone_sync with distinct app_type
keys for issues and PRs (e.g., derive two values like "github-clang-issues" and
"github-clang-prs" instead of effective_app_type) or alternatively merge the two
preprocessors into a single _run_pinecone_sync invocation that returns both
issue and PR documents; update any place that reads/writes the sync cursor
(e.g., update_sync_status / PineconeSyncStatus usage) to use the corresponding
distinct app_type so each stream maintains its own final_sync_at.
---
Nitpick comments:
In `@boost_library_tracker/management/commands/run_boost_library_tracker.py`:
- Around line 245-275: Extract the duplicated helper _run_pinecone_sync into a
single shared utility module (e.g., a new utils or common module) and replace
the copies in each tracker command with imports from that module; keep the same
signature (app_type: str, namespace: str, preprocessor_dotted_path: str), same
calls to call_command("run_cppa_pinecone_sync", ...), and same logging/error
handling, then update all files that contained the duplicate (including
run_boost_library_tracker's usage) to import the shared _run_pinecone_sync and
remove the local definitions so future changes to error handling or behavior are
centralized.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: d362c25c-4c66-4e2e-874b-3b80adbb7784
📒 Files selected for processing (13)
.env.exampleboost_library_tracker/management/commands/run_boost_library_tracker.pyboost_library_tracker/preprocessors/__init__.pyboost_library_tracker/preprocessors/issue_preprocessor.pyboost_library_tracker/preprocessors/pr_preprocessor.pyboost_mailing_list_tracker/management/commands/run_boost_mailing_list_tracker.pyclang_github_tracker/management/commands/run_clang_github_tracker.pyclang_github_tracker/preprocessors/__init__.pyclang_github_tracker/preprocessors/issue_preprocessor.pyclang_github_tracker/preprocessors/pr_preprocessor.pyconfig/settings.pygithub_activity_tracker/preprocessors/__init__.pygithub_activity_tracker/preprocessors/github_preprocess.py
Summary by CodeRabbit
New Features
Configuration