You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Apr 23, 2026. It is now read-only.
Preprocess already scrapedllvm/llvm-project pull request data (including comments) into document objects (content + metadata) for Pinecone upsert. No scraping or GitHub API calls.
Scope
Input: Scraped PR data for llvm/llvm-project (with comments). Output: Documents with content and metadata (e.g. repo, number, state, author, created_at, labels, url). In scope: parse/validate, normalize text, include comments in content/chunking, document schema. Out of scope: GitHub fetch; Pinecone API.
Result
Library or CLI: scraped payload(s) → list of { content, metadata }. Config for field mapping/truncation. Code, tests, and doc schema README.
Acceptance criteria
Accepts scraped llvm/llvm-project PR (+ comments) data in agreed format; outputs stable content + metadata.
Metadata includes identifier, repo (llvm/llvm-project), type (PR), filterable fields. Schema documented.
Summary
Preprocess already scraped llvm/llvm-project pull request data (including comments) into document objects (content + metadata) for Pinecone upsert. No scraping or GitHub API calls.
Scope
contentandmetadata(e.g. repo, number, state, author, created_at, labels, url). In scope: parse/validate, normalize text, include comments in content/chunking, document schema. Out of scope: GitHub fetch; Pinecone API.Result
Library or CLI: scraped payload(s) → list of
{ content, metadata }. Config for field mapping/truncation. Code, tests, and doc schema README.Acceptance criteria
content+metadata.