Preprocessor for scraped llvm/llvm-project PRs & comments → documents for Pinecone

## Summary
Preprocess **already scraped** **llvm/llvm-project** pull request data (including comments) into document objects (content + metadata) for Pinecone upsert. No scraping or GitHub API calls.

## Scope
- **Input**: Scraped PR data for **llvm/llvm-project** (with comments). **Output**: Documents with `content` and `metadata` (e.g. repo, number, state, author, created_at, labels, url). In scope: parse/validate, normalize text, include comments in content/chunking, document schema. Out of scope: GitHub fetch; Pinecone API.

## Result
Library or CLI: scraped payload(s) → list of `{ content, metadata }`. Config for field mapping/truncation. Code, tests, and doc schema README.

## Acceptance criteria
- [ ] Accepts scraped llvm/llvm-project PR (+ comments) data in agreed format; outputs stable `content` + `metadata`.
- [ ] Metadata includes identifier, repo (llvm/llvm-project), type (PR), filterable fields. Schema documented.
- [ ] No GitHub API or scraping in this component.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessor for scraped llvm/llvm-project PRs & comments → documents for Pinecone #88

Summary

Scope

Result

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Preprocessor for scraped llvm/llvm-project PRs & comments → documents for Pinecone #88

Description

Summary

Scope

Result

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions