Skip to content

Conversation

@enitrat
Copy link
Collaborator

@enitrat enitrat commented Oct 5, 2025

  • dpsy_summarizer.py
    • Added optional per-chunk metadata threading (metas) through the recursive summarization.
    • For every generated section (both leaf and non-leaf ToC headers), aggregate chunk source URLs by frequency and inject a formatted Sources: list directly under the section heading.
    • Fixed heading regex in generate_markdown_toc and extract_headings to correctly honor max_level via .
  • mdbook_summarizer.py
    • Calls massively_summarize with explicit metas=None (behavior unchanged aside from ToC).
  • New: doc_dump_summarizer.py
    • Reads python/doc_dump.md (or any similar doc-dump file), splits pages by “Source URL: …” markers, chunks per page, and passes metadata with each chunk.
    • Uses massively_summarize to generate hierarchical summaries with per-section Sources lists.
  • summarizer_factory.py
    • Added DocumentationType.DOCDUMP and registered DocDumpSummarizer.

- dpsy_summarizer.py
  - Added optional per-chunk metadata threading (metas) through the recursive summarization.
  - For every generated section (both leaf and non-leaf ToC headers), aggregate chunk source URLs by frequency and inject a formatted Sources: list directly under the section heading.
  - Fixed heading regex in generate_markdown_toc and extract_headings to correctly honor max_level via .
- mdbook_summarizer.py
  - Inserts a Markdown Table of Contents immediately after the top title.
  - Calls massively_summarize with explicit metas=None (behavior unchanged aside from ToC).
- New: doc_dump_summarizer.py
  - Reads python/doc_dump.md (or any similar doc-dump file), splits pages by “**Source URL:** …” markers, chunks per page, and passes  metadata with each chunk.
  - Uses massively_summarize to generate hierarchical summaries with per-section Sources lists.
  - Inserts a Table of Contents after the top title.
- summarizer_factory.py
  - Added DocumentationType.DOCDUMP and registered DocDumpSummarizer.
@enitrat enitrat merged commit 7c5050b into main Oct 5, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants