A personal knowledge management repository for organizing notes, references, and insights. Includes an automated pipeline for converting PDFs and web pages to markdown, organized by subject area, with sync to Obsidian.
knowledge-management/
├── notes/ # Atomic notes, fleeting thoughts, and evergreen content
├── references/
│ └── papers/ # Converted markdown output, organized by subject
│ ├── transformers/
│ │ ├── 1706.03762.md
│ │ └── 1706.03762_images/
│ └── cuda/
├── resources/
│ └── sources/ # Input sources, organized by subject
│ ├── transformers/
│ │ └── urls.txt
│ └── cuda/
│ └── urls.txt
├── scripts/
│ ├── convert_pdfs.py # Cloud agent conversion script
│ └── sync_to_vault.py # Local Obsidian vault sync script
├── projects/ # Project-specific knowledge and documentation
├── templates/ # Reusable document and note templates
├── sync_config.json # Subject → Obsidian vault path mapping
├── requirements.txt # Python dependencies (pymupdf4llm, markitdown)
└── README.md
The pipeline converts PDFs and web pages to markdown, extracts images from PDFs, and syncs the output to your Obsidian vault with proper frontmatter for the wiki schema.
- Add source URLs to
resources/sources/<subject>/urls.txt - Cloud agent downloads PDFs, converts all sources to markdown (with images), and opens a PR
- Merge the PR and
git pulllocally - Sync to Obsidian with
python scripts/sync_to_vault.py
The sync script renames files to the wiki schema (<source_type>-<slug>.md), prepends YAML frontmatter, and copies extracted images alongside the markdown.
Each line follows the format: url | title | source_type
https://arxiv.org/pdf/1706.03762.pdf | Attention Is All You Need | paper
https://docs.nvidia.com/cuda/cuda-programming-guide/index.html | CUDA Programming Guide | doc
Valid source types: paper, blog, video, course, code, thread, pdf, doc
Blank lines and lines starting with # are ignored.
- Add the URL to
resources/sources/<subject>/urls.txt - Commit and push
- Run the cloud agent:
oz agent run-cloud --environment 3SyIIpxdPQfIyGOFd7ZQTs --prompt "Run: cd knowledge-management && python scripts/convert_pdfs.py. If any new files were generated, create a PR." - Merge the PR, then
git pull - Run
python scripts/sync_to_vault.py
-
Create the subject folder with a
urls.txt:mkdir -p resources/sources/<new-subject>
Add URLs to
resources/sources/<new-subject>/urls.txt -
Add the Obsidian vault mapping in
sync_config.json:{ "subjects": { "<new-subject>": "/path/to/obsidian/vault/Raw" } } -
(Optional) Register a NotebookLM notebook for the subject so new markdown is synced to NotebookLM via the
nlmCLI. Add the notebook ID undernotebooklminsync_config.json:{ "notebooklm": { "<new-subject>": "<notebook-id>" } }If omitted, only the NotebookLM push is skipped for this subject — the Obsidian vault sync (
sync_to_vault.py) still runs normally. -
Create the Obsidian vault for the subject area with the required layer folders:
mkdir -p "/path/to/obsidian/vault/<new-subject>"/{Raw,Wiki,"Learning Path"} cp "/path/to/obsidian/vault/<existing-subject>/CLAUDE.md" "/path/to/obsidian/vault/<new-subject>/CLAUDE.md"
Then adapt the cloned
CLAUDE.mdto the new domain — strip every term, example, and Learning Path stage scope that belongs to the sibling subject, and substitute domain-native terms. A schema clone is a starting point, not a drop-in. -
Run the vault bootstrap workflow (defined in the vault's own
CLAUDE.md, section "Bootstrap workflow"). This is a mandatory one-time step for every new subject-area vault. The bootstrap createsWiki/index.md,Wiki/log.md,Wiki/overview.md, plus stub entity/concept pages for the domain's core terms, plus emptyLearning Path/stage files. Without bootstrap, the first ingest has nothing to link into — summaries end up as orphans and the wiki graph-connectivity invariant is broken. -
Commit, push, and follow steps 3-5 from the "Adding a source to an existing subject" section above.
When starting a new session — on any subject-area vault in this knowledge base — the LLM has no memory of prior work. Before giving any new instructions, ask it to orient itself by reading the load-bearing state files so it can pick up where the last session left off without guessing.
The three checks, in order:
- Read
Wiki/log.mdfor the vault you're working in — the append-only log tells you what's been ingested, what stubs were created, what's deferred, and what was touched last. This is the single most load-bearing file for session continuity. - Check
Raw/for unprocessed files — anything newer than the latest log entry is pending ingest. - Glance at
git statusand recent commits — catches anything changed outside the log (config updates,CLAUDE.mdedits, new subject areas, URL additions).
Suggested resume prompt (vault-agnostic — just substitute the vault path):
Review
Wiki/log.mdin the<vault-name>vault, checkRaw/for unprocessed files, glance atgit status/ recent commits, and tell me what state we're in before we start. Don't edit anything yet.
If you're working across multiple vaults in one session (e.g., bouncing between the Transformer and CUDA vaults), ask for the orientation check on each vault explicitly — the LLM will only check the vault you name.
The vault's own CLAUDE.md contains a longer-form version of this resume prompt tailored to that vault. Use the short form for daily interactive sessions; reach for the long form only when the vault is unfamiliar or it's been weeks since the last session.
- PDFs:
pymupdf4llm— lightweight, extracts images inline at their original position - Web pages:
markitdown(Microsoft) — converts HTML URLs directly to markdown
- Environment ID:
3SyIIpxdPQfIyGOFd7ZQTs - Docker image:
warpdotdev/dev-base:latest-agents - Setup command:
cd knowledge-management && pip install --break-system-packages -r requirements.txt
Clone the repository and install dependencies:
git clone https://github.com/aadehamid/knowledge-management.git
pip install -r requirements.txt