The library expansion: two surfaces, the atlas and book, and the whole of Perseus#2
Merged
Merged
Conversation
Locks the shape for the three-front expansion: two surfaces with no URL changes, the atlas and Carried Across as the justification layer, and the first Perseus tranche through the existing works.json -> convert.ts methodology via an additive perseus-works.json input. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The site now has two surfaces with no URL changes: a reader-centric default (Read, Atlas, Book, About) and an engine room (/engine hub linking /try, /eval, /thesis, /numbers) with a quiet crossing in the header. Homepage gains the justification block; /about and /thesis gain cross-links; README gains the why-paragraph and launch rows. The Naql transmission atlas lives at /atlas: dataset under atlas/data (CC BY 4.0), stemma renderer re-skinned to falsafa tokens via an .atlas-scope bridge (light/dark/sepia follow the site), md siblings and /atlas/graph.json ported, validator at scripts/validate-atlas.ts. Carried Across lives at /book with a new closing chapter, 'Afterword: why Falsafa': the book's argument stated as the reason this project exists, chapter by chapter mapped to design decisions. Review fixes folded in: chapter bylines honor the frontmatter translator instead of hardcoding Thothica (links only Thothica's own credits), prev/next reader nav falls back when a neighbor lacks the current variant, /eras/ index page added (was 404 from /numbers), footer credit made conditional, .gitignore inline-comment bug fixed so apps/mcp/corpus/ is actually ignored, house-style pass on all new copy. Fonts: Amiri, Noto Serif Devanagari/Hebrew, Noto Sans Syriac, IBM Plex Mono via fontsource for the atlas original scripts. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…and search scripts/perseus/ingest.ts fetches each work's __cts__.xml from the PerseusDL canonical repos, picks the English translation URN, credits the real translator (Fowler, Godley, Shorey, Rackham, Lamb, Dale; catalog-silent cases overridden in tranche-1.json: Butler rev. Perseus for Homer, Leonard for Lucretius, Williams for the Aeneid), and flattens TEI to chapters: book-level divisions win when present, shorter dialogues become single chapters. Content quality, after adversarial review: TEI <reg> gazetteer expansions stripped (Herodotus no longer hails from Bodrum with coordinates), prose reflowed so soft source wraps stop becoming line-level paragraph IDs (Thucydides Book 1: 4,162 fragments -> 616 real paragraphs), words rejoined across source hyphen breaks, punctuation-only paragraphs dropped, verse detection now requires verse lines to dominate decisively (Republic 2-3 back to prose). scripts/perseus/apply.ts applies the tranche additively: corpus/ carries post-convert pipeline output (chapter splits, wiki cards, sidecars) that regeneration would destroy, so convert.ts gained --works/--audit/--out flags and a per-chapter translator override instead of an implicit merge. Licensing recorded: TEI encodings CC BY-SA 4.0, translations public domain; every chapter carries its Scaife reader source_url. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…links The library now holds two of the atlas's own books: Hippocrates' Aphorisms (tr. Francis Adams) and Euclid's Elements (tr. Thomas Little Heath), via a second Perseus tranche; the ingester now reads every scripts/perseus/tranche-*.json. Reconciliation runs both ways through a new optional read_slug field on atlas works: atlas work pages link 'Read this work in the library', and reader pages (work index and chapter pages, covering single-chapter redirect works) link 'Trace this book's journey in the atlas'. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The atlas now maps the library's own books: Homer (Pilatus for Boccaccio, Chapman, Pope, Butler), Republic (Ficino, Jowett, Shorey), Nicomachean Ethics (the disputed Arabic credit, Grosseteste, Rackham), Thucydides (Valla for Nicholas V, Hobbes, Dale), Herodotus (Valla, Godley), Lucretius (Creech, Leonard, with Poggio's 1417 find in the work detail), Aeneid (Dryden, Williams), Manusmriti (Jones 1794 Calcutta, Buhler 1886), and four modern chains that carry the atlas into the present: Comte's Traite de legislation, Dunoyer's Nouveau traite, Fichte's Zuruckforderung and the Diwan-e-Ghalib, each crossing into English at New Delhi in 2026 with David Hart and Adnan Abbasi recorded as carriers. Dataset: 27 works, 77 transmissions, 98 people, 13 languages (French, German, Urdu added), 21 places (Florence, Paris, Calcutta, New Delhi, Boston added), 69 sources. Year cap raised to 2100; field enum gains poetry, history, law, economics. All new works carry read_slug, so every chain links to its readable copy and back. The book gains two chapters written from the corpus through our own MCP: 'Carried by mention' (Comte, Dunoyer, Fichte: books famous in citation and absent in translation, with the wiki layer's cosine arithmetic finding Fichte's nearest neighbor is Comte) and 'The fascination clause' (the modern crossing: Ghalib's radif kept in English, the AI-assisted workshop under Hunayn's Risala standard, the carrier who can read his own atlas entry). Afterword renumbered to 12. /about#sources now credits David Hart's Digital Library of Liberty and Power and the Perseus Digital Library by name. Claims authored conservatively pending the adversarial verification pass (running); corrections land as a follow-up. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…d; Hart kept credit-only
Ten research agents audited the new chains and chapters. Corrections
applied: Chapman's Odyssey tightened to 1614-15; Pilatus's Odyssey
finished by 1362; Butler's Perseus revisers named (Timothy Power and
Gregory Nagy, from the TEI headers) in corpus credits and chain notes;
the Arabic Ethics credited as composite (Ishaq books I-IV, Ustath of
al-Kindi's circle V-X; new person record); Grosseteste hedged to first
complete Latin to survive and circulate; Dale corrected to 1848-49 and
upgraded to attested; Creech qualified as first PUBLISHED complete
English (Hutchinson's manuscript stayed unprinted until 1996); the
Lucretius detail no longer implies O and Q enabled the 1417 find;
Jones reframed as proposing the project to Cornwallis himself, with
his 1785 'mercy of our pandits' letter; Republic phrased as 'no
complete Arabic translation survives'.
The big catch: complete English Ghalib translations exist (Niazi 2002,
Rahman 2003), so chapter 11 and the atlas entry now claim what is
true: the form at scale, refrains intact, originals on every line.
Chapter 10 gains the attested Heliopolis false imprint ('in the last
year of the old darkness', actually Danzig) and Mill quoting Dunoyer;
chapter 11 gains Azad fleeing the 1857 sack with Zauq's ghazals.
Hart licensing verdict: no license statement exists anywhere on
davidmhart.com, so his own 2025-26 translations stay OUT (credit-only,
permission email drafted for Adnan). scripts/hart/ingest.ts ingests
public-domain editions only; first: Molinari's Society of Tomorrow
(1904 Lee Warner translation), 32 chapters, 50k words. Corpus: 52
works. apply.ts generalized to any works/audit input pair.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ir own book
The style-and-claims tester caught the chapters inventing
back-references: phantom patrons attributed to chapter four (Hunayn
never had caliphs there; ch3 says his work was privately
commissioned), a 'rarest pattern' chapter five never claimed, a
machines-check-librarians line chapter nine never wrote, London
subscription lists the book never mentioned, and a Comte volume
undercount. All grounded now in what the chapters actually say
(the caliph's rush order for the Topics, the Bakhtishu commissions,
Cambridge's debtors' prison letter, Comte's six volumes), with
negative-parallelism and tricolon density thinned, the Comte quote's
silent truncation marked, and Zauq's radif quoted as the corpus
actually has it ('pain, it struck home').
Molinari's Society of Tomorrow now credits its real translator,
P. H. Lee Warner (1904), instead of thothica: the Hart ingester sets
per-chapter translator credits the way the Perseus one does.
All four testers otherwise green: zero broken links across 6,013
hrefs on the new surfaces, all 13 reconciliation pairs verified both
ways, pagefind serving Aphorisms and Molinari content, corpus
integrity 52/52 works with clean bodies, naql parity build green.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… works Per Adnan: nothing Hart translated himself, only what is originally English or out-of-copyright. Ingested from the Digital Library of Liberty and Power's Guillaumin reader editions: Paine's Common Sense (1776 first-edition text, 4 chapters), Adam Smith's Wealth of Nations (Cannan edition 1904, 36 chapters, 390k words) and Mill's Principles of Political Economy (7th ed. 1871, 76 chapters, 392k words). The Hart ingester gained an original_english mode (stored as 'original' variants, no translator credit, description without a translation line), a heading_allow regex for the multi-pane reader pages that mix chrome with content, and cleanup passes for Cannan-style margin-note spans, bracketed endnote anchors, and edition page markers ([ I-6 ], [ II-410 ], [ 113 ]); the bracket inventory after ingestion is one [English] and one numeric remnant across 780k words. ChapterBody's pagefind condition now indexes native-English originals alongside translations, matching the MCP's english search scope. Build: 2,205 pages indexed, +116 = exactly the new chapters. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…de, FTS5 search The corpus now carries every work in the PerseusDL canonical repos with an English translation: 826 archive works ingested (only 5 minor Homeric Hymns yielded no extractable chapters), joining the curated catalog for 879 works, 8,456 chapters, ~751MB of text. 793 works pair the source-language original with the translation; the originals come from the canonical repos and, where those carry no edition (Didache, the Clement epistles, the apostolic-era texts), from OpenGreekAndLatin/First1KGreek. Greek searches hit Greek: the index answers the Odyssey's first words in under a millisecond. Pipeline: scripts/perseus/enumerate.ts walks blobless clones of both canonical repos; ingest-archive.ts processes the worklist in batches of 25 with applies after every batch, a 2.5GB disk watermark with a resumable remainder file, catalog-driven metadata (textgroup author, edition translator credits), and a URN-to-file fallback that recovered Trachiniae and Athenaeus. Chapter divisions honor book, chapter, poem, act, and now letter, which turned Cicero's two letter collections from single 200k-word slabs into 445 and 454 individually citable letters. Shared TEI flattening lives in scripts/perseus/lib.ts. Search plumbing (the SQLite FTS5 decision from the Perseus launch eng review, now real): scripts/build-search-index.ts builds corpus/search.db (705,114 paragraphs, 424MB, gitignored, rebuilt by script) over the paragraph sidecars; the MCP's search_corpus uses it through node:sqlite in production and bun:sqlite from source, phrase match first, token AND second, legacy filesystem scan as the fallback for index misses and case-sensitive queries. Measured: 597ms scan to 2.6ms FTS on the same query. Pagefind on the site side: 9,592 indexed fragments, up from 2,107. Tests: corpus integrity sampled at scale (zero TEI artifacts in 461 sampled variants), MCP smoke 8/8 including Greek original search, work-scoped queries, and paragraph round-trips. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
… first run The corpus outgrew the tarball, so the package now ships code only (104 KB, was ~104 MB). On first run the server downloads the corpus-2026-06-11 release asset (185 MB compressed, 879 works), verifies its sha256, extracts to the user cache (XDG/macOS Caches), and builds the FTS5 search index locally via node:sqlite, falling back to the filesystem scan where sqlite is unavailable. All progress on stderr; stdout stays MCP-clean. Cold-tested end to end from the packed tarball with an empty cache: download, checksum, extraction, index build, then over live stdio: initialize (0.2.0), list_works genre=Classics -> 824, search_corpus 'wrath of Achilles' -> fts5 engine, 3 hits, 1 ms, get_passage round-trip, and Greek 'μῆνιν ἄειδε' -> the Iliad's original variant. prepack no longer copies the corpus; READMEs tell the truth about the download size. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…Livy mended One regex family caused three classes of damage: some Perseus catalog files write <ti:edition> and some write plain <edition> in the same namespace, and the parsers only matched the prefixed form. Where a catalog used the bare form we lost the source-language edition (54 Latin works without originals, Virgil's Eclogues among them), the textgroup author (52 archive works credited to 'Unknown' when the catalog knew Petronius, Phaedrus, Livy), and translator credits (Epictetus' Fragments are Thomas Wentworth Higginson's, the Eclogues James Rhoades's). All catalog regexes now accept both forms. 53 affected works re-ingested with corrected authors, credits and originals; 63 stale slugs pruned (the author is part of the slug, so a fixed author means a new address) via the new scripts/perseus/prune.ts, including ten per-book Livy fragments that duplicated Ab urbe condita books 1-10. Livy now stands as Titus Livius: Ab urbe condita, ten books, 275k words, Latin paired with the Roberts translation, with the periochae of the lost books and books 21-45 properly attributed alongside. Corpus: 860 works, 8,426 chapters; search.db rebuilt (709,276 paragraphs). Verified: 'Tityre tu patulae' hits the Eclogues Latin original through FTS5. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ll 'Ancient' The full-archive ingester labelled every Perseus work 'Ancient' as a placeholder. A philologist agent classified all 80 authors by floruit into five periods, an adversarial checker verified the dates against reference scholarship (one correction: Dinarchus is early Hellenistic, not Classical, since he kept working after 321 BCE), and the result is applied here. Boundaries: Classical (to 323 BCE), Hellenistic (323-31 BCE, the Hellenistic Greek world and the Roman Republic), Imperial (31 BCE-284 CE, Augustan Rome, the high Empire, the Second Sophistic, the New Testament and Apostolic Fathers), Late Antiquity (284-600 CE), Medieval (after 600). Distribution: Classical 308, Hellenistic 120, Imperial 364, Late Antiquity 26, Medieval 15; the Indic smritis keep their own 'Ancient' (a different civilization, correctly not folded into the Greek scheme). Two 'Unknown' Latin works got their real authors back along the way: the Ecclesiastical History of the English Nation is Bede (Medieval), and the Loeb 'Select Letters' (stoa0040.stoa011) are Augustine's Epistulae (Late Antiquity). scripts/perseus/reclassify-eras.ts patches index.md frontmatter and the manifest facet from era-map.json. The homepage strip and /eras order the new periods chronologically, each with an editorial intro. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Nine commits, one day's work, three phases.
Two surfaces, one site. The reader surface (Read, Atlas, Book, About) is the default; the technical artifact lives behind /engine with the existing URLs untouched. The homepage leads with the library and closes with why it exists.
The justification layer. The Naql transmission atlas renders at /atlas (27 works, 77 transmissions, 99 people, isnad-graded, reaching the year 2026 with the project's own crossings as entries) and the book Carried Across reads at /book, thirteen chapters including two written from the corpus through the project's own MCP. Atlas and library link each other both ways. All historical claims passed a ten-agent adversarial verification pass; the test fleet's findings (including the chapters misquoting their own earlier chapters) were repaired before landing.
The whole of Perseus. Every work in the PerseusDL canonical repos with an English translation: 879 works, 8,456 chapters, with the Greek or Latin original paired beside the translation on 793 of them (First1KGreek fills where the canonical repos lack editions). Cicero's letter collections are 445 and 454 individually citable letters. Plus David Hart's public-domain shelf: Smith's Wealth of Nations (Cannan 1904), Mill's Principles, Paine's Common Sense, the 1904 Molinari (his own 2025-26 translations deliberately excluded; no license on the site).
Search at scale. SQLite FTS5 over 705k paragraphs (the plumbing the launch eng review locked): MCP queries at 1-9ms against ~600ms scans, Greek queries hitting Greek, legacy scan preserved as fallback. corpus/search.db is gitignored and rebuilt by scripts/build-search-index.ts. Pagefind: 9,592 fragments.
Merging deploys falsafa.ai. Two things to know first: @falsafa/mcp can no longer bundle this corpus in its tarball (distribution needs the CDN/lazy plan before the next publish), and the Vercel build must run scripts/build-search-index.ts or skip it gracefully (the MCP falls back to scan without it).
🤖 Generated with Claude Code