Skip to content
@internetarchive

Internet Archive

The Internet Archive is "the library of the Internet", and a big supporter of Free Software.

Pinned Loading

  1. openlibrary Public

    One webpage for every book ever published!

    Python 5.8k 1.6k

  2. bookreader Public

    The Internet Archive BookReader

    JavaScript 1.1k 440

  3. heritrix3 Public

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

    Java 3k 766

  4. cicd Public

    build & test using github registry; deploy to nomad clusters

    19 2

Repositories

Showing 10 of 263 repositories
  • Zeno Public

    State-of-the-art web crawler 🔱

    Go 283 AGPL-3.0 41 29 (3 issues need help) 4 Updated Jul 22, 2025
  • brozzler Public

    brozzler - distributed browser-based web crawler

    Python 725 Apache-2.0 103 34 17 Updated Jul 21, 2025
  • warcprox Public

    WARC writing MITM HTTP/S proxy

    Python 415 60 19 2 Updated Jul 22, 2025
  • iaridash Public

    IARI Dashboard

    JavaScript 2 AGPL-3.0 0 0 0 Updated Jul 21, 2025
  • iari Public

    Import workflows for the Wikipedia Citations Database

    Python 12 GPL-3.0 9 56 0 Updated Jul 21, 2025
  • wayback-machine-webextension Public

    A web browser extension for Chrome, Firefox, Edge, and Safari 14.

    JavaScript 721 AGPL-3.0 218 78 8 Updated Jul 21, 2025
  • gocrawlhq Public

    Go client for Crawl HQ v3

    Go 2 AGPL-3.0 2 0 0 Updated Jul 21, 2025
  • openlibrary Public

    One webpage for every book ever published!

    Python 5,752 AGPL-3.0 1,578 775 (18 issues need help) 120 Updated Jul 21, 2025
  • HTML 7 2 1 1 Updated Jul 21, 2025
  • openlibrary-bots Public

    A repository of cleanup bots implementing the openlibrary-client

    Python 72 54 27 (3 issues need help) 9 Updated Jul 21, 2025