Collaborative collection development for web archives
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


COBWEB is an effort by the California Digital Library (CDL), Harvard Library, and UCLA Library to create a collaborative collection development platform supporting the creation of comprehensive web archives by coordinating the independent activities of the web archiving community.

For more context about Cobweb, visit

To follow active development of Cobweb, visit

Our Vision for the Future

Imagine a fast-moving news event unfolding online via news reports, videos, blogs, and social media. Recognizing the importance of recording this event, a curator immediately creates a new Cobweb project and issues an open call for nominations of relevant web sites. Scholars, subject area specialists, interested members of the public, and event participants themselves quickly respond, contributing to a site list that is more comprehensive than could be created by any one curator or institution. Archiving institutions review the site list and publicly claim responsibility for capturing portions of it that are consistent with local collection development policies and technical capacities. After capture, the institutions’ holdings information is updated in Cobweb to disclose the various collections containing newly available content. By distributing the responsibility, more content is captured more quickly with less overall effort than would otherwise be possible.

Current Challenges

The demands of archiving the web in comprehensive breadth or thematic depth exceed the technical and financial capacity of any single institution. This gives greater impetus to the desirability of community-based cooperation, which is dependent on automated support for facilitating coordination of distributed responsibilities. However, as identified in a recent environmental scan by the Harvard Library, there currently are no effective means for curators or researchers to know what is or is not being captured and archived by others, resulting in “duplication or gaps in coverage and siloed collections” (Harvard Library, 2016). Even the Internet Archive (IA) currently supports search by known URL only. This means that IA “will allow you to find a needle in a haystack, but only if you already know approximately where the needle is” (Broussard, The Atlantic, November 20, 2015). The Memento protocol is another initiative that aids discovery, but again, only if a desired URL is known in advance.

The International Internet Preservation Consortium (IIPC), of which all three project partners are members, has tried collaborative collecting relying on a nomination tool from the University of North Texas (UNT) and other ad hoc methods such as spreadsheets and email. While a valuable resource, the UNT tool supports nomination only and does not support other critical collecting activities; in particular, it has no mechanisms for indicating either an institution’s collecting intentions or its actual holdings. Archive-It (AIT), IA’s subscription service, has been used often for cross-institutional projects, however, IA does not have the legal, managerial, or technical infrastructure to support large-scale, cross-institutional collecting, especially when the collaborating institutions do not already have formal AIT agreements.

Collaborative Collection Development

While there are a number of tools that address some aspects of the collaborative collection development problem, they do not form a single integrated system as is envisioned with the Cobweb platform. As a centralized catalog of aggregated collection- and seed-level descriptive metadata, Cobweb will enable a range of desirable collaborative, coordinated, and complementary collecting activities by supporting three key functions: nominating, claiming, and holdings. The nomination function will let curators and stakeholders suggest web sites pertinent to specific thematic areas and provide seed-level descriptive metadata; the claiming function will allow archival programs to indicate an intention to capture some subset of nominated sites; and the holdings function will allow programs to document captured sites along with their collection-level description, structural and temporal scope, preservation policies, and terms of use. Cobweb will leverage existing tools and sources of archival information, exploiting, for example, the APIs being developed for AIT to retrieve holdings information for over 3,500 collections from 350 institutions.

The platform will further IMLS’s efforts towards developing a national digital platform for managing our digital heritage, helping libraries and archives make better informed decisions regarding the allocation of their resources, and promoting effective institutional collaboration and sharing. It also addresses IMLS’s strategic goals by facilitating learning through more effective discovery, and ultimate use, of relevant content; permitting libraries and archives to be more responsive to the needs of their constituencies by letting them scale their efforts to their capabilities; and increasing the overall efficiency of collaborative solutions to common problems.

Partners and Stakeholders

The Cobweb project is a partnership of the CDL, Harvard Library, and UCLA Library, which have extensive expertise in web archiving, digital library infrastructure and services, collection development policy, and software development. An external advisory board will review and provide input throughout the project. The partners also will work in consultation with an informal but engaged stakeholder group for input and feedback to an iterative development process. Stakeholders include the IIPC, IA/AIT, Library of Congress, George Washington University Libraries, MIT, the New York Art Resources Consortium (NYARC), Old Dominion University, Stanford University Libraries, UNT, and others interested in adopting and contributing to the platform.


This project was made possible in part by the Institute of Museum and Library Services, #LG-70-16-0093-16. The views, findings, conclusions or recommendations expressed in this wiki do not necessarily represent those of the Institute of Museum and Library Services.