COBWEB is an effort by the California Digital Library (CDL), Harvard Library, and UCLA Library to create a collaborative collection development platform supporting the creation of comprehensive web archives by coordinating the independent activities of the web archiving community.
For more context about Cobweb, visit https://www.cdlib.org/services/collections/webarchiving/cobweb/
To follow active development of Cobweb, visit https://github.com/CobwebOrg/cobweb-django
Our Vision for the Future
Imagine a fast-moving news event unfolding online via news reports, videos, blogs, and social media. Recognizing the importance of recording this event, a curator immediately creates a new Cobweb project and issues an open call for nominations of relevant web sites. Scholars, subject area specialists, interested members of the public, and event participants themselves quickly respond, contributing to a site list that is more comprehensive than could be created by any one curator or institution. Archiving institutions review the site list and publicly claim responsibility for capturing portions of it that are consistent with local collection development policies and technical capacities. After capture, the institutions’ holdings information is updated in Cobweb to disclose the various collections containing newly available content. By distributing the responsibility, more content is captured more quickly with less overall effort than would otherwise be possible.
The demands of archiving the web in comprehensive breadth or thematic depth exceed the technical and financial capacity of any single institution. This gives greater impetus to the desirability of community-based cooperation, which is dependent on automated support for facilitating coordination of distributed responsibilities. However, as identified in a recent environmental scan by the Harvard Library, there currently are no effective means for curators or researchers to know what is or is not being captured and archived by others, resulting in “duplication or gaps in coverage and siloed collections” (Harvard Library, 2016). Even the Internet Archive (IA) currently supports search by known URL only. This means that IA “will allow you to find a needle in a haystack, but only if you already know approximately where the needle is” (Broussard, The Atlantic, November 20, 2015). The Memento protocol is another initiative that aids discovery, but again, only if a desired URL is known in advance.
The International Internet Preservation Consortium (IIPC), of which all three project partners are members, has tried collaborative collecting relying on a nomination tool from the University of North Texas (UNT) and other ad hoc methods such as spreadsheets and email. While a valuable resource, the UNT tool supports nomination only and does not support other critical collecting activities; in particular, it has no mechanisms for indicating either an institution’s collecting intentions or its actual holdings. Archive-It (AIT), IA’s subscription service, has been used often for cross-institutional projects, however, IA does not have the legal, managerial, or technical infrastructure to support large-scale, cross-institutional collecting, especially when the collaborating institutions do not already have formal AIT agreements.
Collaborative Collection Development
The platform will further IMLS’s efforts towards developing a national digital platform for managing our digital heritage, helping libraries and archives make better informed decisions regarding the allocation of their resources, and promoting effective institutional collaboration and sharing. It also addresses IMLS’s strategic goals by facilitating learning through more effective discovery, and ultimate use, of relevant content; permitting libraries and archives to be more responsive to the needs of their constituencies by letting them scale their efforts to their capabilities; and increasing the overall efficiency of collaborative solutions to common problems.
Partners and Stakeholders
The Cobweb project is a partnership of the CDL, Harvard Library, and UCLA Library, which have extensive expertise in web archiving, digital library infrastructure and services, collection development policy, and software development. An external advisory board will review and provide input throughout the project. The partners also will work in consultation with an informal but engaged stakeholder group for input and feedback to an iterative development process. Stakeholders include the IIPC, IA/AIT, Library of Congress, George Washington University Libraries, MIT, the New York Art Resources Consortium (NYARC), Old Dominion University, Stanford University Libraries, UNT, and others interested in adopting and contributing to the platform.
This project was made possible in part by the Institute of Museum and Library Services, #LG-70-16-0093-16. The views, findings, conclusions or recommendations expressed in this wiki do not necessarily represent those of the Institute of Museum and Library Services.