Design Principles for System Features & Capabilities
We are collaborating across the Library of Congress to support this vision: creating new pathways of engagement, scholarship, and serving the public—while making history with people, staff, collections and systems—by inviting public participation in a transformative project that improves discovery of the treasures of the Library of Congress.
Two principles guide these proposed features and the ways we can launch, manage, and sustain a crowdsourced transcription and tagging platform at the Library of Congress:
Trust and Approachability
The system features and capabilities envisioned below will support a user-centered crowdsourcing initiative through functionality and a program of engagement.
Participation by many audiences in the same place, at the same time
We have a unique opportunity to support many audiences including, but not limited to, a curious public, researchers, teachers, students, those seeking volunteer opportunities, and staff. This crowdsourcing platform presents a digital space in which audiences with different, but possibly overlapping interests, skill levels, and familiarity with the Library of Congress may gather together. This vision builds on the work Library of Congress staff have invested in crafting programs that serve tanging audiences based on their specific needs. We are likely to have a dynamic and constantly changing atmosphere of participants and collections; here are some essential approaches and features to accommodate many participants in the same place, at the same time.
Lowest barrier to entry: No accounts necessary (but possible)
To ensure that every visitor can contribute and begin their Library of Congress crowdsourcing experience immediately, it would be best that accounts are not required to participate. Accounts may still be possible but not set as a prerequisite to contribute. Entries by "anonymous" users can be assessed separately or in a wider pool of contributions. As discussed below, it will be important to understand activity on the site; therefore, care should be taken to inform even anonymous users about the ways their behavior is captured. It is essential that these details are explained with clear language about the need for and use of that information.
Library of Congress audiences and visitors have a range of needs. This project should at all times meet the needs of all audiences. Examples of ways to create a welcoming and accommodating space include making it clear how to customize font sizes, interface adjustments, font styles and color choices, and plain language for instructions and project contexts. There are opportunities to invite participation in the system and use of the collections in many ways including reviewing and integrating voice to text capabilities, particularly for tagging. By meeting and thoughtfully surpassing compliance standards, the Library of Congress can create pathways for increased accessibility within the transcription workflow - such as the simple addition of a field to create alt text for the images. Captioned video content to support collections contexts, blog posts about discoveries and activity updates, and other featured content would also foreground that all audiences, even those who are not directly transcribing or tagging, are invited.
Multiple entry points
We should anticipate visitors entering from many different paths: whether a visitor is routed through the homepage into a collection and then selects an individual page to transcribe and tag; or referred by a post on a social network; or uncovered in internet search engine results; or introduced via https://loc.gov or simply word of mouth; or returning to transcribe and tag again via their own account page. It should always be clear how a visitor would navigate to the instructions, to an overview of the crowdsourcing program, to move to one's account pages (if one has created an account), and to navigate "up" to the project or campaign page(s), if entering on an individual transcription or tagging page from persistent unique URL.
Many Paths through Digital Collections
Heterogeneous subjects, forms, stories, and tasks are key to sustaining a long-term and representative crowdsourcing project. It is likely that many visitors will follow the sequential presentation of images, just as they were reading a diary or letter, or following other documentation as it unfolds. However, other visitors may wish to explore pages across collections with minimal reorientation to the home or collection-level pages. Still other volunteers may wish to change tasks between transcribing and tagging, or performing review passes, while participating. Community managers may also wish to lead visitors to targeted activity. There are unique opportunities to use the tool with prepared and simultaneous programming around the collections and campaigns. It may be helpful to have "tracks" for the multiple audiences interacting in this dynamic space.
Make Sense Quickly: Invitations to Contribute through Information Architecture
Upon entering the platform, a visitor should understand what tasks are requested and the goals of the program. It should be easy to explore featured collections by tags identified by community managers and curators; it should also be possible to explore collections based on community or self-created tags. Some tag categories that might connect to audience interests include location, era, and event. It should also be easy to understand the historical contexts of the collections, including difficult topics and subject matter.
Make it Easy to Participate
From entry into the site, volunteers should feel welcomed and within only a few steps of getting started with contributing. Clear instructions that are accessible from many different points that incorporate illustrative examples; perhaps even instruction or participation support in varying formats, such as video. Helpful tips about material culture and practice, such as handwriting or paleography resources may further help. Technical features that support accuracy and ease of contribution include annotation tools and editors. Providing a mechanism for volunteers to elect to receive timely feedback on their efforts and provide similar to others are just some community management approaches that can support accuracy and ease of contribution.
Serving Completed Data in Multiple Formats
Text and tags that are created by volunteers as they participate will be used to improve the ways collections can be found and connected. These forms of data should also be made available as project level JSON, CSV, and XML files, as well as a full corpus of completed transcription text. Ideally, these forms or data would be presented within the site on a "data" or research page. It should also be made clear the ways in which visitors might use the loc.gov interface to download individual images; and the loc.gov JSON API to download images and data from digital collections.
Responsibly Share Code
Thoroughly documenting and sharing the design decisions that inform the crafting of the codebase will ensure that other libraries, cultural heritage organizations, and educational bodies can best decide if this tool is appropriate for their needs and matches the skills and resources they have to apply to a web-application based participatory project. We should appropriately document and make available the source code for the tool, as well as other technical considerations and design decisions. Furthermore, lessons learned in the process of developing and improving the underlying tool, as well as the program(s) of engagement made possible by the affordances of the tool, should be shared openly as supporting documentation and stand-alone considerations.
There are many ways to signal respect for the contributions of volunteers while making the experience of participating most rewarding. One way is to maximize the extent of collaboration by better integrating opportunities to "positively compound" participation within the process of transcription, tagging, and review. Most existing tools use a blend of asynchronous transcription and either algorithmic review (matching) or volunteer review. Showcasing interpretation and activity by other volunteers can help participants craft a shared, agreed upon, and quality version of transcribed text. Displaying tags as categorization in the form of arranged and connected knowledge allows participants to crosswalk understanding, opening possibilities of new discovery and connection to the project mission and one another.
Analyze activity & assess participant motivations to improve experiences
It will be imperative to responsibly gather, analyze, and share information about activity in the crowdsourcing platform. This information will be used to best communicate, improve upon existing capabilities, and extend future possibilities of the tool and program of engagement. This section discusses features and capabilities to support better experiences for all participants, from volunteers to staff.
Privacy & Ethical Use of Data
It is imperative that we responsibly and carefully define the need to collect data - whether that is about activity, location, or other forms of information - and connect that explicitly to intended uses of that data to understand audience needs and improve service. However, it should not be required or possible to track users through their visits. Furthermore, it should be possible for users to have their accounts deleted and account information purged from the site, while retaining their anonymized contributions.
No Accounts Necessary... but also creating an Account is possible
As noted above, it should be as easy as possible for visitors to become volunteers. However, creating an account within the system will allow participants to recall their activity, customize their experience, and fulfill needs for reporting should their motivations relate to formal volunteering or school assignments. We should explain to volunteers that creating an account affords additional possibilities including estimating participation. Accounts would also allow volunteers to experience other benefits: invitations to webinars with curators, alerts about new or related collections, and notifications of completion of projects or subjects of interest perhaps based on tags and other self-selection.
Understanding for Community Management
For the health and future of the crowdsourcing initiative, gathering information about the efforts and communication of volunteers imperative. Presenting a clear snapshot of activity, recent discussion, possible roadblocks, and upcoming campaigns or communication would aid community management through quick assessment of the ecosystem's state. Easy access to "live data," rapidly gathering the pulse on high traffic projects or energetic discussion, and performance of ongoing campaigns are key needs for community managers. This scope of activity information would also be useful for community managers in discreetly and sparingly offering volunteers opportunities to re-engage with new collections.
Motivations meet Behavior
We can help identify needed capabilities or features by surfacing the reasons people wish to participate and mapping patterns of activity in the system to these goals. The experience of participating in this project will be dynamic. Our visitors will not cleanly map onto single personas because their reasons for participating may change from visit to visit and over time. Common motivations include pursuing personal learning objectives, contributing to something greater and access to open knowledge, and to fulfill course or volunteer requirements. In 2014, researchers and project managers from the Zooniverse described the benefits of designing digital citizen science projects for participants with limited time and commitment; Eveleigh et al termed this approach "designing for dabblers." Crafting workflows that support bite-sized or small targeted tasks can meet the needs of this type of volunteer who may fit in participation in the crowdsourcing platform amid their other interests and activities. Other volunteers may seek more immersive experiences. Furthermore, students and other volunteers may have participation targets to meet during their time on site.
From user interfaces to tracks, there are opportunities to design a site that enables immersive, deep engagement with stories in the collections, and to find flow in serialized tasks; yet also makes it possible to step out of the workflow quickly and with confidence that one's contributions will be retained and valued. Success may not always mean pace of transcription but rather the attainment of a visitor's goals, whether to learn, fulfill volunteering hours, or make a meaningful contribution to the Library of Congress.
Volunteers with accounts should be able to understand their own activity
Participants in the crowdsourcing program who have created accounts should be able to quickly access, understand, filter or scope, print or download summaries of their activity. This information should be presented to them in a visually dynamic and customizable (or filtered) manner. It should also represent their cumulative activity as well as recent tasks, allow them to pick up where they left off, share or engage others with their own contributions, and perhaps suggest related content via tags or campaigns.
Ability to gather feedback from participants about their needs
Responses, questions, feedback, and other means of communication between participants and community managers will be essential to the health of the project. Beyond a feedback button or comment form, community managers will want to be able to assess participation activity data as feedback on complexities of the collections, spot barriers and drop offs, identify stickiness, and sight opportunities in content and behavior.
Tell a Public Story by Displaying Activity
Presenting information about system-wide activity, as well as individual activity, creates opportunities for shared understanding of the progress toward collective goals, as well as individualized approaches. This information can also be presented on an About or Homepage to quickly convey a sense of participation--via dynamic and perhaps interactive charts, torque maps, timelines, and more--and encourage visitors to join the activity.
Reporting for Staff and Organization
Being able to provide reporting that integrates with or corresponds to staff workflows will help this project become more closely connected to regular activity in the organization. Information gathered about project performance might include the number of pages per collection, the number of volunteers that contributed, the number of views and engagement with the object in loc.gov/Project One, and details about visits including generalized time on site and return visits that engage with collections they've shared. It should be possible to query this data for custom reports, and include filters or other bounding options for specific time frames. Careful consideration should be taken around visitor information; never achieving tracking of users, and at all times generalized and disconnected from any cross-walkable search.
Manage & Match Collections to Tasks to Data and more
These activities focus on engaging audiences with Library of Congress collections. Simultaneously, we strive to design a series of tasks and support that result in data that may be applied to improve access to those Library collections. This section recommends tasks, design decisions, and processes to achieve these goals.
Tasks: Transcription AND Tagging
Of a range of possible tasks, we've identified transcription and tagging to be most relevant to our goals to improve search and identification within Library of Congress collections. These two tasks can be mapped to the digital content lifecycle in the description phase. These tasks are envisioned as a means of creating asset or page level text that can enhance discovery of and access to the Library of Congress digital collections.
Based on the goals of engaging audiences and creating useful text for discovery, legibility, and access, the recommended transcription process is one that facilitates asynchronous but rapid access transcription; asks for minimal interaction with the collection asset image; situates the transcription window adjacent to the collection asset image; offers an adjustable and immersive presentation, if desired; "positively compounds" the efforts of participants; honors the time and contributions of participants; visually supports identification of possible errors or areas needing additional work; and connects to the context of the object in focus, whether smooth transition between pages before and after or to the catalog information and project description.
The goals of tagging are to classify or categorize the content in such a way as to make it discoverable for future use. It is recommended that the tagging feature be built to accommodate set(s) of collection level and platform-wide controlled vocabularies, as well as to accept crowd-generated tags. The former set of tags would allow the platform to leverage existing subject headings, known and popular loc.gov search terms, and other forms of metadata already associated with the collections. The latter tagging capability would allow volunteers to customize their experiences with collections once these tags were made available to search or surface content across the platform; it would also allow community managers to organize campaigns at the point of import and feature this content both on the homepage and highlighted throughout the experience of transcription and tagging. Finally, a visual coding of the tags would offer an opportunity for a subtle contextualization of staff and Library of Congress generated tags and those created by participants.
Task Ecosystem: Tasks in the Workflow
Projects and pilots including Beyond Words have demonstrated that the transcription and tagging tasks may successful be presented together in the crowdsourced workflow; however it should be possible for participants to elect to only complete one task, if they prefer. It is also possible that these tasks might be best presented in separate interfaces or as distinct workflows. The tasks should also be introduced to participants by describing the goals of the information that is being created, examples of how and where it will be discoverable after volunteers have contributed. Specifically, care should be taken to communicate the ways that the information created will be used—whether for search, for research, for publication and display, or to improve the features of the system—as these details will shape the ways the tasks are undertaken by volunteers.
Agreement around the final version of transcription text will be a process that should be achieved through displaying the work of volunteers and facilitating a process in which they can negotiate to reach consensus of the completeness and quality of the text. The collections we will ask the public to transcribe will be of varying format, even changing in format from asset image to asset image. Therefore, there are likely to many points at which interpretation will occur. As a result, workflows that allow participants to work together and create a shared understanding of the asset image (at object/collection level) are most likely to be successful when the opportunity to exchange interpretations are presented throughout the transcription and tagging processes. Features in support of negotiated consensus include an ability to see, mark, report, and/or correct errors in the text; to discuss an asset image or page in a forum; and generalized discussion at the collection level. Another way to permit negotiated consensus is to allow for a peer-review workflow.
Complete the Cycle
It is imperative that the crowd-generated transcription text be returned and served with the loc.gov presentation of the object and at the asset image level. It may be possible to marry this information as metadata or perhaps as data supporting the collection or object. This urgency and responsibility in completing the cycle connects to a user-centered design; building trust with public participants by meeting motivations that relate to contributing to greater access to knowledge, while honoring their time and contribution.
Connect Activity to Collections
At all times it should be possible to navigate from the crowdsourcing project, campaign, or transcription and tagging page to the source object. In the volunteer's account view, it should be possible to navigate from the project to which they contributed to the record of that collection in loc.gov. Furthermore, there are opportunities to connect volunteers to the Ask a Librarian service from the crowdsourcing platform, such as in the discussion section, the project and collection pages, the homepage, and even in the footer.
These key features and approaches should be available for the long-term sustainability of the project:
- Continuity with existing staff workflows, including work within CTS (content transfer system)
- A pipeline for the queue of projects, as well as a monthly mechanism for proposing, identifying, and planning new collections for transcription including a recurring forum for nominating projects - perhaps even a sandbox or test space in which details of nominated collections may be stored or prepared
- Collections management in bulk to support creating an extensive queue; storing offline in advance of campaign, collection, or other programming needs
- Enabling Community Managers to post, launch and queue projects and campaigns via an administrative interface. Over time, this functionality can be expanded with a permissions-based self-service mechanism for curators and collections staff to identify, flag, or queue projects for transcription. This will likely require coherence with Library of Congress web services, Design & Development, as well as the CTS workflow
Applying our APIs
As with other features recommended here, it would be best to seize and map to existing workflows and technologies. A goal for the platform should be to identify and deliver collections via loc.gov API to become transcription projects. Furthermore, allowing the transcription results to be queried via the loc.gov JSON API once associated with the source images or record would be valuable.
The greatest opportunity exists to develop this tool and platform in the open, as open source with appropriate licensing. Documenting development in a repository like GitHub is advisable.
Identify, Articulate, Acknowledge Content Created by Volunteers
Identify content created by participants within loc.gov as volunteer-created. Allow volunteers with an account to retain a pointer to the work they created, with a persistent URL or URI. Actively acknowledge the contributions of volunteers in publications, presentations, social media, and communications outreach. License for content created in the transcription and tagging workflow should be public domain or CC0; it should also be explicitly stated and incorporated as metadata when presented as a dataset and in loc.gov. These contexts should be clearly communicated to volunteers contributing the content as well as researchers engaging with the content.
Data Available for Download
To compliment research and exploration at the object and collection level, providing transcription and tagging results as a bulk data set would allow researchers in a range of disciplines to explore patterns and connections across collections. The transcription text and tags should be licensed as openly as possible and volunteers should be kept informed of the ways their efforts support discovery in the Library of Congress systems, as well as the role their work plays in other scholarly inquiry.
Navigating options to get started
Visitors and volunteers should be able to navigate the available collections in several ways. For example, presenting available projects as a list or set of tiles and incorporating filtering and sorting capability would allow volunteers to swiftly make sense of the active opportunities. Incorporating existing metadata offers other ways to orient volunteers to opportunities. For example, offering selection based on the object time period or era, location metadata, or subject heading. Finally, as participation increases, it may be possible to estimate time to complete a project based on participation data. This approach serves the needs of visitors who wish to dedicate a specific amount of time to their visit to the project.
Sustaining and Improving through Workflows and Spaces of Participation
Once the system has been designed, developed, and launched and once visitors become volunteers, there are endless opportunities to improve and sustain the platform and the program based on the ways people use it, their needs and challenges, and information gathered in the process of participation.
Show the Work!
Displaying the work of others in the system allows participants, whether new or seasoned, to more quickly understand that others are contributing to shared goals. The visibility of content & contribution also signals active collaboration. Furthermore, visual indications of collaboration, whether intended or even in conflict, can be reflected in an interface that shows the efforts of editing. Allowing content to be editable, then marking out those edits—with font, size or other indicator—allows individuals to better understand who and what has come before them on the page.
Create an Atmosphere
Creating a shared understanding of expectations around behavior and communication can help shape expectations of respect and civility in this space. Inviting participants to acknowledge and commit to a Code of Conduct provides them the opportunity to reflect on how they will engage with others, as well as offers them support for framing communication they receive. Examples of public community code of conduct include the Coral Project and Airbnb; the former articulates acceptable and unacceptable behavior and expectations, while the latter asks customers (guests) to sign a code of conduct pledge as part of the booking process.
As participants create transcription and tagged content and more collections are added, plenty of examples to best represent what is needed when encountering decision-making moments will emerge. In the interim, creating clear and annotated examples may be useful to those who are just getting started with transcription and tagging.
Collate and Connect Extraneous Knowledge
As volunteers participate in the crowdsourcing initiative, they will acquire, recall, and perhaps seek to share information and knowledge with others; perhaps to be helpful to other volunteers and, at other times, as an expression of their interest and engagement with Library of Congress collections. They may wish to share examples and further non-Library of Congress resources. A discussion board or forum can help volunteers achieve their motivations of learning, contributing to wider knowledge (a greater good), and to build community or a sense of shared purpose through discussion. It may also be useful to create an asset level discussion for specific or nuanced questions about the asset image or page that is receiving transcription and tagging activity. Impressive examples of discussion spaces in crowdsourcing projects include the Zooniverse talk pages and the Discourse implementation in use by In the Spotlight at the British Library.
Sustain Interest and Increase the Capability of Participants
As described above, visitors may have many different motivations when they first arrive at the crowdsourcing platform; they may also have different motivations each time they arrive on site. Their movement to becoming volunteers may be catalyzed by offering a range of tasks, heterogeneous collections, and oscillating levels of complexity. Furthermore, the system may be built with tracks that prompt a participant to continue in the next step, or asks them to contribute to a new task. It is also possible to blend these approaches and particularly helpful if these are cyclical or distributed opportunities - either naturally based on variety in the collections or designed by community management approaches. Furthermore, offering opportunities to problem-solve, support other volunteers in other roles, level up in transcription and tagging tasks, or self-select for more complex (or pilot) tasks can create a dynamic experience that is rewarding to volunteers and the program ecosystem.
Build Knowledge for Outcomes
It is recommended that Community Managers and staff with collections expertise work together closely to build programming and engagement in relation to the transcription and tagging tasks. This programming could take the shape of essays, webinars, edited video content, chats hosted in discussion pages or social networks, and in-person events. In these activities, persistent URLs for the page, collection, and campaign (or collection of tags) would be required. In the previous section, recommendations were shared for connecting activity to collections. It is also possible to connect collections to catalyze activity; for example, a prompt or pathway from the collection or object Project One page to the crowdsourcing platform.
One way to encourage activity is to offer a low risk point of entry in which a volunteer might practice and keep notes, such as in a sandbox. This space may also make it possible for a volunteer to elect to receive feedback that can increase their confidence, coherence and consistency in transcription and tagging, and connection between their goals and those of the program. Another means of training could be a practice page, potentially offered after a volunteer has made a few contributions or after they have created an account.
The Library of Congress Labs team is well-situated to partner with staff in the Office of the Chief Information Officer and Library Services to continue to run crowdsourcing experiments focused on tasks, cataloging and metadata, and machine learning. Examples include working with Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) outputs to assess quality of transcription & assistance with tagging; parsing task workflows; repeated passes on the same material for thematic, semantic, or other tagging; named entity recognition and more. Furthermore, there are opportunities to integrate other collections and materials into a crowdsourced workflow including audio-visual and time-based media objects. Creating an extensible tool or system into which features that emerge from experiments could be added would support the evolution of the project, as well as create opportunities for volunteers to grow their skills and knowledge.