The Mismatches project is a research project on how mismatched conceptualizations between upstream maintainers and downstream users of a Free and Open Source (FOSS) digital infrastructure project interact to affect the community health and thus sustainability of such projects.
It is part of the Ford/Sloan cohort on digital infrastructure research and is based at RIT. Steve Jacobs and Mel Chua lead the research team.
This webpage provides a current summary of what's going on. We are currently wrapping up our third and final interview rounds and preparing for an in-person analysis deep-dive in early November with consultants from our project community. For up-to-date access to interview transcripts and project files, see our github repo.
What is your research question?
How do mismatched conceptualizations between upstream maintainers and downstream users of a Free and Open Source (FOSS) digital infrastructure project interact to affect the community health and thus sustainability of such projects?
More specifically: how do developers who maintain commonly-used FOSS projects compare to developers who use those same projects in terms of how they conceptualize:
- ontologies of an ideal, well-maintained and sustainable FOSS project community, i.e. which elements are present, how they relate, etc.
- which parties are responsible for which elements, and
- the state of that FOSS project relative to their ontologies of ideal projects?
Why is it important to answer?
In order to address systems-level problems together, we must first have shared conceptualizations of what those systems and their problems might be. Unknowingly holding different conceptualizations can lead to problems being overlooked, dismissed, or addressed with the wrong resources. In our combined 2 decades of work with FOSS community "upstreams" and the corporations/government agencies/NGO "downstreams" that rely on their products, we've noticed many of these types of mismatches, and have spent a great deal of time addressing misconceptions on both sides.
These communication issues stem in part from fundamentally different ontologies, or conceptions of reality, regarding "how FOSS communities work." These are not simply disagreements over how project communities are doing relative to a shared scale, or even what those shared scales might be, but completely disjoint conceptualizations of what FOSS project communities are and how they operate in the first place.
This project investigates how the diversity of conceptualizations both helps and hinders efforts to improve FOSS community health. Studying these interacting ontologies can inform how upstreams and downstreams develop shared understandings of what improvements are needed, and can also make-visible what inefficiencies still require intervention after conceptualization and communication gaps have been addressed.
What methods will you use to answer it?
We are using narrative interview methodology with a publicly viewable dialogue structured over multiple rounds. Interviews will be situated within a critical FOSS infrastructure project - something that a lot of projects use, and that would be disruptive if it went down. In this case, PyPI was chosen as the project case study -- it is the package index for one of the most widely used programming languages in the world (as of 2019), and had disruptive outages until a recent rewrite specifically aimed at addressing the project's sustainability concerns.
We are interviewing both (A) maintainers of PyPI and (B) technical downstream users of the same regarding their conceptualizations of (1) how healthy FOSS communities work, (2) who is responsible for aspects of that health, and (3) the state of that particular FOSS project's community. The entire research project design, including interview protocol, consent forms, recruitment text, etc. is available in our github repo.
Interview excerpts answering these questions are being circulated amongst participants in subsequent interview rounds, both within groups (maintainers see other maintainers' responses, downstreams see other downstreams' responses) and across groups (maintainers see downstream responses, downstreams see maintainer responses). To foster public dialogue, transcript excerpts will be made available under an open license as the interviews progress, with participant consent.
Transcripts will be analyzed for ontologies, or underlying conceptualizations of FOSS projects presupposed by interview narratives. Ontological shifts will be tracked as participants respond to each other. Analysis will also be public and open-licensed, giving the FOSS community an opportunity to see how research of this sort is done. Interim analytical notes are posted in our github repo as they are created.
What data or resources will you use to answer it?
This project relies on three sets of data/resources.
The first and most important source is the interview corpus with upstream and downstream developers that we will collect as part of the project. We are interviewing 3 upstream and 3 downstream developers for the target FOSS infrastructure project, which is PyPI. The raw data is being posted to our github repo as these 6 narrators open-license their transcripts. They have consented to being publicly identified as part of our radically transparent research approach, and we would not be able to do this project without them -- thank you, Naomi Ceder, Terri Oda, Jackie Kazil, Nick Coghlan, Donald Stufft, and Ernest Durbin III!
The second is existing literature, both scholarly and non-scholarly (i.e. books, blogs, etc.) on FOSS community dynamics. These serve as sources of additional conceptualizations of FOSS communities as well as venues to engage in dialogues about them.
The third consists of analytics and metrics on software projects such as those run by Libraries.io, Bitergia, Black Duck and CHAOSS. These will be used to identify which project communities to approach regarding participation at the outset, as well as a means for discussing project health and growth during the study/intervention.
What is your vision of success?
Our initial vision was to identify different conceptualizations of sustainability and being well-maintained, and then identify and codify any mismatches in terms of how easy they are to address and what might be effective in resolving them.
After collecting more than half the data and beginning analysis, we've found that the issue seems to not be conflicting conceptualizations of sustainability and well-maintainedness, but rather incomplete or ill-defined ones. Furthermore, there seems to be something particular about infrastructure projects as opposed to non-infrastructure software projects, and something particular about community-driven projects (as many FOSS projects are) as opposed to corporate-driven projects (which may have FOSS licenses, but are still backed by a corporate entity).
We are still planning to identify overall problem and challenge types within these efforts and then be able to provide “recipes” for them to achieve greater stability. We continue to sense that these “recipes” are in the areas of human communication, management, social engineering solutions vs. technological solutions. However, there may be software tools around organization, scheduling etc that might be part of suggested best practices that emerge.