Abstract:

This talk explores what it takes to build and maintain sustainable open source communities, using scikit-learn to illustrate successes and lessons. It raises awareness of the challenges and complexities in contributor expectations, community building, volunteer burnout, governance tensions, and company involvement. Key issues are discussed to highlight what is needed for long-term health in OSS ecosystems.

#### Ecosystems, Not Just Code
##### Building Sustainable OSS Communities

- why communities (can) outlast code
    - volunteer code versus sustained ecosystem
    - 101 of Community Building
    - Why all the effort?
    - Healthy OSS Community
    - OSS Community and Enterprise

##### 101 of Community Building
- just happens vs. deliberate act
- scikit-learn's community does not seem in need to be build (it is already there)
    - BUT it needs to be continuously nurtured and renewed
- smaller projects in the ecosystem (fairlearn, joblib, etc.) need deliberate effort in community building
- involves a lot more than just putting a project on github

###### exchange and discussion
- in comparison to most company code, scikit-learn code doesn't need to come fast, but it needs to be durable
- we have a test suite that is as large as the actual code
- 26100 Python files, 1174 reStructuredText, and more
- project grown since 2007, so a lot of time to ensure stable code and integration with a few other python libraries (and to introduce complexity)
- a lot of code maintenance
- adding something as stable as the rest, takes time and gets rather delayed a few releases than to be rushed
- high number of comments per issue/PR
    - total number of PRs: 19796 with an average of 6 comments per PR
- typical: idea that is discussed in an issue over the course of a month, a PR is created that then goes through several review cycles far beyond "passing the test", need two maintainers (and no veto) to merge a PR (even if another maintainer made the PR); in the process we might discover there is a blocker to this PR that needs to be taken care of first; then if it supersedes a certain level of complexity, there is a chance a new point of view is coming up during discussion and the approach is changed which might result in another PR that supersedes the first PR and the process starts anew
- common complex PRs are usually a few hundred lines of code change long and usually code is changed, not added

###### coordination and decision making 
- a lot slower than top-bottom
- [The Hard Parts of Open Source" by Evan Czaplicki](https://www.youtube.com/watch?v=o_4EX4dPppA)
    - thinking about communication design of GitHub
    - calling the Somebody-Store
    - misconception that all discussion is constructive

- "Random people from the internet can add work to my ToDo list"

- nothing is ever set in stone
- a project needs to find ways to move forwards and come to conclusions still

- scikit-learn introduced a governance structure in 2019
    - defining voting rights, what is needed to merge a PR and a process that uncouples that from a consensus rule (SLEPs)

###### roles in the scikit-learn team

###### mentoring
- mentoring people on grants (via NumFocus and CZI)
    - me and a few others
    - maintainers can tell you which issues to work on
    - help when stuck
    - reviews

- community sprints via PyLadies and on PyData or EuroSciPy conferences and others
    - have easy issues prepared
    - help people set up their development environments is needed
    - lowers the entry barrier

- mentoring outside of structures is unlike harder
    - identify contributors that are hooked to stay
    - expose them to incrementally more complex tasks
    - take care their work is reviewed on time and they get the 
    - bottlenecks: 
        - delay in reviewing contributor's PRs supposedly turns many away
        - hard to identifying issues new contributors can help with without deep project knowledge (on context, past discussions, goals)
    - insight from [Charlie's blog](https://ossd-s23.github.io/Charlie-XIAO-weekly) (who blogged on his first encounters with scikit-learn and later became a maintainer)

###### communication between strangers
- mismatch in expectations
    - (a lot of) people really hard trying to find tasks to work on
    - not enough tasks for new contributors, especially not on junior level
        - mostly also more experienced new contributors need to read through long
          discussions to get a feeling for all the trade offs of a task
- issue with "Can I work on it?" comments, discussed here: https://github.com/scikit-learn/scikit-learn/issues/32655 and an attempt to prevention here: https://github.com/scikit-learn/scikit-learn/issues/32650

- measures of expectation management
    - communicate what to expect when working on PR
    - explaining communication rules: [Getting Answers to Your Technical Questions](https://betatim.github.io/posts/getting-answers/)
    - Tims autocomment
                                                                                                                                                    
- people with rude social behaviours will be excluded corresponding our [Code of Conduct](https://github.com/scikit-learn/scikit-learn/blob/main/CODE_OF_CONDUCT.md)


- large, old project specific problems: very recently work on expectation management for new contributors
    - https://github.com/scikit-learn/scikit-learn/pull/32660
    - https://github.com/scikit-learn/scikit-learn/pull/32504
    - https://github.com/scikit-learn/scikit-learn/pull/32715
- recent discussion to be not as welcoming to anyone anymore

###### pushes from industry
- for example for publishing and maintaining wheels for their hardware
    https://github.com/scientific-python/summit-2025-nov/issues/4                                      

###### cross-project collaboration
- Scientific Python Summit and topics discussed there: interoperability, common regimes, shared tools

###### new challenges arising with gen-AI
- current problem: flood in AI-generated contributions
    - https://github.com/scikit-learn/scikit-learn/issues/31679
    - https://github.com/scikit-learn/scikit-learn/pull/32566



- requires time to nurture a community

##### why do all that "expensive" effort of community building?
- alias: what the community brings to an OSS project (beyond code)
- risk of OSS burnout and path dependency of longer living projects
- nothing is ever set in stone
- premise to the OSS project to exist
- workforce
- innovation
- feedback
    - on relevance of specific features (reports on use case)
    - on project stability (bug reports)
    - in form of requests how to further develop the project (strategic direction)
- sense of reliability/importance; serves as a motivation for anyone more deeply
    involved
- community voices latest needs that will be relevant in the immediate future
- independent critique on the project or certain features that employees working for
    a company could ever bring
- "I don't know, it is more fun!"

##### anatomy of a healthy OSS community
- OSS community as a social movement
    - what moves people
    - institutionalisation of larger/older projects
- individuals are looking for
    - atmosphere and tone
    - expectations and predictability
    - feeling welcome / included
    - recongnition of their contributions
    - mentorship
    - impact, making a difference
    - being heared for strategic decisions (depending on seniority)
    - psychological safety
    - fix your own problems upstream (or report)
- company-payed developers who maintain the project
    - reliability on roadmap goals
- community/project as a whole should care about
    - inclusivity, low entry threshold to keep hight supply in potential new members
    - flexibility / renewability
    - clarity in governance
    - contentwise independence from enterprises
    - sustainability
    - transparency on decisions
    - governance between meritocracy and doocracy
        - consensus, majority vote, veto rules
        - escalation path
        - merit also based on social interaction
        - action drives decisions (hidden labour and risk of burnout)
- indicators of community health / measure
    - response latency / lag
        - provide actual stats from scikit-learn
    - ability to grow newcomers into maintainers
        - pyramide from first contribution -> reoccuring contributor -> becoming
          included (pinged in or invited to) discussions -> obtaining triage rights
          -> becoming a maintainer
    - diversity of contribution types
        - docs, examples, algorithms, etc

##### OSS community and enterprise
- pictures of Labs team mixed with other probabl team members
- factor of a mutualistic relationship (symbiosis), but requires a healthy community
    - probabl as a facilitator and profiteer
    - also applies to other companies that scikit-learn cooperates with
- what OSS community needs from enterprises 
    - company involvement can strengthen an OSS community if they help to keep OSS
      communities healthy
    - several funding and sponsoring plans
        - sponsor recurring contributors and maintainers for their work (also only
          partially)
        - pay mentoring programmes
        - project based
    - providing infrastructure
        - SAS (VMs)
        - hardware for OSS developers
        - trainings/counselling
        - outreach and marketing for amplification
        - neutral conflict mitigation
    - enabling planning decision making by providing freed developer time for it,
      though no disturbing internal process
    - publicly share how you use an OSS project (specific features, use case,
      combination with other tools)
    - allocate percentage for innovation experiments on OSS participation of your
      developers
    - ideally, several companies are involved
    - well defined rules in case of conflicts of interest
    - boundaries
        - separate trademark
        - keep governance decision and roadmap community driven
        - all discussions need to stay public
        - no propriatary gate to develop or push features (company contributions
          should be reviewed like any other)
- what OSS community can give to enterprises
    - work on features relevant to many users in the future
    - try out new ideas (experimentation lab)
    - a constant funnel for finding talent
    - partnership for specific topics

##### closing
- healthy OSS communities aren't accidents
- continuously renewed by deliberate action 
- your next small act can extend the life of critical public infrastructure

- study ["Psychological Safety Sustains Participation in Pull-based Open Source
    Projects"](https://arxiv.org/abs/2504.17510)
- [Small-World Phenomenon of Global Open-Source Software Collaboration on
    Github](https://www.researchgate.net/publication/394469418_Small-World_Phenomenon_of_Global_Open-Source_Software_Collaboration_on_Github)