Skip to content


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation

Defining and Measuring the Universe of Open Source Software Innovation

Data Science for Public Good summer program 2021

Research Question

  • In what ways can repositories be efficiently classified into “types” (e.g., operating systems, network services, database management, development tools, blockchain, etc.)? What information (e.g., tags, repos stats, or READMEs) is most helpful for classifying existing repositories?

  • How does GitHub activity change based on the type of software being developed? Which types of software have the most contributors? Which types of software requires more commits, additions or deletions?

  • How do different types of software affect collaboration tendencies? How do these tendencies change across the academic, business, or government sectors?

Collect Access Token

This won't take longer than 5 min.

We are collecting access token to speed up the process of scraping GitHub repositories. One access token can only scrape 5000 repositories in an hour, and our goal is to scrape about 10 million repositories. Having more acess token would help us tremendously.

Please refer to this document for detailed instruction in creating a personal access token for step 1-5.

For step 6 and 7, please refer to the following image. plot

Please private message the team:

  • access token
  • username

and make sure you delete the message (not the access token) afterwards. (Ex. if you messaged us on teams, you should delete that message.)

We appreciate your help!

Project Sponsor

National Center for Science and Engineering Statistics (NCSES)

  • Carol Robbins, Senior Economist
  • Ledia Guci, Science Resources Analyst


No description, website, or topics provided.







No releases published


No packages published