Skip to content

yochannah/sustainable-communities-tracker

Repository files navigation

Sustainable communities tracker

DOI Node.js CI

This code was used in a year-long study looking at CHAOSS and other sustainability metrics that might affect open communities that user version control hosting sites for collaboration. This mostly ended up being and/or open source, open science communities, and mostly on GitHub.

Code components

There are four main components within this code.

  1. One is the data visualisation component, which runs on Jekyll, with lots of javascript-based processing, since Jekyll can generate static sites but doesn't always allow advanced data manipulation. The entry point for this section is in the view folder.
  2. The rest are data collection components.
    1. General: This is fully written in Javascript. The entry point is in index.js in this folder. Entire suite of metrics-gathering.
    2. Individual methods: JS-based. Entry: singleMethod.js. Useful to gather all of a single metric from all repos.
    3. Local methods: clone git and run some metrics locally rather than infuriating the github api.

Data collection

To set up

  1. You'll need a recent version of node. Run npm install to get the dependencies.
  2. in your .bashrc or .zshrc set up a githup api access token. It should look somethingg like this: EXPORT github_sustain_sw_token=123456678sdfsdfsdfsdfsdfsdf
  3. run node index.js in yer terminal

To grab various github api stats

  1. Full run, on ALL the repos you have in a text file, separated by newlines:
    node index.js --month 12 --urlList /path/to/urllist.txt

Local git-based stats

In some cases, using the GitHub API is not the most efficient way to handle things - generally repos with LOTS of commits. Cloning stuff locally and assessing the logs works for anything that's git specific rather than GitHub specific.

To run the local scripts:

Setup:

  1. Create a directory in the parent directory of this repo, and call it localData.
  2. Copy sample.tsv into localData.
  3. Tweak the repo names and dates, and/or add any lines you need to, to add more repos.
  4. You may also need to give the script permissions to run using chmod +x

The above setup steps, as one copy-pastable block:

```bash
#go to the folder containing this code.
cd sustainable-communities-tracker
#change the script to be executable
chmod +x src/localMethods/localMethods.sh

# make a folder to store all the output data 
mkdir ../localData
#copy the sample data to the data folder.
cp templates/sample.tsv ../localData/sample.tsv
```

To run the local script, after the above setup is complete:

npm run localMethods

Data visualisation component

Once you've run stats on a repo, there's a minimal UI to view the json and data more visually. To use it:

  1. copy files generated by the script(s) to view/_data
  2. set up jekyll if you haven't already (gem install jekyll bundler). I use rvm to manage ruby versions, it makes things easier.
  3. once it's all set up, cd into the view directory cd view and run bundle exec jekyll serve - presto, you'll serve the visualisations.

Note that a copy of this repo, containing sanitised data and ready for deployment, is available here: https://github.com/Sustainable-Open-Science-and-Software/survey-datavis/actions