Skip to content

Creates a corpus for publicly accessible repositories in a GitLab instance.

License

Notifications You must be signed in to change notification settings

DLR-SC/GitLab-Corpus

Repository files navigation

GitLab-Corpus

This tool creates a corpus for accessible repositories in a GitLab instance. The corpus will primarily contain information about software projects.

Relevant information could be:

  • number of authors or commits
  • merge requests
  • programming languages used
  • CI usage
    … and more.

The output corpus is in the JSON-format, as it is widely used and because of its compatibility with neo4j.

Install

We assume that you installed Python >= 3.8 and a recent Git client.

Please follow these steps to install the required dependencies and to make available the corpus command line tool:

git clone <URL of this Git repository> corpus
cd corpus
pip install --editable .

NOTE

  • The required dependencies to use this tool are listed in the install_requires section of the setup.cfg file.
  • The requirements.txt lists additionally recommended development tools.

Usage

NOTE To use this tool, you first need to write a config-file in which you provide information about the GitLab instance you want to run this tool on.

Here is an example:

[global]
default = gitlab-1
ssl_verify = true
timeout = 15

[gitlab-1]
url = https://gitlab.example.com
private_token = 123abc
api_version = 4

The tool can be run using the command corpus.

Running the command using the --help parameter or without any parameter, will print the help page.

Documentation

The documentation is available in the docs directory.

About

Creates a corpus for publicly accessible repositories in a GitLab instance.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages