Skip to content

IBM/github-crawler

Repository files navigation

github-crawler

Extract GitHub repositories metadata and README content.

STEPS:

  1. environment SETUP and package installation

Virtual env

```sh
python3 -m venv env
source env/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
# when finish using
deactivate 
```

Conda

```sh
conda env create -f conda.yaml
conda activate crawler
# when finish using
conda deactivate
```
  1. Update the .env file with the correct params

    cp .env.example .env
    code .env
  2. Run the following scripts:

    i. python crawl_repos.py <topic-name> <stars-size> to crawl all the repos with the topic and stars greater or equal . If omitted will consider 0+ stars.

    ii. python get_contributors.py to crawl all the user who contributed the crawled repo from step 3.i

    iii. python get_stargazers.py to crawl all the users who starred the crawled repo from step 3.i

About

Extract GitHub repositories metadata and README content.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published