github-crawler

Extract GitHub repositories metadata and README content.

STEPS:

```sh
python3 -m venv env
source env/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
# when finish using
deactivate 
```

```sh
conda env create -f conda.yaml
conda activate crawler
# when finish using
conda deactivate
```

Update the .env file with the correct params
```
cp .env.example .env
code .env
```
Run the following scripts:

i. python crawl_repos.py <topic-name> <stars-size> to crawl all the repos with the topic and stars greater or equal . If omitted will consider 0+ stars.

ii. python get_contributors.py to crawl all the user who contributed the crawled repo from step 3.i

iii. python get_stargazers.py to crawl all the users who starred the crawled repo from step 3.i

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.devcontainer		.devcontainer
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
utils		utils
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
conda.yaml		conda.yaml
crawl_repos.py		crawl_repos.py
diffReader.py		diffReader.py
get_contributors.py		get_contributors.py
get_events.py		get_events.py
get_stargazers.py		get_stargazers.py
joiner_and_writer.py		joiner_and_writer.py
requirements.txt		requirements.txt
test3333		test3333
tranform_data.py		tranform_data.py

Provide feedback