How to Dev and Use:

THE PROJECT IS FOR EDUCATIONAL PURPOSES ONLY, anyone using this scraper is expected to adhere to GitHub regulations

Components

Scraper

Scraper filters GitHub for matched repos containing the wanted file at any depth,

~~Only the first batch of 1000 results will be returned.~~ Sample scraper shows a way to overcome the limitation

Example file target: MLProject

GitHub Query - filename:MLProject

Install

Install dependencies - pip install -r requirements.txt

Run

Change the sample_credentials.py to crendentials.py upon cloning, then fill in your GitHub personal access token.

Run main.py to filter through repos and paths.

Run miner_selenium.py to run the chrome-based selenium scraper.

Run miner_requests.py to run the GitHub v3 API scraper.

Run pickle_loader.py to see the first 1000 results collected.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
apis		apis
spider		spider
utils		utils
.gitpod.yml		.gitpod.yml
LICENSE		LICENSE
README.md		README.md
chromedriver.exe		chromedriver.exe
main.py		main.py
pickle_loader.py		pickle_loader.py
requirements.txt		requirements.txt
sample_credentials.py		sample_credentials.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apis

apis

spider

spider

utils

utils

.gitpod.yml

.gitpod.yml

LICENSE

LICENSE

README.md

README.md

chromedriver.exe

chromedriver.exe

main.py

main.py

pickle_loader.py

pickle_loader.py

requirements.txt

requirements.txt

sample_credentials.py

sample_credentials.py

Repository files navigation

How to Dev and Use:

Components

Scraper

Install

Run

About

Releases

Packages

Languages

License

Superskyyy/yet-another-github-miner

Folders and files

Latest commit

History

Repository files navigation

How to Dev and Use:

Components

Scraper

Install

Run

About

Resources

License

Stars

Watchers

Forks

Languages