This Python script is a Proof of Concept (POC) for a web scraper designed to demonstrate the potential risks of careless use of GitHub. It allows you to search for commits containing sensitive credentials in public GitHub repositories.
Before using this script, make sure you have Python installed on your system.
Clone this repository to your local machine:
git clone https://github.com/Amit-Katz/github-credentials-scraper.git
cd github-credentials-scraper
Install the required Python packages using pip:
pip install -r requirements.txt
The script accepts the following command-line arguments:
-
--query
,-q
(optional): A list of commit messages to search for. The default queries are ["deleted .env", "delete .env", "hide .env"]. -
--terms
,-t
(optional): A list of terms to search for in the commit messages. The default term is "mongodb". -
--output
,-o
(optional): Path to the output directory where the results will be saved. -
--verbose
,-v
(optional): Enable verbose mode for more detailed output.
To search for the default commit messages and terms, simply run:
python scraper.py
You can specify custom queries and terms using the --query
and --terms
options. For example:
python scraper.py --query "add secret key" "remove password" --terms "api_key" "password"
To save the results to a directory, use the --output
option:
python scraper.py --output results
Enable verbose mode to see detailed output:
python scraper.py --verbose
This script is intended for educational purposes only and should not be used to violate GitHub's terms of service or any applicable laws. Always obtain proper authorization before scraping or accessing any website or service.
This project is licensed under the MIT License - see the LICENSE file for details.