├── targets
│ ├── Target
│ │ ├── github-users.txt # User accounts collected from multiple sources
│ │ ├── github-repos.txt # GitHub repositories owned by the collected users
│ │ ├── github-repos-shell.txt # GitHub repositories that use `Shell` as a primary language - according to our statistics, these are the most likely to expose secrets
│ │ ├── github-raw.json # JSON file containing all users/repos data
│ │ └── README.md # Markdown file containing multiple statistics describing the collected data
A CVEDB workflow collects a list of targets, enumerates their employees, collects their data, cleans it up, and pushes it to this repository.
- Get the initial list of target names from Project Discovery's Chaos dataset (Thanks, ProjectDiscovery)
- Use a slightly modified version of CrossLinked to collect employee names and usernames from LinkedIn(Thanks, m8r0wn)
- Generate username permutations based on the collected names/usernames.
- Enumerate public GitHub organization members using the GitHub CLI (Thanks, GitHub?)
- Merge the collected potential usernames and pass them to our own enumerepo which validates the usernames and enumerates their public repositories.
- All of the collected orgs/usernames/repos/gists are then passed to TruffleHog to find exposed secrets/credentials (Thanks Truffle Security!) as highlighted above in the
Secrets
workflow. Note that the results of this part are not pushed to this repository for obvious reasons. They are only accessible to our users who can edit/customize this workflow to view the secrets, receive notifications about new ones, or export them using one of our integrations. - In the end, we parse and organize the collected data and push it here (except for the
Secrets
part.) - We have the workflow scheduled to run regularly to keep the data up-to-date at all times.
Note: The username generation process consists of multiple steps to maximize coverage, but this could also lead to a few false positives. We carefully designed the workflow (and continue to develop it) to ensure the results are as accurate as possible but please verify the validity of this data before taking action on it.
All contributions/ideas/suggestions are welcome! If you want to add/edit a target/workflow, feel free to create a new ticket via GitHub issues, tweet at us @cvedb, or join the conversation on Discord.
We believe in the value of tinkering. Sign up for a demo on cvedb.github.io to customize this workflow to your use case, get access to many more workflows, or build your own from scratch!