tap-github is a Singer tap for GitHub.
Build with the Singer SDK.
pipx install git+https://github.com/MeltanoLabs/tap-github.git
Or better yet, please pin to a release version for a stable experience:
pipx install git+https://github.com/MeltanoLabs/tap-github.git@vX.Y.Z
A list of release versions is available at https://github.com/MeltanoLabs/tap-github/releases
This tap accepts the following configuration options:
- Required: One and only one of the following modes:
repositories: an array of strings specifying the GitHub repositories to be included. Each element of the array should be of the form
organizations: an array of strings containing the github organizations to be included
searches: an array of search descriptor objects with the following properties:
name: a human readable name for the search query
query: a github search string (generally the same as would come after
?q=in the URL)
user_usernames: a list of github usernames
user_ids: a list of github user ids [int]
- Highly recommended:
auth_token- GitHub token to authenticate with.
additional_auth_tokens- List of GitHub tokens to authenticate with. Streams will loop through them when hitting rate limits..
- alternatively, you can input authentication tokens with any environment variables starting with GITHUB_TOKEN.
- or authenticate as a GitHub app setting a private key in GITHUB_APP_PRIVATE_KEY. Formatted as follows:
:app_id:;;-----BEGIN RSA PRIVATE KEY-----\n_YOUR_P_KEY_\n-----END RSA PRIVATE KEY-----. You can generate it from the
Private keyssection on https://github.com/organizations/:organization_name/settings/apps/:app_name. Read more about GitHub App quotas here.
rate_limit_buffer- A buffer to avoid consuming all query points for the auth_token at hand. Defaults to 1000.",
Note that modes 1-3 are
repository modes and 4-5 are
user modes and will not run the same set of streams.
A full list of supported settings and capabilities for this tap is available by running:
A small number of records may be pulled without an auth token. However, a Github auth token should generally be considered "required" since it gives more realistic rate limits. (See GitHub API docs for more info.)
The GitHub API is limited for some resources such as
/events. For some resources, users might encounter the following error:
In order to keep the API fast for everyone, pagination is limited for this resource. Check the rel=last link relation in the Link response header to see how far back you can traverse.
To avoid this, the GitHub streams will exit early. I.e. when there are no more
next page available. If you are fecthing
/events at the repository level, beware of letting the tap disabled for longer than a few days or you will have gaps in your data.
You can easily run
tap-github by itself or in a pipeline using Meltano.
- For the
traffic_*streams, you will need write access to the repository. You can enable extraction for these streams by selecting them in the catalog.
tap-github --config CONFIG --discover > ./catalog.json
This project uses parent-child streams. Learn more about them here.
pipx install poetry
Create tests within the
tap_github/tests subfolder and
poetry run pytest
You can also test the
tap-github CLI interface directly using
poetry run tap-github --help
Testing with Meltano
Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.
Your project comes with a custom
meltano.yml project file already created. Open the
meltano.yml and follow any "TODO" items listed in
Next, install Meltano (if you haven't already) and any needed plugins:
# Install meltano
pipx install meltano
# Initialize meltano within this directory
Now you can test and orchestrate using Meltano:
# Test invocation:
meltano invoke tap-github --version
# OR run a test `elt` pipeline:
meltano elt tap-github target-jsonl
One-liner to recreate output directory, run elt, and write out state file:
# Update this when you want a fresh state file:
# Run everything in one line
mkdir -p .output && meltano elt tap-github target-jsonl --job_id $TESTJOB && meltano elt tap-github target-jsonl --job_id $TESTJOB --dump=state > .output/state.json
See the dev guide for more instructions on how to use the Singer SDK to develop your own taps and targets.