Skip to content

Query git repositories with SQL. Generate reports, perform status checks, analyze codebases. πŸ” πŸ“Š

License

Notifications You must be signed in to change notification settings

Malyaka-Imran/askgit

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Go Reference BuildStatus Go Report Card TODOs codecov

askgit AskGit Logo

askgit is a command-line tool for running SQL queries on git repositories. It's meant for ad-hoc querying of git repositories on disk through a common interface (SQL), as an alternative to patching together various shell commands. It can execute queries that look like:

-- how many commits have been authored by user@email.com?
SELECT count(*) FROM commits WHERE author_email = 'user@email.com'

You can try queries on public git repositories without installing anything at https://try.askgit.com/

More in-depth examples and documentation can be found below. Also checkout our newsletter to stay up to date with feature releases and interesting queries and use cases.

Installation

Homebrew

brew tap askgitdev/askgit
brew install askgit

Pre-Built Binaries

The latest releases should have pre-built binaries for Mac and Linux. You can download and add the askgit binary somewhere on your $PATH to use. libaskgit.so is also available to be loaded as a SQLite run-time extension.

Go

libgit2 is a build dependency (used via git2go) and must be available on your system for linking.

The following (long 😬) go install commands can be used to install a binary via the go toolchain.

On Mac:

CGO_CFLAGS=-DUSE_LIBSQLITE3 CGO_LDFLAGS=-Wl,-undefined,dynamic_lookup go install -tags="sqlite_vtable,vtable,sqlite_json1,static,system_libgit2" github.com/askgitdev/askgit@latest

On Linux:

CGO_CFLAGS=-DUSE_LIBSQLITE3 CGO_LDFLAGS=-Wl,--unresolved-symbols=ignore-in-object-files go install -tags="sqlite_vtable,vtable,sqlite_json1,static,system_libgit2" github.com/askgitdev/askgit@latest

See the Makefile for more context. Checking out this repository and running make in the root will produce two files in the .build directory:

  1. askgit - the CLI binary (which can then be moved into your $PATH for use)
  2. libaskgit.so - a shared object file SQLite extension that can be used by SQLite directly

Using Docker

Build an image locally using docker

docker build -t askgit:latest .

Or use an official image from docker hub

docker pull augmentable/askgit:latest

Running commands

askgit operates on a git repository. This repository needs to be attached as a volume. This example uses the (bash) built-in command pwd for the current working directory

[pwd] Print the absolute pathname of the current working directory.

docker run --rm -v `pwd`:/repo:ro augmentable/askgit "SELECT * FROM commits"

Running commands from STDIN

For piping commands via STDIN, the docker command needs to be told to run non-interactively, as well as attaching the repository at /repo.

cat query.sql | docker run --rm -i -v `pwd`:/repo:ro augmentable/askgit

Usage

askgit -h

Will output the most up to date usage instructions for your version of the CLI. Typically the first argument is a SQL query string:

askgit "SELECT * FROM commits"

Your current working directory will be used as the path to the git repository to query by default. Use the --repo flag to specify an alternate path, or even a remote repository reference (http(s) or ssh). askgit will clone the remote repository to a temporary directory before executing a query.

You can also pass a query in via stdin:

cat query.sql | askgit

By default, output will be an ASCII table. Use --format json or --format csv for alternatives. See -h for all the options.

Tables and Functions

Local Git Repository

The following tables access a git repository in the current directory by default. If the --repo flag is specified, they will use the path provided there instead. A parameter (usually the first) can also be provided to any of the tables below to override the default repo path. For instance, SELECT * FROM commits('https://github.com/askgitdev/askgit') will clone this repo to a temporary directory on disk and return its commits.

commits

Similar to git log, the commits table includes all commits in the history of the currently checked out commit.

Column Type
hash TEXT
message TEXT
author_name TEXT
author_email TEXT
author_when DATETIME
committer_name TEXT
committer_email TEXT
committer_when DATETIME
parents INT

Params:

  1. repository - path to a local (on disk) or remote (http(s)) repository
  2. rev - return commits starting at this revision (i.e. branch name or SHA), defaults to HEAD
-- return all commits starting at HEAD
SELECT * FROM commits

-- specify an alternative repo on disk
SELECT * FROM commits('/some/path/to/repo')

-- clone a remote repo and use it
SELECT * FROM commits('https://github.com/askgitdev/askgit')

-- use the default repo, but provide an alternate branch
SELECT * FROM commits('', 'some-ref')
refs
Column Type
name TEXT
type TEXT
remote TEXT
full_name TEXT
hash TEXT
target TEXT

Params:

  1. repository - path to a local (on disk) or remote (http(s)) repository
stats
Column Type
file_path TEXT
additions INT
deletions INT

Params:

  1. repository - path to a local (on disk) or remote (http(s)) repository
  2. rev - commit hash (or branch/tag name) to use for retrieving stats, defaults to HEAD
  3. to_rev - commit hash to calculate stats relative to
-- return stats of HEAD
SELECT * FROM stats

-- return stats of a specific commit
SELECT * FROM stats('', 'COMMIT_HASH')

-- return stats for every commit in the current history
SELECT commits.hash, stats.* FROM commits, stats('', commits.hash)
files
Column Type
path TEXT
executable BOOL
contents TEXT

Params:

  1. repository - path to a local (on disk) or remote (http(s)) repository
  2. rev - commit hash (or branch/tag name) to use for retrieving files in, defaults to HEAD
blame

Similar to git blame, the blame table includes blame information for all files in the current HEAD.

Column Type
line_no INT
commit_hash TEXT

Params:

  1. repository - path to a local (on disk) or remote (http(s)) repository
  2. rev - commit hash (or branch/tag name) to use for retrieving blame information from, defaults to HEAD
  3. file_path - path of file to blame

Utilities

JSON

The SQLite JSON1 extension is included for working with JSON data.

toml_to_json

Scalar function that converts toml to json.

SELECT toml_to_json('[some-toml]')

-- +-----------------------------+
-- | TOML_TO_JSON('[SOME-TOML]') |
-- +-----------------------------+
-- | {"some-toml":{}}            |
-- +-----------------------------+
xml_to_json

Scalar function that converts xml to json.

SELECT xml_to_json('<some-xml>hello</some-xml>')

-- +-------------------------------------------+
-- | XML_TO_JSON('<SOME-XML>HELLO</SOME-XML>') |
-- +-------------------------------------------+
-- | {"some-xml":"hello"}                      |
-- +-------------------------------------------+
yaml_to_json and yml_to_json

Scalar function that converts yaml to json.

SELECT yaml_to_json('hello: world')

-- +------------------------------+
-- | YAML_TO_JSON('HELLO: WORLD') |
-- +------------------------------+
-- | {"hello":"world"}            |
-- +------------------------------+
str_split

Helper for splitting strings on some separator.

SELECT str_split('hello,world', ',', 0)

-- +----------------------------------+
-- | STR_SPLIT('HELLO,WORLD', ',', 0) |
-- +----------------------------------+
-- | hello                            |
-- +----------------------------------+
SELECT str_split('hello,world', ',', 1)

-- +----------------------------------+
-- | STR_SPLIT('HELLO,WORLD', ',', 1) |
-- +----------------------------------+
-- | world                            |
-- +----------------------------------+

Enry Functions

Functions from the enry project are also available as SQL scalar functions

enry_detect_language

Supply a file path and some source code to detect the language.

SELECT enry_detect_language('some/path/to/file.go', '<contents of file>')
enry_is_binary

Given a blob, determine if it's a binary file or not (returns 1 or 0).

SELECT enry_is_binary('<contents of file>')
enry_is_configuration

Detect whether a file path is to a configuration file (returns 1 or 0).

SELECT enry_is_configuration('some/path/to/file/config.json')
enry_is_documentation

Detect whether a file path is to a documentation file (returns 1 or 0).

SELECT enry_is_documentation('some/path/to/file/README.md')
enry_is_dot_file

Detect whether a file path is to a dot file (returns 1 or 0).

SELECT enry_is_dot_file('some/path/to/file/.gitignore')
enry_is_generated

Detect whether a file path is generated (returns 1 or 0).

SELECT enry_is_generated('some/path/to/file/generated.go', '<contents of file>')
enry_is_image

Detect whether a file path is to an image (returns 1 or 0).

SELECT enry_is_image('some/path/to/file/image.png')
enry_is_test

Detect whether a file path is to a test file (returns 1 or 0).

SELECT enry_is_test('some/path/to/file/image.png')
enry_is_vendor

Detect whether a file path is to a vendored file (returns 1 or 0).

SELECT enry_is_vendor('vendor/file.go')

GitHub API

You can use askgit to query the GitHub API (v4). Constraints in your SQL query are pushed to the GitHub API as much as possible. For instance, if your query includes an ORDER BY clause and if items can be ordered in the GitHub API response (on the specified column), your query can avoid doing a full table scan and rely on the ordering returned by the API.

Authenticating

You must provide an authentication token in order to use the GitHub API tables. You can create a personal access token following these instructions. askgit will look for a GITHUB_TOKEN environment variable when executing, to use for authentication. This is also true if running as a runtime loadable extension.

github_stargazers

Table-valued-function that returns a list of users who have starred a repository.

Column Type
login TEXT
email TEXT
name TEXT
bio TEXT
company TEXT
avatar_url TEXT
created_at TEXT
updated_at TEXT
twitter TEXT
website TEXT
location TEXT
starred_at TEXT

Params:

  1. fullNameOrOwner - either the full repo name askgitdev/askgit or just the owner askgit (which would require the second argument)
  2. name - optional if the first argument is a "full" name, otherwise required - the name of the repo
SELECT * FROM github_stargazers('askgitdev', 'askgit');
SELECT * FROM github_stargazers('askgitdev/askgit'); -- both are equivalent
github_starred_repos

Table-valued-function that returns a list of repositories a user has starred.

Column Type
name TEXT
url TEXT
description TEXT
created_at TEXT
pushed_at TEXT
updated_at TEXT
stargazer_count INT
name_with_owner TEXT
starred_at TEXT

Params:

  1. login - the login of a GitHub user
SELECT * FROM github_starred_repos('patrickdevivo')
github_stargazer_count

Scalar function that returns the number of stars a GitHub repository has.

Params:

  1. fullNameOrOwner - either the full repo name askgitdev/askgit or just the owner askgit (which would require the second argument)
  2. name - optional if the first argument is a "full" name, otherwise required - the name of the repo
SELECT github_stargazer_count('askgitdev', 'askgit');
SELECT github_stargazer_count('askgitdev/askgit'); -- both are equivalent
github_user_repos and github_org_repos

Table-valued function that returns all the repositories belonging to a user or an organization.

Column Type
created_at TEXT
database_id INT
default_branch_ref_name TEXT
default_branch_ref_prefix TEXT
description TEXT
disk_usage INT
fork_count INT
homepage_url TEXT
is_archived INT
is_disabled INT
is_fork INT
is_mirror INT
is_private INT
issue_count INT
latest_release_author TEXT
latest_release_created_at TEXT
latest_release_name TEXT
latest_release_published_at TEXT
license_key TEXT
license_name TEXT
name TEXT
open_graph_image_url TEXT
primary_language TEXT
pull_request_count INT
pushed_at TEXT
release_count INT
stargazer_count TEXT
updated_at TEXT
watcher_count INT

Params:

  1. login - the login of a GitHub user or organization
SELECT * FROM github_user_repos('patrickdevivo')
SELECT * FROM github_org_repos('askgitdev')
github_repo_issues

Table-valued-function that returns all the issues of a GitHub repository.

Column Type
owner TEXT
reponame TEXT
author_login TEXT
author_url TEXT
body TEXT
body_text TEXT
closed INT
closed_at TEXT
comment_count INT
created_at TEXT
created_via_email INT
database_id TEXT
editor_login TEXT
editor_url TEXT
includes_created_edit INT
is_read_by_viewer INT
label_count INT
last_edited_at TEXT
locked INT
milestone_count INT
milestone_progress FLOAT
issue_number INT
participant_count INT
published_at TEXT
reaction_count INT
state TEXT
title TEXT
updated_at TEXT
url TEXT
user_edits_count INT
viewer_can_react INT
viewer_can_subscribe INT
viewer_can_update INT
viewer_did_author INT
viewer_subscription TEXT

Params:

  1. fullNameOrOwner - either the full repo name askgitdev/askgit or just the owner askgit (which would require the second argument)
  2. name - optional if the first argument is a "full" name, otherwise required - the name of the repo
SELECT * FROM github_repo_issues('askgitdev/askgit');
SELECT * FROM github_repo_issues('askgitdev', 'askgit'); -- both are equivalent

Example Queries

This will return all commits in the history of the currently checked out branch/commit of the repo.

SELECT * FROM commits

Return the (de-duplicated) email addresses of commit authors:

SELECT DISTINCT author_email FROM commits

Return the commit counts of every author (by email):

SELECT author_email, count(*) FROM commits GROUP BY author_email ORDER BY count(*) DESC

Same as above, but excluding merge commits:

SELECT author_email, count(*) FROM commits WHERE parents < 2 GROUP BY author_email ORDER BY count(*) DESC

Outputs the set of files in the current tree:

SELECT * FROM files

Returns author emails with lines added/removed, ordered by total number of commits in the history (excluding merges):

SELECT count(DISTINCT commits.hash) AS commits, SUM(additions) AS additions, SUM(deletions) AS deletions, author_email
FROM commits LEFT JOIN stats('', commits.hash)
WHERE commits.parents < 2
GROUP BY author_email ORDER BY commits

Returns commit counts by author, broken out by day of the week:

SELECT
    count(*) AS commits,
    count(CASE WHEN strftime('%w',author_when)='0' THEN 1 END) AS sunday,
    count(CASE WHEN strftime('%w',author_when)='1' THEN 1 END) AS monday,
    count(CASE WHEN strftime('%w',author_when)='2' THEN 1 END) AS tuesday,
    count(CASE WHEN strftime('%w',author_when)='3' THEN 1 END) AS wednesday,
    count(CASE WHEN strftime('%w',author_when)='4' THEN 1 END) AS thursday,
    count(CASE WHEN strftime('%w',author_when)='5' THEN 1 END) AS friday,
    count(CASE WHEN strftime('%w',author_when)='6' THEN 1 END) AS saturday,
    author_email
FROM commits GROUP BY author_email ORDER BY commits

Exporting

You can use the askgit export sub command to save the output of queries into a sqlite database file. The command expects a path to a db file (which will be created if it doesn't already exist) and a variable number of "export pairs," specified by the -e flag. Each pair represents the name of a table to create and a query to generate its contents.

askgit export my-export-file -e commits -e "SELECT * FROM commits" -e files -e "SELECT * FROM files"

This can be useful if you're looking to use another tool to examine the data emitted by askgit. Since the exported file is a plain SQLite database, queries should be much faster (as the original git repository is no longer traversed) and you should be able to use any tool that supports querying SQLite database files.

About

Query git repositories with SQL. Generate reports, perform status checks, analyze codebases. πŸ” πŸ“Š

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • Go 98.1%
  • Makefile 1.6%
  • Other 0.3%