Skip to content

ajnavarro/gitbase-spark-connector

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gitbase-spark-connector Build Status codecov

gitbase-spark-connector is a Scala libraray that lets you expose gitbase tables as Spark SQL Dataframes to run scalable analysis and processing pipelines on source code.

Pre-requisites

Import as a dependency

For the moment, it is served through jitpack so you can check out examples about how to import it in your project here.

Usage

First of all, you'll need a gitbase instance running. It will expose your repositories through a SQL interface.

docker run -d --name gitbase -p 3306:3306 -v /path/to/repos/directory:/opt/repos srcd/gitbase:v0.17.0

Note you must change /path/to/repos/directory to the actual path where your git repositories are located.

Also, a bblfsh server could be needed for some operations on UASTs

docker run -d --name bblfshd --privileged -p 9432:9432 bblfsh/bblfshd:v2.9.1-drivers

You can configure where gitbase and bblfsh are listening by the environment variables:

  • BBLFSH_HOST (default: "0.0.0.0")
  • BBLFSH_PORT (default: "9432")
  • GITBASE_SERVERS (default: "0.0.0.0:3306")

Finally you can add the gitbase DataSource and configuration just registering in the spark session.

import tech.sourced.gitbase.spark.util.GitbaseSessionBuilder

val spark = SparkSession.builder().appName("test")
    .master("local[*]")
    .config("spark.driver.host", "localhost")
    .registerGitbaseSource()
    .getOrCreate()

val refs = spark.table("ref_commits")
val commits = spark.table("commits")

val df = refs
  .join(commits, Seq("repository_id", "commit_hash"))
  .filter(refs("history_index") === 0)

df.select("ref_name", "commit_hash", "committer_when").show(false)

Output:

+-------------------------------------------------------------------------------+----------------------------------------+-------------------+
|ref_name                                                                       |commit_hash                             |committer_when     |
+-------------------------------------------------------------------------------+----------------------------------------+-------------------+
|refs/heads/HEAD/015dcc49-9049-b00c-ba72-b6f5fa98cbe7                           |fff7062de8474d10a67d417ccea87ba6f58ca81d|2015-07-28 08:39:11|
|refs/heads/HEAD/015dcc49-90e6-34f2-ac03-df879ee269f3                           |fff7062de8474d10a67d417ccea87ba6f58ca81d|2015-07-28 08:39:11|
|refs/heads/develop/015dcc49-9049-b00c-ba72-b6f5fa98cbe7                        |880653c14945dbbc915f1145561ed3df3ebaf168|2015-08-19 01:02:38|
|refs/heads/HEAD/015da2f4-6d89-7ec8-5ac9-a38329ea875b                           |dbfab055c70379219cbcf422f05316fdf4e1aed3|2008-02-01 16:42:40|
+-------------------------------------------------------------------------------+----------------------------------------+-------------------+

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 73.6%
  • Shell 24.6%
  • Dockerfile 1.8%