Babelfish documentation (GitBook)
Branch: master
Clone or download
Latest commit cd030c8 Feb 12, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitbook/assets Replace mermaid diagrams by PNG images Dec 4, 2018
_tools update languages list to include SDK version Oct 18, 2018
bip-index GitBook: [master] 17 pages modified Aug 15, 2018
graphs_source Added mermaid sources Dec 4, 2018
uast Replace mermaid diagrams by PNG images Dec 4, 2018
using-babelfish Fixes from review Feb 12, 2019
writing-a-driver fix broken link Aug 27, 2018
.gitignore Fixes. Nov 8, 2017
.nvmrc Added .nvmrc so Travis use the last node LTS (fixes build) Jul 13, 2017
.travis.yml
DCO
MAINTAINERS docs: Add MAINTAINERS file Jul 5, 2017
Makefile update languages list and serve a static list of official drivers Sep 27, 2018
README.md describe how Babelfish compares to other projects; addresses #35 Oct 17, 2018
SUMMARY.md describe how Babelfish compares to other projects; addresses #35 Oct 17, 2018
alternatives.md Update alternatives.md Oct 25, 2018
architecture.md GitBook: [master] 43 pages and one asset modified Jun 11, 2018
babelfish-improvement-proposals.md GitBook: [master] 17 pages modified Aug 15, 2018
book.json Replace mermaid diagrams by PNG images Dec 4, 2018
join-the-community.md Fix various rendering Jul 9, 2018
languages.json update languages list Feb 6, 2019
languages.md update languages list Feb 6, 2019

README.md

Babelfish - Universal Code Parser

Introduction

Babelfish is a self-hosted server for source code parsing. The Babelfish service can parse any file, in any supported language, extracting an Abstract Syntax Tree (AST) from it and converting it into a Universal Abstract Syntax Tree (UAST). The UAST enables further analysis and transformations with either the included tools or your own tools by providing a standard open format. Jump to the Getting Started section to start using it!

Motivation & Scope

Babelfish was created as a solution for large scale code analysis. To analyze the source code from millions of repositories, at each revision.

The current scope is to enable parsing of single files in any popular programming language and producing a Universal Abstract Syntax Tree (UAST).

This current scope is expected to expand in the near future to full project analysis, where the source code can be analyzed with its full context, and not just per-file.

For more information about how Babelfish compares to other similar systems, see this page.

Use Cases

Some of the use cases that we aim to support with UAST are:

  • Feature extraction for Machine Learning on Code: For example, extracting a list of all tokens for every file, or a list of all function calls, etc.
  • Language-agnostic static analysis: making it easy to write static analyzers in any language, analyzing any supported language
  • UAST diffs: Understanding changes made to code with finer-grained granularity. Is this commit changing variable names? Is it adding a loop?
  • Uniform import extraction: Extracting all imports from every language in a uniform way.
  • Statistical analysis of language features: How many people use for-comprehension in Python.

Current status

Currently, Babelfish is in the process of transition to v2 protocol, new node representation and Semantic UAST.

All the beta+ drivers support these new features in the latest version and requires bblfshd >= 2.6.1.

Libuast was not yet updated to support the new node format, thus all the clients still work in v1 compatibility mode to be able to execute XPath queries.

See v2 transition options for details.

Further Reading

This repository contains the project documentation, which you can also see properly rendered at https://docs.sourced.tech/babelfish.