Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

Commit

Permalink
describe how Babelfish compares to other projects; addresses #35
Browse files Browse the repository at this point in the history
Signed-off-by: Denys Smirnov <denys@sourced.tech>
  • Loading branch information
Denys Smirnov authored and bzz committed Oct 17, 2018
1 parent cbe5eba commit 32ee704
Show file tree
Hide file tree
Showing 3 changed files with 117 additions and 0 deletions.
3 changes: 3 additions & 0 deletions README.md
Expand Up @@ -12,6 +12,9 @@ The current **scope is to enable parsing of single files in any popular programm

This current scope is expected to expand in the near future to full project analysis, where the source code can be analyzed with its full context, and not just per-file.

For more information about how Babelfish compares to other similar systems,
see [this page](alternatives.md).

### Use Cases

Some of the use cases that we aim to support with UAST are:
Expand Down
1 change: 1 addition & 0 deletions SUMMARY.md
Expand Up @@ -3,6 +3,7 @@
* [Babelfish - Universal Code Parser](README.md)
* [Architecture](architecture.md)
* [Languages](languages.md)
* [Alternatives](alternatives.md)
* [Join the Community](join-the-community.md)
* [Babelfish Improvement Proposals](babelfish-improvement-proposals.md)

Expand Down
113 changes: 113 additions & 0 deletions alternatives.md
@@ -0,0 +1,113 @@
# Babelfish vs Other Software

<!-- TODO: https://github.com/oracle/opengrok/wiki/Comparison-with-Similar-Tools -->

## [Kythe](https://kythe.io/)

> The best way to view Kythe is as a “hub” for connecting tools for various languages, clients and build systems. By defining language-agnostic protocols and data formats for representing, accessing and querying source code information as data, Kythe allows language analysis and indexing to be run as services.
> Some tools (e.g., static analyzers) already have expressive purpose-built internal representations for code. Kythe is not meant to be a universal replacement for such IRs — instead, our goal is to provide a way for such tools to capture “interesting subsets” of an analysis for sharing with other tools.
Babelfish and Kythe share a goal of defining a common representation
for concepts from different programming languages and provide a way to query it.
Both provide a unified data format. And both are language-independent.

The main difference is that Babelfish preserve all AST nodes including
control flow and expressions, while Kythe focuses on class hierarchy,
dependencies, etc.

Also, Babelfish provides a unified IR across languages, which is defined
as a non-goal of Kythe.

Kythe requires to instrument language compilers and build systems while
Babelfish uses native language parsers to get an AST, allowing to develop
a new language driver in less time.

Kythe also provides a way to process the whole project, while for now
Babelfish is focused on processing individual files.

## [Language Server Protocol](https://microsoft.github.io/language-server-protocol/)

> A Language Server is meant to provide the language-specific smarts and communicate with development tools over a protocol that enables inter-process communication.
> The idea behind the Language Server Protocol (LSP) is to standardize the protocol for how such servers and development tools communicate. This way, a single Language Server can be re-used in multiple development tools, which in turn can support multiple languages with minimal effort.
LSP defines a common protocol and RPC for queries like Go-To-Definition,
Usages, etc. But it does not define a common representation of AST because
the goal of the project is to enable easy access to analysis that is done
by compilers and language SDKs.

Babelfish provides a common representation for ASTs allowing
to use its output for static analysis by other tools. Queries over UAST
structure will still allow querying for usages, etc.

## [srclib](https://srclib.org/)

> srclib makes developer tools like code search and static analyzers better. It supports things like jump to definition, find usages, type inference, and documentation generation.
> srclib handles: package detection, global dependency resolution, type inference, querying the graph of definitions and references in code, versioning using different VCS systems, and semantic blaming.
<!-- TODO -->

## [ctags](http://ctags.sourceforge.net/)

> Ctags generates an index (or tag) file of language objects found in source files that allows these items to be quickly and easily located by a text editor or other utility. A tag signifies a language object for which an index entry is available (or, alternatively, the index entry created for that object).
Both Babelfish and Ctags provides positional information for identifiers,
classes, directives, etc.

The main difference is that Ctags does not provide any form of AST, while
Babelfish provides native language AST as well as language-independent UAST.

Also, Babelfish ecosystem allows making more complex queries over AST.

## [ANTLR](https://github.com/antlr/antlr4)

> ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build parse trees and also generates a listener interface (or visitor) that makes it easy to respond to the recognition of phrases of interest.
ANTLR is a language generator with support for most existing languages
and data formats. But parsers usually produce parse trees that are usually
different from native language AST, thus requiring additional processing
to be used for analysis. The structure of parse tree is also language-dependent.

Babelfish provides a correct native language AST as well as
language-independent UAST for all supported languages.

## [Tree-sitter](https://github.com/tree-sitter/tree-sitter)

> Tree-sitter is a C library for incremental parsing, intended to be used via bindings to higher-level languages. It can be used to build a concrete syntax tree for a program and efficiently update the syntax tree as the program is edited. This makes it suitable for use in text-editing programs.
Tree-sitter provides a simplified AST to be able to execute a limited set
of queries. It usually cannot provide enough details suitable for static
analysis. Instead, it focuses on performance and implements real-time
tree diffing.

Babelfish provides a full native language AST as well as language-independent
UAST for all supported languages. It is suitable for static analysis since
it preserves all features of original AST.

## [srcML](https://www.srcml.org/)

> The srcML format is an XML representation for source code, where the markup tags identify elements of the abstract syntax for the language. The srcml program is a command line application for the conversion source code to srcML, an interface for the exploration, analysis, and manipulation of source code in this form, and the conversion of srcML back to source code. The current parsing technologies supports C/C++, C#, and Java.
srcML defines an XML schema to annotate source code files with AST structure.
It also provides tools to query and analyze files in this format. An AST
structure is language-dependent.

Babelfish provides a native language AST with positional information, that
can be used to generate the same markup. Also, it provides a language-independent
UAST for all supported languages. It allows performing the same queries
and analysis for different programming languages.

## [SmaCC](http://www.refactoryworkers.com/SmaCC.html)

> SmaCC (Smalltalk Compiler-Compiler) is a freely available parser generator for Smalltalk. It generates LR parsers and is a replacement for the T-Gen parser generator. SmaCC overcomes many of T-Gen's limitations that make it difficult to produce parsers. SmaCC can generate parsers for ambiguous grammars and grammars with overlapping tokens. Both of these are not possible using T-Gen. In addition to handling more grammars than T-Gen, SmaCC has a smaller runtime than T-Gen and is faster than T-Gen. The latest version of SmaCC has support for GLR parsing, generating abstract syntax trees (ASTs), and transforming code.
SmaCC provides tools to build parser generators that can produce ASTs
that are close to native ASTs of programming languages. It also allows
applying transformations and rewrites to the tree. But generated AST is
still language-dependent.

Babelfish also generates a native AST and provides an SDK to perform tree
rewrites. But it also provides a language-independent UAST for all
supported languages.

0 comments on commit 32ee704

Please sign in to comment.