Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Mimir Conceptual Overview
The goal of this document is to provide a more conceptual overview of Mimir than simple code documentation can accomplish. Topics covered include Mimir's internal algebraic query representation, Mimir's C-Tables-based data model for ambiguous, incomplete, and probabilistic data, and the two main constructs in Mimir: Models and Lenses.
You may find it convenient to follow along with the documentation for the class mimir.Database. This class serves as the central exchange for everything that happens in Mimir. Different components of Mimir are modularized and farmed out to different sub-packages, but
Database includes references to all of them and convenience methods for interacting with multiple components at once. Database also includes the two main methods for running queries in Mimir:
db.query(q): Compile, optimize, and run a query through the Mimir wrapper.
db.backend.execute(q): Run a query directly on the backend database, without invoking the Mimir wrapper.
Below, when we refer to components defined in the database class, we'll mention how they are referenced. By convention, the
Database class appears throughout the Mimir codebase with the name
db, so for example, the view manager would typically be referenced as
Table of Contents
Internally, Mimir represents queries in an adapted form of Codd's Relational Algebra. This section overviews the Abstract Syntax Trees or ASTs used to to represent queries and primitive-valued expressions.
Mimir internal code needs to be able to interact with the database in a number of ways, from querying the backend, to managing internal Mimir state. This section outlines the tools that Mimir includes for doing so.
A number of components of Mimir share the need for obtaining certain kinds of statistics or summary structures of tables in the backend database. The functionality needed to assemble these can be found in the
- An (approximate) Functional Dependency graph for a de-normalized table.
- Detecting columns that are likely to represent sequential identifiers for data (i.e., that are likely to define an ordering over the table).
While rare, it may occasionally be necessary to edit the parser to add new SQL commands. This section details the organization of the parser, and gives some examples of how to properly add new commands. TLDR: Do not edit any file in
src/main/java/mimir/parser except for
To capture ambiguity and uncertainty in data, Mimir uses an encoding strategy called Virtual C-Tables. This section begins by introducing principles of incomplete databases, starting with the high-level conceptual Possible Worlds Semantics, before introducing successively more refined and practical representations (V-Tables, C-Tables, and Virtual C-Tables).
This section brings everything together, introducing the two key components of Mimir: (1) Models, wrappers around existing ML tools, frameworks, and techniques, and (2) Lenses, structural wrappers that allow Models to dictate how data should be transformed.