Concepts

Oliver Kennedy edited this page Dec 19, 2017 · 17 revisions

Mimir Conceptual Overview

The goal of this document is to provide a more conceptual overview of Mimir than simple code documentation can accomplish. Topics covered include Mimir's internal algebraic query representation, Mimir's C-Tables-based data model for ambiguous, incomplete, and probabilistic data, and the two main constructs in Mimir: Models and Lenses.

You may find it convenient to follow along with the documentation for the class mimir.Database. This class serves as the central exchange for everything that happens in Mimir. Different components of Mimir are modularized and farmed out to different sub-packages, but Database includes references to all of them and convenience methods for interacting with multiple components at once. Database also includes the two main methods for running queries in Mimir:

  • db.query(q): Compile, optimize, and run a query through the Mimir wrapper.
  • db.backend.execute(q): Run a query directly on the backend database, without invoking the Mimir wrapper.

Below, when we refer to components defined in the database class, we'll mention how they are referenced. By convention, the Database class appears throughout the Mimir codebase with the name db, so for example, the view manager would typically be referenced as db.views

Table of Contents

Relational Algebra and Expressions

Internally, Mimir represents queries in an adapted form of Codd's Relational Algebra. This section overviews the Abstract Syntax Trees or ASTs used to to represent queries and primitive-valued expressions.

Database Programming

Mimir internal code needs to be able to interact with the database in a number of ways, from querying the backend, to managing internal Mimir state. This section outlines the tools that Mimir includes for doing so.

Unified Statistics Tools

A number of components of Mimir share the need for obtaining certain kinds of statistics or summary structures of tables in the backend database. The functionality needed to assemble these can be found in the mimir.statistics package.

  • An (approximate) Functional Dependency graph for a de-normalized table.
  • Detecting columns that are likely to represent sequential identifiers for data (i.e., that are likely to define an ordering over the table).

Editing the Parser

While rare, it may occasionally be necessary to edit the parser to add new SQL commands. This section details the organization of the parser, and gives some examples of how to properly add new commands. TLDR: Do not edit any file in src/main/java/mimir/parser except for CCJSqlParser.jj.

C-Tables and Incomplete Databases

To capture ambiguity and uncertainty in data, Mimir uses an encoding strategy called Virtual C-Tables. This section begins by introducing principles of incomplete databases, starting with the high-level conceptual Possible Worlds Semantics, before introducing successively more refined and practical representations (V-Tables, C-Tables, and Virtual C-Tables).

Wrapping ML Tools

This section brings everything together, introducing the two key components of Mimir: (1) Models, wrappers around existing ML tools, frameworks, and techniques, and (2) Lenses, structural wrappers that allow Models to dictate how data should be transformed.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.