Skip to content
This repository has been archived by the owner on Sep 27, 2019. It is now read-only.

Adding an SQL Operator

Ivan.Yang edited this page May 1, 2017 · 8 revisions

Introduction

The current architecture of Peloton contains a mix of both PostgreSQL components and our own components. In particular, we use the SQL parser and planner of Postgres, and then we use our own execution engine to execute these generated plans. We also use our own storage engine to store the databases. More information on the tile-based architecture of our execution and storage engines is available here.

Execution Engine

Before going about adding an operator, you might want to look at existing Peloton operators in src/executor. In particular, the limit operator is kind of straightforward.

All the operators inherit from the abstract operator class. In particular, each operator has a Init and Execute functions. These functions should initialize/reinitialize and execute the respective operator.

When a parent operator invokes the Execute function of a child operator, the child operator returns false if and only if it has already returned all the logical tiles it has produced to the parent operator. It will never return an empty logical tile. Otherwise, the child operator returns true, and the parent operator can use GetOutput to obtain the logical tile produced by the child operator. A parent operator can, therefore, repeatedly invoke Execute function of a child operator to obtain all the logical tiles produced by the child operator.

Peloton Query Processing

We intercept the generated plan trees generated by the Postgres planner component and use our own executor component. Now, we will get into the specifics within our side of things.

Queries are classified into data description language (DDL) queries and data manipulation language (DML) queries. These two categories of queries take two different processing paths both within the Postgres frontend and Peloton.

In Postgres, DML queries are executed in four stages. Take a look at the entry point of the Postgres executor module here. ExecutorStart() performs some initialization that sets up the dynamic plan state tree from the static plan tree. ExecutorRun() invokes the plan state tree. ExecutorFinish() and ExecutorEnd() take care of cleaning things up, but they are not relevant to us. Peloton takes over query execution when queries reach ExecutorRun(), and we therefore only make use of ExecutorStart() in our system.

In case of DDL queries, Peloton intercepts them in the ProcessUtility function here.

Peloton cannot directly execute the Postgres plan state tree as our executors can only understand our own Peloton query plan tree. So, we need to transform the Postgres plan state tree into a Peloton plan tree before execution. We refer to this process as plan mapping or plan transformation. After mapping the plan, Peloton executes the plan tree by recursively executing the plan tree nodes. We obtain Peloton tuples after query processing. We then transform them back into Postgres tuples before sending them back to the client via the Postgres frontend.

Plan Mapping

After taking over from Postgres, DDL queries are handled by peloton_ddl(), whereas DML queries would be processed by peloton_dml(). These functions are located here within the peloton module.

Plan mapping is done only for DML queries, since DDL queries do not require any planning. The high-level idea is to map each plan node in the Postgres plan state tree recursively into a corresponding plan node in the Peloton plan tree. The plan mapper module preprocesses the plan state tree, and extracts the critical information from each Postgres plan node. This preprocessing is performed by functions in the peloton::bridge::DMLUtils namespace. The main PlanTransformer would then transform the preprocessed plan by recursively invoking sub-transformers based on the type of node in the tree. An entry point for this module is peloton::bridge::PlanTransformer::TransformPlan().

Peloton Plan Execution

Peloton then builds an executor tree based on the Peloton query plan tree. It then runs the executor tree recursively.

Execution context is the state associated with an instance of the plan execution, such as parameters and transaction information. By separating the execution context from the query plan, we can support prepared statements. A planned and then mapped query plan can be reused with different execution contexts. This saves time spent for query planning and mapping.

After that, query execution consists of two stages. The execution tree has to be initialized (DInit()), and then it is executed (DExecute()). An entry point for this module is here.

Expression System

We have our own expression system in Peloton. We transform the Postgres expressions into Peloton expressions, and evaluate them. All the expressions are based on the abstract expression class.

The code related to our expression system is located under src/backend/expression. There are several file containing utility functions like this one containing date-related functions.

The expression system is tightly coupled with our type system. The type system is based on an abstract data type called Value. The associated code is located here.

Adding an operator

Adding an operator will involve working with our plan mapper, execution engine, and the expression system.

It will probably involve changes in the operator to plan transformer, the plan executor, and the execution engine.

Example operator

In this paragraph, we show how to support like operator in Peloton by slightly modifying the source code. Peloton reuses Postgres' expression system and transforms Postgres expressions into Peloton expressions. The related functions are implemented in expr_transformer.cpp. In particular, TransformExpr() is responsible for performing the expression transformation. As the plan node associated to the like operator is tagged with T_OpExpr, TransformExpr() invokes TransformOp() in order to search for the correct expression type in Peloton using pg_func_id, which is the operator's unique ID in Postgres. The mapping information is recorded in an unordered map called kPgFuncMap, written in pg_func_map.cpp. The value of pg_func_id for like operator can be either 850 (char type) or 1631 (varchar type and text type), so we add the following two lines in pg_func_map.cpp:

{850, {EXPRESSION_TYPE_COMPARE_LIKE, 2}},
{1631, {EXPRESSION_TYPE_COMPARE_LIKE, 2}},

EXPRESSION_TYPE_COMPARE_LIKE is the unique expression identifier for like operator in Peloton, which must be defined in types.h. Now we have successfully mapped the like operator from Postgres expression to Peloton expression. The next step is to tell Peloton how to execute like. Since like is a comparison operator, ExpressionUtil::ComparisonFactory() in expression_util.cpp will be called to perform the execution logic implemented for like. Just like other comparison operators, the only thing we need to do for supporting the execution of like is to add EXPRESSION_TYPE_COMPARE_LIKE into the switch-case statement in ExpressionUtil::ComparisonFactory(), and the detailed execution logic has already been implemented in the class CmpLike.

Following the instructions described above, you can also implement any other operators on your own by adding a few lines of code to Peloton.

Clone this wiki locally