Skip to content

RandomQueryGeneratorTransforms

philip-stoev edited this page Jul 17, 2012 · 1 revision

Table of Contents

Transforms Overview

Testing a database server requires that the results returned by queries are checked against a validated source of correct results. Usually, this is will be a human, as in а mysqltest test case, but it can be another database server (as are the comparison tests run with the RQG).

The third option is to compare each query to a rewritten version of it that is known to preserve the original properties (e.g. the rewritten query returns the same rows as the original one, or a subset or a superset of those rows). This way, the server can be checked against itself and one optimization can be verified against another.

Transformers are also useful to make sure that a particular SELECT can function properly in various circumstances, for example as a stored procedure, prepared statement or a subselect in a bigger query.

Running

If you use --validator=Transformer, then (almost) all available Transforms will be tried against each generated query.

Warning: You likely want to avoid executing a Transformer RQG run with anything more than 1 thread. What may happen (if you do not use --threads=1) is that a difference set/testcase is created due to the underlying data being modified by another thread.

Available Transforms

The following Transforms are currently available and most are enabled by default when you run the test with --validator=Transformer :

  • ConvertSubqueriesToViews - converts each subselect into a view;
  • Count - Converts a normal SELECT query to a SELECT COUNT(*) and vice versa and validates that the COUNT(*) returns the same number as there are rows in the original query;
  • Distinct - Adds or removes DISTINCT and checks that the new result set is correct with respect to DISTINCT;
  • FromSubquery - Converts the FROM of the query into a subquery.
  • Having - removes the HAVING part of the query and checks that the new result is a superset of the original result;
  • InlineSubqueries - takes non-correlated subqueries, executes them separately and in-lines the values in the original query;
  • InlineVirtualColumns - inlines any virtual column expressions. Due to various issues around casting, the concept of negative zero and data types, the resulting query may not be 100% identical and false positives may be observed; Not enabled by default.
  • LimitDecrease - converts the query into a LIMIT 1 query;
  • LimitIncrease - converts the query into a LIMIT 4294836225 query;
  • OrderBy - removes ORDER BY which should not change the result set, only its ordering;
  • StraightJoin - adds or removes STRAIGHT_JOIN from each SELECT.
  • DisableIndexes - re-executes each query under ALTER TABLE DISABLE KEYS in order to force a different query execution plan. Primary keys will remain operational. ALTER TABLE DISABLE KEYS works only with MyISAM tables and should only be used with --threads=1. May cause the test to slow down due to the extra processing around the numerous ALTERs.
  • RemoveIndexHints - removes FORCE|USE|IGNORE KEY and runs the query again, hoping that the query plan will change. Since adding a FORCE|USE|IGNORE KEY in the Transform is difficult, it is much easier to have the grammar contain FORCE|USE|IGNORE KEY initially and then use the Transform to remove such hints.
  • ExecuteAsView - defines each select as a VIEW and then executes SELECT * against the view;
  • ExecuteAsSPTwice - converts a query into a stored procedure, then calls the procedure twice. The result sets should be an unordered match. This transform is best used with a grammar that contains views or semijoin-using subqueries.
  • ExecuteAsPreparedTwice - converts a query into a prepared statement, then calls it twice. The result sets should be an unordered match of the original query. This transform is best used with a grammar that contains views or semijoin-using subqueries
  • ExecuteAsInsertSelect - executes the query in CREATE ... SELECT , INSERT ... SELECT and REPLACE ... SELECT contexts and verifies that the table was populated with the same data that will be returned by the standalone SELECT.
  • ExecuteAsSelectItem - executes the query as a subquery in the item list. For example SELECT 1 will be converted into SELECT ( SELECT 1 ) AS s1 . This transform is only possible for queries that return exactly 1 row and one column, so it will be applied infrequently.
  • ExecuteAsUnion - runs the query as a UNION and UNION ALL, both the first and the second (last) query of the union.

Selecting specific Transforms

To run the Transformer Validator with only certain Transforms available, you can edit randgen/lib/GenTest/Validator/Transformer.pm

Creating New Transforms

Each Transform is specified in a .pm package in the GenTest::Transform containing a transfrom($original_query, $executor, $original_result) method that takes three arguments:

  • the original query that is to be transformed;
  • an GenTest::Executor object that can be used to obtain metadata and run any additional SQL queries that Transform class may need in order to transform the original query;
  • a reference to the original result from the original query.
The transform() method returns either a numeric status code, or an arrayref of query strings. If the return code is STATUS_WONT_HANDLE this signals that the Transform is not willing to process this particular query.

If an arrayref of query strings is included, those queries will be executed in order, and each query tagged with a TRANSFORM... tag will be will be used for validation purposes. Queries that do not return a result set (e.g. any DDL queries that create or destroy temporary database objects) or are not tagged will simply be executed and their result code will be disregarded. For example, the ExecuteAsView Transform uses such DDL queries to construct a temporary view that is then destroyed.

This returned queries that are to be used for the comparison/validation have to be tagged with one of the following tags inside a /* */ comment, to signal the way the result set from this query needs to be compared to the result set from the original query:

  • TRANSFORM_OUTCOME_EXACT_MATCH - the result set from the transformed query must be an exact match to the result set of the original query. This is a very strict requirement that is very difficult to meet if the execution strategy for the two queries is different. It requires that both queries have a strict ORDER BY clause, involving either the unique keys of all participating tables, or all participating columns;
  • TRANSFORM_OUTCOME_UNORDERED_MATCH - the result set from the transformed query must match the result set of the original query except for the ordering. This is the tag to use when your transformed query may not have a strict ORDER BY and may have a different execution plan from the original query. This is also the tag to use when you are removing the ORDER BY clause from the original query.
  • TRANSFORM_OUTCOME_SUPERSET - the result set from the transformed query must be a superset of the result set of the original query. This should be the tag in case you removed the WHERE or HAVING from the original query.
  • TRANSFORM_OUTCOME_SUBSET - the result set from the transformed query must be a subset of the original result set. This is useful in case you add extra WHERE, HAVING or LIMIT clauses to the original query.
  • TRANSFORM_OUTCOME_SINGLE_ROW - the result set from the transformed query is zero or one rows, and this row is one of the rows from the original result set. This is useful if you add LIMIT 1 to a query that does not have a strict ORDER BY clause.
  • TRANSFORM_OUTCOME_FIRST_ROW - the result set from the transformed query is zero or one rows, and this row is identical to the first row from the original result set. This is useful if you are adding LIMIT 1 to a query that does have a strict ORDER BY clause.
  • TRANSFORM_OUTCOME_DISTINCT - the result set from the transformed query matches the result set of the original query, except for any duplicate rows. This is used when you are adding DISTINCT to the original query.
  • TRANSFORM_OUTCOME_COUNT - the result set from the transformed query has the number of rows in the original result set in its first column of the first row, or vice versa. This is used when transforming an existing query into a COUNT(*) or the other way around.