Skip to content

Commit

Permalink
MONDRIAN: Schema extensions and optimizations to allow elimination of…
Browse files Browse the repository at this point in the history
… GROUP BY in specific situations, as well as to leave individual property columns out of the group by if their values are known to be functionally dependent on the values of the associated level columns. The latter enhancement is currently MySQL-only. Submitted on behalf of Marin Software.

[git-p4: depot-paths = "//open/mondrian/": change = 12915]
  • Loading branch information
Eric McDermid committed Jul 9, 2009
1 parent 518c54a commit 65ab468
Show file tree
Hide file tree
Showing 15 changed files with 500 additions and 37 deletions.
172 changes: 172 additions & 0 deletions doc/schema.html
Expand Up @@ -45,6 +45,7 @@
<li><a href="#Member_properties_and_formatters">Member properties and formatters</a></li>
<li><a href="#Approximate_level_cardinality">Approximate level cardinality</a></li>
<li><a href="#Default_Measure_Attribute">Default Measure Attribute</a></li>
<li><a href="#Functional_dependency_optimizations">Functional Dependency Optimizations</a></li>
</ol>
</ol>
</li>
Expand Down Expand Up @@ -970,6 +971,177 @@ <h1>3.3.10 Default Measure Attribute<a name="Default_Measure_Attribute">&nbsp;</
<div style="padding-left:20px">&lt;/<a href="#XML_VirtualCube">VirtualCube</a>&gt;</div>
</code>
</blockquote>
<!--
####################################################
## 3.3.11 Functional Dependency Optimizations #####
#################################################### -->
<h1>3.3.11 Functional Dependency Optimizations<a name="Functional_dependency_optimizations">&nbsp;</a></h1>

<p>In some circumstances, it may be possible to optimize performance by taking advantage
of known functional dependencies in the data being processed. Such dependencies
are typically the result of business rules associated with the systems producing
the data, and often cannot be inferred just by looking at the data itself.</p>

<p>Functional dependencies are declared to Mondrian using the
<a href="#dependsOnLevelValue">&lt;dependsOnLevelValue&gt</a> attribute of the
<a href="#XML_Property">&lt;Property&gt</a> element and the
<a href="#uniqueKeyLevelName">&lt;uniqueKeyLevelName&gt</a> attribute of the
<a href="#XML_Hierarchy">&lt;Hierarchy&gt</a> element.</p>

<p>The <a href="#dependsOnLevelValue">&lt;dependsOnLevelValue&gt</a> attribute of a
<a href="#Member_properties">member property</a> is used to indicate
that the value of the member property is functionally dependent on the value
of the <a href="#XML_Level">&lt;Level&gt</a> in which the member property is
defined. In other words, for a given value of the level, the value of the
property is invariant.</p>

<p>The <a href="#uniqueKeyLevelName">&lt;uniqueKeyLevelName&gt</a> attribute of a
<a href="#XML_Hierarchy">&lt;Hierarchy&gt</a> is used to indicate that the given
level (if any) taken together with all higher levels in the hierarchy acts as a
unique alternate key, ensuring that for any unique combination of those level values,
there is exactly one combination of values for all levels below it.

<p>To illustrate, consider a hierarchy modeling cars built and licensed in the
United States:</p>

<blockquote style="text-indent: -20px">
<code>
<div style="padding-left:20px;">&lt;<a href="#XML_Dimension">Dimension</a> name="Automotive" foreignKey="auto_dim_id"&gt;</div>
<div style="padding-left:40px;">&lt;<a href="#XML_Hierarchy">Hierarchy</a> hasAll="true" primaryKey="auto_dim_id" uniqueKeyLevelName="Vehicle Identification Number"&gt;</div>
<div style="padding-left:60px;">&lt;<a href="#XML_Table">Table</a> name="automotive_dim"/&gt;</div>
<div style="padding-left:60px">&lt;<a href="#XML_Level">Level</a> name="Make" column="make_id" type="Numeric"/&gt;</div>
<div style="padding-left:60px">&lt;<a href="#XML_Level">Level</a> name="Model" column="model_id" type="Numeric"/&gt;</div>
<div style="padding-left:60px">&lt;<a href="#XML_Level">Level</a> name="ManufacturingPlant" column="plant_id" type="Numeric"/&gt;</div>
<div style="padding-left:80px">&lt;<a href="#XML_Property">Property</a> name="State" column="plant_state_id" type="Numeric" dependsOnLevelValue="true"/&gt;</div>
<div style="padding-left:80px">&lt;<a href="#XML_Property">Property</a> name="City" column="plant_city_id" type="Numeric" dependsOnLevelValue="true"/&gt;</div>
<div style="padding-left:60px">&lt;<a href="#XML_Level">Level</a> name="Vehicle Identification Number" column="vehicle_id" type="Numeric"/&gt;</div>
<div style="padding-left:80px">&lt;<a href="#XML_Property">Property</a> name="Color" column="color_id" type="Numeric" dependsOnLevelValue="true"/&gt;</div>
<div style="padding-left:80px">&lt;<a href="#XML_Property">Property</a> name="Trim" column="trim_id" type="Numeric" dependsOnLevelValue="true"/&gt;</div>
<div style="padding-left:60px">&lt;<a href="#XML_Level">Level</a> name="LicensePlateNum" column="license_id" type="String"/&gt;</div>
<div style="padding-left:80px">&lt;<a href="#XML_Property">Property</a> name="State" column="license_state_id" type="Numeric" dependsOnLevelValue="true"/&gt;</div>
<div style="padding-left:40px">&lt;/<a href="#XML_Hierarchy">Hierarchy</a>&gt;</div>
<div style="padding-left:20px;">&lt;/<a href="#XML_Dimension">Dimension</a>&gt;</div>
</code>
</blockquote>

<p>In the above example, we know that a given manufacturing plant exists only in
a single city and state, that a given car only has one color scheme and
one trim level, and that the license number is associated with a single state.
Therefore, we can state that all of these member properties are functionally
dependent on the associated level values.</p>

<p>Additionally, we know that the Vehicle Identification Number uniquely identifies
each car, and that each car only has one license. Thus, we know that the
combination of Make, Model, Manufacturing Plant, and Vehicle Ientification
Number uniquely identify each vehicle; the license number is redundant.</p>

<p>These attributes enable optimization of the GROUP BY clause in the SQL statements Mondrian generates. Absent any
functional dependency information, a typical query on the Automotive dimension
would look something like:</p>


<blockquote style="text-indent: -20px">
<code>
<div style="padding-left:20px;">SELECT</div>
<div style="padding-left:40px;">`automotive_dim`.`make_id` AS c0,</div>
<div style="padding-left:40px;">`automotive_dim`.`model_id` AS c1,</div>
<div style="padding-left:40px;">`automotive_dim`.`plant_id` AS c2,</div>
<div style="padding-left:40px;">`automotive_dim`.`plant_state_id` AS c3,</div>
<div style="padding-left:40px;">`automotive_dim`.`plant_city_id` AS c4,</div>
<div style="padding-left:40px;">`automotive_dim`.`vehicle_id` AS c5,</div>
<div style="padding-left:40px;">`automotive_dim`.`color_id` AS c6,</div>
<div style="padding-left:40px;">`automotive_dim`.`trim_id` AS c7,</div>
<div style="padding-left:40px;">`automotive_dim`.`license_id` AS c8,</div>
<div style="padding-left:40px;">`automotive_dim`.`license_state_id` AS c9</div>
<div style="padding-left:20px;">FROM</div>
<div style="padding-left:40px;">`automotive_dim` AS `automotive_dim`,</div>
<div style="padding-left:20px;">GROUP BY</div>
<div style="padding-left:40px;">`automotive_dim`.`make_id`,</div>
<div style="padding-left:40px;">`automotive_dim`.`model_id`,</div>
<div style="padding-left:40px;">`automotive_dim`.`plant_id`,</div>
<div style="padding-left:40px;">`automotive_dim`.`plant_state_id`,</div>
<div style="padding-left:40px;">`automotive_dim`.`plant_city_id`,</div>
<div style="padding-left:40px;">`automotive_dim`.`vehicle_id`,</div>
<div style="padding-left:40px;">`automotive_dim`.`color_id`,</div>
<div style="padding-left:40px;">`automotive_dim`.`trim_id`,</div>
<div style="padding-left:40px;">`automotive_dim`.`license_id`,</div>
<div style="padding-left:40px;">`automotive_dim`.`license_state_id`</div>
<div style="padding-left:20px;">ORDER BY</div>
<div style="padding-left:40px;">`...</div>
</code>
</blockquote>

<p>Given the functional dependence attributes in the schema example above, however,
we know that the query is selecting at a depth that includes the "unique key" level,
and that all properties in the query are also functionally dependent on their levels.
In such cases the GROUP BY clause is redundant and may be eliminated completely,
reducing SQL query performance significantly on some databases:</p>

<blockquote style="text-indent: -20px">
<code>
<div style="padding-left:20px;">SELECT</div>
<div style="padding-left:40px;">`automotive_dim`.`make_id` AS c0,</div>
<div style="padding-left:40px;">`automotive_dim`.`model_id` AS c1,</div>
<div style="padding-left:40px;">`automotive_dim`.`plant_id` AS c2,</div>
<div style="padding-left:40px;">`automotive_dim`.`plant_state_id` AS c3,</div>
<div style="padding-left:40px;">`automotive_dim`.`plant_city_id` AS c4,</div>
<div style="padding-left:40px;">`automotive_dim`.`vehicle_id` AS c5,</div>
<div style="padding-left:40px;">`automotive_dim`.`color_id` AS c6,</div>
<div style="padding-left:40px;">`automotive_dim`.`trim_id` AS c7,</div>
<div style="padding-left:40px;">`automotive_dim`.`license_id` AS c8,</div>
<div style="padding-left:40px;">`automotive_dim`.`license_state_id` AS c9</div>
<div style="padding-left:20px;">FROM</div>
<div style="padding-left:40px;">`automotive_dim` AS `automotive_dim`,</div>
<div style="padding-left:20px;">ORDER BY</div>
<div style="padding-left:40px;">`...</div>
</code>
</blockquote>


<p>Had the query not been deep enough to include the "unique key" level, or
had any of the member properties not been functionally dependent on their level,
this optimization would not be possible.</p>

<p>In some cases, a different optimization can be made where there is no "unique key"
level, but some or all of the member properties are functionally dependent on
their level. Some databases (notably MySQL) permit columns to be listed in the
SELECT clause that do not also appear in the GROUP BY clause. On such databases,
Mondrian can simply leave the functionally dependent member properties out of
the GROUP BY, which may reduce SQL query processing time substantially:</p>

<blockquote style="text-indent: -20px">
<code>
<div style="padding-left:20px;">SELECT</div>
<div style="padding-left:40px;">`automotive_dim`.`make_id` AS c0,</div>
<div style="padding-left:40px;">`automotive_dim`.`model_id` AS c1,</div>
<div style="padding-left:40px;">`automotive_dim`.`plant_id` AS c2,</div>
<div style="padding-left:40px;">`automotive_dim`.`plant_state_id` AS c3,</div>
<div style="padding-left:40px;">`automotive_dim`.`plant_city_id` AS c4,</div>
<div style="padding-left:40px;">`automotive_dim`.`vehicle_id` AS c5,</div>
<div style="padding-left:40px;">`automotive_dim`.`color_id` AS c6,</div>
<div style="padding-left:40px;">`automotive_dim`.`trim_id` AS c7,</div>
<div style="padding-left:40px;">`automotive_dim`.`license_id` AS c8,</div>
<div style="padding-left:40px;">`automotive_dim`.`license_state_id` AS c9</div>
<div style="padding-left:20px;">FROM</div>
<div style="padding-left:40px;">`automotive_dim` AS `automotive_dim`,</div>
<div style="padding-left:20px;">GROUP BY</div>
<div style="padding-left:40px;">`automotive_dim`.`make_id`,</div>
<div style="padding-left:40px;">`automotive_dim`.`model_id`,</div>
<div style="padding-left:40px;">`automotive_dim`.`plant_id`,</div>
<div style="padding-left:40px;">`automotive_dim`.`vehicle_id`,</div>
<div style="padding-left:40px;">`automotive_dim`.`license_id`,</div>
<div style="padding-left:20px;">ORDER BY</div>
<div style="padding-left:40px;">`...</div>
</code>
</blockquote>

<p>Please note that significant changes are anticipated to the schema syntax
in Mondrian 4.0, including a new approach to declaring functional dependencies.
While the expectation is that the 4.0 schema processor will maintain backward
compatibility with schemas developed for Mondrian 3.1, these are transitional
attributes introduced to allow support in the interim, and 4.0 will not be
backward compatible with them. Therefore, any schema using these attributes will
need to be migrated to the new syntax as part of upgrading to Mondrian 4.0</p>

<!--
#######################################
Expand Down
16 changes: 16 additions & 0 deletions src/main/mondrian/olap/Mondrian.xml
Expand Up @@ -509,6 +509,13 @@ Revision is $Id$
Can be localized from Properties file using #{propertyname}.
</Doc>
</Attribute>
<Attribute name="uniqueKeyLevelName" required="false">
<Doc>
Should be set to the level (if such a level exists) at which depth it is known
that all members have entirely unique rows, allowing SQL GROUP BY clauses to
be completely eliminated from the query.
</Doc>
</Attribute>

<Object name="relation" type="RelationOrJoin">
<Doc>
Expand Down Expand Up @@ -784,6 +791,15 @@ Revision is $Id$
Can be localized from Properties file using #{propertyname}.
</Doc>
</Attribute>
<Attribute name="dependsOnLevelValue" required="false" type="Boolean">
<Doc>
Should be set to true if the value of the property is functionally dependent on
the level value. This permits the associated property column to be omitted from
the GROUP BY clause (if the database permits columns in the SELECT that are not
in the GROUP BY). This can be a significant performance enhancement on some
databases, such as MySQL.
</Doc>
</Attribute>
</Element>

<Element type="Measure">
Expand Down
16 changes: 16 additions & 0 deletions src/main/mondrian/olap/Mondrian_SW.xml
Expand Up @@ -521,6 +521,13 @@
Can be localized from Properties file using #{propertyname}.
</Doc>
</Attribute>
<Attribute name="uniqueKeyLevelName" required="false">
<Doc>
Should be set to the level (if such a level exists) at which depth it is known
that all members have entirely unique rows, allowing SQL GROUP BY clauses to
be completely eliminated from the query.
</Doc>
</Attribute>

<Object name="relation" type="RelationOrJoin">
<Doc>
Expand Down Expand Up @@ -809,6 +816,15 @@
Can be localized from Properties file using #{propertyname}.
</Doc>
</Attribute>
<Attribute name="dependsOnLevelValue" required="false" type="Boolean">
<Doc>
Should be set to true if the value of the property is functionally dependent on
the level value. This permits the associated property column to be omitted from
the GROUP BY clause (if the database permits columns in the SELECT that are not
in the GROUP BY). This can be a significant performance enhancement on some
databases, such as MySQL.
</Doc>
</Attribute>
</Element>

<Element type="Measure">
Expand Down
16 changes: 16 additions & 0 deletions src/main/mondrian/rolap/RolapHierarchy.java
Expand Up @@ -55,6 +55,7 @@ public class RolapHierarchy extends HierarchyBase {
private RolapNullMember nullMember;

private String sharedHierarchyName;
private String uniqueKeyLevelName;

private Exp aggregateChildrenExpression;

Expand Down Expand Up @@ -156,6 +157,7 @@ public class RolapHierarchy extends HierarchyBase {
getRolapSchema().getDialect());
}
this.memberReaderClass = xmlHierarchy.memberReaderClass;
this.uniqueKeyLevelName = xmlHierarchy.uniqueKeyLevelName;

// Create an 'all' level even if the hierarchy does not officially
// have one.
Expand Down Expand Up @@ -892,6 +894,20 @@ public void setDefaultMember(Member defaultMember) {
}


/**
* Gets "unique key level name" attribute of this Hierarchy, if set.
* If set, this property indicates that all level properties are
* functionally dependent (invariant) on their associated levels,
* and that the set of levels from the root to the named level (inclusive)
* effectively defines an alternate key.
*
* This allows the GROUP BY to be eliminated from associated queries.
*/
public String getUniqueKeyLevelName() {
return uniqueKeyLevelName;
}


/**
* A <code>RolapNullMember</code> is the null member of its hierarchy.
* Every hierarchy has precisely one. They are yielded by operations such as
Expand Down
7 changes: 5 additions & 2 deletions src/main/mondrian/rolap/RolapLevel.java
Expand Up @@ -338,7 +338,7 @@ private static RolapProperty[] createProperties(
list.add(
new RolapProperty(
Property.NAME.name, Property.Datatype.TYPE_STRING,
nameExp, null, null, true));
nameExp, null, null, null, true));
}
for (int i = 0; i < xmlLevel.properties.length; i++) {
MondrianDef.Property property = xmlLevel.properties[i];
Expand All @@ -347,7 +347,10 @@ private static RolapProperty[] createProperties(
property.name,
convertPropertyTypeNameToCode(property.type),
xmlLevel.getPropertyExp(i),
property.formatter, property.caption, false));
property.formatter,
property.caption,
xmlLevel.properties[i].dependsOnLevelValue,
false));
}
return list.toArray(new RolapProperty[list.size()]);
}
Expand Down
19 changes: 19 additions & 0 deletions src/main/mondrian/rolap/RolapProperty.java
Expand Up @@ -34,6 +34,7 @@ class RolapProperty extends Property {

private final PropertyFormatter formatter;
private final String caption;
private final boolean dependsOnLevelValue;

/** The column or expression which yields the property's value. */
private final MondrianDef.Expression exp;
Expand All @@ -56,12 +57,16 @@ class RolapProperty extends Property {
MondrianDef.Expression exp,
String formatterDef,
String caption,
Boolean dependsOnLevelValue,
boolean internal)
{
super(name, type, -1, internal, false, false, null);
this.exp = exp;
this.caption = caption;
this.formatter = makePropertyFormatter(formatterDef);
this.dependsOnLevelValue =
dependsOnLevelValue == null ? false
: dependsOnLevelValue.booleanValue();
}

private PropertyFormatter makePropertyFormatter(String formatterDef) {
Expand Down Expand Up @@ -102,6 +107,20 @@ public String getCaption() {
}
return caption;
}

/**
* @return Returns the dependsOnLevelValue setting (if unset,
* returns false). This indicates whether the property is
* functionally dependent on the level with which it is
* associated.
*
* If true, then the property column can be eliminated from
* the GROUP BY clause for queries on certain databases such
* as MySQL.
*/
public boolean dependsOnLevelValue() {
return dependsOnLevelValue;
}
}

// End RolapProperty.java
4 changes: 2 additions & 2 deletions src/main/mondrian/rolap/RolapStar.java
Expand Up @@ -924,9 +924,9 @@ public String generateSql(
columnExpr = measure.getAggregator().getExpression(columnExpr);
}
final String columnName = columnNameList.get(k);
query.addSelect(columnExpr, columnName);
String alias = query.addSelect(columnExpr, columnName);
if (!(column instanceof Measure)) {
query.addGroupBy(columnExpr);
query.addGroupBy(columnExpr, alias);
}
}
// remove whitespace from query - in particular, the trailing newline
Expand Down

0 comments on commit 65ab468

Please sign in to comment.