Skip to content

Commit

Permalink
MONDRIAN: Re-organize documents,
Browse files Browse the repository at this point in the history
fix Javadoc 1.4 warnings.

[git-p4: depot-paths = "//open/mondrian/": change = 166]
  • Loading branch information
julianhyde committed Sep 25, 2002
1 parent 86dc14d commit 2a17dee
Show file tree
Hide file tree
Showing 32 changed files with 1,916 additions and 1,506 deletions.
6 changes: 4 additions & 2 deletions build.xml
Expand Up @@ -77,6 +77,8 @@ ${lib.dir}/xml-apis.jar"/>
<!-- Weblogic must be after xml-apis.jar and xercesImpl.jar, because it
contains an incompatible version of xerces. -->
<pathelement location="${weblogic.home}/lib/weblogic.jar"/>
<pathelement location="${ant.home}/lib/ant.jar"/>
<pathelement location="${ant.home}/lib/optional.jar"/>
</path>

<path id="project.boot.classpath">
Expand Down Expand Up @@ -455,10 +457,10 @@ mondrian/resource/**/*.class"/>
classpathref="project.classpath"
destdir="${javadoc.dir}"
packagenames="mondrian.*"
overview="${doc.dir}/overview.html"
overview="${java.dir}/overview.html"
footer="&lt;a href=&quot;http://sourceforge.net/projects/mondrian&quot;&gt;&lt;img src=&quot;http://sourceforge.net/sflogo.php?group_id=35302&#38;type=1&quot; width=&quot;88&quot; height=&quot;31&quot; border=&quot;0&quot; alt=&quot;SourceForge.net_Logo&quot;&gt;&lt;/a&gt;"
author="true">
<link href="http://www.javasoft.com/j2se/1.3/docs/api/"/>
<link href="http://java.sun.com/j2se/1.4/docs/api/"/>
<link href="http://www.junit.org/junit/javadoc/3.7/"/>
<link href="http://java.sun.com/products/servlet/2.2/javadoc/"/>
</javadoc>
Expand Down
223 changes: 223 additions & 0 deletions doc/architecture.html
@@ -0,0 +1,223 @@
<html>
<!--
== $Id$
== This software is subject to the terms of the Common Public License
== Agreement, available at the following URL:
== http://www.opensource.org/licenses/cpl.html.
== (C) Copyright 2001-2002 Kana Software, Inc. and others.
== All Rights Reserved.
== You must accept the terms of that agreement to use this software.
== jhyde, 24 September, 2002
-->

<head>
<meta http-equiv="Content-Language" content="en-us">
<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>Mondrian architecture</title>
<link rel="stylesheet" href="style.css" type="text/css" />


</head>

<!--
This sentence is here to fool javadoc (which is looking for a period, and otherwise finds one
inside one of our header tables).
-->

<body>
<h1><font size="7">Mondrian</font></h1>
<hr>

<table cellSpacing="0" cellPadding="0" border="0">
<tr>
<td valign="top">
<p dir="ltr">
<a href="index.html">Home</a><br>
<a href="install.html">Download</a><br>
Overview<br>
<a href="olap.html">&nbsp;&nbsp;&nbsp;What&nbsp;is&nbsp;OLAP?</a><br>
<a href="architecture.html">&nbsp;&nbsp;&nbsp;Architecture</a><br>
<a href="architecture.html">&nbsp;&nbsp;&nbsp;</a><a href="faq.html">FAQ</a><br>
Design<br>
<a href="components.html">&nbsp;&nbsp;&nbsp;Components</a><br>
<a href="components.html">&nbsp;&nbsp;&nbsp;</a><a href="api/index.html">API</a><br>
<a href="links.html">Links</a><br>
<a href="components.html">&nbsp;&nbsp;&nbsp;</a><a href="piet.html">The artist</a><br>
<a href="components.html">&nbsp;&nbsp;&nbsp;</a><a href="http://sourceforge.net/projects/mondrian/">Project</a><br>
&nbsp;&nbsp; <a href="help.html">Help</a><br>
&nbsp;</td>
<td width="5"><img height="1" src="http://graphics7.nytimes.com/images/misc/spacer.gif" width="5" border="0"></td>
<td width="1" bgcolor="#999999"><img height="1" src="http://graphics7.nytimes.com/images/misc/spacer.gif" width="1" border="0"></td>
<td width="5"><img height="1" src="http://graphics7.nytimes.com/images/misc/spacer.gif" width="5" border="0"></td>
<td valign="top">

<h1>Architecture</h1>
<p>A Mondrian OLAP System consists of four layers; working from the eyes of the
end-user to the bowels of the data center, these are the presentation layer, the
calculation layer, the aggregation layer, and the storage layer.</p>
<p>The <dfn><font face="Verdana">presentation layer</font></dfn> determines what
the end-user sees on his or her monitor, and how he or she can interact to ask
new questions. There are many ways to present multidimensional datasets,
including pivot tables (an interactive version of the table shown above), pie,
line and bar charts, and advanced visualization tools such as clickable maps and
dynamic graphics. These might be written in Swing or JSP, charts rendered in
JPEG or GIF format, or transmitted to a remote application via XML. What all of
these forms of presentation have in common is the multidimensional 'grammar' of
dimensions, measures and cells in which the presentation layer asks the question
is asked, and OLAP server returns the answer.</p>
<p>The second layer is the <dfn><font face="Verdana">calculation layer</font></dfn>.
The calculation layer parses, validates and executes MDX queries. A query is
evaluted in multiple phases. The axes are computed first, then the values of the
cells within the axes. For efficiency, the calculation layer sends cell-requests
to the aggregation layer in batches. A <dfn>
<font face="Verdana">query transformer</font></dfn> allows the application to
manipulate existing queries, rather than building an MDX statement from scratch
for each request. And <dfn>
<font face="Verdana">metadata</font></dfn> describes the the dimensional model,
and how it maps onto the relational model.</p>
<p>The third layer is the <dfn><font face="Verdana">aggregation layer</font></dfn>.
An aggregation is a set of measure values ('cells') in memory, qualified by a
set of dimension column values. The calculation layer sends requests for sets of
cells. If the requested cells are not in the cache, or derivable by rolling up
an aggregation in the cache, the aggregation manager and sends a request to the
storage layer.</p>
<p>The <dfn><font face="Verdana">storage layer</font></dfn> is an RDBMS. It is
responsible for providing aggregated cell data, and members from dimension
tables. I describe <a href="#Storage_and_aggregation_strategies">below</a> why I
decided to use the features of the RDBMS rather than developing a storage system
optimized for multidimensional data.</p>
<p>All four of these components can exist on the same machine. Layers 2 and 3,
which comprise the Mondrian server, must be on the same machine. The storage
layer could be on another machine, accessed via remote JDBC connection. In a
multi-user system, the presentation layer would exist on each end-user's machine
(except in the case of JSP pages generated on the server).</p>
<h2><a name="Storage_and_aggregation_strategies">Storage and aggregation
strategies</a></h2>
<p>OLAP Servers are generally categorized according to how they store their
data:</p>
<ul>
<li>A <font face="Verdana"><dfn>MOLAP (multidimensional OLAP)</dfn></font>
server stores all of its data on disk in structures optimized for
multidimensional access. Typically, data is stored in dense arrays, requiring
only 4 or 8 bytes per cell value.</li>
<li>A <font face="Verdana"><dfn>ROLAP (relational OLAP)</dfn></font> server
stores its data in a relational database. Each row in a fact table has a
column for each dimension and measure.</li>
</ul>
<p>Three kinds of data need to be stored: fact table data (the transactional
records), aggregates, and dimensions.</p>
<p>MOLAP databases store fact data in multidimensional format, but if there are
more than a few dimensions, this data will be sparse, and the multidimensional
format does not perform well. A <font face="Verdana"><dfn>HOLAP (hybrid OLAP)</dfn></font>
system solves this problem by leaving the most granular data in the relational
database, but stores aggregates in multidimensional format.</p>
<p>Pre-computed aggregates are necessary for large data sets, otherwise certain
queries could not be answered without reading the entire contents of the fact
table. MOLAP aggregates are often an image of the in-memory data structure,
broken up into pages and stored on disk. ROLAP aggregates are stored in tables.
In some ROLAP systems these are explicitly managed by the OLAP server; in other
systems, the tables are declared as materialized views, and they are implicitly
used when the OLAP server issues a query with the right combination of columns
in the <code>group by</code> clause.</p>
<p>The final component of the aggregation strategy is the cache. The cache holds
pre-computed aggregations in memory so subsequent queries can access cell values
without going to disk. If the cache holds the required data set at a lower level
of aggregation, it can compute the required data set by rolling up.</p>
<p>The cache is arguably the most important part of the aggregation strategy
because it is <em><font face="Verdana">adaptive</font></em>. It is difficult to
choose a set of aggregations to pre-compute which speed up the system without
using huge amounts of disk, particularly those with a high dimensionality or if
the users are submitting unpredictable queries. And in a system where data is
changing in real-time, it is impractical to maintain pre-computed aggregates. A
reasonably sized cache can allow a system to perform adequately in the face of
unpredictable queries, with few or no pre-computed aggregates.</p>
<p>Mondrian's aggregation strategy is as follows:</p>
<ul>
<li>Fact data is stored in the RDBMS. Why develop a storage manager when the
RDBMS already has one?</li>
<li>Read aggregate data into the cache by submitting <code>group by</code>
queries. Again, why develop an aggregator when the RDBMS has one?</li>
<li><em><font face="Verdana">If</font></em> the RDBMS supports materialized
views, <em><font face="Verdana">and </font></em>the database administrator
chooses to create materialized views for particular aggregations, then
Mondrian will use them implicitly. Ideally, Mondrian's aggregation manager
should be aware that these materialized views exist and that those particular
aggregations are cheap to compute. If should even offer tuning suggestings to
the database administrator.</li>
</ul>
<p>The general idea is to delegate unto the database what is the database's.
This places additional burden on the database, but once those features are added
to the database, all clients of the database will benefit from them.
Multidimensional storage would reduce I/O and result in faster operation in some
circumstances, but I don't think it warrants the complexity at this stage.</p>
<p>A wonderful side-effect is that because Mondrian requires no storage of its
own, it can be installed by adding a JAR file to the class path and be up and
running immediately. Because there are no redundant data sets to manage, the
data-loading process is easier, and Mondrian is ideally suited to do OLAP on
data sets which change in real time.</p>
<p><i>Note to self</i>: The cache manager ought to distinguish between data which is being
pulled into the cache to be rolled up immediately into some other aggregation,
and an aggregation which is explicitly needed.</p>
<h2>Components</h2>
<h3>Query transformer</h3>
<p>See {@link mondrian.olap.Parser}.</p>
<h3>Metadata</h3>
<p>It is represented as an XML file. The metadata is loaded into memory the
first time you reference a dimensional model. You can modify the model at
runtime by creating instances of classes such as <code>{@link
mondrian.rolap.RolapHierarchy}</code>.</p>
<h3>Calculation layer</h3>
<p><i>todo</i>: See {@link mondrian.olap.Query} and {@link mondrian.olap.Result}.</p>
<p><i>todo</i>: The <code>package {@link mondrian.rolap}</code>. is the one and
only implementation of the API. The DriverManager (<code>class {@link
mondrian.olap.DriverManager}</code>) acts as class-factory.</p>
<p><i>todo</i>: How members are calculated...</p>
<p><i>todo</i>: How aggregations are batched...</p>
<p><i>todo</i>: MDX functions. See <a href="#User_defined_functions">user-defined functions</a>.</p>
<h3>Aggregation manager</h3>
<p>Aggregations are based upon the relational model: as far as the aggregation
manager is concerned, there is no relationship between the columns <code>city</code>
and <code>state</code>. This means that all roll-ups are the same: you just drop
a column. Consider the 3 roll-ups possible by dropping a column from the
aggregation {<code>gender</code>, <code>city</code>, <code>state</code>}:
dropping <code>gender</code> is equivalent to removing the <code>[Gender]</code>
dimension; dropping <code>city</code> is equivalent to rolling up to a higher
level in the <code>[Geography]</code> hierarchy; and dropping <code>state</code>
is not even allowed in the dimensional model (no, sorry, you can't ask about
products sold in a cities called 'Portland'). This approach will also allow us
to implement 'drill anywhere'.</p>
<p>An aggregation is defined by a search condition, for example, <code>{state in
('CA', 'OR', 'WA'), city = <i>any</i>, gender = 'M', measure = 'Unit sales'}</code>.
The <i><code>any</code></i> value is important; if we had asked for a specific
set of cities, we would not later be able to roll-up by dropping the <code>city</code>
column.</p>
<p>The caching strategy is to throw out the aggregation with the lowest
cost/benefit ratio. The 'benefit' of an item is the effort it took to produce
(effort which it is saving future queries) multiplied by its 'usefulness' which
declines exponentially if it is not used over time. The 'cost' of an item is its
size.</p>
</td>
</tr>
</table>

<hr>

<table border="0" class="clsStd" width="100%" style="border-collapse: collapse" bordercolor="#111111" cellpadding="0" cellspacing="0">
<tr>
<td>
<a href="http://apoptosis.dyndns.org:8080/open/mondrian/doc/components.html">$Id$
</a>(<a href="http://apoptosis.dyndns.org:8080/open/mondrian/doc/components.html?ac=22">log</a>)</td>
<td align="right">
<a href="http://sourceforge.net">
<img src="http://sourceforge.net/sflogo.php?group_id=35302&type=1" width="88" height="31" border="0" alt="SourceForge.net Logo">
</a>
</td>
</tr>
</table>

</body>
</html>
120 changes: 120 additions & 0 deletions doc/components.html
@@ -0,0 +1,120 @@
<html>
<!--
== $Id$
== This software is subject to the terms of the Common Public License
== Agreement, available at the following URL:
== http://www.opensource.org/licenses/cpl.html.
== (C) Copyright 2001-2002 Kana Software, Inc. and others.
== All Rights Reserved.
== You must accept the terms of that agreement to use this software.
== jhyde, 24 September, 2002
-->

<head>
<meta http-equiv="Content-Language" content="en-us">
<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<title>Mondrian home page</title>
<link rel="stylesheet" href="style.css" type="text/css" />


</head>

<!--
This sentence is here to fool javadoc (which is looking for a period, and otherwise finds one
inside one of our header tables).
-->

<body>
<h1><font size="7">Mondrian</font></h1>
<hr>

<table cellSpacing="0" cellPadding="0" border="0">
<tr>
<!-- Start links -->
<td valign="top">
<p dir="ltr">
<a href="index.html">Home</a><br>
<a href="install.html">Download</a><br>
Overview<br>
&nbsp;&nbsp;&nbsp;<a href="olap.html">What&nbsp;is&nbsp;OLAP?</a><br>
&nbsp;&nbsp;&nbsp;<a href="architecture.html">Architecture</a><br>
&nbsp;&nbsp;&nbsp;<a href="faq.html">FAQ</a><br>
Design<br>
&nbsp;&nbsp;&nbsp;<a href="components.html">Components</a><br>
&nbsp;&nbsp;&nbsp;<a href="api/index.html">API</a><br>
<a href="links.html">Links</a><br>
&nbsp;&nbsp;&nbsp;<a href="people.html">People</a><br>
&nbsp;&nbsp;&nbsp;<a href="http://sourceforge.net/projects/mondrian/">Project</a><br>
&nbsp;&nbsp;&nbsp;<a href="help.html">Help</a><br>
</td>
<!-- End links -->
<td width="5"><img height="1" src="spacer.gif" width="5" border="0"></td>
<td width="1" bgcolor="#999999"><img height="1" src="spacer.gif" width="1" border="0"></td>
<td width="5"><img height="1" src="spacer.gif" width="5" border="0"></td>
<td valign="top">

<h2>Introduction</h2>
<p>See <a href="olap.html">architecture</a>.</p>
<h2>Components</h2>
<h3>Query transformer</h3>
<p>See {@link mondrian.olap.Parser}.</p>
<h3>Metadata</h3>
<p>It is represented as an XML file. The metadata is loaded into memory the
first time you reference a dimensional model. You can modify the model at
runtime by creating instances of classes such as <code>{@link
mondrian.rolap.RolapHierarchy}</code>.</p>
<h3>Calculation layer</h3>
<p><i>todo</i>: See {@link mondrian.olap.Query} and {@link mondrian.olap.Result}.</p>
<p><i>todo</i>: The <code>package {@link mondrian.rolap}</code>. is the one and
only implementation of the API. The DriverManager (<code>class {@link
mondrian.olap.DriverManager}</code>) acts as class-factory.</p>
<p><i>todo</i>: How members are calculated...</p>
<p><i>todo</i>: How aggregations are batched...</p>
<p><i>todo</i>: MDX functions. See <a href="#User_defined_functions">user-defined functions</a>.</p>
<h3>Aggregation manager</h3>
<p>Aggregations are based upon the relational model: as far as the aggregation
manager is concerned, there is no relationship between the columns <code>city</code>
and <code>state</code>. This means that all roll-ups are the same: you just drop
a column. Consider the 3 roll-ups possible by dropping a column from the
aggregation {<code>gender</code>, <code>city</code>, <code>state</code>}:
dropping <code>gender</code> is equivalent to removing the <code>[Gender]</code>
dimension; dropping <code>city</code> is equivalent to rolling up to a higher
level in the <code>[Geography]</code> hierarchy; and dropping <code>state</code>
is not even allowed in the dimensional model (no, sorry, you can't ask about
products sold in a cities called 'Portland'). This approach will also allow us
to implement 'drill anywhere'.</p>
<p>An aggregation is defined by a search condition, for example, <code>{state in
('CA', 'OR', 'WA'), city = <i>any</i>, gender = 'M', measure = 'Unit sales'}</code>.
The <i><code>any</code></i> value is important; if we had asked for a specific
set of cities, we would not later be able to roll-up by dropping the <code>city</code>
column.</p>
<p>The caching strategy is to throw out the aggregation with the lowest
cost/benefit ratio. The 'benefit' of an item is the effort it took to produce
(effort which it is saving future queries) multiplied by its 'usefulness' which
declines exponentially if it is not used over time. The 'cost' of an item is its
size.</p>
</td>
</tr>
</table>

<hr>

<table border="0" class="clsStd" width="100%" style="border-collapse: collapse" bordercolor="#111111" cellpadding="0" cellspacing="0">
<tr>
<td>
<a href="http://apoptosis.dyndns.org:8080/open/mondrian/doc/components.html">$Id$
</a>(<a href="http://apoptosis.dyndns.org:8080/open/mondrian/doc/components.html?ac=22">log</a>)</td>
<td align="right">
<a href="http://sourceforge.net">
<img src="http://sourceforge.net/sflogo.php?group_id=35302&type=1" width="88" height="31" border="0" alt="SourceForge.net Logo">
</a>
</td>
</tr>
</table>

</body>
</html>

0 comments on commit 2a17dee

Please sign in to comment.