MONDRIAN: Re-organize documents,

fix Javadoc 1.4 warnings. [git-p4: depot-paths = "//open/mondrian/": change = 166]
pentaho · Sep 25, 2002 · 2a17dee · 2a17dee
1 parent 86dc14d
commit 2a17dee
Show file tree

Hide file tree

Showing 32 changed files with 1,916 additions and 1,506 deletions.
diff --git a/build.xml b/build.xml
@@ -77,6 +77,8 @@ ${lib.dir}/xml-apis.jar"/>
     <!-- Weblogic must be after xml-apis.jar and xercesImpl.jar, because it
          contains an incompatible version of xerces. -->
     <pathelement location="${weblogic.home}/lib/weblogic.jar"/>
+    <pathelement location="${ant.home}/lib/ant.jar"/>
+    <pathelement location="${ant.home}/lib/optional.jar"/>
   </path>
 
   <path id="project.boot.classpath">
@@ -455,10 +457,10 @@ mondrian/resource/**/*.class"/>
         classpathref="project.classpath"
         destdir="${javadoc.dir}"
         packagenames="mondrian.*"
-        overview="${doc.dir}/overview.html"
+        overview="${java.dir}/overview.html"
         footer="&lt;a href=&quot;http://sourceforge.net/projects/mondrian&quot;&gt;&lt;img src=&quot;http://sourceforge.net/sflogo.php?group_id=35302&#38;type=1&quot; width=&quot;88&quot; height=&quot;31&quot; border=&quot;0&quot; alt=&quot;SourceForge.net_Logo&quot;&gt;&lt;/a&gt;"
         author="true">
-        <link href="http://www.javasoft.com/j2se/1.3/docs/api/"/>
+        <link href="http://java.sun.com/j2se/1.4/docs/api/"/>
         <link href="http://www.junit.org/junit/javadoc/3.7/"/>
         <link href="http://java.sun.com/products/servlet/2.2/javadoc/"/>
     </javadoc>

diff --git a/doc/architecture.html b/doc/architecture.html
@@ -0,0 +1,223 @@
+<html>
+<!--
+  == $Id$
+  == This software is subject to the terms of the Common Public License
+  == Agreement, available at the following URL:
+  == http://www.opensource.org/licenses/cpl.html.
+  == (C) Copyright 2001-2002 Kana Software, Inc. and others.
+  == All Rights Reserved.
+  == You must accept the terms of that agreement to use this software.
+  == jhyde, 24 September, 2002
+  -->
+
+<head>
+<meta http-equiv="Content-Language" content="en-us">
+<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
+<meta name="ProgId" content="FrontPage.Editor.Document">
+<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
+<title>Mondrian architecture</title>
+<link rel="stylesheet" href="style.css" type="text/css" />
+
+
+</head>
+
+<!-- 
+
+This sentence is here to fool javadoc (which is looking for a period, and otherwise finds one
+inside one of our header tables).
+
+ -->
+
+<body>
+<h1><font size="7">Mondrian</font></h1>
+<hr>
+
+<table cellSpacing="0" cellPadding="0" border="0">
+<tr>
+<td valign="top">
+<p dir="ltr">
+<a href="index.html">Home</a><br>
+<a href="install.html">Download</a><br>
+Overview<br>
+<a href="olap.html">&nbsp;&nbsp;&nbsp;What&nbsp;is&nbsp;OLAP?</a><br>
+<a href="architecture.html">&nbsp;&nbsp;&nbsp;Architecture</a><br>
+<a href="architecture.html">&nbsp;&nbsp;&nbsp;</a><a href="faq.html">FAQ</a><br>
+Design<br>
+<a href="components.html">&nbsp;&nbsp;&nbsp;Components</a><br>
+<a href="components.html">&nbsp;&nbsp;&nbsp;</a><a href="api/index.html">API</a><br>
+<a href="links.html">Links</a><br>
+<a href="components.html">&nbsp;&nbsp;&nbsp;</a><a href="piet.html">The artist</a><br>
+<a href="components.html">&nbsp;&nbsp;&nbsp;</a><a href="http://sourceforge.net/projects/mondrian/">Project</a><br>
+&nbsp;&nbsp; <a href="help.html">Help</a><br>
+&nbsp;</td>
+<td width="5"><img height="1" src="http://graphics7.nytimes.com/images/misc/spacer.gif" width="5" border="0"></td>
+<td width="1" bgcolor="#999999"><img height="1" src="http://graphics7.nytimes.com/images/misc/spacer.gif" width="1" border="0"></td>
+<td width="5"><img height="1" src="http://graphics7.nytimes.com/images/misc/spacer.gif" width="5" border="0"></td>
+<td valign="top">
+
+<h1>Architecture</h1>
+<p>A Mondrian OLAP System consists of four layers; working from the eyes of the 
+end-user to the bowels of the data center, these are the presentation layer, the 
+calculation layer, the aggregation layer, and the storage layer.</p>
+<p>The <dfn><font face="Verdana">presentation layer</font></dfn> determines what 
+the end-user sees on his or her monitor, and how he or she can interact to ask 
+new questions. There are many ways to present multidimensional datasets, 
+including pivot tables (an interactive version of the table shown above), pie, 
+line and bar charts, and advanced visualization tools such as clickable maps and 
+dynamic graphics. These might be written in Swing or JSP, charts rendered in 
+JPEG or GIF format, or transmitted to a remote application via XML. What all of 
+these forms of presentation have in common is the multidimensional 'grammar' of 
+dimensions, measures and cells in which the presentation layer asks the question 
+is asked, and OLAP server returns the answer.</p>
+<p>The second layer is the <dfn><font face="Verdana">calculation layer</font></dfn>. 
+The calculation layer parses, validates and executes MDX queries. A query is 
+evaluted in multiple phases. The axes are computed first, then the values of the 
+cells within the axes. For efficiency, the calculation layer sends cell-requests 
+to the aggregation layer in batches. A <dfn>
+<font face="Verdana">query transformer</font></dfn> allows the application to 
+manipulate existing queries, rather than building an MDX statement from scratch 
+for each request. And <dfn>
+<font face="Verdana">metadata</font></dfn> describes the the dimensional model, 
+and how it maps onto the relational model.</p>
+<p>The third layer is the <dfn><font face="Verdana">aggregation layer</font></dfn>. 
+An aggregation is a set of measure values ('cells') in memory, qualified by a 
+set of dimension column values. The calculation layer sends requests for sets of 
+cells. If the requested cells are not in the cache, or derivable by rolling up 
+an aggregation in the cache, the aggregation manager and sends a request to the 
+storage layer.</p>
+<p>The <dfn><font face="Verdana">storage layer</font></dfn> is an RDBMS. It is 
+responsible for providing aggregated cell data, and members from dimension 
+tables. I describe <a href="#Storage_and_aggregation_strategies">below</a> why I 
+decided to use the features of the RDBMS rather than developing a storage system 
+optimized for multidimensional data.</p>
+<p>All four of these components can exist on the same machine. Layers 2 and 3, 
+which comprise the Mondrian server, must be on the same machine. The storage 
+layer could be on another machine, accessed via remote JDBC connection. In a 
+multi-user system, the presentation layer would exist on each end-user's machine 
+(except in the case of JSP pages generated on the server).</p>
+<h2><a name="Storage_and_aggregation_strategies">Storage and aggregation 
+strategies</a></h2>
+<p>OLAP Servers are generally categorized according to how they store their 
+data:</p>
+<ul>
+  <li>A <font face="Verdana"><dfn>MOLAP (multidimensional OLAP)</dfn></font> 
+  server stores all of its data on disk in structures optimized for 
+  multidimensional access. Typically, data is stored in dense arrays, requiring 
+  only 4 or 8 bytes per cell value.</li>
+  <li>A <font face="Verdana"><dfn>ROLAP (relational OLAP)</dfn></font> server 
+  stores its data in a relational database. Each row in a fact table has a 
+  column for each dimension and measure.</li>
+</ul>
+<p>Three kinds of data need to be stored: fact table data (the transactional 
+records), aggregates, and dimensions.</p>
+<p>MOLAP databases store fact data in multidimensional format, but if there are 
+more than a few dimensions, this data will be sparse, and the multidimensional 
+format does not perform well. A <font face="Verdana"><dfn>HOLAP (hybrid OLAP)</dfn></font> 
+system solves this problem by leaving the most granular data in the relational 
+database, but stores aggregates in multidimensional format.</p>
+<p>Pre-computed aggregates are necessary for large data sets, otherwise certain 
+queries could not be answered without reading the entire contents of the fact 
+table. MOLAP aggregates are often an image of the in-memory data structure, 
+broken up into pages and stored on disk. ROLAP aggregates are stored in tables. 
+In some ROLAP systems these are explicitly managed by the OLAP server; in other 
+systems, the tables are declared as materialized views, and they are implicitly 
+used when the OLAP server issues a query with the right combination of columns 
+in the <code>group by</code> clause.</p>
+<p>The final component of the aggregation strategy is the cache. The cache holds 
+pre-computed aggregations in memory so subsequent queries can access cell values 
+without going to disk. If the cache holds the required data set at a lower level 
+of aggregation, it can compute the required data set by rolling up.</p>
+<p>The cache is arguably the most important part of the aggregation strategy 
+because it is <em><font face="Verdana">adaptive</font></em>. It is difficult to 
+choose a set of aggregations to pre-compute which speed up the system without 
+using huge amounts of disk, particularly those with a high dimensionality or if 
+the users are submitting unpredictable queries. And in a system where data is 
+changing in real-time, it is impractical to maintain pre-computed aggregates. A 
+reasonably sized cache can allow a system to perform adequately in the face of 
+unpredictable queries, with few or no pre-computed aggregates.</p>
+<p>Mondrian's aggregation strategy is as follows:</p>
+<ul>
+  <li>Fact data is stored in the RDBMS. Why develop a storage manager when the 
+  RDBMS already has one?</li>
+  <li>Read aggregate data into the cache by submitting <code>group by</code> 
+  queries. Again, why develop an aggregator when the RDBMS has one?</li>
+  <li><em><font face="Verdana">If</font></em> the RDBMS supports materialized 
+  views, <em><font face="Verdana">and </font></em>the database administrator 
+  chooses to create materialized views for particular aggregations, then 
+  Mondrian will use them implicitly. Ideally, Mondrian's aggregation manager 
+  should be aware that these materialized views exist and that those particular 
+  aggregations are cheap to compute. If should even offer tuning suggestings to 
+  the database administrator.</li>
+</ul>
+<p>The general idea is to delegate unto the database what is the database's. 
+This places additional burden on the database, but once those features are added 
+to the database, all clients of the database will benefit from them. 
+Multidimensional storage would reduce I/O and result in faster operation in some 
+circumstances, but I don't think it warrants the complexity at this stage.</p>
+<p>A wonderful side-effect is that because Mondrian requires no storage of its 
+own, it can be installed by adding a JAR file to the class path and be up and 
+running immediately. Because there are no redundant data sets to manage, the 
+data-loading process is easier, and Mondrian is ideally suited to do OLAP on 
+data sets which change in real time.</p>
+<p><i>Note to self</i>: The cache manager ought to distinguish between data which is being 
+pulled into the cache to be rolled up immediately into some other aggregation, 
+and an aggregation which is explicitly needed.</p>
+<h2>Components</h2>
+<h3>Query transformer</h3>
+<p>See {@link mondrian.olap.Parser}.</p>
+<h3>Metadata</h3>
+<p>It is represented as an XML file. The metadata is loaded into memory the 
+first time you reference a dimensional model. You can modify the model at 
+runtime by creating instances of classes such as <code>{@link 
+mondrian.rolap.RolapHierarchy}</code>.</p>
+<h3>Calculation layer</h3>
+<p><i>todo</i>: See {@link mondrian.olap.Query} and {@link mondrian.olap.Result}.</p>
+<p><i>todo</i>: The <code>package {@link mondrian.rolap}</code>. is the one and 
+only implementation of the API. The DriverManager (<code>class {@link 
+mondrian.olap.DriverManager}</code>) acts as class-factory.</p>
+<p><i>todo</i>: How members are calculated...</p>
+<p><i>todo</i>: How aggregations are batched...</p>
+<p><i>todo</i>: MDX functions. See <a href="#User_defined_functions">user-defined functions</a>.</p>
+<h3>Aggregation manager</h3>
+<p>Aggregations are based upon the relational model: as far as the aggregation 
+manager is concerned, there is no relationship between the columns <code>city</code> 
+and <code>state</code>. This means that all roll-ups are the same: you just drop 
+a column. Consider the 3 roll-ups possible by dropping a column from the 
+aggregation {<code>gender</code>, <code>city</code>, <code>state</code>}: 
+dropping <code>gender</code> is equivalent to removing the <code>[Gender]</code> 
+dimension; dropping <code>city</code> is equivalent to rolling up to a higher 
+level in the <code>[Geography]</code> hierarchy; and dropping <code>state</code> 
+is not even allowed in the dimensional model (no, sorry, you can't ask about 
+products sold in a cities called 'Portland'). This approach will also allow us 
+to implement 'drill anywhere'.</p>
+<p>An aggregation is defined by a search condition, for example, <code>{state in 
+('CA', 'OR', 'WA'), city = <i>any</i>, gender = 'M', measure = 'Unit sales'}</code>. 
+The <i><code>any</code></i> value is important; if we had asked for a specific 
+set of cities, we would not later be able to roll-up by dropping the <code>city</code> 
+column.</p>
+<p>The caching strategy is to throw out the aggregation with the lowest 
+cost/benefit ratio. The 'benefit' of an item is the effort it took to produce 
+(effort which it is saving future queries) multiplied by its 'usefulness' which 
+declines exponentially if it is not used over time. The 'cost' of an item is its 
+size.</p>
+</td>
+</tr>
+</table>
+
+<hr>
+
+<table border="0" class="clsStd" width="100%" style="border-collapse: collapse" bordercolor="#111111" cellpadding="0" cellspacing="0">
+  <tr>
+    <td>
+      <a href="http://apoptosis.dyndns.org:8080/open/mondrian/doc/components.html">$Id$
+      </a>(<a href="http://apoptosis.dyndns.org:8080/open/mondrian/doc/components.html?ac=22">log</a>)</td>
+    <td align="right">
+      <a href="http://sourceforge.net">
+        <img src="http://sourceforge.net/sflogo.php?group_id=35302&type=1" width="88" height="31" border="0" alt="SourceForge.net Logo">
+      </a>
+    </td>
+  </tr>
+</table>
+
+</body>
+</html>
diff --git a/doc/components.html b/doc/components.html
@@ -0,0 +1,120 @@
+<html>
+<!--
+  == $Id$
+  == This software is subject to the terms of the Common Public License
+  == Agreement, available at the following URL:
+  == http://www.opensource.org/licenses/cpl.html.
+  == (C) Copyright 2001-2002 Kana Software, Inc. and others.
+  == All Rights Reserved.
+  == You must accept the terms of that agreement to use this software.
+  == jhyde, 24 September, 2002
+  -->
+
+<head>
+<meta http-equiv="Content-Language" content="en-us">
+<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
+<meta name="ProgId" content="FrontPage.Editor.Document">
+<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
+<title>Mondrian home page</title>
+<link rel="stylesheet" href="style.css" type="text/css" />
+
+
+</head>
+
+<!-- 
+
+This sentence is here to fool javadoc (which is looking for a period, and otherwise finds one
+inside one of our header tables).
+
+ -->
+
+<body>
+<h1><font size="7">Mondrian</font></h1>
+<hr>
+
+<table cellSpacing="0" cellPadding="0" border="0">
+<tr>
+<!-- Start links -->
+<td valign="top">
+<p dir="ltr">
+<a href="index.html">Home</a><br>
+<a href="install.html">Download</a><br>
+Overview<br>
+&nbsp;&nbsp;&nbsp;<a href="olap.html">What&nbsp;is&nbsp;OLAP?</a><br>
+&nbsp;&nbsp;&nbsp;<a href="architecture.html">Architecture</a><br>
+&nbsp;&nbsp;&nbsp;<a href="faq.html">FAQ</a><br>
+Design<br>
+&nbsp;&nbsp;&nbsp;<a href="components.html">Components</a><br>
+&nbsp;&nbsp;&nbsp;<a href="api/index.html">API</a><br>
+<a href="links.html">Links</a><br>
+&nbsp;&nbsp;&nbsp;<a href="people.html">People</a><br>
+&nbsp;&nbsp;&nbsp;<a href="http://sourceforge.net/projects/mondrian/">Project</a><br>
+&nbsp;&nbsp;&nbsp;<a href="help.html">Help</a><br>
+</td>
+<!-- End links -->
+<td width="5"><img height="1" src="spacer.gif" width="5" border="0"></td>
+<td width="1" bgcolor="#999999"><img height="1" src="spacer.gif" width="1" border="0"></td>
+<td width="5"><img height="1" src="spacer.gif" width="5" border="0"></td>
+<td valign="top">
+
+<h2>Introduction</h2>
+<p>See <a href="olap.html">architecture</a>.</p>
+<h2>Components</h2>
+<h3>Query transformer</h3>
+<p>See {@link mondrian.olap.Parser}.</p>
+<h3>Metadata</h3>
+<p>It is represented as an XML file. The metadata is loaded into memory the 
+first time you reference a dimensional model. You can modify the model at 
+runtime by creating instances of classes such as <code>{@link 
+mondrian.rolap.RolapHierarchy}</code>.</p>
+<h3>Calculation layer</h3>
+<p><i>todo</i>: See {@link mondrian.olap.Query} and {@link mondrian.olap.Result}.</p>
+<p><i>todo</i>: The <code>package {@link mondrian.rolap}</code>. is the one and 
+only implementation of the API. The DriverManager (<code>class {@link 
+mondrian.olap.DriverManager}</code>) acts as class-factory.</p>
+<p><i>todo</i>: How members are calculated...</p>
+<p><i>todo</i>: How aggregations are batched...</p>
+<p><i>todo</i>: MDX functions. See <a href="#User_defined_functions">user-defined functions</a>.</p>
+<h3>Aggregation manager</h3>
+<p>Aggregations are based upon the relational model: as far as the aggregation 
+manager is concerned, there is no relationship between the columns <code>city</code> 
+and <code>state</code>. This means that all roll-ups are the same: you just drop 
+a column. Consider the 3 roll-ups possible by dropping a column from the 
+aggregation {<code>gender</code>, <code>city</code>, <code>state</code>}: 
+dropping <code>gender</code> is equivalent to removing the <code>[Gender]</code> 
+dimension; dropping <code>city</code> is equivalent to rolling up to a higher 
+level in the <code>[Geography]</code> hierarchy; and dropping <code>state</code> 
+is not even allowed in the dimensional model (no, sorry, you can't ask about 
+products sold in a cities called 'Portland'). This approach will also allow us 
+to implement 'drill anywhere'.</p>
+<p>An aggregation is defined by a search condition, for example, <code>{state in 
+('CA', 'OR', 'WA'), city = <i>any</i>, gender = 'M', measure = 'Unit sales'}</code>. 
+The <i><code>any</code></i> value is important; if we had asked for a specific 
+set of cities, we would not later be able to roll-up by dropping the <code>city</code> 
+column.</p>
+<p>The caching strategy is to throw out the aggregation with the lowest 
+cost/benefit ratio. The 'benefit' of an item is the effort it took to produce 
+(effort which it is saving future queries) multiplied by its 'usefulness' which 
+declines exponentially if it is not used over time. The 'cost' of an item is its 
+size.</p>
+</td>
+</tr>
+</table>
+
+<hr>
+
+<table border="0" class="clsStd" width="100%" style="border-collapse: collapse" bordercolor="#111111" cellpadding="0" cellspacing="0">
+  <tr>
+    <td>
+      <a href="http://apoptosis.dyndns.org:8080/open/mondrian/doc/components.html">$Id$
+      </a>(<a href="http://apoptosis.dyndns.org:8080/open/mondrian/doc/components.html?ac=22">log</a>)</td>
+    <td align="right">
+      <a href="http://sourceforge.net">
+        <img src="http://sourceforge.net/sflogo.php?group_id=35302&type=1" width="88" height="31" border="0" alt="SourceForge.net Logo">
+      </a>
+    </td>
+  </tr>
+</table>
+
+</body>
+</html>