Skip to content

This project revolves around developing an optimization technique to improve the responsiveness of iBATIS cache by introducing the notion of a cache warming harness in iBATIS framework. iBATIS is a persistence framework which automates the mapping between SQL databases and objects in Java. It allows programmers to map the parameters and results …

balak1986/prime-cache-support-in-ibatis2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

See http://balak1986.github.com/prime-cache-support-in-ibatis2 for better look.

CHAPTER 1
Introduction

1.1 Overview
iBATIS is a persistence framework which automates the mapping between SQL databases and objects in Java. It allows programmers to map JavaBeans objects to PreparedStatement parameters and ResultSets. It uses XML descriptors to map JavaBeans to SQL statements. In iBATIS, Result Maps finish the job by mapping the result of a database query (a set of columns) to object properties. It is possible to load properties that represent composite objects by associating a result map property with a mapped statement that specifies how to load the appropriate data and class for complex property. The results from a query mapped statement can be cached by iBATIS cache model.
The problem with populating properties of fully-formed complex type is that whenever iBATIS load a complex type, two SQL statements are actually being run (one for composite object and one for its complex property).  This problem seems trivial when loading a single composite object, but if we were to run a query that loaded N composite objects, a separate query would be run for each composite object to load its complex property. This results in N+ 1 query in total: one for the list of composite object and one for each composite object returned to load each related complex property.
The result set might have N objects, but there might only be M (where M<N) unique complex property values in the results. Hence it fires N-M additional queries in the database. One way to mitigate the problem is to cache the mapped statement that fetches the complex property. Instead of running a SQL query for N times, the framework will return the complex property object from the cache i.e., query will hit the database only M times. Though iBATIS provides caching, the default Cache Model will not completely solve the N+1 problem as the cache is loaded slowly over the time on demand basis and each unique cache miss is costly. This dissertation intends to solve this problem.

1.2 Objectives
The goal here is to develop an optimization technique to improve the responsiveness of iBATIS cache by introducing the notion of a cache warming harness in iBATIS framework that will be used to solve N+1 query problem and will help bringing the application to steady-state right after start up and will ensure that the performance contracts are more deterministic.

1.3 Approach
Cache warming will fire the set of specified query identifiers (which usually are without parameter variables) and this cached data will be used for fetching complex property of composite objects.

Briefly, this dissertation work involves the following tasks:
1.	Understand iBATIS caching model.
2.	Analyze various caching techniques.
3.	Add a wrapper on top of iBATIS caching model to improve the current caching technique.
4.	Develop ‘cache warming’ feature that can be used to solve N+1 query problem.
5.	Test the new caching strategy with simulation and measure the performance.

CHAPTER 2
Technology Background

2.1 Persistence Framework
Persistence framework is responsible for interfacing with the driver.  It acts as a layer of abstraction between the application and the database, typically bridging any conceptual differences between the two.  It maps the objects in the application domain to data that needs to be persisted in a database.  The mappings can be defined using either XML files or metadata annotations.  It provides a simple API for storing and retrieving Java objects directly to and from the database.

2.2 iBATIS
iBATIS is a data mapper framework that provides a very simple framework for using XML descriptors to map JavaBeans, Map implementations, primitive wrapper types (String, Integer…) and even XML documents to an SQL statement. In application architecture, iBATIS fits in at the persistence layer. It uses an approach called SQL mapping to persist objects to a relational database.

Lifecycle of the iBATIS [4]
•	Provide an object as a parameter (a JavaBean, Map or primitive wrapper). The parameter object will be used to set input values in an update statement, or where clause values in a query, ...
•	Execute the mapped statement. The Data Mapper framework will create a PreparedStatement instance, set any parameters using the provided parameter object, execute the statement and build a result object from the ResultSet.
•	In the case of an update, the number of rows affected is returned. In the case of a query, a single object, or a collection of objects is returned. Like parameters, result objects can be a JavaBean, a Map, a primitive type wrapper or XML.

2.3 JavaBeans
iBATIS supports many types for parameter and result mappings.  We have a choice of JavaBeans, Maps (such as HashMap), XML, and of course primitive types.  JavaBeans provide the highest performance, the greatest amount of flexibility, and type safety.  JavaBeans are fast because they use simple, low-level method calls for property mappings.  The JavaBean specification is a set of rules for defining components for use with Java.  The only rules in the specification that apply to iBATIS are the ones that concern property naming.  Property names are defined in a JavaBean by a pair of methods that the specification refers to as accessor methods.  JavaBeans won’t degrade performance while adding more properties, and they are more memory efficient than the alternatives.  A more important consideration is that JavaBeans are type safe.  This type safety allows iBATIS to determine the appropriate type of value that should be returned from the database and binds it tightly. 

2.4 Mapped Statements
iBATIS does not directly tie classes to tables or fields to columns, but instead maps the parameters and results (i.e., the inputs and outputs) of a SQL statement to a class.  iBATIS is an additional layer of indirection between the classes and the tables, allowing it more flexibility in how classes and tables can be mapped, without requiring any changes to the data model or the object model.

Mapped statement [1] can be viewed as a set of inputs and outputs. The inputs are the parameters, typically found in the WHERE clause of the SQL statement.  The outputs are the columns found in the SELECT clause.  iBATIS maps the inputs and outputs of the statement using a simple XML descriptor file.

2.5 Cache Model
The iBATIS cache focuses on caching results within the persistence layer [1]. As such, it is independent of the service or presentation layers, and is not based on object identity.  IBATIS’s caching mechanism is completely configuration based. The cache model configuration is defined within a SQL Map configuration and can be utilized by more than one query mapped statements. The cache model defines how the cache will store fresh results and clear stale data from the cache. Mapped statements that want to use the cache just need to reference it using the CacheModel attributes of the mapped statements. Cache model can specify a cache type to use.  IBATIS provides four default cache implementations (MEMORY, LRU, FIFO, and OSCACHE). 

CHAPTER 3
Methodology

3.1 Caching Fundamentals
The repetitive acquisition and release of the same resource affect the overall performance of the system. Usually, the resource acquired is information from a remote data source. The overhead of multiple acquisitions affects both CPU and I/O operations. The system performance can be improved by reducing the resource management cost. 

Caching is a technique that can drastically improve the performance of any database application. Due to caching, multiple read operations for the same data are avoided. Usually, the I/O operations are the most expensive operations that an application can perform, and therefore it is beneficial to limit their use as much as possible.

Caching the data that the application is accessing will increase the memory usage, sometimes beyond acceptable limits. Therefore it is very important to obtain a proper balance between the I/O accesses and the memory usage. The quantity of data being cached and the moment when to load vary depending on the requirements of each application. It can either be in the beginning when the application initializes or whenever it is required for the first time.

Cache patterns define strategies for integrating caching into applications and middleware components. These patterns concentrate on improving data access performance and resource utilization by eliminating redundant data access operations.


3.2 Demand Cache
Demand Cache [2] describes a strategy for populating a cache incrementally as applications request data. A demand cache is useful for data that is read frequently but unpredictably.

The cache begins empty, so the first reference always requires a physical database query operation. However, as the system runs, the cache accessor gradually populates the cache to contain most of the data that it references frequently.

In this model, application cache initialization is immediate since the cache begins empty. Cache accessor only issues database read operations for data requested by the client. Therefore, the cache contains precisely the minimal set of data required by the application. Demand cache will be populated slowly, using many data access operations. The performance of the application is improving during its execution; the probability to have a cache hit growing after each data access operation.

3.3 Primed Cache
A Primed Cache [2] should be considered whenever it is possible to predict a subset or the entire set of data that the client will request, and to load the cache. Primed cache defines a complementary caching strategy that alleviates the problem of slow, incremental cache population. 

Primed cache has minimal access overheads as issuing a single priming operation that stores many data items in the cache is significantly faster than the analogous combination of individual read operations. Also, if the data requirements of the application are correctly guessed, then the cache will also occupy an optimum quantity of memory, containing only relevant data. 
3.4 Flexible Caching Design
If the data requirements of application are known from the beginning of the run, then loading the data with Primed cache pattern is the best strategy. However, if the data requirements are unpredictable, the only way to function is to load the data first time when required using Demand cache pattern. 

In most of enterprise applications, there are use cases to load data in the beginning as well as on demand. Combination of both Primed and Demand cache patterns can be used in projects in order to utilize the benefits of both designs; minimal data access overhead improvements provided by Primed cache pattern and fast initialization improvements provided by Demand cache pattern.

In iBATIS, it may take several database read operations before the cache helps improving the performance. iBATIS cache design gives slower responses initially with gradual improvement as the cache gets populated. Adding a cache warming behavior to iBATIS framework using Primed Cache pattern will improve the responsiveness of the cache and also solve the N+1 query problem by incurring that cost upfront at application load time. 



CHAPTER 4
Design Flow and Implementation

4.1 iBATIS Cache Design
iBATIS cache design follows the demand cache pattern. iBATIS caches data in mapped statement and parameter granularity in the sense that query name and query parameter combined together is used as specific key to store and retrieve cache data.

MappedStatement is responsible to provide data for client request and is a base for all types of MappedStatement that includes Caching, Select, Update, and Delete statements. The CachingStatement which acts like CacheAccessor in demand cache pattern defines the entry point for all the client's data access operations when cache model is enabled for mapped statement in sql-map configuration. CachingStatement manages both cache and database interactions. CacheModel is a wrapper for caches and it implements ExecuteListener class so that cache can be flushed on execute of mapped statements. CacheModel has reference to CacheController that implements the cache storage mechanism, and SqlExecutor is DataAccessor and responsible for issuing physical database operations.

The CachingStatement forms the cache key based on the Read operation's parameters and mapped statement. If it finds the data in the cache, then it returns data without any database interaction, else it delegates the task to SqlExecutor. 

4.2 Enhanced Cache Design
PrimeCachingStatement is newly introduced as new mapped statement as a child of CachingStatement to perform cache warming using prime cache pattern. If it is possible to predict a small, relevant subset of data to prime in anticipation of subsequent client requests, this subset must be small enough to make efficient use of cache storage, but large enough to include future potential requests. In that case, user can enable primedCache attribute flag of cacheModel element in sql-map configuration to make use of prime cache for loading data into cache like below configuration.

<cacheModel type="MEMORY" id="ProductCategoryCache" primedCache="true">
</cacheModel>

Cache model configuration
 
Figure 3: The static structure of the primed cache design

PrimeCachingStatement holds the reference to mapped statement that needs to be executed to load data for primed cache. When user set primedCache attribute to true, it is mandatory to set primedCacheQuery attribute of select element that want to take advantage of primed cache. The primedCacheQuery attribute specifies which query needs to be run to warm the cache. 

<select id="getParentCategoryForId" parameterClass="int” resultMap="ResultMapName" cacheModel="ProductCategoryCache" primedCacheQuery="getAllParentCategories" keyProperty="productCategoryId">
	<!—select query sql 
</select>
Select statement that wanted to use prime cache

Primed cache uses partial keys and specific keys. A partial key corresponds to a discrete set of data with common characteristics whereas a specific key corresponds to an exact data item in the cache. The mapped statement of primedCacheQuery is used as partial key. The mapped statement of select query and parameter of select query combined together is used as specific key. The keyProperty attribute will be provided in select element by user to say which property of the results should be used as parameter component of specific key.

4.3 Interaction Diagrams
Primed cache uses primedCacheQuery to read the entire set of matching data from the database and stores it in the cache, whereas caller uses specific keys when reading data from the cache. 

When user query for data, PrimeCachingStatement forms specific key and tries to get data from cache. If data is not available in cache, the caching logic infers whether the requested mapped statement is primedCache by checking cacheModel property. If primedCache flag is set, it calls prime method to issue a database read operation that selects all the corresponding data. It generates specific keys for each and assigns specific keys to data that it reads during a priming operation [2] . This behavior tends to depend heavily on specific domain object types. The keyProperty attribute is used to retrieve the parameter object from prime query result object.

During the first cache miss, cache will be warmed with all required data using primed cache query. When users ask data for a specific key next time, PrimeCachingStatement forms a specific key based on the details of the read operation, finds the data in its cache, and returns it without any physical database interaction.

4.4 Cache Warming
In the enhanced cache model, prime cache will be build when the query is invoked first time by user. This is default behavior so that cache for unused query can be avoided. However, enhanced cache model provides a way to load all prime caches during application server start. This functionality is added to warm the cache during server start or anytime. 

iBATIS exposes SqlMapClient interface to user for interacting with iBATIS functionalities. The loadPrimedCache is newly added to SqlMapClient interface to provide flexibility for user to start cache warming at any point of time.  The SqlMapClient transfer the loading of primed cache task to SqlMapExecutorDelegate. It maintains the list of mapped statements. When loadPrimedCache is invoked, it iterates all mapped statements and finds list of all prime caching statements defined in sql map configuration. It makes queryForList method for all primed cache queries to warm the cache with all primed cache data.

CHAPTER 5
Experiment Results and Discussions

5.1 Performance Impact
The performance test for enhanced cache model is conducted using product_category table. This table has 25000 parent categories and 200000 subcategories for which parent_product_category_id column value is zero.

Apache JMeter [5] tool is used to measure responsiveness and JetProfiler [6] tool to measure the traffic in MySQL server for both default iBATIS cache and enhanced iBATIS cache models.  Simulation is conducted for testing responsiveness by invoking 1000 request from client for getting parent category object by passing parent category id (randomly generated). The below sql map configuration shows the setting primed cache.

<cacheModel type="MEMORY" id="ProductCategoryCache"
	primedCache="true">
</cacheModel>
<resultMap class="ProductCategory" id="ParentProductCategoryResultMap">
	<result property="productCategoryId" column="product_category_id" />
	<result property="productCategoryName" column="product_category_name" />
	<result property="productCategoryDescription" column="product_category_description" />
	<result property="productCategoryImage" column="product_category_image" />
</resultMap>
<select id="getAllParentCategories" resultMap="ParentProductCategoryResultMap">
	SELECT * FROM product_category WHERE parent_product_category_id = 0
</select>
<select id="getParentCategory" parameterClass="int"
	resultMap="ParentProductCategoryResultMap" cacheModel="ProductCategoryCache"
	primedCacheQuery="getAllParentCategories" keyProperty="productCategoryId">
	SELECT * FROM product_category WHERE product_category_id = #productCategoryId#
</select>


Experiment shows that response time for client request improves slowly over the time as the cache builds on demand for each cache miss in default iBATIS cache model. The throughput provided by default iBATIS cache model when client makes 1000 requests is 49261 requests per minute.

 
Figure 9: Throughput for the enhanced iBATIS cache model
When primedCache is enabled in cache model, throughput for the initial period is low when compare to default cache model. However, after the prime cache build, the responsiveness is improved. If we call loadPrimedCache functionality during server start, we can push the cost of loading primed cache from first client request to the server boot up process. The throughput provided by enhanced iBATIS cache model when client makes 1000 requests is 63157 requests per minute where as default cache model provides only 49261. 

Load on MySQL Server has been measured using JetProfiler by invoking 50000 requests from a client via JMeter. In the below graph, y-axis shows how many SELECTs are performed, x-axis shows snap time.

Figure 10: Load on database for the default iBATIS cache model

Figure 11: Load on database for the enhanced iBATIS cache model

These database load graph is depict the efficiency the enhanced cache. Load on database for the default iBATIS cache model is very huge when compare to enhanced cache model. For 50000 requests, primed cache model reduced the database load by more than 1000 times (max value on the y-axis is 1 for enhanced cache where it is 1000 for default cache).
5.2 Solving N+1 Query Problem
The sub category object is a JavaBean for product_category table and it has parent category object. To populate these kinds of composite objects, iBATIS provides a way to load data from database using nested select statement. So user can configure the result map of composite objects to specify the query that needs to be run for populating parent category object as below configuration. However, this results in the classic N+1 query problem.

<resultMap class="ProductCategory" id="ProductCategoryResultMap">
	<result property="productCategoryId" column="product_category_id" />
	<result property="productCategoryName" column="product_category_name" />
	<result property="parentProductCategory" column="parent_product_category_id"
		select="getParentCategory" />
</resultMap>
<!-- This statement executes 1 time when we call getAllSubCategories() -->
<select id="getAllSubCategories" resultMap="ProductCategoryResultMap">
	SELECT * FROM product_category WHERE parent_product_category_id != 0 
</select>
<!-- This statement executes N times (once for each sub category returned above) -->
<select id="getParentCategory" parameterClass="int"
	resultMap="ParentProductCategoryResultMap">
	SELECT * FROM product_category WHERE product_category_id = #productCategoryId#
</select>

Enhanced cache model provides a way to specify the query that needs to be run to load cache in the beginning or in the first client request. If user enables primed cache flag in cacheModel element, the cache is warmed. So whenever user calls getAllSubCategories method, only one query will be triggered to list of all sub categories and primed cache will be used for getting parent category for each  sub category. Table shows the cache hit ratio comparison for getting all sub categories. 

Table 1: Cache hit ratio comparison
Cache Hit Ratio
Default iBATIS Cache	Primed Cache
12.5	99.995

CHAPTER 6
Summary

In summary, as a result of combining both Primed and Demand cache patterns, we were able to solve N+1 query problem and also to design a flexible caching model for iBATIS. Starting off the application with a cold cache was giving somewhat non-deterministic performance contracts. Throw in periodic or explicit cache flushes makes the situation becomes even more dicey. So we added ability in iBATIS to provide a cache warming harness. It helps bringing the application to steady-state right after start up and ensures the performance contracts are more deterministic.

A detailed experimental evaluation presented in this dissertation using performance testing tools validated the efficiency and efficacy of the proposed enhanced cache. We compared new cache model impact on the system throughput, database load and cache hit ratio against the iBATIS default cache model. In all cases, the new cache model outperformed the old one. 

CHAPTER 7
Directions for Future Work

This dissertation has outlined an approach and implemented solution to one of the major problems in iBATIS which is a need for both on demand and primed cache.  In the near future, there is the exciting prospect of an integrated this approach to MyBatis framework.  It would be nice if primed cache properties can be gathered by means of annotation as the world moves towards convention over configuration.  An extension to this work includes an implementation of primed cache for query list and creating annotations.

The present enhanced cache served us well for N+1 query problem and cache warming, but as data volume increase, more optimal techniques will be needed for cleaning unused cache data.  At present enhanced cache uses MEMORY cache controller that purely relies on the garbage collector to handle the cached objects.  This is one of the research topics for the near future.  

References
  
1.	Clinton Begin, Brandon Goodin, and Larry Meadors. IBatis In Action. New York: Manning Publications, 2007.

2.	Nock C. Data Access Patterns – Database Interactions in Object-Oriented Applications. Boston: Addison Wesley, 2004.

3.	Kircher M., Jain P. Pattern-Oriented Software Architecture, Patterns for Resource Management . Wiley, 2004.

4.	iBATIS – Developer Guide
URL:http://ibatis.apache.org/docs/java/pdf/iBATIS-SqlMaps-2_en.pdf

5.	JMeter - Junit Sampler Tutorial URL:http://jakarta.apache.org/jmeter/usermanual/junitsampler_tutorial.pdf

6.	JetProfiler – User Guide
URL:http://www.jetprofiler.com/doc/

Product by Balamurugan Krishnamurthy

About

This project revolves around developing an optimization technique to improve the responsiveness of iBATIS cache by introducing the notion of a cache warming harness in iBATIS framework. iBATIS is a persistence framework which automates the mapping between SQL databases and objects in Java. It allows programmers to map the parameters and results …

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published