DataSpread: A Spreadsheet-Database Hybrid System
DataSpread is a spreadsheet-database hybrid system, with a spreadsheet frontend, and a database backend. Thus, DataSpread inherits the flexibility and ease-of-use of spreadsheets, as well as the scalability and power of databases. A paper describing DataSpread's architecture, design decisions, and optimization can be found here. DataSpread is a multi-year project, supported by the National Science Foundation via award number 1633755.
The current version is 0.1.
DataSpread's version 0.1 enables users to scale to billions of cells and return results for common spreadsheet operations within seconds. It does so via on-demand loading of spreadsheet data.
Like traditional spreadsheet software, DataSpread supports standard spreadsheet book and sheet operations like Load, Rename, Delete, and Import (via XLS and XLSX, and CSV). Any updates to the spreadsheets are automatically saved.
Like traditional spreadsheet software, DataSpread supports the use of 225+ spreadsheet functions, along with formatting and styling operations. It also supports row and column operations like insert, delete, cut, copy, and paste; during insertion and deletion, formulae are updated as is the case in traditional spreadsheet software.
It supports all these operations while scaling to arbitrarily large spreadsheets.
In future releases, DataSpread will support SQL on the spreadsheet frontend, along with other relational algebra-based interactions. It will also support joint formula evaluation and optimization.
Key Design Innovations
- DataSpread employs a flexible hybrid data model to represent spreadsheet data within a database.
- DataSpread uses positional indexing techniques to both locate data by position, and keep it up-to-date as the data is updated.
- DataSpread also employs a LRU caching mechanism to retrieve and keep in memory data from the database on demand.
- DataSpread also employs speculative fetching to fetch additional data beyond the user's current spreadsheet window.
You can directly use DataSpread via our cloud-hosted site.
To host DataSpread locally you can either use one of the pre-build war files, available here, or build the war file yourself from the source.
- Apache Ant >= 1.6
- Java Platform (JDK) >= 8
- PostgreSQL >= 9.5
- PostgreSQL JDBC driver - 9.4.1208
- Apache Tomcat >= 8.5.4
- Apache Maven >= 3.5.0
Building Instructions (To generate a war file)
Clone the DataSpread repository. Alternatively, you can download the source as a zip or tar.gz.
Use maven to build the
warfile using the following command. After the build completes the war is available at
mvn clean install
Deploying DataSpread locally.
Create a database and an user who has access to the database. Note the database name, username and password. Typically when you have PostgreSQL installed locally the password is blank.
Install Apache Tomcat. You can use the guide here. Make a note of the directory where tomcat is installed. This is known as
TOMCAT_HOMEin all documentation.
Update the Tomcat configuration. You need to update the following two files, which are present in
web.xmlby adding the following text at the end of the file before the closing XML tag.
<listener> <listener-class>org.model.DBHandler</listener-class> </listener>
context.xmlby adding the following text at the end of the file before the closing XML tag.
<Resource name="jdbc/ibd" auth="Container" type="javax.sql.DataSource" driverClassName="org.postgresql.Driver" url="jdbc:postgresql://127.0.0.1:5432/<database_name>" username="<username>" password="<password>" maxTotal="20" maxIdle="10" maxWaitMillis="-1" defaultAutoCommit="false" accessToUnderlyingConnectionAllowed="true"/>
<password>with your PostgreSQL's database name, user name and password respectively.
postgresql-9.4.1208(Download from here) to
TOMCAT_HOME. It is crucial to have the exact version of this file.
Deploy the war file within Tomcat. This can be done via Tomcat's web interface or by manually copying the war file in the
Now you are ready to run the program. Visit the url where Tomcat is installed. It will be typically http://localhost:8080/DataSpread_war/ for a local install.