Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
DataSeries is a toolset for processing and analyzing large sets of tabular data. It uses a data model similar to SQL. Data is grouped by types (ala SQL tables) that consist of a number of rows. Each row consists of a number of fields (a la SQL columns) of various types. DataSeries includes the C++ interface to quickly read and analyze DataSeries files. We have used it to store and analyze datasets containing more than 100 billion rows. The DataSeries User Guide describes how to use DataSeries.
A brief article describing DataSeries was published in Operating Systems Review (January 2009, Vol 43, Issue 1). The technical report that is part of the DataSeries distribution describes many of the experiments we have performed with DataSeries, and evaluations of various of its features.
You have several options for getting DataSeries:
- You can install DataSeries from the binary package releases.
- You can build DataSeries from the tar file source releases.
- You can build DataSeries from the current git repository.
The most recent release was on 2011-06-13. The big change since previous releases is we now generate binary packages for Debian, Ubuntu, CentOS, Fedora, Scientific Linux, OpenSuSE. If you have any difficulty with the packages, please send mail to lintel-users (at) lists.sourceforge.net; the packages have not been extensively tested. There have been a collection of more minor extensions to the packages that are described in the Lintel-NEWS and DataSeries-NEWS files. We have also introduced sourceforge.net based mailing lists for the projects.
A new release is currently (2012-04) being prepared. If you want to test it, please send mail to lintel-users. It will include all recent changes to Lintel and DataSeries, packages for additional operating systems, and support for building on FreeBSD, OpenBSD, and MacOS.
The DataSeries User Guide describes how to use DataSeries.
The SNIA IOTTA working group also makes data available in DataSeries and several other formats.