This repository is private.
All pages are served over SSL and all pushing and pulling is done over SSH.
No one may fork, clone, or view it unless they are added as a member.
Every repository with this icon (
) is private.
Every repository with this icon (
This repository is public.
Anyone may fork, clone, or view it.
Every repository with this icon (
) is public.
Every repository with this icon (
ekgregg (author)
Fri Apr 10 14:29:46 -0700 2009
OSCON 2008, Session 4: LucidDB
- Lucidera
- LucidDB is and open-source column store
Why
- Boatload of data
- Need to analyze
- You are Lazy, cheap, smart
- Not like bigtable or hypertable, vanilla db accelerated for analytics
- Complex star joins and stuff
- LucidDB addresses sizes between 10’s of GB and terabytes (sweet spot).
Benchmarks
Assumptions
- TPC-H Scale Factor 10
- LucidDB 0.7.4
- 6GB Buffer Pool
- libaio and O_DIRECT
- MySQL 5.0.22, MyISAM
- Scale factor 10 = 10GB flat file data = 60M lineitems
- same schema, primary and foreign keys indexed
- Machine used: AMD64 2Ghz, RHEL5, 2.6.18-8, JRockit R27.4, 8GB RAM, 1MB L2, SATA 10K RPM, ext3
- Dramatic differences (factors of 2 and better are average)
- Loading takes more time, Creating indexes a LOT faster
Architecture
- Read what you need
- Aggressive compression
- Optimal use of IO
- Larger effective data cache
- Uses index semijoin to handle star joins
- Make every disk read count: High selectivity, fragmentation, page reads may by non-contiguous.
- Java? What the… "If you’re not doing Java, there’s not a very good solution"
- C++ heavy lifting.
- Never do single-row inserts or updates into a column data store











