hank / life

Good code.

This URL has Read+Write access

life / oscon / 2008 / sessions / LucidDB.rdoc
2af5ed41 » Erik 2008-07-23 Ug LucidDB 1 = OSCON 2008, Session 4: LucidDB
2 - Lucidera
3 - LucidDB is and open-source column store
4
5 == Why
6 - Boatload of data
7 - Need to analyze
8 - You are Lazy, cheap, smart
9 - Not like bigtable or hypertable, vanilla db accelerated for analytics
10 - Complex star joins and stuff
1dc69265 » Erik 2008-07-26 Spelling corrections 11 - LucidDB addresses sizes between 10's of GB and terabytes (sweet spot).
2af5ed41 » Erik 2008-07-23 Ug LucidDB 12
13 == Benchmarks
14 === Assumptions
15 - TPC-H Scale Factor 10
16 - LucidDB 0.7.4
17 - 6GB Buffer Pool
18 - libaio and O_DIRECT
19 - MySQL 5.0.22, MyISAM
20 - Scale factor 10 = 10GB flat file data = 60M lineitems
21 - same schema, primary and foreign keys indexed
22 - Machine used: AMD64 2Ghz, RHEL5, 2.6.18-8, JRockit R27.4, 8GB RAM, 1MB L2, SATA 10K RPM, ext3
23 - Dramatic differences (factors of 2 and better are average)
24 - Loading takes more time, Creating indexes a LOT faster
25
26 == Architecture
27 - Read what you need
28 - Aggressive compression
29 - Optimal use of IO
30 - Larger effective data cache
31 - Uses index semijoin to handle star joins
32 - Make every disk read count: High selectivity, fragmentation, page reads may by non-contiguous.
33 - Java? What the... "If you're not doing Java, there's not a very good solution"
34 - C++ heavy lifting.
35 - Never do single-row inserts or updates into a column data store
36