hank / life
- Source
- Commits
- Network (0)
- Issues (0)
- Downloads (0)
- Wiki (1)
- Graphs
-
Tree:
0e2a689
ralreein (author)
Sun Aug 23 19:55:40 -0700 2009
| 2af5ed41 » | Erik | 2008-07-23 | 1 | = OSCON 2008, Session 4: LucidDB | |
| 2 | - Lucidera | ||||
| 3 | - LucidDB is and open-source column store | ||||
| 4 | |||||
| 5 | == Why | ||||
| 6 | - Boatload of data | ||||
| 7 | - Need to analyze | ||||
| 8 | - You are Lazy, cheap, smart | ||||
| 9 | - Not like bigtable or hypertable, vanilla db accelerated for analytics | ||||
| 10 | - Complex star joins and stuff | ||||
| 1dc69265 » | Erik | 2008-07-26 | 11 | - LucidDB addresses sizes between 10's of GB and terabytes (sweet spot). | |
| 2af5ed41 » | Erik | 2008-07-23 | 12 | ||
| 13 | == Benchmarks | ||||
| 14 | === Assumptions | ||||
| 15 | - TPC-H Scale Factor 10 | ||||
| 16 | - LucidDB 0.7.4 | ||||
| 17 | - 6GB Buffer Pool | ||||
| 18 | - libaio and O_DIRECT | ||||
| 19 | - MySQL 5.0.22, MyISAM | ||||
| 20 | - Scale factor 10 = 10GB flat file data = 60M lineitems | ||||
| 21 | - same schema, primary and foreign keys indexed | ||||
| 22 | - Machine used: AMD64 2Ghz, RHEL5, 2.6.18-8, JRockit R27.4, 8GB RAM, 1MB L2, SATA 10K RPM, ext3 | ||||
| 23 | - Dramatic differences (factors of 2 and better are average) | ||||
| 24 | - Loading takes more time, Creating indexes a LOT faster | ||||
| 25 | |||||
| 26 | == Architecture | ||||
| 27 | - Read what you need | ||||
| 28 | - Aggressive compression | ||||
| 29 | - Optimal use of IO | ||||
| 30 | - Larger effective data cache | ||||
| 31 | - Uses index semijoin to handle star joins | ||||
| 32 | - Make every disk read count: High selectivity, fragmentation, page reads may by non-contiguous. | ||||
| 33 | - Java? What the... "If you're not doing Java, there's not a very good solution" | ||||
| 34 | - C++ heavy lifting. | ||||
| 35 | - Never do single-row inserts or updates into a column data store | ||||
| 36 | |||||
