I tested Alenka on TPC-H on my GPU GTX 460SE, 1GB, 288 Cores, SM2.0. #33

AlexeyAB · 2013-06-03T11:20:41Z

Hi Anton!
I tested Alenka on TPC-H on my GPU GTX 460SE, 1GB, 288 Cores, SM2.0.

Can you show the results on your GPU and on the others GPU, on which you tested?

I have such results:

TPC-H	GPU SM2.0 GTX 460SE 1GB 288 Cores
Volume of DB:	10GB		100GB
Launch number:	1st launch	2nd launch	1st launch	2nd launch
q1.sql	5,3	3,8	90	90
q2.sql	1	0,36	9,3	2,3
q3.sql	1,8	1,7	71	67
q4.sql	4,25	1,3	45	13
q5.sql	-	-	-	-
q6.sql	1,77	0,9	39	8
q7.sql	8,7	2	-	-

Regards, Alexey

AlexeyAB · 2013-06-03T11:24:02Z

An additional, i can't run q5 and get an error.

For 100 GB TPC-H:

F:\GPU_test_100GB_new>AlenkaDB.exe q5.sql
GeForce GTX 460 SE : 1296.000 Mhz   (Ordinal 0)
6 SMs enabled. Compute Capability sm_21
FreeMem:    867MB   TotalMem:   1023MB.
Mem Clock: 1700.000 Mhz x 256 bits   (108.800 GB/s)
ECC Disabled

Process count = 6200000
BINARY LOAD: R region
Reading 5 records
...
JOIN J1 l_orderkey o_orderkey 829939712
segment 96 560193536
join count 18210128
join time 56.555 801366016
JOIN J2 l_suppkey s_suppkey 829939712
Cuda error in file 'bison.y' in line 1461 : out of memory.

And for TPC-H 10GB i get an exception:

F:\GPU_test_10GB_new>AlenkaDB.exe q5.sql
GeForce GTX 460 SE : 1296.000 Mhz (Ordinal 0)
6 SMs enabled. Compute Capability sm_21
FreeMem: 858MB TotalMem: 1023MB.
Mem Clock: 1700.000 Mhz x 256 bits (108.800 GB/s)
ECC Disabled

Process count = 6200000
BINARY LOAD: R region
Reading 5 records
FILTER RF R 900055040
MAP CHECK segment 0 R
filter is finished 1 899006464
filter time 0.004 900055040
BINARY LOAD: N nation
Reading 25 records
BINARY LOAD: O orders
Reading 15000000 records
FILTER OFI O 900055040
MAP CHECK segment 2 R
filter is finished 2275919 861650944
filter time 0.065 886554624
BINARY LOAD: S supplier
Reading 100000 records
BINARY LOAD: C customer
Reading 1500000 records
BINARY LOAD: L lineitem
Reading 59986052 records
JOIN J11 n_regionkey r_regionkey 886554624
segment 0 886554624
join count 5
join time 0.007 886554624
JOIN J12 c_nationkey n_nationkey 886554624
segment 0 874496000
join count 300270
join time 0.035 876593152
JOIN J13 o_custkey c_custkey 886554624
segment 2 801226752
join count 456771
join time 0.154 822460416
JOIN J1 l_orderkey o_orderkey 828882944
segment 9 723501056
join count 1825856
join time 1.253 793100288
JOIN J2 l_suppkey s_suppkey 820625408
segment 0 803848192

<- got an exception

antonmks · 2013-06-03T12:10:49Z

Hi Alexey
I do not have a 100GB database right now, only 300GB so I can tell you my 300x results.
I do not count the disk read time because it depends only on your disk subsystem so these are run from a host memory ("second run") results :
So here it is for 300GB (in seconds) :
Q1 - 33
Q2 - 4
Q3 -16
Q5
Q6 - 2
Q7 -30

Q5 doesn't quite work at this scale because of a large intermediate data set. It is easy to fix, I just need to add partitioning routine to a join, just like in groupby.
My setup is :
Pentium G620
GTX 580 3GB of DDR5
16GB of DDR3

Before creating datafiles I presorted the lineitem.tbl on date_ordered column, this way Alenka can skip entire segments in filter ops.
I should be able to run the tests next week on GTX Titan, I expect the results to be significantly faster.

Regards, Anton

AlexeyAB · 2013-06-03T12:44:48Z

Now Alenka uses segment's (storage) indexes in DSM, like in Oracle ExatadaV2 in PAX?

You'll get access to Titan at some university, in a cloud or will buy it?

And why you exclude q4.sql?

Regards, Alexey

antonmks · 2013-06-03T12:55:24Z

Alenka doesn't use any indexes.
Alenka stores min and max values of every column for every segment so when running operations it is often possible to figure out that none of the values in a segment match, even without reading the segment itself.
GTX Titan - I got it as a present from Israeli guys who use Alenka :-)
Q4 - I exclude a lot of queries, supporting all TPC-H would require implementing many additional features ...

AlexeyAB · 2013-06-03T13:25:34Z

This method in Oracle is called the Storage Index, which uses in Smart Scans :)
http://www.oracle.com/technetwork/issue-archive/2011/11-may/o31exadata-354069.html

Storage Index

Figure 1 shows how the data inside a storage index is maintained.
...
Each storage index entry covers a physical region of the table, contains minimum and maximum values 
of the columns in that region, and also indicates whether any of the rows in that region contain nulls.

And If interested in HCC Mechanics, three paragraphs about it (half a page), the structure of the hybrid columnar compression (what sounds a whole lot like PAX) in Oracle ExadataV2.
They use a sorting of data in the first place, for the effective compression ratio.
http://books.google.ru/books?id=1Aqw6mrxz5AC&lpg=PA104&ots=T9pNopWGoA&hl=ru&pg=PA72&redir_esc=y#v=onepage&q&f=false

Discussion and what said Kevin Closson about HCC:
http://www.dbms2.com/2009/09/03/oracle-11g-exadata-hybrid-columnar-compression/

antonmks · 2013-06-03T13:47:00Z

Yeah, hybrid compression becomes popular. If you are into this, take a look
at

http://www.dbms2.com/2013/05/27/ibm-blu

Pretty interesting system and it executes queries on compressed data too !

AlexeyAB · 2013-06-03T21:35:49Z

Thanks!

About presorted the lineitem.tbl on date_ordered column. Can you write that SQL?

This has no effect:

A := LOAD 'lineitem.tbl' USING ('|') AS (orderkey{1}:int,  l_suppkey{3}:int, qty{5}:int, 
 price{6}:decimal, discount{7}:decimal, tax{8}:decimal, rf{9}:varchar(1),
 lf{10}:varchar(1), shipdate{11}:int, commitdate{12}:int, receiptdate{13}:int);

OA := ORDER A BY shipdate ASC;

STORE OA INTO 'lineitem' BINARY;

Outpup:
Process count = 6200000
LOAD: A lineitem.tbl 11 |
order: OA A
STORE: OA lineitem
LOADING lineitem.tbl |
SQL scan parse worked
cycle time 0.81

And no files created.

antonmks · 2013-06-04T02:15:08Z

Well, you need to presort the entire file :
sort -t "|" -k11 lineitem.tbl > lineitems.tbl

Regards, Anton

AlexeyAB · 2013-06-04T09:20:24Z

Thanks!
This is bash, but I on Windows, here sort can't do this :)
Will have to run a virtual machine.

antonmks · 2013-06-04T09:29:27Z

Just install Cygwin, it is really convenient.

AlexeyAB mentioned this issue Jun 10, 2013

Whether to do short description of the architecture and implementation? #31

Closed

antonmks closed this as completed Jun 22, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I tested Alenka on TPC-H on my GPU GTX 460SE, 1GB, 288 Cores, SM2.0. #33

I tested Alenka on TPC-H on my GPU GTX 460SE, 1GB, 288 Cores, SM2.0. #33

AlexeyAB commented Jun 3, 2013

AlexeyAB commented Jun 3, 2013

antonmks commented Jun 3, 2013

AlexeyAB commented Jun 3, 2013

antonmks commented Jun 3, 2013

AlexeyAB commented Jun 3, 2013

antonmks commented Jun 3, 2013

AlexeyAB commented Jun 3, 2013

antonmks commented Jun 4, 2013

AlexeyAB commented Jun 4, 2013

antonmks commented Jun 4, 2013

I tested Alenka on TPC-H on my GPU GTX 460SE, 1GB, 288 Cores, SM2.0. #33

I tested Alenka on TPC-H on my GPU GTX 460SE, 1GB, 288 Cores, SM2.0. #33

Comments

AlexeyAB commented Jun 3, 2013

AlexeyAB commented Jun 3, 2013

antonmks commented Jun 3, 2013

AlexeyAB commented Jun 3, 2013

antonmks commented Jun 3, 2013

AlexeyAB commented Jun 3, 2013

antonmks commented Jun 3, 2013

AlexeyAB commented Jun 3, 2013

antonmks commented Jun 4, 2013

AlexeyAB commented Jun 4, 2013

antonmks commented Jun 4, 2013