-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster joins #38
Comments
Good news! It sorts the data in every segment seperately, and can be used an additional to "sort -t "|" -k11 lineitem.tbl > lineitems.tbl" to sort all data by field "shipdate" which reduces time for FILTER? And by using thrust::is_sorted() before join we can know, do we need to sort data or no, is it correct? |
Yes. Anton Подольск ? On Thu, Jul 4, 2013 at 3:44 PM, AlexeyAB notifications@github.com wrote:
|
Stored in the "*.sort-file" indexes, which uses with "thrust::gather/scatter" to produce sorted segment? Yes, and you, Moscow? |
The sort file simply has one or more numbers like 0 - meaning that a segment is already presorted on first field. Yes. |
.sort is a metadata-file. Why not to use bitmask, an example 5(101) mean that on 1st and 3d fields is already sorted? А как догадались, что с Подольска? :) |
Yes, it is possible to use a bitmask, although the effect on performance would be nil. It is just a list of sorted fields. Посмотрел Google analytics, там есть города посетителей. А почему Подольск ? С такими интересами в Сан Франциско было бы интереснее и лучше :-) |
I can't read it :) |
Updated. |
Вариант. Но не так хорош разговорный английский, и сейчас не так далеко от Москвы, пока смотрю найдется ли здесь достойное применение. А сами не думали о Сан Франциско? :) |
Английский - ерунда, учится быстро. А мне никак по семейным обстоятельствам :-( |
I added an option to sort database segments locally on a join key. The best performance can be reached by sorting the data file on a filter key and then sorting segments locally on a join key. This way the expensive sorting of the datasets can be avoided in join operations.
So, for example, the line
STORE A INTO 'lineitem' BINARY;
now becomes
STORE A INTO 'lineitem' BINARY SORT SEGMENTS BY orderkey;
Running time of Q3 changed from 15 to 7 seconds,
Q5 from 68 to 17 seconds.
The text was updated successfully, but these errors were encountered: