Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

TAJO-1407 Minor performance improvement of MemSortExec#426

Closed
navis wants to merge 1 commit intoapache:masterfrom
navis:TAJO-1407
Closed

TAJO-1407 Minor performance improvement of MemSortExec#426
navis wants to merge 1 commit intoapache:masterfrom
navis:TAJO-1407

Conversation

@navis
Copy link
Contributor

@navis navis commented Mar 17, 2015

Simple micro benchmark shows 15~30% performance improvement, but this uses more memory. (Also can make little difference on appearance order of non-key values, caused by different sort implementation)

@navis
Copy link
Contributor Author

navis commented Mar 18, 2015

Removed JDK7 only method

@jihoonson
Copy link
Contributor

Looks interesting.
I'll review tonight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding some descriptions on this class?

@navis
Copy link
Contributor Author

navis commented Mar 19, 2015

Added comments and now it supports interval type.

@jihoonson
Copy link
Contributor

I'll review this weekend.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TajoDataTypes.Type is the enum type of PB. Each type already has a corresponding integer.
type.getNumber() returns an integer value corresponding to the given type, and Type.valueOf() returns a type corresponding to the given integer value.
How about use these functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible but I prefer switch-case block to be compact as possible as it can be. And most of types in PB enum cannot be supported by this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. It looks good.

@jihoonson
Copy link
Contributor

@navis, this is truly impressive work.
With my laptop, the vectorized sorter is faster than Collections.sort() about 25%!
I left some trivial comments.
In addition, it will be great if you add some tests to verify sort result.

@navis
Copy link
Contributor Author

navis commented Mar 24, 2015

Revised test case, which can be fail sometimes

@jihoonson
Copy link
Contributor

I run the test 5 times, and it passed every time.
Would you share when the test fails?

@navis
Copy link
Contributor Author

navis commented Mar 24, 2015

2M rows sometimes made GCs, causing inversion of elapsed time.

@jihoonson
Copy link
Contributor

Oh, I think it's ok. The revised test doesn't contain any comparisons of elapsed times.
In addition, MemSortExec is executed only when there remains a sufficient amount of memory.
Here is my +1, and I'll commit it shortly.
Thanks for your great work!

@jihoonson
Copy link
Contributor

Oops, sorry. I have to wait the CI result.
Would you put the latest patch on Jira?
I'd like to commit this patch as soon as possible, and Jenkins will be much faster than Travis CI.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants