Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

TAJO-1950: Query master uses too much memory during range shuffle#884

Closed
jihoonson wants to merge 43 commits intoapache:masterfrom
jihoonson:TAJO-1950
Closed

TAJO-1950: Query master uses too much memory during range shuffle#884
jihoonson wants to merge 43 commits intoapache:masterfrom
jihoonson:TAJO-1950

Conversation

@jihoonson
Copy link
Contributor

No description provided.

@jihoonson jihoonson changed the title TAJO-1950: UniformRangePartition::partition() takes too long time when sorting large data TAJO-1950: Query master uses too much memory during range shuffle Nov 27, 2015
@jihoonson
Copy link
Contributor Author

This patch is not ready for review. I need more tests including CI test.

@jihoonson
Copy link
Contributor Author

Hi, this patch is ready for review.
Here are highlights of this patch.

  • To reduce the number of fetches created in Repartitioner, I've merged fetches with their pull server host name. As a result, the total number of fetches is H * R, where H is the number of hosts and R is the number of ranges.
  • To reduce the size of fetches created in Repartitioner, I've changed to keep FetchProto instead of FetchImpl.
  • To reduce the number of index opens and closes, I've added an index reader cache to PullServerService.

@jihoonson
Copy link
Contributor Author

I've additionally added some codes to clean up index cache upon eb completion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be future.channel().close()

@jihoonson
Copy link
Contributor Author

@jinossy thanks for your review.
Actually, I found more problems related to the http message length. I'll fix it soon.

@jinossy
Copy link
Member

jinossy commented Dec 10, 2015

I see, please go ahead.

@jihoonson
Copy link
Contributor Author

@jinossy thanks! I've updated my patch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jihoonson
zero length chunk should be removed

@jinossy
Copy link
Member

jinossy commented Dec 15, 2015

Looks good to me!
@hyunsik
Have you any comments?

@hyunsik
Copy link
Member

hyunsik commented Dec 16, 2015

I have a concern about too long of MAXIMUM HTTP REQUEST. It may cause too large granularity for repeat fetch.

@jinossy
Copy link
Member

jinossy commented Dec 17, 2015

@hyunsik
You're right. it can cause read timeout.

@jihoonson
Could you change to configurable value ?

@jihoonson
Copy link
Contributor Author

@hyunsik and @jinossy, thanks for your comment. I make the max url length configurable. The default value is 2 KB.

@jinossy
Copy link
Member

jinossy commented Dec 18, 2015

+1 LGTM!

@jihoonson
Copy link
Contributor Author

@jinossy thanks for your review. I'm testing this patch against 10TB dataset. I'll share the result when it finishes and then commit this patch.

@jihoonson
Copy link
Contributor Author

Well, there remain some issues around fetch timeout. When I tested this patch against 10TB dataset, a lot of fetch timeouts occurred while transferring intermediate data between stages. The main reason looks that index lookup takes a lot of time (over 30 seconds with cache miss). So, I think the fundamental solution is to improve index search performance which need to be handled in another jira.
So, I'd like to suggest to increase fetch timeout temporarily. With doubled timeout (that is 120 seconds), everything was ok against 10TB data.
@hyunsik, @jinossy what do you think?

@jihoonson
Copy link
Contributor Author

I've decreased the default max url length to 1KB. Maybe we can find more optimized value of max url length and fetch timeout.

@jinossy
Copy link
Member

jinossy commented Dec 24, 2015

@jihoonson
Thanks for your sharing. I guess, decreasing uri parameter will be better
Ship it!

@jihoonson
Copy link
Contributor Author

@jinossy thanks for your comment. I'll commit shortly.
Merry christmas!

@asfgit asfgit closed this in 1f9ae1d Dec 24, 2015
combineads pushed a commit to combineads/tajo that referenced this pull request Dec 29, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants