Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HAWQ-274. Add check for segments' temporary directories #274

Closed
wants to merge 4 commits into from

Conversation

linwen
Copy link
Contributor

@linwen linwen commented Jan 17, 2016

  1. add two columns in gp_segment_config, failed_tmpdir_num and failed_tmpdir;
  2. segments' temporary directory information is loaded in shared memeory;
  3. segment's RM process checks and reports failed tmp dir number and path in IMAlive message;
  4. master's RM process updates segment's status in catalog table, if failed tmp dir number exceeds
    the guc values, this segment is considered as down.

Please review, Thanks!

Wen Lin added 4 commits January 17, 2016 18:46
1. add two columns in gp_segment_config, failed_tmpdir_num and failed_tmpdir;
2. segments' temporary directory information is loaded in shared memeory;
2. segment's RM process checks and reports failed tmp dir number and path in IMAlive message;
3. master's RM process updates segment's status in catalog table, if failed tmp dir number exceeds
the guc values, this segment is considered as down.
@jiny2
Copy link
Contributor

jiny2 commented Jan 19, 2016

The implementation looks good to me. +1

@wengyanqing
Copy link
Contributor

+1

@@ -151,6 +152,7 @@ int initializeSocketServer_RMSEG(void)
}
#define SEGMENT_HEARTBEAT_INTERVAL (3LL * 1000000LL)
#define SEGMENT_HOSTCHECK_INTERVAL (5LL * 1000000LL)
#define SEGMENT_TMPDIRCHECK_INTERVAL (10 * 60LL * 1000000LL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using three gucs for these three parameters might be better. It will make future tuning and testing work much easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10mins looks a very long time too.

@cwelton
Copy link
Contributor

cwelton commented Feb 17, 2016

This PR has now been open for over 30 days.

I see that HAWQ-274 was marked as resolved, did you simply forget to close the PR?

@linwen linwen closed this Feb 18, 2016
@linwen linwen deleted the HAWQ_274 branch August 5, 2016 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants