Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pg_usleep with WaitLatch, check for postmaster death #15

Merged
merged 1 commit into from
Jan 27, 2017

Conversation

marcocitus
Copy link
Member

pg_usleep might not respond immediately in case of a postmaster crash. We separately need to address poll.

@keithf4
Copy link

keithf4 commented Dec 22, 2016

Don't mean to derail this pull request, but I've been trying to do this exact thing with pg_partman in 9.6 and been getting nowhere. This Latch method works perfectly fine with background workers in 9.4 and 9.5, but something has changed in 9.6 and I'm unable to get this method to work. I'm responding here both so I can follow along if this is incorporated into pg_cron successfully and see how it's done. And also to see if you have the same issues and may be able to assist in debugging the cause.

Michael Paquier narrowed down the commit that causes the issue and is linked below. Title of the thread is misleading as that was just my first thought as to what the problem was. Latest reply from me shows my debugging attempts that confirmed it's the SIGUSR1 signal causing issues.

https://www.postgresql.org/message-id/CAB7nPqTrRjdcV_e4gJdNRqgHSdoX15vVwMqGpukFU6AfdS8mqg%40mail.gmail.com

@keithf4
Copy link

keithf4 commented Dec 23, 2016

So, I applied this patch to pg_cron to see if it would have the same problem as pg_partman. After fixing the waitFlags to be able to use the timeout value (otherwise it just sits and waits forever if there's ever no tasks to run)

int waitFlags = WL_LATCH_SET | WL_POSTMASTER_DEATH | WL_TIMEOUT;

I found that it does not have the same problems. Good news for you!

So, that made me go back to digging around the background worker code some more. Think I may have found the source of my problem. Has to do with one background worker calling another dynamic one that has to report its status back to the caller.

https://www.postgresql.org/message-id/CAG1_KcDdckBorU3B7H-eGmpcGM9HQMnYfSNOKMU3HWsx-rPcrg%40mail.gmail.com

So, thanks for putting this pull request in because it forced me to dig deeper and find the real cause of my problem :)

I tried looking at how to add this WaitLatch() code to the polling section, but not quite sure how at this time either.

@marcocitus
Copy link
Member Author

Thanks for sharing your investigation, that's good to know. I've also seen some other problems with starting multiple background workers, in particular invalidations not arriving.

@marcocitus marcocitus merged commit eee57b3 into master Jan 27, 2017
@marcocitus marcocitus deleted the bugfix/wait_latch branch January 27, 2017 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants