Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding support for using background workers instead of libpq + adding support for an audit table #111

Merged
merged 5 commits into from
Sep 15, 2020

Conversation

bdrouvotAWS
Copy link
Collaborator

This PR is mainly made of 4 commits:

  1. background worker support
  2. audit table support
  3. job and task wording
  4. renamed bgworker scheduler to launcher

Let's explain each commit

background worker support

This commit is adding support for using background workers instead of libpq sessions to execute the jobs.
This behavior is guarded behind a GUC and disabled by default so that it does not impact backward compatibility.
The background workers use a similar technique to the pg_background extension to run commands, and things like cron.host, nodename, and nodeport have no effect in this mode.

audit table support

This commit is adding support for an audit table.
It works for either libpq or background-worker mode.

content looks like:

postgres=# select * from cron.job_run_details;
 jobid | runid | job_pid | database | username |                  command                   |  status   |         return_message         |          start_time           |           end_time
-------+-------+---------+----------+----------+--------------------------------------------+-----------+--------------------------------+-------------------------------+-------------------------------
     6 |     2 |   15073 | postgres | bdt      | insert into bdtcron values(1)              | succeeded | INSERT 0 1                     | 2020-08-18 12:43:01.004976+00 | 2020-08-18 12:43:01.009695+00
     2 |     3 |   15074 | postgres | bdt      | update bdtcron set a = a -1                | succeeded | UPDATE 0                       | 2020-08-18 12:43:01.005747+00 | 2020-08-18 12:43:01.010155+00
     4 |     4 |   15075 | postgres | bdt      | create table tutu as select * from bdtcron | succeeded | SELECT 1                       | 2020-08-18 12:43:01.006427+00 | 2020-08-18 12:43:01.011015+00
     5 |     5 |   15076 | postgres | bdt      | select * from bdtcron                      | succeeded | SELECT 1                       | 2020-08-18 12:43:01.00724+00  | 2020-08-18 12:43:01.011435+00
     1 |     6 |   15077 | postgres | bdt      | delete from bdtcron                        | succeeded | DELETE 1                       | 2020-08-18 12:43:01.008254+00 | 2020-08-18 12:43:01.011857+00
     3 |     7 |   15078 | postgres | bdt      | vacuum bdtcron                             | succeeded | VACUUM                         | 2020-08-18 12:43:01.00887+00  | 2020-08-18 12:43:01.022352+00
     6 |     8 |   15080 | postgres | bdt      | insert into bdtcron values(1)              | succeeded | INSERT 0 1                     | 2020-08-18 12:44:01.004167+00 | 2020-08-18 12:44:01.009513+00
     2 |     9 |   15081 | postgres | bdt      | update bdtcron set a = a -1                | succeeded | UPDATE 0                       | 2020-08-18 12:44:01.004883+00 | 2020-08-18 12:44:01.010053+00
     4 |    10 |   15082 | postgres | bdt      | create table tutu as select * from bdtcron | failed    | relation "tutu" already exists | 2020-08-18 12:44:01.005524+00 | 2020-08-18 12:44:01.010601+00
     5 |    11 |   15083 | postgres | bdt      | select * from bdtcron                      | succeeded | SELECT 1                       | 2020-08-18 12:44:01.006221+00 | 2020-08-18 12:44:01.011112+00
     1 |    12 |   15084 | postgres | bdt      | delete from bdtcron                        | succeeded | DELETE 1                       | 2020-08-18 12:44:01.00723+00  | 2020-08-18 12:44:01.011554+00

job and task wording

This commit is adding some wording about job and task.

renamed bgworker scheduler to launcher

This commit is renaming the current background worker from "scheduler" to "launcher" (to go with autovac nomenclature) to avoid confusion with the new workers.

@ghost
Copy link

ghost commented Aug 26, 2020

CLA assistant check
All CLA requirements met.

src/pg_cron.c Outdated Show resolved Hide resolved
src/pg_cron.c Outdated Show resolved Hide resolved
src/pg_cron.c Outdated Show resolved Hide resolved
src/job_metadata.c Outdated Show resolved Hide resolved
src/job_metadata.c Outdated Show resolved Hide resolved
src/job_metadata.c Outdated Show resolved Hide resolved
src/pg_cron.c Outdated Show resolved Hide resolved
@@ -3,6 +3,9 @@
* src/pg_cron.c
*
* Implementation of the pg_cron task scheduler.
* Wording:
* - A job is a scheduling definition of a task
* - A task is what is actually executed within the database engine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit that this wasn't the greatest use of words.

src/pg_cron.c Outdated Show resolved Hide resolved
src/pg_cron.c Outdated Show resolved Hide resolved
Copy link
Member

@marcocitus marcocitus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very polished already.

The only issue I found so far was that a WARNING results in "failed" status. We could also do a little more to restrict concurrency within the bounds of max_worker_processes.

Copy link
Member

@marcocitus marcocitus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a bit mean. It hits an assert failure.

SELECT cron.schedule('* * * * *', 'drop extension pg_cron');

2020-09-01 16:15:00.007 CEST [3775] LOG:  cron job 2 starting: drop extension pg_cron
TRAP: FailedAssertion("!(update_result)", File: "src/pg_cron.c", Line: 1647)


appendStringInfo(&querybuf,
"update %s.%s set status = '%s', return_message = 'server restarted' where status in ('%s','%s')"
, CRON_SCHEMA_NAME, JOB_RUN_DETAILS_TABLE_NAME, GetCronStatus(CRON_STATUS_FAILED), GetCronStatus(CRON_STATUS_STARTING), GetCronStatus(CRON_STATUS_RUNNING));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's usually nicer to use quote_qualified_identifier to generate qualified table names.

@@ -275,6 +281,43 @@ NextJobId(void)
return jobId;
}

int64
NextRunId(void)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some style things like having comments on all functions, but I think it's easiest to merge and adjust them later.

if (edata.elevel >= ERROR)
UpdateJobRunDetail(task->runId, NULL, GetCronStatus(CRON_STATUS_FAILED), display_msg.data, NULL, &end_time);
else
UpdateJobRunDetail(task->runId, NULL, GetCronStatus(CRON_STATUS_SUCCEEDED), display_msg.data, NULL, &end_time);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of neat to see the notices appear in the table. They do get overwritten once the task is done. Is that intentional? (I don't mind this behaviour)

ereport(WARNING,
(errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
errmsg("out of background worker slots"),
errhint("You might need to increase max_worker_processes.")));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason to not put this relevant info in the errorMessage?

Copy link
Member

@marcocitus marcocitus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all seems fine and I'm happy to merge as is and address remaining comments myself.

I do want to confirm: Is the current warning/notice behaviour as intended?

@bdrouvotAWS
Copy link
Collaborator Author

Hi Marco,

Thanks for the review!
The intention was to lay the foundation to display intermediate feedback (specially for long running tasks).
But as it is right now, I agree that we can update job_run_details only when elevel >= ERROR.

Bertrand

@marcocitus
Copy link
Member

marcocitus commented Sep 15, 2020

Found some compilation issues on PG13:

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -ggdb -Og -g3 -fno-omit-frame-pointer -fPIC -std=c99 -Wall -Wextra -Werror -Wno-unused-parameter -Wno-implicit-fallthrough -Iinclude -I/home/marco/pgsql-13/include -I. -I./ -I/home/marco/pgsql-13/include/server -I/home/marco/pgsql-13/include/internal  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o src/pg_cron.o src/pg_cron.c -MMD -MP -MF .deps/pg_cron.Po
src/pg_cron.c: In function ‘ExecuteSqlString’:
src/pg_cron.c:2000:4: error: too many arguments to function ‘set_ps_display’
 2000 |    set_ps_display(GetCommandTagName(commandTag), false);
      |    ^~~~~~~~~~~~~~
In file included from src/pg_cron.c:72:
/home/marco/pgsql-13/include/server/utils/ps_status.h:21:13: note: declared here
   21 | extern void set_ps_display(const char *activity);
      |             ^~~~~~~~~~~~~~
src/pg_cron.c:2026:54: error: passing argument 3 of ‘pg_plan_queries’ makes integer from pointer without a cast [-Werror=int-conversion]
 2026 |   plantree_list = pg_plan_queries(querytree_list, 0, NULL);
      |                                                      ^~~~
      |                                                      |
      |                                                      void *
In file included from /home/marco/pgsql-13/include/server/tcop/utility.h:18,
                 from src/pg_cron.c:82:
/home/marco/pgsql-13/include/server/tcop/tcopprot.h:58:14: note: expected ‘int’ but argument is of type ‘void *’
   58 | extern List *pg_plan_queries(List *querytrees, const char *query_string,
      |              ^~~~~~~~~~~~~~~
src/pg_cron.c:2026:19: error: too few arguments to function ‘pg_plan_queries’
 2026 |   plantree_list = pg_plan_queries(querytree_list, 0, NULL);
      |                   ^~~~~~~~~~~~~~~
In file included from /home/marco/pgsql-13/include/server/tcop/utility.h:18,
                 from src/pg_cron.c:82:
/home/marco/pgsql-13/include/server/tcop/tcopprot.h:58:14: note: declared here
   58 | extern List *pg_plan_queries(List *querytrees, const char *query_string,
      |              ^~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make: *** [/home/marco/pgsql-13/lib/pgxs/src/makefiles/../../src/Makefile.global:920: src/pg_cron.o] Error 1

Will merge into a separate branch to fix.

@marcocitus marcocitus changed the base branch from master to bgw_audit_patch September 15, 2020 08:51
@marcocitus marcocitus merged commit 5f1c56d into citusdata:bgw_audit_patch Sep 15, 2020
@marcocitus
Copy link
Member

marcocitus commented Sep 15, 2020

Other issue I'll try to fix is to degrade more gracefully when the schema is outdated. pg_cron starts reading from the metadata as soon as the server starts, so even if the user knows to do ALTER EXTENSION pg_cron UPDATE immediately after installing new binaries, it might crash and burn before then.

@rauanmayemir
Copy link

These are exciting changes, when can a new release be expected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants