-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/support singlenode mode #77
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hiiii, @wfnuser welcome!🎊 Thanks for taking the effort to make our project better! 🙌 Keep making such awesome contributions!
c45de79
to
c5d89a4
Compare
c5d89a4
to
416b6b3
Compare
What's the difference from |
@gfphoenix78 Let's break your question into two levels: user side and developer side. From user's perspective, the single node mode is necessary because there are nature differences between utility need and singlenode deployment need. From developer's perspective, it's true that utility should support more functions. However, when GPDB started developing, it never foreseed that in some day we need singlenode mode to support feature like gpfdist, parallel, autovacuum, resource queue(maybe) and so on. So it's not easy to make all of this feature supported in utility mode now, like we never prepare the parallel dsm and resource queue space for utility mode. It's more easy to keep the original behavior of utility mode and just make a new branch for singlenode mode for the founded broken cases. And for now, let me add some comments on each part that gp_role_singlenode and gp_role_utility behave differently. I will supplement a list below later too. |
I supplemented some comments about the part where gp_role_singlenode and gp_role_utilty is different int my code.
I have not written comments for every place that behave differences, if they have similar reasons. For example, the support for resource queue in utility mode is yet to be determined, including its semantic specifications. So we have behavior difference for resource queue in singlenode and utility mode. And I only add some comment about it in some part of the related code. And for the part already has some comment, I didn't list it here. diff --git a/gpcontrib/gp_exttable_fdw/extaccess.c b/gpcontrib/gp_exttable_fdw/extaccess.c
index d0fe7e31f0..e096551b55 100644
--- a/gpcontrib/gp_exttable_fdw/extaccess.c
+++ b/gpcontrib/gp_exttable_fdw/extaccess.c
@@ -192,6 +192,12 @@ external_beginscan(Relation relation, uint32 scancounter,
uri = (char *) strVal(v);
}
}
+ /*
+ * SINGLENODE_FIXME:
+ * In utility mode, it cannot pass "at least one entry and one segment database" check.
+ * In singlenode mode, we have one entry database. We might support it in utility mode
+ * in the future too.
+ */
else if ((Gp_role == GP_ROLE_DISPATCH && isMasterOnly) || Gp_role == GP_ROLE_SINGLENODE)
{
/* this is a ON COORDINATOR table. Only get uri if we are the master */
diff --git a/src/backend/access/table/table.c b/src/backend/access/table/table.c
index 77b5fcda68..7da1096443 100644
--- a/src/backend/access/table/table.c
+++ b/src/backend/access/table/table.c
@@ -209,6 +209,17 @@ CdbTryOpenTable(Oid relid, LOCKMODE reqmode, bool *lockUpgraded)
* update or delete does not hold locks on catalog on segments, so
* we do not need to consider lock-upgrade for DML on catalogs.
*/
+ /*
+ * SINGLENODE_FIXME:
+ * Since we don't need to dispatch in singlenode, we don't need to keep
+ * the ExclusiveLock in singlenode mode.
+ * However, it will cause behavior difference in `parallel_update_readcommitted`.
+ * The second process in that case want to update the table is blocked but there is
+ * `duplicate key value violates unique constraint "pg_aovisimap_95435_index"`
+ * error after the first process commit.
+ * Need to figure it out later. Currently, let's just use the stronger
+ * lock as cbdb does.
+ */
if (reqmode == RowExclusiveLock &&
(Gp_role == GP_ROLE_DISPATCH || Gp_role == GP_ROLE_SINGLENODE) &&
relid >= FirstNormalObjectId)
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index f220a9dbe7..adc5b7ab58 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -610,6 +610,7 @@ ExecuteGrantStmt(GrantStmt *stmt)
* To make that a little bit less confusing, emit a NOTICE, when
* REVOKE find no permissions to remove.
*/
+ /* SINGLENODE: apparently, we don't need grant in utility mode. */
if (!revoked_something && !stmt->is_grant && (Gp_role == GP_ROLE_DISPATCH || Gp_role == GP_ROLE_SINGLENODE))
{
ereport(NOTICE,
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 51d549563d..1f65adb3f2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3123,6 +3123,7 @@ index_build(Relation heapRelation,
* Note that planner considers parallel safety for us.
*
* SINGLENODE_FIXME: disable parallel index building for now.
+ * And consider to enable parallel in utility mode too.
*/
if (parallel && GP_ROLE_SINGLENODE != Gp_role && IsNormalProcessingMode() &&
indexRelation->rd_rel->relam == BTREE_AM_OID)
diff --git a/src/backend/cdb/cdbutil.c b/src/backend/cdb/cdbutil.c
index 15e82960f2..617765a46d 100644
--- a/src/backend/cdb/cdbutil.c
+++ b/src/backend/cdb/cdbutil.c
@@ -433,6 +433,9 @@ getCdbComponentInfo(void)
* Validate that there exists at least one entry and one segment database
* in the configuration
*/
+ /*
+ * In singlenode deployment, total_segment_dbs is zero and it should still work.
+ */
if (component_databases->total_segment_dbs == 0 && Gp_role != GP_ROLE_SINGLENODE)
{
ereport(ERROR,
@@ -449,6 +452,10 @@ getCdbComponentInfo(void)
/*
* Now sort the data by segindex, isprimary desc
*/
+ /*
+ * In singlenode mode, the total_segment_dbs is always zero. So we can't and don't need to
+ * sort compoenent_databases.
+ */
if (Gp_role != GP_ROLE_SINGLENODE)
{
qsort(component_databases->segment_db_info,
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 15f132bed8..030d6f4572 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1585,6 +1585,7 @@ CopyFrom(CopyFromState cstate)
ReportSrehResults(cstate->cdbsreh, total_rejected);
}
}
+ /* In dispatcher, we have the report for rejected count. We should have it in singlenode too. */
else if (Gp_role == GP_ROLE_SINGLENODE && cstate->cdbsreh)
{
ReportSrehResults(cstate->cdbsreh, cstate->cdbsreh->rejectcount);
diff --git a/src/backend/commands/exttablecmds.c b/src/backend/commands/exttablecmds.c
index 235ae44cb5..dd7d78fb61 100644
--- a/src/backend/commands/exttablecmds.c
+++ b/src/backend/commands/exttablecmds.c
@@ -487,6 +487,11 @@ transformExecOnClause(List *on_clause)
ListCell *exec_location_opt;
char *exec_location_str = NULL;
+ /*
+ * In single node, we only have one node which behave like QD in many aspects.
+ * And set exec_location_str as "COORDINATOR_ONLY" will force the execution be
+ * execute on the only node in singlenode mode, which is what we want.
+ */
if (Gp_role == GP_ROLE_SINGLENODE)
exec_location_str = "COORDINATOR_ONLY";
else if (on_clause == NIL)
diff --git a/src/backend/commands/queue.c b/src/backend/commands/queue.c
index d85562a977..4e31350479 100644
--- a/src/backend/commands/queue.c
+++ b/src/backend/commands/queue.c
@@ -938,6 +938,10 @@ CreateQueue(CreateQueueStmt *stmt)
* Create the in-memory resource queue, if resource scheduling is on,
* otherwise don't - and gripe a little about it.
*/
+ /*
+ * SINGLENODE_FIXME: Do we need support resource queue for utility mode. If so,
+ * what is the semantic? Should it be a global shared one?
+ */
if (Gp_role == GP_ROLE_DISPATCH || Gp_role == GP_ROLE_SINGLENODE)
{
if (IsResQueueEnabled())
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2084376bee..778db794d1 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2844,6 +2844,11 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
* current transaction, such as being used in some manner by an
* enclosing command.
*/
+ /*
+ * SINGLENODE_FIXME: We should reconsider the check for singlenode mode,
+ * I didn't have time to go through all of these details. Currently, let's
+ * keep the same behavior as CBDB.
+ */
if (is_partition && (Gp_role != GP_ROLE_DISPATCH && Gp_role != GP_ROLE_SINGLENODE))
CheckTableNotInUse(relation, "CREATE TABLE .. PARTITION OF or ALTER TABLE ADD PARTITION");
diff --git a/src/backend/commands/tablecmds_gp.c b/src/backend/commands/tablecmds_gp.c
index 902e07a15c..83e49b659a 100644
--- a/src/backend/commands/tablecmds_gp.c
+++ b/src/backend/commands/tablecmds_gp.c
@@ -1477,6 +1477,7 @@ ATExecGPPartCmds(Relation origrel, AlterTableCmd *cmd)
* pg_stat_last_operation for GPDB specific alter partition
* commands.
*/
+ /* In singlenode mode, we should track the same metadata as cluster. */
if (Gp_role == GP_ROLE_DISPATCH || Gp_role == GP_ROLE_SINGLENODE)
GpAlterPartMetaTrackUpdObject(RelationGetRelid(rel), cmd->subtype);
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 625c5be6d0..73d35e26ba 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -5626,6 +5626,10 @@ ExistingIndex: USING INDEX name { $$ = $3; }
DistributedBy: DISTRIBUTED BY '(' distributed_by_list ')'
{
+ /*
+ * We only forbid it in singlenode so that the isolation test using utility mode can
+ * accept DISTRIBUTED BY grammer as before.
+ */
if (GP_ROLE_SINGLENODE == Gp_role)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index bcace98e92..7f397a5c0a 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -683,6 +683,13 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
ALLOCSET_DEFAULT_INITSIZE,
ALLOCSET_DEFAULT_MAXSIZE);
+ /*
+ * SINGLENODE_FIXME: we should enable orca in the future.
+ * It's necessary to init orca optimizer even we don't allow orca to optimizer
+ * any query yet. That's because there is some case need to set some orca related
+ * parameter but I can't recall the detail.
+ * We will bring it back anyway, let's just keep the change here.
+ */
if (!bootstrap && (Gp_role == GP_ROLE_DISPATCH || Gp_role == GP_ROLE_SINGLENODE))
InitGPOPT();
#endif |
060b9b1
to
72d0914
Compare
#99 #100 #92 are some issues @yjhjstz and @HuSen8891 have mentioned. We will solve them in another pr. |
72d0914
to
280df96
Compare
After discussion, @yjhjstz will review parallel_schedule part changes, @gfphoenix78 will review greenplum_schedule part changes and @HuSen8891 will review isolaiton2 part changes. Thanks a lot for your time. |
8c80815
to
6848942
Compare
to support |
Actually the nature way is to treat it as entrydb, let me discuss with @my-ship-it again. After discussion, we decide to support distributed by clause. Let me change the code. There is some work need to be done. @yjhjstz And fixed it on July 31st. About the seperated tests, in greenplum testing, there exists many features that are supported by singlenode but not in cluster. Furthermore, there exists some SQL using fault_inject(xxx, {dbid not exists in singlenode mode}) and many specific tests only expected working in cluster. Thus, the differences are inevitable in .sql file. |
ecda88e
to
75391e9
Compare
dfbb1ab
to
00ff2c8
Compare
00ff2c8
to
44c84a9
Compare
6653f4e
to
2ef0eff
Compare
Waiting for the merge. @my-ship-it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this pr is ready to merge. I have a quick test and it shows:
gpstop --singlenodeMode clouldn't stop standby, left dangling processes
make create-demo-singlenode
gpstop --singlenodeMode
pp
gpadmin 20358 1 0 08:46 ? 00:00:00 /home/gpadmin/install/gp9/bin/postgres -D /home/gpadmin/dbhouse/gp9/standby -p 9001 -c gp_role=dispatch
gpadmin 20359 20358 0 08:46 ? 00:00:00 postgres: 9001, master logger process
gpadmin 20360 20358 0 08:46 ? 00:00:00 postgres: 9001, startup recovering 000000010000000000000003
gpadmin 20361 20358 0 08:46 ? 00:00:00 postgres: 9001, checkpointer
gpadmin 20362 20358 0 08:46 ? 00:00:00 postgres: 9001, background writer
gpadmin 21028 18606 0 08:50 pts/2 00:00:00 grep post
make destroy-demo-singlenode couldn't stop standby too
Hot_standy can't handle querys
make create-demo-singlenode
gpconfig -c hot_standby -v on
20230821:08:55:52:025769 gpconfig:i-v3roa8c2:gpadmin-[INFO]:-completed successfully with parameters '-c
hot_standby -v on'
gpstop -u
psql -p 9001
psql: error: connection to server on socket "/tmp/.s.PGSQL.9001" failed: FATAL: the database system is in recovery mode
DETAIL: last replayed record at 0/C000060
- VERSION: PostgreSQL 14.4 (Cloudberry Database 1.0.0+2ef0effbc1 build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0, 64-bit compiled on Aug 21 2023 08:44:56 (with assert checking)
Let me check standby logic. |
aec9c15
to
3ce6a71
Compare
Now gpstop and make destroy-demo-singlenode can stop standby node. I didn't notice that before, thanks for mention it. @avamingli |
|
This problems seems not caused by my branch. I checked on commit 9979adb (HEAD) and @lss602726449 checked on commit 4a9c401 (HEAD -> main, origin/main, origin/HEAD, anti_semi_join). Both have the same issue. |
check_agg_arguments_walker() supposed that it needn't descend into the arguments of a lower-level aggregate function, but this is just wrong in the presence of multiple levels of sub-select. The oversight would lead to executor failures on queries that should be rejected. (Prior to v11, they actually were rejected, thanks to a "redundant" execution-time check.) Per bug #17835 from Anban Company. Back-patch to all supported branches. Discussion: https://postgr.es/m/17835-4f29f3098b2d0ba4@postgresql.org
Commit 3e310d837 taught isAssignmentIndirectionExpr() to look through CoerceToDomain nodes. That's not sufficient, because since commit 04fe805 it's been possible for the planner to simplify CoerceToDomain to RelabelType when the domain has no constraints to enforce. So we need to look through RelabelType too. Per bug #17897 from Alexander Lakhin. Although 3e310d837 was back-patched to v11, it seems sufficient to apply this change to v12 and later, since 04fe805 came in in v12. Dmitry Dolgov Discussion: https://postgr.es/m/17897-4216c546c3874044@postgresql.org
After we cherry-pick b4f0e8db26 and 4027f6cda5 from pg, we should add some related cases in *_optimizer.out.
We're going to enable single-node mode in cbdb, so we need to rewrite a bunch of code tests prepared for cbdb cluster mode. After careful consideration, we decide to use the postgres test code as the base code since the behavior of single-node mode is more similar to postgres than cluster mode. In this commit we port all the postgres parallel_schedule related tests before pg 14.4 to cbdb code base. We create a new folder singlenode_regress for the singlenode mode tests and update the makefile to reuse the regression scripts. The main reason that we seperate these jobs to a independent commit is to make the code review easier. So that you can find out what we changed in the test files. And we're gonna to do that in the following several commits.
For the same reason as the last commit(cadfa1393a0) has, we need to prepare tests for parallel_schedule in single-node mode too. And we copy the base tests from cbdb. The most recent commit at the time of copying was 5e85de8.
In the most cases, single node mode behave more similar to postgres than cbdb cluster. But there are some features we're not support in cbdb (like parallel operators are not supported yet when I started doing this job). And there are so many differences in cbdb and postgres optimizer module, so the plan selected is different in many cases. When I deal with these differences, I just either follow the way pg does or cbdb cluster mode does or cbdb utility mode does, so that I can keep the change minimum.
Since cases in greenplum_schedule are designed to test features provided in greenplum, and many of those features are natually not supported in single node mode. So we only select a few of those tests to run. And in this commit, we select 75 cases of the greenplum_schedule and make them pass.
In single node, many cases will have problem in parallel retrieve cursor and resgroup schedule, so we currently only port tests in parallel_schedule for single node. And to avoid copy more code from test/isolation2, we change the Makefile and reuse the binary in isolation2 by soft link.
Pass all the selected isolation2_schedule tests in single node mode.
Before this commit, CBDB only supports distributed deployment. We enable singlenode mode deployment now. You can create demo singlenode CBDB by `make create-demo-singlenode`. And we also introduce a new gp_role: `singlenode` to handle the code divergence in singlenode mode and cluster mode. In the most cases, singlenode behave similar to utility mode, but there are some scattered differences. Overall, the code branching is deteremined by gp_role carefully. In addition to the core changes in CBDB, I also change several scripts for CBDB control purpose. You can use tools like `gpstart`, `gpstop` with a new parameter `--singlenodeMode`. The scripts changes are all about ignoring QE related parts. I use singlenode flag to control if we should do the works related with QE.
3ce6a71
to
65ff828
Compare
There is a known result diff (caused by gp_warehouse) in singlenode regression after the recent rebase. I'd prefer not to repeatedly address these issues until the review process is nearing completion. Instead, I'll rebase once more and address all of them collectively. |
Server crash gp_role=singlenode login segment primary in normal cluster.gpadmin=# select * from gp_segment_configuration ;
dbid | content | role | preferred_role | mode | status | port | hostname | address | datadir | warehouseid
------+---------+------+----------------+------+--------+------+------------+------------+-------------------------------------------------------+-------------
1 | -1 | p | p | n | u | 9000 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhouse/gp9/qddir/demoDataDir-1 | 0
8 | -1 | m | m | s | u | 9001 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhous
e/gp9/standby | 0
2 | 0 | p | p | s | u | 9002 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhous
e/gp9/dbfast1/demoDataDir0 | 0
5 | 0 | m | m | s | u | 9005 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhous
e/gp9/dbfast_mirror1/demoDataDir0 | 0
3 | 1 | p | p | s | u | 9003 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhous
e/gp9/dbfast2/demoDataDir1 | 0
6 | 1 | m | m | s | u | 9006 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhous
e/gp9/dbfast_mirror2/demoDataDir1 | 0
4 | 2 | p | p | s | u | 9004 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhous
e/gp9/dbfast3/demoDataDir2 | 0
7 | 2 | m | m | s | u | 9007 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhous
e/gp9/dbfast_mirror3/demoDataDir2 | 0
(8 rows)
gpadmin@i-v3roa8c2:~$ PGOPTIONS='-c gp_role=singlenode' psql -p 9002
psql: error: connection to server on socket "/tmp/.s.PGSQL.9002" failed: FATAL: Unexpected internal error (assert.c:48)
DETAIL: FailedAssertion("ResScheduler", File: "resscheduler.c", Line: 200) while utility mode is ok gpadmin@i-v3roa8c2:~$ PGOPTIONS='-c gp_role=utility' psql -p 9002
psql (14.4, server 14.4)
Type "help" for help. |
Another crash is found. make create-demo-singlenode
gpconfig -c hot_standby -v on
gpstop -ar
gpadmin@i-v3roa8c2:~$ PGOPTIONS='-c gp_role=utility' psql -p 9001
psql (14.4, server 14.4)
Type "help" for help.
gpadmin=# show gp_role;
gp_role
---------
utility
(1 row)
gpadmin@i-v3roa8c2:~$ PGOPTIONS='-c gp_role=singlenode' psql -p 9001
psql (14.4, server 14.4)
Type "help" for help.
gpadmin=# show gp_role;
gp_role
------------
singlenode
(1 row)
gpadmin=# \d
FATAL: Unexpected internal error (assert.c:48)
DETAIL: FailedAssertion("gp_session_id > -1", File: "memquota.c", Line: 970)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
psql (14.4, server 14.4)
FATAL: Unexpected internal error (assert.c:48)
DETAIL: FailedAssertion("gp_session_id > -1", File: "memquota.c", Line: 970)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
psql (14.4, server 14.4)
FATAL: Unexpected internal error (assert.c:48)
DETAIL: FailedAssertion("gp_session_id > -1", File: "memquota.c", Line: 970)
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
psql (14.4, server 14.4) |
Indeed, But I have been already in the process of departing from this company and might not have the necessary enthusiasm to address all of these issues. After conversing with @my-ship-it, someone else will oversee the handling of this pull request. I'll retain my branch within my repository for future reference. For the time being, I will proceed to close this pull request.
FYI, the outcome of previous discussion with @my-ship-it regarding the singlenode role is that it shouldn't be utilized to establish connections with any of the cluster nodes. This rule was partially addressed in issue #123 on the other side. I believe that handling this matter could follow a similar approach, although I didn't personally tackle that aspect. My initial focus for this task was to ensure the successful execution of all designated tests without impacting the cluster mode. Certainly, there are some issues that remain unresolved. Only half of the regression tests have been attended to so far. We'll need to tackle these problems in a series of steps to rectify the situation. This scope was discussed and agreed with @my-ship-it . And I really need to take a breath from it. However, it does sound more reasonable to resolve all the issues before merging. Please continue with the efforts. |
Close #69
Change logs
This pr introduces a new deploy mode: singlenode mode. In singlenode mode, cbdb is deployed as a single node service and we will set the gp_role to 'singlenode'.
We introduce a bunch of new tests and scripts for singlenode in this pr.
For the core part, we branching the logic carefully by using the flag of gp_role. These changes are scattered in the whole project.
We also cherry-pick two bug-fix commits from upstream. Since our test for cbdb singlenode is based on a bit newer PG14 than cbdb v1.0 was rebased from.
Why are the changes needed?
CBDB users need singlenode deploy mode.
Does this PR introduce any user-facing change?
Yes. Now our users can deploy cbdb in single node. You can create a demo service by using the following command:
More details can be referenced in commit messages.
How was this patch tested?
Pass ICW test in cluster mode.
Pass 'make installcheck-singlenode' in single node mode. It runs singlenode mode tests we introduced in this pr.
Contributor's Checklist
Here are some reminders before you submit the pull request:
make installcheck
make -C src/test installcheck-cbdb-parallel