Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/support singlenode mode #77

Closed
wants to merge 10 commits into from

Conversation

wfnuser
Copy link
Contributor

@wfnuser wfnuser commented Jul 24, 2023


Close #69

Change logs

This pr introduces a new deploy mode: singlenode mode. In singlenode mode, cbdb is deployed as a single node service and we will set the gp_role to 'singlenode'.

We introduce a bunch of new tests and scripts for singlenode in this pr.
For the core part, we branching the logic carefully by using the flag of gp_role. These changes are scattered in the whole project.

We also cherry-pick two bug-fix commits from upstream. Since our test for cbdb singlenode is based on a bit newer PG14 than cbdb v1.0 was rebased from.

Why are the changes needed?

CBDB users need singlenode deploy mode.

Does this PR introduce any user-facing change?

Yes. Now our users can deploy cbdb in single node. You can create a demo service by using the following command:

make create-demo-singlenode

More details can be referenced in commit messages.

How was this patch tested?

Pass ICW test in cluster mode.
Pass 'make installcheck-singlenode' in single node mode. It runs singlenode mode tests we introduced in this pr.

Contributor's Checklist

Here are some reminders before you submit the pull request:

  • Document changes
  • Communicate in the GitHub Issues or Discussions (list them if needed)
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel

@CLAassistant
Copy link

CLAassistant commented Jul 24, 2023

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ wfnuser
❌ tglsfdc
You have signed the CLA already but the status is still pending? Let us recheck it.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hiiii, @wfnuser welcome!🎊 Thanks for taking the effort to make our project better! 🙌 Keep making such awesome contributions!

@gfphoenix78
Copy link
Collaborator

What's the difference from singlenode and utility mode? They all look like running database like a PG.

@wfnuser
Copy link
Contributor Author

wfnuser commented Jul 25, 2023

What's the difference from singlenode and utility mode? They all look like running database like a PG.

@gfphoenix78 Let's break your question into two levels: user side and developer side.

From user's perspective, the single node mode is necessary because there are nature differences between utility need and singlenode deployment need.
For example, utility mode should not support resource queue in the original way when we connect to a QE node in a CBDB cluster. But as a way to deploy CBDB, we definitely should support it in single node mode.
And if you create a table while using utility mode to connect a QE of a cluster, you will find troubles to create the table with same name again on normal node. But if utility also take the responsibility of deployment mode, it will never happen.
Furthermore, features like authority is useless in utility mode but necessary in singlenode mode.
These are the nature difference between a way to operation and a way to deployment as a single node service. To sum up, we need a singlenode role.

From developer's perspective, it's true that utility should support more functions. However, when GPDB started developing, it never foreseed that in some day we need singlenode mode to support feature like gpfdist, parallel, autovacuum, resource queue(maybe) and so on. So it's not easy to make all of this feature supported in utility mode now, like we never prepare the parallel dsm and resource queue space for utility mode. It's more easy to keep the original behavior of utility mode and just make a new branch for singlenode mode for the founded broken cases.
However, in the future we might catch up some feature should be supported by utility mode too.

And for now, let me add some comments on each part that gp_role_singlenode and gp_role_utility behave differently. I will supplement a list below later too.

@wfnuser
Copy link
Contributor Author

wfnuser commented Jul 25, 2023

I supplemented some comments about the part where gp_role_singlenode and gp_role_utilty is different int my code.

  • Some of them are temporary difference. We just branch it to keep singlenode and cluster behave same to pass some test cases. I don't have enough time to go through every details. In these cases, I have no reason to change the behavior in utility mode.
  • Some of them are real difference. Like we should have grant logic in singlenode but never in utility mode.

I have not written comments for every place that behave differences, if they have similar reasons. For example, the support for resource queue in utility mode is yet to be determined, including its semantic specifications. So we have behavior difference for resource queue in singlenode and utility mode. And I only add some comment about it in some part of the related code.

And for the part already has some comment, I didn't list it here.

diff --git a/gpcontrib/gp_exttable_fdw/extaccess.c b/gpcontrib/gp_exttable_fdw/extaccess.c
index d0fe7e31f0..e096551b55 100644
--- a/gpcontrib/gp_exttable_fdw/extaccess.c
+++ b/gpcontrib/gp_exttable_fdw/extaccess.c
@@ -192,6 +192,12 @@ external_beginscan(Relation relation, uint32 scancounter,
 				uri = (char *) strVal(v);
 		}
 	}
+	/*
+	 * SINGLENODE_FIXME:
+	 * In utility mode, it cannot pass "at least one entry and one segment database" check.
+	 * In singlenode mode, we have one entry database. We might support it in utility mode
+	 * in the future too.
+	 */
 	else if ((Gp_role == GP_ROLE_DISPATCH && isMasterOnly) || Gp_role == GP_ROLE_SINGLENODE)
 	{
 		/* this is a ON COORDINATOR table. Only get uri if we are the master */
diff --git a/src/backend/access/table/table.c b/src/backend/access/table/table.c
index 77b5fcda68..7da1096443 100644
--- a/src/backend/access/table/table.c
+++ b/src/backend/access/table/table.c
@@ -209,6 +209,17 @@ CdbTryOpenTable(Oid relid, LOCKMODE reqmode, bool *lockUpgraded)
 	 * update or delete does not hold locks on catalog on segments, so
 	 * we do not need to consider lock-upgrade for DML on catalogs.
 	 */
+	/*
+	 * SINGLENODE_FIXME:
+	 * Since we don't need to dispatch in singlenode, we don't need to keep
+	 * the ExclusiveLock in singlenode mode.
+	 * However, it will cause behavior difference in `parallel_update_readcommitted`.
+	 * The second process in that case want to update the table is blocked but there is
+	 * `duplicate key value violates unique constraint "pg_aovisimap_95435_index"`
+	 * error after the first process commit.
+	 * Need to figure it out later. Currently, let's just use the stronger
+	 * lock as cbdb does.
+	 */
 	if (reqmode == RowExclusiveLock &&
 		(Gp_role == GP_ROLE_DISPATCH || Gp_role == GP_ROLE_SINGLENODE) &&
 		relid >= FirstNormalObjectId)
diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c
index f220a9dbe7..adc5b7ab58 100644
--- a/src/backend/catalog/aclchk.c
+++ b/src/backend/catalog/aclchk.c
@@ -610,6 +610,7 @@ ExecuteGrantStmt(GrantStmt *stmt)
 	 * To make that a little bit less confusing, emit a NOTICE, when
 	 * REVOKE find no permissions to remove.
 	 */
+	/* SINGLENODE: apparently, we don't need grant in utility mode. */
 	if (!revoked_something && !stmt->is_grant && (Gp_role == GP_ROLE_DISPATCH || Gp_role == GP_ROLE_SINGLENODE))
 	{
 		ereport(NOTICE,
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 51d549563d..1f65adb3f2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -3123,6 +3123,7 @@ index_build(Relation heapRelation,
 	 * Note that planner considers parallel safety for us.
 	 *
 	 * SINGLENODE_FIXME: disable parallel index building for now.
+	 * And consider to enable parallel in utility mode too.
 	 */
 	if (parallel && GP_ROLE_SINGLENODE != Gp_role && IsNormalProcessingMode() &&
 		indexRelation->rd_rel->relam == BTREE_AM_OID)
diff --git a/src/backend/cdb/cdbutil.c b/src/backend/cdb/cdbutil.c
index 15e82960f2..617765a46d 100644
--- a/src/backend/cdb/cdbutil.c
+++ b/src/backend/cdb/cdbutil.c
@@ -433,6 +433,9 @@ getCdbComponentInfo(void)
 	 * Validate that there exists at least one entry and one segment database
 	 * in the configuration
 	 */
+	/*
+	 * In singlenode deployment, total_segment_dbs is zero and it should still work.
+	 */
 	if (component_databases->total_segment_dbs == 0 && Gp_role != GP_ROLE_SINGLENODE)
 	{
 		ereport(ERROR,
@@ -449,6 +452,10 @@ getCdbComponentInfo(void)
 	/*
 	 * Now sort the data by segindex, isprimary desc
 	 */
+	/*
+	 * In singlenode mode, the total_segment_dbs is always zero. So we can't and don't need to
+	 * sort compoenent_databases.
+	 */
 	if (Gp_role != GP_ROLE_SINGLENODE)
 	{
 		qsort(component_databases->segment_db_info,
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 15f132bed8..030d6f4572 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1585,6 +1585,7 @@ CopyFrom(CopyFromState cstate)
 			ReportSrehResults(cstate->cdbsreh, total_rejected);
 		}
 	}
+	/* In dispatcher, we have the report for rejected count. We should have it in singlenode too. */
 	else if (Gp_role == GP_ROLE_SINGLENODE && cstate->cdbsreh)
 	{
 		ReportSrehResults(cstate->cdbsreh, cstate->cdbsreh->rejectcount);
diff --git a/src/backend/commands/exttablecmds.c b/src/backend/commands/exttablecmds.c
index 235ae44cb5..dd7d78fb61 100644
--- a/src/backend/commands/exttablecmds.c
+++ b/src/backend/commands/exttablecmds.c
@@ -487,6 +487,11 @@ transformExecOnClause(List *on_clause)
 	ListCell   *exec_location_opt;
 	char	   *exec_location_str = NULL;
 
+	/*
+	 * In single node, we only have one node which behave like QD in many aspects.
+	 * And set exec_location_str as "COORDINATOR_ONLY" will force the execution be
+	 * execute on the only node in singlenode mode, which is what we want.
+	 */
 	if (Gp_role == GP_ROLE_SINGLENODE)
 		exec_location_str = "COORDINATOR_ONLY";
 	else if (on_clause == NIL)
diff --git a/src/backend/commands/queue.c b/src/backend/commands/queue.c
index d85562a977..4e31350479 100644
--- a/src/backend/commands/queue.c
+++ b/src/backend/commands/queue.c
@@ -938,6 +938,10 @@ CreateQueue(CreateQueueStmt *stmt)
 	 * Create the in-memory resource queue, if resource scheduling is on,
 	 * otherwise don't - and gripe a little about it.
 	 */
+	/* 
+	 * SINGLENODE_FIXME: Do we need support resource queue for utility mode. If so,
+	 * what is the semantic? Should it be a global shared one?
+	 */
 	if (Gp_role == GP_ROLE_DISPATCH || Gp_role == GP_ROLE_SINGLENODE)
 	{
 		if (IsResQueueEnabled())
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2084376bee..778db794d1 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2844,6 +2844,11 @@ MergeAttributes(List *schema, List *supers, char relpersistence,
 		 * current transaction, such as being used in some manner by an
 		 * enclosing command.
 		 */
+		/*
+		 * SINGLENODE_FIXME: We should reconsider the check for singlenode mode,
+		 * I didn't have time to go through all of these details. Currently, let's
+		 * keep the same behavior as CBDB.
+		 */
 		if (is_partition && (Gp_role != GP_ROLE_DISPATCH && Gp_role != GP_ROLE_SINGLENODE))
 			CheckTableNotInUse(relation, "CREATE TABLE .. PARTITION OF or ALTER TABLE ADD PARTITION"); 
 
diff --git a/src/backend/commands/tablecmds_gp.c b/src/backend/commands/tablecmds_gp.c
index 902e07a15c..83e49b659a 100644
--- a/src/backend/commands/tablecmds_gp.c
+++ b/src/backend/commands/tablecmds_gp.c
@@ -1477,6 +1477,7 @@ ATExecGPPartCmds(Relation origrel, AlterTableCmd *cmd)
 	 * pg_stat_last_operation for GPDB specific alter partition 
 	 * commands. 
 	 */
+	/* In singlenode mode, we should track the same metadata as cluster. */
 	if (Gp_role == GP_ROLE_DISPATCH || Gp_role == GP_ROLE_SINGLENODE)
 		GpAlterPartMetaTrackUpdObject(RelationGetRelid(rel), cmd->subtype);
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 625c5be6d0..73d35e26ba 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -5626,6 +5626,10 @@ ExistingIndex:   USING INDEX name					{ $$ = $3; }
 
 DistributedBy:   DISTRIBUTED BY  '(' distributed_by_list ')'
 			{
+				/*
+				 * We only forbid it in singlenode so that the isolation test using utility mode can
+				 * accept DISTRIBUTED BY grammer as before.
+				 */
 				if (GP_ROLE_SINGLENODE == Gp_role)
 					ereport(ERROR,
 							(errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index bcace98e92..7f397a5c0a 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -683,6 +683,13 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
 												   ALLOCSET_DEFAULT_INITSIZE,
 												   ALLOCSET_DEFAULT_MAXSIZE);
 
+	/*
+	 * SINGLENODE_FIXME: we should enable orca in the future.
+	 * It's necessary to init orca optimizer even we don't allow orca to optimizer
+	 * any query yet. That's because there is some case need to set some orca related
+	 * parameter but I can't recall the detail.
+	 * We will bring it back anyway, let's just keep the change here.
+	 */
 	if (!bootstrap && (Gp_role == GP_ROLE_DISPATCH || Gp_role == GP_ROLE_SINGLENODE))
 		InitGPOPT();
 #endif

@wfnuser wfnuser force-pushed the feat/support-singlenode-mode branch 3 times, most recently from 060b9b1 to 72d0914 Compare July 26, 2023 10:46
@wfnuser
Copy link
Contributor Author

wfnuser commented Jul 27, 2023

#99 #100 #92 are some issues @yjhjstz and @HuSen8891 have mentioned. We will solve them in another pr.

@wfnuser wfnuser force-pushed the feat/support-singlenode-mode branch from 72d0914 to 280df96 Compare July 27, 2023 06:22
@wfnuser
Copy link
Contributor Author

wfnuser commented Jul 27, 2023

After discussion, @yjhjstz will review parallel_schedule part changes, @gfphoenix78 will review greenplum_schedule part changes and @HuSen8891 will review isolaiton2 part changes. Thanks a lot for your time.

@wfnuser wfnuser force-pushed the feat/support-singlenode-mode branch 2 times, most recently from 8c80815 to 6848942 Compare July 27, 2023 07:00
@yjhjstz
Copy link
Collaborator

yjhjstz commented Jul 27, 2023

to support DISTRIBUTED BY (id) , we need to treat it as DISTRIBUTED REPLICATED , So users don't need to change sql or code.

@wfnuser
Copy link
Contributor Author

wfnuser commented Jul 27, 2023

to support DISTRIBUTED BY (id) , we need to treat it as DISTRIBUTED REPLICATED , So users don't need to change sql or code.

Actually the nature way is to treat it as entrydb, let me discuss with @my-ship-it again.

After discussion, we decide to support distributed by clause. Let me change the code. There is some work need to be done.

@yjhjstz And fixed it on July 31st. About the seperated tests, in greenplum testing, there exists many features that are supported by singlenode but not in cluster. Furthermore, there exists some SQL using fault_inject(xxx, {dbid not exists in singlenode mode}) and many specific tests only expected working in cluster. Thus, the differences are inevitable in .sql file.
Unless we redesign the regress file, and updated all the SQL by some branching logic to control variable like dbid. We can do that in the future. Currently, let's just seperate the test cases.

@wfnuser wfnuser force-pushed the feat/support-singlenode-mode branch 3 times, most recently from ecda88e to 75391e9 Compare August 1, 2023 03:30
@wfnuser wfnuser force-pushed the feat/support-singlenode-mode branch 2 times, most recently from dfbb1ab to 00ff2c8 Compare August 1, 2023 09:08
@wfnuser wfnuser force-pushed the feat/support-singlenode-mode branch from 00ff2c8 to 44c84a9 Compare August 3, 2023 02:12
@wfnuser wfnuser force-pushed the feat/support-singlenode-mode branch 2 times, most recently from 6653f4e to 2ef0eff Compare August 18, 2023 07:59
@wfnuser
Copy link
Contributor Author

wfnuser commented Aug 18, 2023

Waiting for the merge. @my-ship-it

Copy link
Collaborator

@avamingli avamingli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this pr is ready to merge. I have a quick test and it shows:

gpstop --singlenodeMode clouldn't stop standby, left dangling processes

make create-demo-singlenode
gpstop --singlenodeMode

 pp
gpadmin    20358       1  0 08:46 ?        00:00:00 /home/gpadmin/install/gp9/bin/postgres -D /home/gpadmin/dbhouse/gp9/standby -p 9001 -c gp_role=dispatch
gpadmin    20359   20358  0 08:46 ?        00:00:00 postgres:  9001, master logger process
gpadmin    20360   20358  0 08:46 ?        00:00:00 postgres:  9001, startup recovering 000000010000000000000003
gpadmin    20361   20358  0 08:46 ?        00:00:00 postgres:  9001, checkpointer
gpadmin    20362   20358  0 08:46 ?        00:00:00 postgres:  9001, background writer
gpadmin    21028   18606  0 08:50 pts/2    00:00:00 grep post

make destroy-demo-singlenode couldn't stop standby too

Hot_standy can't handle querys

make create-demo-singlenode 
gpconfig -c hot_standby -v on
 20230821:08:55:52:025769 gpconfig:i-v3roa8c2:gpadmin-[INFO]:-completed successfully with parameters '-c 
 hot_standby -v on'
gpstop -u
 psql -p 9001
psql: error: connection to server on socket "/tmp/.s.PGSQL.9001" failed: FATAL:  the database system is in recovery mode
DETAIL:  last replayed record at 0/C000060
- VERSION: PostgreSQL 14.4 (Cloudberry Database 1.0.0+2ef0effbc1 build dev) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0, 64-bit compiled on Aug 21 2023 08:44:56 (with assert checking)

src/test/regress/pg_regress.c Outdated Show resolved Hide resolved
src/backend/catalog/index.c Outdated Show resolved Hide resolved
src/backend/catalog/aclchk.c Outdated Show resolved Hide resolved
@wfnuser
Copy link
Contributor Author

wfnuser commented Aug 21, 2023

Let me check standby logic.

@wfnuser wfnuser dismissed stale reviews from yjhjstz and HuSen8891 via aec9c15 August 21, 2023 06:23
@wfnuser wfnuser force-pushed the feat/support-singlenode-mode branch 2 times, most recently from aec9c15 to 3ce6a71 Compare August 21, 2023 10:05
@wfnuser
Copy link
Contributor Author

wfnuser commented Aug 21, 2023

Now gpstop and make destroy-demo-singlenode can stop standby node. I didn't notice that before, thanks for mention it. @avamingli
And I'm starting working on the hot standby part. I need to understand the relevant feature first.

@wfnuser
Copy link
Contributor Author

wfnuser commented Aug 21, 2023

39be1fe8-cae8-417c-b075-9a6831a066ad
It seems that there is some regression in main branch, I haven't meet this problem before I rebase recently. And I saw this issue in some recent failure CI tasks on branches beside mine.
@my-ship-it @avamingli

@wfnuser
Copy link
Contributor Author

wfnuser commented Aug 22, 2023

Hot_standy can't handle querys

This problems seems not caused by my branch. I checked on commit 9979adb (HEAD) and @lss602726449 checked on commit 4a9c401 (HEAD -> main, origin/main, origin/HEAD, anti_semi_join). Both have the same issue.

tglsfdc and others added 10 commits August 22, 2023 10:51
check_agg_arguments_walker() supposed that it needn't descend into
the arguments of a lower-level aggregate function, but this is
just wrong in the presence of multiple levels of sub-select.  The
oversight would lead to executor failures on queries that should
be rejected.  (Prior to v11, they actually were rejected, thanks
to a "redundant" execution-time check.)

Per bug #17835 from Anban Company.  Back-patch to all supported
branches.

Discussion: https://postgr.es/m/17835-4f29f3098b2d0ba4@postgresql.org
Commit 3e310d837 taught isAssignmentIndirectionExpr() to look through
CoerceToDomain nodes.  That's not sufficient, because since commit
04fe805 it's been possible for the planner to simplify
CoerceToDomain to RelabelType when the domain has no constraints
to enforce.  So we need to look through RelabelType too.

Per bug #17897 from Alexander Lakhin.  Although 3e310d837 was
back-patched to v11, it seems sufficient to apply this change
to v12 and later, since 04fe805 came in in v12.

Dmitry Dolgov

Discussion: https://postgr.es/m/17897-4216c546c3874044@postgresql.org
After we cherry-pick b4f0e8db26 and 4027f6cda5 from pg, we should add
some related cases in *_optimizer.out.
We're going to enable single-node mode in cbdb, so we need to rewrite
a bunch of code tests prepared for cbdb cluster mode.

After careful consideration, we decide to use the postgres test code
as the base code since the behavior of single-node mode is more similar
to postgres than cluster mode.

In this commit we port all the postgres parallel_schedule related tests
before pg 14.4 to cbdb code base. We create a new folder singlenode_regress
for the singlenode mode tests and update the makefile to reuse the
regression scripts.

The main reason that we seperate these jobs to a independent commit is
to make the code review easier. So that you can find out what we
changed in the test files. And we're gonna to do that in the following
several commits.
For the same reason as the last commit(cadfa1393a0) has, we need to
prepare tests for parallel_schedule in single-node mode too. And we
copy the base tests from cbdb. The most recent commit at the time of
copying was 5e85de8.
In the most cases, single node mode behave more similar to postgres
than cbdb cluster. But there are some features we're not support in
cbdb (like parallel operators are not supported yet when I started
doing this job). And there are so many differences in cbdb and postgres
optimizer module, so the plan selected is different in many cases.

When I deal with these differences, I just either follow the way pg
does or cbdb cluster mode does or cbdb utility mode does, so that I can
keep the change minimum.
Since cases in greenplum_schedule are designed to test features
provided in greenplum, and many of those features are natually not
supported in single node mode. So we only select a few of those tests
to run.

And in this commit, we select 75 cases of the  greenplum_schedule and
make them pass.
In single node, many cases will have problem in parallel retrieve
cursor and resgroup schedule, so we currently only port tests in
parallel_schedule for single node.

And to avoid copy more code from test/isolation2, we change the
Makefile and reuse the binary in isolation2 by soft link.
Pass all the selected isolation2_schedule tests in single node mode.
Before this commit, CBDB only supports distributed deployment.
We enable singlenode mode deployment now.

You can create demo singlenode CBDB by `make create-demo-singlenode`.

And we also introduce a new gp_role: `singlenode` to handle the code
divergence in singlenode mode and cluster mode. In the most cases,
singlenode behave similar to utility mode, but there are some scattered
differences. Overall, the code branching is deteremined by gp_role
carefully.

In addition to the core changes in CBDB, I also change several scripts
for CBDB control purpose. You can use tools like `gpstart`, `gpstop`
with a new parameter `--singlenodeMode`.

The scripts changes are all about ignoring QE related parts. I use
singlenode flag to control if we should do the works related with QE.
@wfnuser wfnuser force-pushed the feat/support-singlenode-mode branch from 3ce6a71 to 65ff828 Compare August 22, 2023 02:51
@wfnuser
Copy link
Contributor Author

wfnuser commented Aug 22, 2023

There is a known result diff (caused by gp_warehouse) in singlenode regression after the recent rebase. I'd prefer not to repeatedly address these issues until the review process is nearing completion. Instead, I'll rebase once more and address all of them collectively.

@avamingli
Copy link
Collaborator

Server crash gp_role=singlenode login segment primary in normal cluster.

gpadmin=# select * from gp_segment_configuration ;
 dbid | content | role | preferred_role | mode | status | port |  hostname  |  address   |                        datadir                        | warehouseid
------+---------+------+----------------+------+--------+------+------------+------------+-------------------------------------------------------+-------------
    1 |      -1 | p    | p              | n    | u      | 9000 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhouse/gp9/qddir/demoDataDir-1         |           0
    8 |      -1 | m    | m              | s    | u      | 9001 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhous
e/gp9/standby                     |           0
    2 |       0 | p    | p              | s    | u      | 9002 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhous
e/gp9/dbfast1/demoDataDir0        |           0
    5 |       0 | m    | m              | s    | u      | 9005 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhous
e/gp9/dbfast_mirror1/demoDataDir0 |           0
    3 |       1 | p    | p              | s    | u      | 9003 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhous
e/gp9/dbfast2/demoDataDir1        |           0
    6 |       1 | m    | m              | s    | u      | 9006 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhous
e/gp9/dbfast_mirror2/demoDataDir1 |           0
    4 |       2 | p    | p              | s    | u      | 9004 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhous
e/gp9/dbfast3/demoDataDir2        |           0
    7 |       2 | m    | m              | s    | u      | 9007 | i-v3roa8c2 | i-v3roa8c2 | /home/gpadmin/dbhous
e/gp9/dbfast_mirror3/demoDataDir2 |           0
(8 rows)

gpadmin@i-v3roa8c2:~$ PGOPTIONS='-c gp_role=singlenode' psql -p 9002
psql: error: connection to server on socket "/tmp/.s.PGSQL.9002" failed: FATAL:  Unexpected internal error (assert.c:48)
DETAIL:  FailedAssertion("ResScheduler", File: "resscheduler.c", Line: 200)

4193abee-3dc4-4d3b-ade6-f279e7927db3

while utility mode is ok

gpadmin@i-v3roa8c2:~$ PGOPTIONS='-c gp_role=utility' psql -p 9002
psql (14.4, server 14.4)
Type "help" for help.

@avamingli
Copy link
Collaborator

I didn't take too much time on this, but it doesn't seem right.

make create-demo-singlenode
gpconfig -c hot_standby -v on
gpstop -ar

img_v2_1a042f4a-3ba1-4a75-91c4-2458b19ad9ag

gp_role of primary process is singlenode, but standby process is dispatch, but when login standby, it shows we are utility.

@avamingli
Copy link
Collaborator

Another crash is found.
This pr has too many serious issues, I'm not sure if all of crashes are addressed, the above issues are found by me with some usual tests as utility mode.
It shows high risk without enough tests, I object to merge this version, we need to reconsider this @my-ship-it .

make create-demo-singlenode
gpconfig -c hot_standby -v on
gpstop -ar
gpadmin@i-v3roa8c2:~$ PGOPTIONS='-c gp_role=utility' psql -p 9001
psql (14.4, server 14.4)
Type "help" for help.
gpadmin=# show gp_role;
 gp_role
---------
 utility
(1 row)
gpadmin@i-v3roa8c2:~$ PGOPTIONS='-c gp_role=singlenode' psql -p 9001
psql (14.4, server 14.4)
Type "help" for help.
gpadmin=# show gp_role;
  gp_role
------------
 singlenode
(1 row)
gpadmin=# \d
FATAL:  Unexpected internal error (assert.c:48)
DETAIL:  FailedAssertion("gp_session_id > -1", File: "memquota.c", Line: 970)
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
psql (14.4, server 14.4)
FATAL:  Unexpected internal error (assert.c:48)
DETAIL:  FailedAssertion("gp_session_id > -1", File: "memquota.c", Line: 970)
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
psql (14.4, server 14.4)
FATAL:  Unexpected internal error (assert.c:48)
DETAIL:  FailedAssertion("gp_session_id > -1", File: "memquota.c", Line: 970)
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
psql (14.4, server 14.4)

img_v2_a3eab1fe-0a71-4456-9d27-67b817ed651g

@wfnuser
Copy link
Contributor Author

wfnuser commented Aug 23, 2023

Another crash is found. This pr has too many serious issues, I'm not sure if all of crashes are addressed, the above issues are found by me with some usual tests as utility mode. It shows high risk without enough tests, I object to merge this version, we need to reconsider this @my-ship-it .

Indeed, But I have been already in the process of departing from this company and might not have the necessary enthusiasm to address all of these issues. After conversing with @my-ship-it, someone else will oversee the handling of this pull request. I'll retain my branch within my repository for future reference. For the time being, I will proceed to close this pull request.

Server crash gp_role=singlenode login segment primary in normal cluster.

FYI, the outcome of previous discussion with @my-ship-it regarding the singlenode role is that it shouldn't be utilized to establish connections with any of the cluster nodes. This rule was partially addressed in issue #123 on the other side. I believe that handling this matter could follow a similar approach, although I didn't personally tackle that aspect.

My initial focus for this task was to ensure the successful execution of all designated tests without impacting the cluster mode. Certainly, there are some issues that remain unresolved. Only half of the regression tests have been attended to so far. We'll need to tackle these problems in a series of steps to rectify the situation. This scope was discussed and agreed with @my-ship-it . And I really need to take a breath from it.

However, it does sound more reasonable to resolve all the issues before merging. Please continue with the efforts.

@wfnuser wfnuser closed this Aug 23, 2023
@Ray-Eldath Ray-Eldath mentioned this pull request Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CBDB as a single node service
7 participants