Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry-pick Resgroup related code from GreenPlum [Mar 2, 2022 - Feb 7, 2023] #448

Merged
merged 46 commits into from
Jun 24, 2024

Conversation

foreyes
Copy link
Collaborator

@foreyes foreyes commented May 24, 2024

Normal Conflict:

  • Backport PARALLEL RETRIEVE CURSOR changes
  • Add GUC gp_log_endpoints to print endpoints information to server log
  • recalculate QE's query_mem proportionally (#13160)
  • [7X] Feat: Find the pids of overflowed subtransaction (#13992)
  • Add a GUC to control the output of suboverflow transaction sql statement (#14019)
  • Add a GUC to display create gang time while executing statements
  • Order items in sync/unsync_guc_name.h

Code file address changed:

  • CdbComponentDatabaseInfo: add an active segdb list
  • Add gp_backend_info() for runtime introspection/debugging
  • Use hostname instead of ip to compute the host_segments.
  • Support both unicast and wildcard address binding

Skipped:

  • INSERT/COPY on AO/CO tables with unique indexes
  • ao/co: Retire dev guc hiding unique index feature

@CLAassistant
Copy link

CLAassistant commented May 24, 2024

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 22 committers have signed the CLA.

✅ foreyes
❌ dgkimura
❌ divyeshddv
❌ adam8157
❌ ashwinstar
❌ huansong
❌ jchampio
❌ SmartKeyerror
❌ kainwen
❌ thedanhoffman
❌ Tao-T
❌ Aegeaner
❌ soumyadeep2007
❌ dreamedcheng
❌ fairyfar
❌ dh-cloud
❌ yaowangm
❌ gpopt
❌ zxuejing
❌ charliettxx
❌ yanwr1
❌ airfan1994
You have signed the CLA already but the status is still pending? Let us recheck it.

@foreyes foreyes force-pushed the cherry-pick-gp branch 5 times, most recently from 00835ec to 634fd0a Compare May 31, 2024 03:05
@foreyes foreyes force-pushed the cherry-pick-gp branch 2 times, most recently from 9df5076 to afe910e Compare June 18, 2024 01:46
@foreyes foreyes changed the title [WIP] Cherry-pick GUC related code base from GreenPlum [Mar 8, 2022 - Nov 25, 2022] Cherry-pick Resgroup related code from GreenPlum [Mar 2, 2022 - Feb 7, 2023] Jun 18, 2024
ashwinstar and others added 21 commits June 21, 2024 15:06
FATAL "writer segworker group shared snapshot collision" happens when
gp_vmem_idle_time reached, the QD will clean the idle writer and
reader gang and close the connection to the QE, QE will quit in an
async way. QD processes remain. If QE cannot quit before QD starts a
new command, it will find the same session id in the shared snapshot
and collision will happen. QE session quit may take time due to
ProcArrayLock contention.

Hence, this commit only cleans up reader gangs and not writer gang
during idle cleanup session timeout. This way no need to remove and
readd shared snapshot slot on QEs and hence avoids the collision
possibility.

(cherry picked from commit cc58ac6afec2587ae7afb489f59fc7c1d1949325)
These changes are back ported from 6X_STABLE branch, other than refining code
and words, the names of UDFs are changed:

```
pg_catalog.gp_endpoints() -> pg_catalog.gp_get_endpoints()
pg_catalog.gp_segment_endpoints() -> pg_catalog.gp_get_segment_endpoints()
pg_catalog.gp_session_endpoints() -> pg_catalog.gp_get_session_endpoints()
```

And views are created for convenience:

```
CREATE VIEW pg_catalog.gp_endpoints AS
    SELECT * FROM pg_catalog.gp_get_endpoints();

CREATE VIEW pg_catalog.gp_segment_endpoints AS
    SELECT * FROM pg_catalog.gp_get_segment_endpoints();

CREATE VIEW pg_catalog.gp_session_endpoints AS
    SELECT * FROM pg_catalog.gp_get_session_endpoints();
```

Co-Authored-By: Jian Guo <gjian@vmware.com>
Co-Authored-By: Xuejing Zhao <zxuejing@vmware.com>
In GPDB, we do not allow users to use PREPARE TRANSACTION in regular
and utility-mode connections to prevent any conflicts/issues with
GPDB's distributed transaction manager that heavily utilizes two-phase
commit. As part of the Postgres 10 merge into GPDB, a regression was
introduced that allowed PREPARE TRANSACTION to be run in utility-mode
connections. The error check was being bypassed because the
TransactionStmt was not being properly obtained. The cause of this was
due to an upstream Postgres refactor that introduced RawStmt which
would wrap the TransactionStmt so the TransactionStmt typecast was
being done on the wrong parse node (needs to be done on the
RawStmt->stmt). Added simple regression test to make sure this
regression doesn't occur again from future Postgres merges. Also
disable some recovery TAP tests which use PREPARE TRANSACTION in
utility-mode connections.

Postgres commit reference (RawStmt refactor):
postgres/postgres@ab1f0c8
Use optimizer_enable_nljoin to disable all xforms that produce nestloop
join alternatives.

Co-authored-by: Orhan Kislal <okislal@vmware.com>
- Assert that interconnect_address is always set, in order to get rid of
  conditional code
- Remove AI_PASSIVE flag from socket setup functions (it was being
  ignored anyway as we always pass a unicast address to getaddrinfo)
- Comment cleanup
- Added a regress test for checking motion socket creation

Co-authored-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Add a new state and corresponding error message for RESET and let FTS ignores
it when it detects primary down.

Detailed rationale behind the change: this RESET period is when a primary crashes
but have not yet started recovery. Normally this is a short period but we've
seen cases where the primary's postmaster waits a long time (40 to 50 seconds)
for backends to exit. Because previously PM would send "in recovery" response to FTS
during that time, and FTS sense no recovery progress, it would panic and issue
failover. Now we just let FTS ignore that state. We could add a new FTS timeout
to guard against primary being stuck waiting in that state, but we think it
should be very rare so we aren't doing that until we see a need.
There's a 5-second timeout `SIGKILL_CHILDREN_AFTER_SECS` on the
PM side, after which PM will send `SIGKILL` to its children.

Also make the new mode be respected by certain retry mechanism, such as in the isolation2
framework and the segment_failure_due_to_recovery().
In the current resource group implementation, query_mem in the plan tree is calculated using
QD's system memory and the number of primary segments, not QE's own system memory and
the number of primary segments. This can result in the wrong memory being allocated at the
execution stage eventually, which can lead to various problems such as OOM, underutilization of
QE resources, etc.

The query_mem is linearly to system memory and number of primary segments if we enable
resource group, the approximate calculation formula is as follows:

query_mem = (total_memory * gp_resource_group_memory_limit * memory_limit / nsegments) *
                          memory_spill_ratio / concurrency

Only total_memory and nsegments differ between QD and QE, so we can dispatch these two
parameters to QE, and then calculate QE's own query_mem proportionally.

At the same time, we use the GUC gp_resource_group_enable_recalculate_query_mem to let the
client decides whether to recalculate the query_mem proportionally on QE and repopulate the
operatorMemKB in the plan tree according to this value.
...to help with debugging and introspection. This will allow us to pull
information about the active segments during execution, and it will form
the basis of the gp_backend_info() function.
To debug into the master backend for a given Postgres session, you can
SELECT pg_backend_pid() and attach a debugger to the resulting process
ID. We currently have no corresponding function for the segment
backends, however -- developers have to read the output of `ps` and try
to correlate their connected session to the correct backends. This is
error-prone, especially if there are many sessions in flight.

gp_backend_info() is an attempt to fill this gap. Running

    SELECT * FROM gp_backend_info();

will return a table of the following format:

   id | type | content |   host    | port  |  pid
  ----+------+---------+-----------+-------+-------
   -1 | Q    |      -1 | pchampion | 25431 | 50430
    0 | w    |       0 | pchampion | 25432 | 50431
    1 | w    |       1 | pchampion | 25433 | 50432
    2 | w    |       2 | pchampion | 25434 | 50433

This allows developers to jump directly to the correct host and PID for
a given backend. This patch supports backends for writer gangs (type 'w'
in the table), reader gangs ('r'), master QD backend ('Q') and
master singleton readers ('R').

Co-authored-by: Soumyadeep Chakraborty <soumyadeep2007@gmail.com>
Co-authored-by: Divyesh Vanjare <vanjared@vmware.com>
The global variable host_segments is **only** used on QEs under resource
group mode and the value is dispatched to QEs from QD. Previously in
the function getCdbComponentInfo(), QD make a hashtable, and count host_segments
group by the key of ip address. This is not correct. A typical
Greenplum deployment environment may have different ip addresses point
to the same machine. Use ip address as hash key will lead to wrong
number of host_segments and lead to more memory limit of a segment
than user's intent.

This commit use hostname as a machine's unique identifier to fix the issue.
Also change some names to better show the meanings.
… tainted replicated (#13177)

Previously, CPhysicalJoin derived the outer distribution when it was tainted
replicated. It checked only for strict replicated and universal replicated and
returned the inner distribution in these cases (in this case, it satisfies
random). Tainted replicated wasn't considered and was causing an undercount (the
JOIN derived tainted replicated instead of random, which was causing the number
of columns to be undercounted, because it wrongly assumed that one segment
contained all output columns).

Co-authored-by: Daniel Hoffman <hoffmand@vmware.com>
In planner, If a SegmentGeneral pat contains volatile expressions, it
cannot be taken as General, and we will try to make it SingleQE by
adding a motion (if this motion is not needed, it will be removed
later). But a corner case is that if the path refs outer Params then
it cannot be motion-ed. This commit fixes the issue by not trying to
bring to singleQE for segmentgeneral path that refs outer Params.

See Github Issue 13532 for details.
The 5-digits date string was invalid and would be rejected on GPDB5. But then
the upstream pg modified the date parsing logic, which would make it parsed
as YYYMMMDD. As it's not a standard timeformat and the change causes gp6+ to
behave differently from previous version. this commit lets gp reject it by
default. And if the pg-like date parsing required, we can set the value of
GUC gp_allow_date_field_width_5digits to true.
…rt (#12694)

According to a reported error in PolicyEagerFreeAssignOperatorMemoryKB makes                                                                                                                                                           query end without calling mppExecutorCleanup #12690, the code path in                                                                                                                                                                  `standard_ExecutorStart` didn't handle exception in
`PolicyAutoAssignOperatorMemoryKB` and `PolicyEagerFreeAssignOperatorMemoryKB`
calling, which may cause the OOM exception not to be handled in
`standard_ExecutorStart` but throw to upper `PortalStart` methods, while there
is also an exception handling mechanism in `PortalStart` but `mppExecutorCleanup`
will not call because `portal->queryDesc` will be `NULL` in certain transaction states.

This commit fixes it.
790c7ba changed our address binding strategy to use a unicast
address (segment's gp_segment_configuration.address) instead of the
wildcard address, to reduce port usage on segment hosts and to ensure
that we don't inadvertently use a slower network interface for
interconnect traffic.

In some cases, inter-segment communication using the unicast address
mentioned above, may not be possible. One such example is if the source
segment's address field and the destination segment's address field are
on different subnets and/or existing routing rules don't allow for such
communication. In these cases, using a wildcard address for address
binding is the only available fallback, enabling the use of any
network interface compliant with routing rules.

Thus, this commit introduces the gp_interconnect_address_type GUC to
support both kinds of address binding.

We pick the default to be "unicast", as that is the only reasonable way
to ensure that the segment's address field is used for fast interconnect
communication and to keep port usage manageable on large clusters with
highly concurrent workloads.

Testing notes:

VM setup: one coordinator node, two segment nodes. All nodes are
connected through three networks.

Gp segment config: coordinator node has one coordinator. Each segment
node has two primaries. No mirrors.  Coordinator uses a dedicated
network. Two primaries on a segment node each uses one of the other two
networks.

With 'unicast', we fail to send packets due to the network structure:
WARNING:  interconnect may encountered a network error, please check your network

Falling back to 'wildcard', we see that packets can be sent successfully
across motions.

Co-authored-by: Huansong Fu <fuhuansong@gmail.com>
`ResGroupActivated = true` is set at the ending of InitPostgres()
by InitResManager(). If inside InitPostgres() some code before
InitResManager() call palloc() failed, then call trace:

    gp_failed_to_alloc()
    -> VmemTracker_GetAvailableVmemMB()
    -> VmemTracker_GetNonNegativeAvailableVmemChunks
    -> VmemTracker_GetVmemLimitChunks

It will trigger:

VmemTracker_GetVmemLimitChunks()
{
    AssertImply(vmemTrackerInited && IsResGroupEnabled(),
        IsResGroupActivated());
}

Like commit c1cdb99 does, remove the AssertImply and add 
TODO comment.
removed meaningless code line in resgroup_helper.c
This adds a GUC optimizer_enable_replicated_table, which defaults any DML
operation on a replicated table to fall back to Postgres planner.
optimizer_enable_replicated_table is on by default.

Co-authored-by: Daniel Hoffman <hoffmand@vmware.com>
The GUC replacement_sort_tuples was introduced in GP 9.6 to indicate the 
threshold to use replacement selection rather than quicksort. In PG12,
the GUC was removed with all code related to replacement selection sort,
and doesn't appear in GPDB7. However, in GPDB7 there is still one line
about replacement_sort_tuples in sync_guc_name.h (without any other
related code). It should be treated as a mistake. The fix is to simply
remove the line and doesn't impact any existing behavior.
…r 100 (#13668)

When we set runaway_detector_activation_percent to 0 or 100 means to disable runaway detection,
this should apply to Vmem Tracker and Resource Group.

However, in the current implementation, we will still invoke IsGroupInRedZone() if we enabled resource
group if we set runaway_detector_activation_percent to 0 or 100. And in function IsGroupInRedZone()
has some automatic operation to read variables. At the same time RedZoneHandler_IsVmemRedZone
is a very frequently called function, so this will waste a lot of CPU resources.

When we init Red-Zone Handler, will set redZoneChunks to INT32_MAX if we disable runaway
detection, so we can use it to judge whether we are in Red-Zone or not quickly.

No more tests need, since current unit tests already have cases covering this situation.
SmartKeyerror and others added 25 commits June 21, 2024 15:06
…pipeline tests (#13974)

The resource group pipeline uses ORCA as an optimizer by default. But as a resource 
management tool, it's unimportant which optimizer we use.

So use postgres query optimizer instead of ORCA to run resource group pipeline tests. 
After that, we can remove the file of resgroup_bypass_optimizer.source and 
resgroup_bypass_optimizer_1.source.
[7X] Feat: Identify backends with suboverflowed txs

Subtransaction overflow is a chronic problem for Postgres and Greenplum,
which arises when a backend creates more than PGPROC_MAX_CACHED_SUBXIDS
(64) subtransactions. This is often caused by the use of plpgsql
EXCEPTION blocks, SAVEPOINT etc.

Overflow implies that pg_subtrans needs to be consulted and the
in-memory XidCache is no longer sufficient.

The lookup cost is particularly felt when there are long running
transactions in the system, in addition to backends with suboverflow.

Long running transactions increase the xmin boundary, leading to more
lookups, especially older pages in pg_subtrans. Looking up older pages
while we are constantly generating new pg_subtrans pages (with the
suboverflowed backend(s)) leads to pg_subtrans LRU misses, exacerbating
the slowdown in overall system query performance.

Terminating the backend with suboverflow or backends with long running
transactions can help alleviate the potential performance problems. This
commit provides an extension and a view which can help DBAs identify
suboverflown backends, which they can subsequently terminate. Please
note that backends should be terminated from the master (which will
automatically terminate the corresponding backends on the segments).
…ent (#14019)

We might want to also consider adding a log message to print the query string
that caused the overflow. This is important as only 1 statement out of thousands
executed in a backend may trigger the overflow, or the backend can come out of
the overflow state before it is inspected with our view/UDF. Logging the
statement will ensure that customers can pinpoint the offending statements.
Note that lc_monetary and lc_time are related to formatting
output data. Besides, formatting functions will be pushed down
to QEs in some common cases. So to keep the output data consistent
with the locale value set in QD, we need to sync these GUCs
between QD and QEs.

Co-authored-by: wuchengwen <wcw190496@alibaba-inc.com>
Add GUC gp_print_create_gang_time control whether to print information about creating gang time.
We print the create gang time for both DDL and DML.

If all the segDescs of a gang are from the cached pool, we regard the gang as reused.
We only display the shortest and longest establish conn time and their segindexs of a gang.

The info of the shortest establish conn time and the longest establish conn time is the same for 1-gang.

DDL:
```
create table t(tc1 int);
INFO: The shortest establish conn time: 4.48 ms, segindex: 2,
      The longest  establish  conn time: 8.13 ms, segindex: 1

set optimizer=off;
INFO: (Gang) is reused
```
DML:
we can use DML or explain analyze to get create gang time.
```
select * from t_create_gang_time t1, t_create_gang_time t2 where t1.tc1=2;
INFO: (Slice1) is reused
INFO: (Slice2) The shortest establish conn time: 4.80 ms, segindex: 0,
               The longest  establish conn time: 4.80 ms, segindex: 0
tc1 | tc2 | tc1 | tc2
-----+-----+-----+-----
(0 rows)

explain analyze select * from t_create_gang_time t1, t_create_gang_time t2 where t1.tc1=2;
INFO: (Slice1) is reused
INFO: (Slice2) is reused
QUERY PLAN
......
```
Items in these two files should be ordered.
Currently, the postmaster process will be added to the parent cgroup, and all the
auxiliary processes, such as BgWriter, SysLogger, will be added to the cgroup of
user.slice, if we enable the resource group. We can not control the resource
usage of the cgroup of user.slice, and it's difficult to calculate the proportion
between the resource usage of the parent group and child group, the Linux Cgroup
document doesn't explain it either.

So this PR created a new control group, named "system_group", to control the
resource usage of the postmaster process and all other auxiliary processes.

And this PR uses the below principle:

When a process forks a child process, the new process is born into the
cgroup that the forking process belongs to at the time of the operation.
After exit, a process stays associated with the cgroup that it belonged to
at the time of exit until it's reaped;
Added a new view into the resource manager tool gp_toolkit to perform 
the function that is used frequently:

gp_toolkit.gp_resgroup_role: assigned resource group to roles.
Fix the failed pipeline due to #13880
…rface (#14343)

This PR is the second step in refactoring the resource group. The first one is #14256.

In this PR, we do not change any behavior of the resource group, we do not change the
interface exposed to the resource manager, it just abstracts all the fundamental functions
to the struct CGroupOpsRoutine, and use this abstract handler to manipulate the
underlying Linux Cgroup files, there are two purposes for this:

1. make the code more readable.
2. provides the base interface for Linux Cgroup v2.

The second one is our main motivation for doing this.

Of course, this is a relatively large change, so it's not all done, and more details need to be
fixed.
    NEW SYNTAX of resource group cpuset for different master and segment
    using syntax like cpuset="1;3-4" could different cpuset of master and
    segment by semicolon. As we define cpuset="1;3-4", master will apply
    the first cpu core, segments apply third and fourth core at same time.
    Differentiate mater and segment by seperating cpuset through semicolon,
    then apply the first half of it to master and second half to segment.
fix link problems in macOS and Windows which was introduced by #14343.
Fix dev pipline failure of previous PR #14332.
My linker complains that there's multiple definition of cgroupOpsRoutine
and cgroupSystemInfo.

We should declare the variable in header file with an extern tag and
initialize it in one of the .c file.

Since cgroupSystemInfo and cgroupOpsRoutine are required on multiple
platforms, I initialize them in resgroup.c.
Simplify and refactor some codes of RG cpuset seperated by coordinator/segment.
This commit if for enhenceing previous PR https://github.com/greenplum-db/gpdb/pull/14332.

authored-by: chaotian <chaotian@vmware.com>
After #14343, it's time to remove all the relevant codes and test cases about the
resource group memory manager.

1. What this PR has done

This PR did most 2 important things:

First, the most important is to remove all the codes and test cases about resource
group memory model, which includes the functions, variables, GUCs, etc.
add new semantics on resource group and removing memory model.

Since pg_resgroupcapability.reslimittype is consistent with the enumerated type
ResGroupLimitType, when we delete `RESGROUP_LIMIT_TYPE_MEMORY and other content,
there will be "holes". In order to avoid more PR and review work, this PR deleted
memory model and added new semantics.

The GUC this PR removed:

gp_resource_group_memory_limit, gp_resgroup_memory_policy,
memory_spill_ratio, gp_log_resgroup_memory, gp_resgroup_memory_policy_auto_fixed_mem
gp_resource_group_cpu_ceiling_enforcement, gp_resgroup_print_operator_memory_limits
gp_resource_group_enable_recalculate_query_mem.

2. New Resource Group Attributes and Limits

New Resource group attributes and limits:

- concurrenty. The maximum number of concurrent transactions, including active and
idle transactions, that are permitted in the resource group.
- cpu_hard_quoata_limit. The percentage of CPU resources hard limit to this resource
group. This value indicates the maximum CPU ratio that the current group can use.
- cpu_soft_priority. The current group CPU priority, the larger the value, the higher the
priority, the more likely to be scheduled by the CPU, the default value is 100.
- cpuset. The CPU cores to reserve for this resource group.

First, let's take a look at the new resource management view of resource group:

postgres=# select * from gp_toolkit.gp_resgroup_config;
 groupid |   groupname   | concurrency | cpu_hard_quota_limit | cpu_soft_priority | cpuset
---------+---------------+-------------+----------------------+-------------------+--------
    6437 | default_group | 20          | 20                   | 100               | -1
    6438 | admin_group   | 10          | -1                   | 300               | -1
    6441 | system_group  | 0           | 10                   | 100               | -1
(3 rows)

2.1 What's the meaning of cpu_hard_quota_limit

It can be seen that cpu_rate_limit is removed and replaced by cpu_hard_quota_limit, which
indicates the upper limit of CPU resources that the current group can use. This is a percentage,
taking 20 as an example, it means that the CPU resources used by the current group cannot exceed
20% of the total CPU resources of the Host.

The sum of cpu_hard_quota_limit of all groups can exceed 100, and the range of this value is
[1, 100] or -1, where 100 and -1 both mean that all CPU resources can be used, and no CPU resource
limit is imposed on it.

When we change the value of cpu_hard_quota_limit, will write

cpu.cfs_period_us * ncores * cpu_hard_quota_limit / 100
to the file cpu.cfs_quota_us.

2.2 What's the meaning of cpu_soft_priority

We have added cpu_soft_priority this field, which is used to indicate the CPU priority of the current
group, corresponding to the dynamic running load weight in Linux CFS. The larger the value, the greater
the weight of the group, and it will be scheduled more preferentially by the Linux scheduling process.

The value range is [1, +∞], currently, the value cannot exceed 2^64 - 1. The default value is 100.

When we change the value of cpu_soft_priority, will write

(int64)(cpu_soft_priority * 1024 / 100)

to the file cpu.shares.
In DistributedLog_AdvanceOldestXmin() we advance DLOG's idea of
the oldestXmin to the "globalxmin" value, and also truncate all
DLOG segments that only hold xids older than the oldestXmin.

The oldestXmin can be xmax, i.e. the "latestCompletedXid" + 1,
when e.g. there's no other concurrent running transactions.
However, during postmaster restart we initialize the oldestXmin
to be up to only latestCompletedXid. As a result, when we try to
advance it again, we could try to access the segment that holds
latestCompletedXid, which had been truncated before the restart.

Fixing it now by initializing oldestXmin properly. Add a test for
the same. Had to move the test file to isolation/input in order to
import the regress.so for the test_consume_xids() function we need.
Since #14562 removed some GUCs and the memory model of resource group,
there have some legacy test cases and useless codes in the project, this PR will
remove those codes and files.

No more test needs, it's a clean process.
Currently, we will create a distributed snapshot in the function GetSnapshotData() if
we are QD, and we will iterate procArray again to get the global xmin/xmax/xip.

But if the current query could be dispatched to a single segment directly, which means
it's a direct dispatch, there is no need to create a distributed snapshot, the local snapshot
is enough.
Change of totalQueued and totalExecuted from int to int64 to avoid overflow after long running.

Co-authored-by: huaxi.shx <huaxi.shx@alibaba-inc.com>
When gpdb calls InitResGroups to init a postgres backend, readStr is
called to read cpuset assigned to gpdb. However the size of data buffer
in readStr is too small, cpuset string readed by gpdb is truncated.

This commit change the buffer size from MAX_INT_STRING_LEN(20) to
MAX_CGROUP_CONTENTLEN(1024) to fix resgroup init error when there
is a lot of cores in cpuset.cpus

Co-authored-by: huaxi.shx <huaxi.shx@alibaba-inc.com>
The GUC's name must be populated into `sync_guc_name.h` if it needs to sync value between QD and QEs.
QD will dispatch its current synced GUC values (as startup options) to create QEs. Otherwise, the
settings will not take effect on the newly created QE.

An example of GUC inconsistency between QD and QE:
```
CREATE OR REPLACE FUNCTION cleanupAllGangs() RETURNS BOOL
AS '@abs_builddir@/../regress/regress.so', 'cleanupAllGangs' LANGUAGE C;

CREATE OR REPLACE FUNCTION public.segment_setting(guc text) RETURNS SETOF text EXECUTE ON ALL SEGMENTS AS
$$ BEGIN RETURN NEXT pg_catalog.current_setting(guc); END $$ LANGUAGE plpgsql;

postgres=# show allow_segment_DML;
 allow_segment_DML
-------------------
 off
(1 row)

postgres=# set allow_segment_DML = on;
SET
postgres=# show allow_segment_DML;
 allow_segment_DML
-------------------
 on
(1 row)

postgres=# select public.segment_setting('allow_segment_DML');
 segment_setting
-----------------
 on
 on
 on
(3 rows)

postgres=# select cleanupAllGangs();
 cleanupallgangs
-----------------
 t
(1 row)

postgres=# show allow_segment_DML;
 allow_segment_DML
-------------------
 on
(1 row)

postgres=# select public.segment_setting('allow_segment_DML');
 segment_setting
-----------------
 off
 off
 off
(3 rows)

```
- Move guc `application_name` and `vacuum_cost_limit` back to `unsync_guc_name.h`
to fix pipeline failure.
Pipeline link: https://prod.ci.gpdb.pivotal.io/teams/main/pipelines/gpdb_main_without_asserts/jobs/gpconfig_rocky8/builds/19
- Remove several deprecated gucs.
1.support create/alter resource group with memory_limit, add the
removed gucs(which are used to do memory limit) back;

2.support to acquire the amount of memory reserved for the query
in resource group mode;

3.add a guc gp_resgroup_memory_query_fixed_mem to allow users set
the memory limit for a query.
@my-ship-it my-ship-it merged commit c8271af into cloudberrydb:main Jun 24, 2024
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet