Skip to content

[BUGFIX] Fix segment crash on deserialization of incoming internal executor record-type parameters#106

Merged
Stolb27 merged 1 commit intoadb-6.xfrom
ADBDEV-337
Oct 16, 2020
Merged

[BUGFIX] Fix segment crash on deserialization of incoming internal executor record-type parameters#106
Stolb27 merged 1 commit intoadb-6.xfrom
ADBDEV-337

Conversation

@maksm90
Copy link

@maksm90 maksm90 commented Aug 28, 2020

Problem description

The master node dispatches external and internal (gathered from plan tree) parameters along with query plan. Internal ones might include not initialized values, e.g., used for storing results of not yet evaluated initPlans. Those parameters are transmitted as zero values that cases segfault on segments under deserialization of complex type value.

The current fix intercepts on QE side handling of zero value parameter before further deserialization process.

Scenario to reproduce

CREATE TABLE users_unmasked (
  user_id bigint NOT NULL,
  params text
) DISTRIBUTED BY (user_id);

ALTER TABLE ONLY users_unmasked
ADD CONSTRAINT users_20171219_pkey PRIMARY KEY (user_id);

-- faulty query
SELECT cte."userId" FROM ( SELECT 7 as "userId" ) cte WHERE (
  SELECT
    row(u.user_id)
  FROM
    test_schema.users_unmasked u
  WHERE
    u.user_id = 7
) IS NOT NULL;

Faulty query have to has the following plan:

                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Result
   One-Time Filter: ($0 IS NOT NULL)
   InitPlan 1 (returns $0)  (slice2)
     ->  Gather Motion 1:1  (slice1; segments: 1)
           ->  Index Only Scan using users_20171219_pkey on users_unmasked u
                 Index Cond: (user_id = 7)
   ->  Result
 Optimizer: Postgres query optimize

Internal executor parameter here is $0 that is used in “One-Time Filter” of Result node.

Affected versions

On master branch a huge refactoring was done in place of dispatching executor parameters - commit 53d12bd . In particular, it segregates external and internal parameters in serialized representation of transmitted data and segregates de-serialization procedures of these parameters. The master works perfectly in this context.
6X and 5X are sensitive to this error.

Copy link

@leskin-in leskin-in left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note https://github.com/greenplum-db/gpdb/pull/10450 merges the commit you mentioned into 6X.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose an extra if (value == 0) should be added before the call to BuildTupleRemapInfo(). In the case in question, we do not need the result of the method, and it does not have any side-effects (except for draining space from remapper->mycontext).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks, fixed.

@maksm90
Copy link
Author

maksm90 commented Aug 28, 2020

Note greenplum-db#10450 merges the commit you mentioned into 6X.

Hmm, I cannot find this commit. Could you point to it?

Copy link

@darthunix darthunix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maksm90 can I ask you to make some additional research about the absence of current error on master?
You are right, the storage of external parameters was refactored in master branch since 6X - now we store them in QueryDispatchDesc. As a result, serialization was also refactored: on 6X we use serializeParamListInfo(), on master - serializeParamsForDispatch(). But I don't see big difference in the code here. The same situation about deserialization: deserializeParamListInfo() on 6X and deserializeExternParams() on master are very much alike while TRRemapDatum() is identical on 6X and master (everything works without your current patch). So may be we should find out the problem in serialization rather than make additional check on deserialization side?

@maksm90
Copy link
Author

maksm90 commented Aug 31, 2020

@maksm90 can I ask you to make some additional research about the absence of current error on master?

Sure

You are right, the storage of external parameters was refactored in master branch since 6X - now we store them in QueryDispatchDesc. As a result, serialization was also refactored: on 6X we use serializeParamListInfo(), on master - serializeParamsForDispatch(). But I don't see big difference in the code here.

Serialized representation of external and internal parameters are stored separately

The same situation about deserialization: deserializeParamListInfo() on 6X and deserializeExternParams() on master are very much alike while TRRemapDatum() is identical on 6X and master (everything works without your current patch).

Similarly deserialization of external and internal parameters on QE side are separated. But deserialization procedure for internal parameters doesn't use buggy TRRemapDatum() and because of this GP7 doesn't fail on query.

@darthunix darthunix self-requested a review September 1, 2020 00:48
Copy link

@darthunix darthunix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maksm90 thank you for explanation above. I think it would be a very invasive modification to port master branch refactoring (https://github.com/greenplum-db/gpdb/commit/53d12bd56fd124fa1b0bcd0d72ff7cf69f0bd441), so it is better to fix the problem with small modifications as you have suggested.

@leskin-in
Copy link

Note greenplum-db#10450 merges the commit you mentioned into 6X.

Hmm, I cannot find this commit. Could you point to it?

Yes, I was incorrect about that. The code in question is not in the current 6X_STABLE.

The master node dispatches external and internal (gathered from plan
tree) parameters along with query plan. Internal parameters might
include not initialized zero values, e.g., used for storing results of
not yet evaluated initPlans. Those parameters are trasmitted as zero
values that cases segfault on segments under deserialization of complex
type value.

The current fix intercepts on QE side handling of zero value parameter
before further deserialization process.
@Stolb27 Stolb27 merged commit 6762b75 into adb-6.x Oct 16, 2020
@Stolb27 Stolb27 deleted the ADBDEV-337 branch October 16, 2020 05:08
RekGRpth pushed a commit that referenced this pull request Dec 3, 2025
Problem description:
After sequential execution of isolation2 tests 'standby_replay_dtx_info' and
'ao_unique_index' the coordinator's standby postmaster process together with
its children processes were terminated.

Root cause:
Test 'standby_replay_dtx_info' sets fault injection 'standby_gxacts_overflow'
on coordinator's standby, which updates the global var 'max_tm_gxacts' (the
limit of distributed transactions) to 1, but at the reset of this fault the
value of 'max_tm_gxacts' was not updated to its original value. Therefore, on
any next test that created more than 2 distributed transactions that were
replayed on the standby, the standby encountered the fatal error "the limit of 1
distributed transactions has been reached" and it was terminated.

Fix:
Set 'max_tm_gxacts' to its original value when fault injection
'standby_gxacts_overflow' is not set.

(cherry picked from commit 423cc57b779bfb8f048f47425b428091a7d959a9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments