Skip to content

Latency glitch fix #558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 37 commits into from
Closed

Latency glitch fix #558

wants to merge 37 commits into from

Conversation

mronstro
Copy link

No description provided.

Nisha Gopalakrishnan and others added 30 commits May 22, 2024 09:31
Analysis

Tables of redundant row format having indexes of size greater than 767 bytes
cannot be accessed after a successful upgrade to 8.0

An index with size greater than 767 bytes was permitted to be created
on a table with redundant row format prior to 5.7.35 version. When
such tables are upgraded to 8.0, they became inaccessible. During open table
i.e dd_open_table(), the index size was validated resulting in an error since
the redundant row format does not support indexes greater than 767 bytes.

Fix

Mark such indexes of invalid size as corrupt during dd_open_table().
This involved moving the index size check to later point in time
in dd_open_table_one() i.e after dict_table_add_to_cache(). All
operations which involves using the index will error out as index
corrupted until the index is dropped.

Change-Id: Iff3915028f1d3631af2fe85669891144ab048856
Follow up patch to removed unwanted zip file.

Change-Id: Iff3915028f1d3631af2fe85669891144ab048856
Fixed the build failure due to unused variable(used only in debug
execute if stmt).

Change-Id: Iff3915028f1d3631af2fe85669891144ab048856
fields_with_changed_order didn't treat prefix_len key table well.
col->get_phy_pos() returns raw data about prefix phy_pos.
We need to use field->get_phy_pos() for actual phy_pos here.

Change-Id: If13449f9e6e6191cd0f2e7102f62ac28024727f8
…xcom/gcs_xcom_networking

Description:
Fixed a memory leak in Gcs_ip_allowlist_entry_hostname::get_value.
Previously, the function returned nullptr on error without freeing the allocated
memory for a vector.

This commit modifies Gcs_ip_allowlist_entry_hostname::get_value to properly
delete the vector before returning nullptr in error cases.

Change-Id: Iab0076839cccb6c4af7feb3640788ee977f0e9ce
Follow up fix for mismatched result due to file path separators.

Change-Id: Iec5b01659e12840f30748beb81dbed2dd33471b6
Removing dead code.
Contribution by Huaxiong Song.

Change-Id: I2965e659b715d5f0b3f920a8a016429a33e07f25
The mysql_server_mock sometimes doesn't shutdown when it receives a
SIGTERM.

Stacktraces show that the mock-server waits on:

  server_mock::MySQLServerMock::run () at mock_server/src/mysql_server_mock.cc:318

which is a condition-variable that waits for all connections to be
closed.

Background
==========

The hang appears as:

1. a connection is held open by the client
2. the SIGTERM leads to close_all_connections being called which calls
   cancel() on each connection, aborting any waiting IO, expecting that
   it leads to a connection close on each connection.

If the connection is _not_ waiting on IO, it does NOT notice the request
for shutdown and waits for the client to send something, while the
client waits for a shutdown of the mock-server. A deadlock.

Change
======

- add a is_terminated_(true) to track that a connection shall be closed
- check is_terminated_ before entering an async_read()

Change-Id: Idc0cb32fb5b0d67a42ac30644f4a80abbd4a2117
              function json_extract

 Description:
 ------------
 When the source has a table featuring a stored generated column
 populated using a JSON function, and subsequent update or deletion
 operations are executed on the underlying base column, it fails with
 the error message: "Invalid JSON text in argument 1 to function
 json_extract: 'The document is empty.'".

 Analysis:
 ------------
 Issue Scenario:
  The source table includes:-
    - A stored generated column populated using JSON function
    - binlog_row_image=MINIMAL
  Replication encounters failure upon update/delete operations on
  the underlying base column.
 Root cause:
  - The replica attempts to re-evaluate the generated column during
    the update/delete before image (BI) process.
  - Re-evaluation of the generated column fails with the error
    "document is empty" while executing the JSON function,
    as no underlying base column is passed as part of the BI.

 Fix:
 ----
 Do not re-evaluate stored generated columns when the base columns
 are unavailable.

Change-Id: I6e41c29a6d87024f59a645eb6e4596e731087c90
                          from Optimizer

During optimization, MySQL chooses an initial plan based on different
parameters like access cost, index usability, coverage, etc. During
this phase ORDER BY and LIMIT are not considered for the plan
creation and access choice. Later during ORDDER BY optimization
test_if_cheaper_ordering() chooses index scan as a replacement
for sort to get the results in order. This change in the plan is not
reflected in the cost data present in the plan. As a result, simple
query with ORDER BY and LIMIT will have the cost of sorting the
entire table to fetch few LIMIT rows. This cost value is used in
comparision with secondary_engine_cost_threshold to decide Heatwave
offloading. This could lead to offloading of the queries even when
they could perform much better on InnoDB engine.

Fix: Update the table access cost if index is choosen in
test_if_cheaper_ordering() to provide order. And update query cost
accordingly for single table queries and its present in secondary engine.

Change-Id: Ica649c9db687ef1162964d865a2b5e4ec4c0d069
                          from Optimizer

Post-push test fix for hypergraph optimizer.

Change-Id: Ia0155170e54c4644471e4da1c86925d781617225
…during MySQL 8 upgrade

Summary

Upgrading a table

  CREATE TABLE t1 (c1 int GENERATED ALWAYS AS (0),
                   c2 int) ENGINE=MyISAM;

from 5.7 to 8.x would fail. This would only occur for a
mix of regular columns and generated columns and when using
MyISAM.

The reason for this was that during that particular type
of upgrade for use with the data dictionary, we would align
NULL info one way, and the other way everywhere else. This
was caught by our sanity checks.

Analysis

Using our above table as an example: We always use two loops
to create columns; the first loop processes regular columns,
while the the second loop processes generated columns.
Therefore, we'd process c2 (a natural column) before c1 (a
generated column).

In older code, we'd count the NULL position up in order of
column creation so that c2 (which is the second column in the
CREATE statement, but the first column to be processed on
account of being the first regular column) will be processed
first and get the lower NULL position; c1 will be processed
second (on account of being a generated column and getting
created in the second loop) and get the higher NULL position.

In short, we process all regular columns first, then all
generated columns, and count up NULL position the entire
time.

In fill_columns_from_dd(), we processed columns in the same
order (first all natural columns, then all generated columns),
but would increase the NULL position even when no field was
processed (e.g. when skipping a generated column in the first
loop that only processes regular columns). Therefore, the
NULL positions ended up wrong and tripped our sanity check.

This patch aligns the NULL ordering so that the positions
generated in fill_columns_from_dd() are compatible with the
ordering created by earlier servers and indeed with the
behavior elsewhere in contemporary servers.

Change-Id: I7bd18e0822c1d86b9dd322a5d994200078ebf84c
Issue:
 In bug#35183686 fix, we started logging the fields whose column order
 changed. But while calculating size needed in redo to log index information
 these columns were missing from calculation.

Fix
 Make sure these colums are also considered while calculating size
 needed to log index entry.

Change-Id: Ic8752c72a8f5beddfc5739688068b9c32b02a700
Extend testing of the ndb_optimized_node_selection variable in
MySQL Server, and fix a few issues.

ndb_optimized_node_selection MTR test is extended to give some
coverage of :
 - Table types : Read Primary, Read Backup, Fully Replicated
 - O_N_S variants : 0,1,2,3

Surfaced issues :

 - Reading and Setting optimized_node_selection option on
   (one of) the ndb_cluster_connection objects and/or on
   the session's Ndb object with implications for thread
   safety, testability, behaviour
 - Two Round-robin unhinted transaction issues :
   - Proximity ignoring Round-robin algorithm logic error
     resulting in imbalance.
   - Proximity info used in some cases even when
     optimized_node_selection is disabled.

Changes
 - Avoid setting/clearing ndb_cluster_connection
   optimized_node_selection value
   - Rename the variable to indicate that it's a
     connection default
   - Modify NdbApi level code to consistently use
     the Ndb session object's value for decisions
   - Modify ha_ndbcluster code to consistently set
     the Ndb object's optimized_node_selection value
     prior to starting a transaction

 - Fix Round-robin bugs.

Change-Id: I7205837bc3672110490d23c2470a95a963237e3d
Unhinted transactions, or hintable transactions where hinting
is disabled will use a round-robin (RR) algorithm, balancing
the transaction load across a set of candidate data nodes.

The ndb_optimized_node_selection option controls which nodes
are candidates.

                        ndb_optimized_node_selection & 1
  All data nodes        0
  Local data nodes      1

Locality is defined or inferred by a number of NdbApi
level concepts such as :
  Config   : LocationDomain
  Api      : DataNodeNeighbour
  Internal : tryBind remote address->Group
  Config   : Group

In both cases there can be multiple candidates, so NdbApi
attempts to spread the load across the candidates.

However the state memory required to do this is part of the
Ndb (session) object, and is reset (to the same 'next' position)
in every new Ndb object.

In some use cases new Ndb objects are always used, such as
when a SQL user connects, issues an unhinted query and then
disconnects.

In this case there is no RR balance, resulting in resource
usage imbalances and limiting system scalability and capacity.

This is fixed so that in these cases the first transaction
issued by a connection is also handled in a round-robin
way, avoiding persistent imbalance.

The MTR testcase ndb_optimized_node_selection is extended
to cover this scenario.

Change-Id: Ia491b3bc4ae96e5e57a27baa375d2d84b3068d60
Testcase ndb_read_balance aims to capture some of the behaviour of
the different table distribution types (READ_PRIMARY, READ_BACKUP,
FULLY_REPLICATED) in test output.

However it was noted that it was not stable.

Some reasons for this :
 - Read Backup + Fully Replicated tables may read a primary or
   backup fragment replica depending on :
   - The lock mode (long-locking reads must go to the primary)
   - Whether the chosen TC is local to the primary replica or not
 - Unhinted transactions are non-deterministic other than that
   over some number of iterations they should exhibit round-robin
   behaviour over relatively 'local' data nodes
 - Hinted transactions for read-backup and fully replicated tables
   are also liable to read Primary or Backup nodes depending on
   locality.

The test is modified to :
 - Show that for READ PRIMARY tables, the primary replicas are
   effectively always chosen
 - Show that for READ BACKUP + FULLY REPLICATED tables, the primary
   replicas are always chosen for locking reads
   - For non locking reads, a balance of PRIMARY + BACKUP replicas
     are chosen.

This modification should remove the test dependencies on preceding
transactions, endinanness etc.

Change-Id: I5f6e41ae888a0279f0b9b316e214e8643996410f
Extend testing of the ndb_optimized_node_selection variable in
MySQL Server, and fix a few issues.

ndb_optimized_node_selection MTR test is extended to give some
coverage of :
 - Table types : Read Primary, Read Backup, Fully Replicated
 - O_N_S variants : 0,1,2,3

Surfaced issues :

 - Reading and Setting optimized_node_selection option on
   (one of) the ndb_cluster_connection objects and/or on
   the session's Ndb object with implications for thread
   safety, testability, behaviour
 - Two Round-robin unhinted transaction issues :
   - Proximity ignoring Round-robin algorithm logic error
     resulting in imbalance.
   - Proximity info used in some cases even when
     optimized_node_selection is disabled.

Changes
 - Avoid setting/clearing ndb_cluster_connection
   optimized_node_selection value
   - Rename the variable to indicate that it's a
     connection default
   - Modify NdbApi level code to consistently use
     the Ndb session object's value for decisions
   - Modify ha_ndbcluster code to consistently set
     the Ndb object's optimized_node_selection value
     prior to starting a transaction

 - Fix Round-robin bugs.

Change-Id: I7205837bc3672110490d23c2470a95a963237e3d
Unhinted transactions, or hintable transactions where hinting
is disabled will use a round-robin (RR) algorithm, balancing
the transaction load across a set of candidate data nodes.

The ndb_optimized_node_selection option controls which nodes
are candidates.

                        ndb_optimized_node_selection & 1
  All data nodes        0
  Local data nodes      1

Locality is defined or inferred by a number of NdbApi
level concepts such as :
  Config   : LocationDomain
  Api      : DataNodeNeighbour
  Internal : tryBind remote address->Group
  Config   : Group

In both cases there can be multiple candidates, so NdbApi
attempts to spread the load across the candidates.

However the state memory required to do this is part of the
Ndb (session) object, and is reset (to the same 'next' position)
in every new Ndb object.

In some use cases new Ndb objects are always used, such as
when a SQL user connects, issues an unhinted query and then
disconnects.

In this case there is no RR balance, resulting in resource
usage imbalances and limiting system scalability and capacity.

This is fixed so that in these cases the first transaction
issued by a connection is also handled in a round-robin
way, avoiding persistent imbalance.

The MTR testcase ndb_optimized_node_selection is extended
to cover this scenario.

Change-Id: Ia491b3bc4ae96e5e57a27baa375d2d84b3068d60
Testcase ndb_read_balance aims to capture some of the behaviour of
the different table distribution types (READ_PRIMARY, READ_BACKUP,
FULLY_REPLICATED) in test output.

However it was noted that it was not stable.

Some reasons for this :
 - Read Backup + Fully Replicated tables may read a primary or
   backup fragment replica depending on :
   - The lock mode (long-locking reads must go to the primary)
   - Whether the chosen TC is local to the primary replica or not
 - Unhinted transactions are non-deterministic other than that
   over some number of iterations they should exhibit round-robin
   behaviour over relatively 'local' data nodes
 - Hinted transactions for read-backup and fully replicated tables
   are also liable to read Primary or Backup nodes depending on
   locality.

The test is modified to :
 - Show that for READ PRIMARY tables, the primary replicas are
   effectively always chosen
 - Show that for READ BACKUP + FULLY REPLICATED tables, the primary
   replicas are always chosen for locking reads
   - For non locking reads, a balance of PRIMARY + BACKUP replicas
     are chosen.

This modification should remove the test dependencies on preceding
transactions, endinanness etc.

Change-Id: I5f6e41ae888a0279f0b9b316e214e8643996410f
Change-Id: Ieb72bf9e2112184467652926d9536ef5bfbffb97
innodb_directories to datadir after ALTER TABLE

In case of ALTER TABLE .. FORCE and ALTER TABLE .. ENGINE, tables are
recreated.  The default location for recreated tables is default data
dir.  If the table was created with Data Directory clause, the table
is created in the same source directory.

Now, If the source table was moved to a different location and then
the above Alter statements are run, the table gets recreated in the
default data dir. This does not seem correct as the user would expect
it to be recreated in the original directory. Furthermore, ALTER TABLE
statement does not allow change in data directory so it is important
for the table to be recreated at the same location.

The solution is to update the data directory flag for corresponding
dd_table during restart after ibd file was just moved to new location,
get the current path of that ibd and reuse this path during
recreation.

Also, for those ibd files which were moved during previous versions of
the server, we recognize them also as moved by checking the following
conditions in fil_tablespace_path_equals():
1. The file must have .ibd extension
2. old_path and new_path must point to same location (because in
previous version when the file is just moved, old_path is updated to
new location already)
3. new_path must be different from default data directory
4. dd_table should already exist for this tablespace
5. dd_table data directory flag doesn't exist (because in prev versions
before this fix, the dd_table data directory flag is not updated for
those ibd files that are moved. This check is added because for those
tables that are created with data directory clause, this flag is true.
We want to eliminate such cases and be sure that the ibd file is moved)

For such ibd files which satisfy the above conditions, we recognize
them as moved and populate their tablespace information separately in
m_moved_previously_before_90000 which is later used to update dd_table.
Note that the variable is named with version 9.0.0 as this is needed
for data directories before 9.0.0. We can remove this code in 10.0.0.

We added a common method
dd_update_table_and_partitions_after_dir_change() which updates
dd_table data directory flag for moved ibd files in above cases.

Change-Id: I5d4fc2e1f7d23f6a8a9c152449d9d5ecf2344833
…N_SIZE_INLINE'

Limit the scale used when computing averages of decimal numbers.

Simplified patch backported.

Change-Id: If32143ad7ed55f841e3973e78fa8e98cc245e775
Bug#36347647  Contribution by Tencent: Resiliency issue in fts_sync_commit

Symptoms:
During various operations on tables containing FTS indexes
the state of FTS as comitted to database may become
inconsistent, affecting the following scenarios:
 - the server terminates when synchronizing the FTS cache,
 - synchronization of FTS cache occurs concurrently with
   another FTS operation
This inconsistency may lead to various negative effects
including incorrect query results.
An example operation which forces the synchronization of
FTS cach is OPTIMIZE TABLE with
innodb_optimize_fulltext_only set to ON.

Root Cause:
Function 'fts_cmp_set_sync_doc_id' and 'fts_sql_commit' use
different  trx_t objects in function 'fts_sync_commit'.
This causes a scenario where 'synced_doc_id' in the config
table is already committed, but remaining FTS data isn't yet,
leading to issues in the scenarios described above - the
server terminating between the commits, or concurrent access
getting the intermediate state.

Fix:
When 'fts_cmp_set_sync_doc_id' is called from 'fts_sync_commit'
it will use the transaction provided by the caller.

Patch based on contribution by Tencent.

Change-Id: I65fa5702db5e7b6b2004a7311a6b0aa97449034f
Bug#36347647  Contribution by Tencent: Resiliency issue in fts_sync_commit

Post-push fix:
Add doxygen documentation to function parameter.

Simplified patch backported.

Change-Id: Ie3dbf78b4e84ae22c85d1ce873b58ee926566ff8
Change-Id: Ie99c337b7d407db70bf1480df99f5116737556e1
Change-Id: I86408ecad4a991e217b8a07be62cbba3907cd505
Change-Id: Ic57173150c9c9d9069882f09d8c07152e5752475
Change-Id: Ib9dafeb2c373496f2ff1e86b81df3fea7923a26b
ON SHUTDOWN

Note: This commit in 8.0 is different from 8.4 and trunk due to testcase

Background:
-----------
  The trx_sys->mysql_trx_list maintains a list of trx_t objects created
  for user transactions. When the `USE <database>` statement is run, a
  trx_t object is created for the THD running this query and added to
  this list. When the connection is closed, the trx_t object of the THD
  is freed and removed from this list.

  During shutdown, all the connections are closed before trx_sys is
  closed. When closing trx_sys, it is expected that all trx_t objects
  in trx_sys->mysql_trx_list are freed and the list is empty.

  In replication, when the applier sees an `XA START`, it detaches the
  THD from the handlertons by storing a backup in the engine's ha_data.
  In case of InnoDB, the trx_t created for the THD is stored in ha_data
  and is made nullptr; but not removed from the mysql_trx_list.

  In replication, the detached engine ha_data is reattached after the
  XA transaction is processed. During the reattach, the trx_t is
  restored from the ha_data back to the THD.

  By design, the detach happens when `XA START` is encountered by the
  applier. When the applier sees an `XA PREPARE`, the reattach is done
  via `applier_reset_xa_trans` calling `attach_native_trx`. When the
  applier sees an `XA COMMIT .. ONE PHASE`, the reattach is done in
  `ha_commit_low`. If the server is shutdown, the applier performs the
  reattach in `ha_rollback_low`.

  The binlog applier can be simulated by enabling `pseudo_replica_mode`
  and running a `BINLOG '0'`.

Issue:
------
  When the XA transaction is empty, the ha_list is nullptr. The reattach
  logic is performed only when ha_list is non-empty. Hence, if the
  server is shutdown after the applier detaches the engine (i.e, after
  running an `XA START`), the applier would never reattach the ha_data
  which leads to the assert when InnoDB closes trx_sys. Due to the
  design, the problem is seen only with the reattach in `ha_commit_low`
  and `ha_rollback_low`.

Fix:
----
  Ensured that reattach is done even if ha_list is empty.

Change-Id: Ida617d067d112914c9d55ef4cde9f91956d19a08
Follow up to disable ndb_optimized_node_selection test when running
with Query cache, as the cache avoids communication with Ndb
altogether.

Approved by : Magnus Blaudd <magnus.blaudd@oracle.com>

Change-Id: Ia9f2dee674f88e15a0b19da88fcc049f29311cb0
frazerclement and others added 7 commits May 31, 2024 15:07
Change-Id: I1713d7edb3fa0bd64f47847dcd65e88c4f33091d
- Post push fix for memory leak

Change-Id: I034bb20c71dfe3ae467e762b2dd7c7f95fb0679b
Approved by: Erlend Dahl <erlend.dahl@oracle.com>
Approved by: Erlend Dahl <erlend.dahl@oracle.com>
Approved by: Erlend Dahl <erlend.dahl@oracle.com>
…is repeated

Follow up patch. Build breakage on OpenSUSE15 and Ubuntu 20.04 when
compiling with GCC 9.

AsyncFile.cpp:1172:52: error: use of deleted function 'AsyncFile::odirect_set_log_state::odirect_set_log_state()'
 1172 |     AsyncFile::odirect_set_log_bp[FsOpenReq::BP_MAX];
      |                                                    ^
In file included from AsyncFile.cpp:32:
AsyncFile.hpp:161:10: note: 'AsyncFile::odirect_set_log_state::odirect_set_log_state()' is implicitly deleted because the default definition would be ill-formed:
  161 |   struct odirect_set_log_state {
      |          ^~~~~~~~~~~~~~~~~~~~~
AsyncFile.hpp:161:10: error: use of deleted function 'std::atomic<_Tp>::atomic() [with _Tp = NDB_TICKS]'
In file included from AsyncFile.hpp:29,
                 from AsyncFile.cpp:32:
/usr/include/c++/9/atomic:198:7: note: 'std::atomic<_Tp>::atomic() noexcept [with _Tp = NDB_TICKS]' is implicitly deleted because its exception-specification does not match the implicit exception-specification ''
  198 |       atomic() noexcept = default;
      |       ^~~~~~

Change-Id: I111c041825f193dec145bec00dd6428c554e6785
(cherry picked from commit 5e99e2ddd92874fcba6b6bcd8f4ddc6254baa1d9)
When sending to another node and not being able to send all the data
it is important to make sure it gets sent anyways. Currently this
relies on the wakeup timer in the send thread. However this might
delay the next send on a ready transporter by as much as 1 millisecond
and possibly even 10 millisecond dependent on how the OS handles
timer interrupts.

A better approach is to immediately wakeup the responsible send
thread and have this thread take care of the sending and thus
ensuring it is sent as soon as is possible with the send thread.
@mronstro
Copy link
Author

The patch is against MySQL 8.0, became mysql-trunk by mistake.

@mysql-oca-bot
Copy link

Hi, thank you for your contribution. Please confirm this code is submitted under the terms of the OCA (Oracle's Contribution Agreement) you have previously signed by cutting and pasting the following text as a comment:
"I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it."
Thanks

@mronstro
Copy link
Author

I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it."

@mysql-oca-bot
Copy link

Hi, thank you for your contribution. Your code has been assigned to an internal queue. Please follow
bug http://bugs.mysql.com/bug.php?id=115614 for updates.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.