MDEV-11064 - Restrict the speed of reading binlog from Master #246

Closed
wants to merge 9 commits into
from

Conversation

Projects
None yet
4 participants
@GCSAdmin

In some case, the speed of reading binlog from master is high, especially when doing a new replica.
It would bring the high traffic in master.
So We introduce a new variable "read_binlog_speed_limit" to control the binlog reading rate for IO thread to solve the problem.

handle_io_slave:
last_add_time = now()
tokens = read_binlog_speed_limit //there are some initial tokens in the bucket
while(true){
event = read_event()
if (read_binlog_speed_limit > 0) {
if (tokents > TOKEN_MAX) {
tokens = TOKEN_MAX
last_add_time = now()
}
do{
//put some token
tokens = tokens + (now()-last_add_time)*read_binlog_speed_limit;
//if the token is not enough, sleep some time.
if(tokens < event.real_network_read_len)
sleep((event.len-tokens)/speed_limit)
}
}
write_event(event)
}

It can work when slave_compressed_protocol is on.
But it maybe doesn't work well when the binlog event is very big.

@svoj svoj changed the title from Restrict the speed of reading binlog from Master to MDEV-11064 - Restrict the speed of reading binlog from Master Oct 15, 2016

@svoj

This comment has been minimized.

Show comment
Hide comment
@svoj

svoj Oct 15, 2016

Contributor

Hi GCSAdmin,

Thanks for your contribution. JIRA task has been created to track this pull request: https://jira.mariadb.org/browse/MDEV-11064

This task was added to 10.2.4 backlog, which is planned to be handled between 2016-11-10 and 2016-11-17.

Similar to other open source projects, the MariaDB Foundation needs to have shared ownership of all code that is included in the MariaDB distribution. The easiest way to achieve this is by submitting your code under the BSD-new license.

The other alternative is to sign the code contribution agreement which can be found here: https://mariadb.com/kb/en/mariadb/mca/

Please indicate in a comment below that you are contributing your new code of the whole pull request, including one or several files that are either new files or modified ones, under the BSD-new license or that you have filled out the contribution agreement and sent it.

Thanks,
Sergey

Contributor

svoj commented Oct 15, 2016

Hi GCSAdmin,

Thanks for your contribution. JIRA task has been created to track this pull request: https://jira.mariadb.org/browse/MDEV-11064

This task was added to 10.2.4 backlog, which is planned to be handled between 2016-11-10 and 2016-11-17.

Similar to other open source projects, the MariaDB Foundation needs to have shared ownership of all code that is included in the MariaDB distribution. The easiest way to achieve this is by submitting your code under the BSD-new license.

The other alternative is to sign the code contribution agreement which can be found here: https://mariadb.com/kb/en/mariadb/mca/

Please indicate in a comment below that you are contributing your new code of the whole pull request, including one or several files that are either new files or modified ones, under the BSD-new license or that you have filled out the contribution agreement and sent it.

Thanks,
Sergey

mysys/my_getsystime.c
+ newtime= (ulonglong)t.tv_sec * 1000000 + t.tv_usec;
+ return newtime;
+#endif
+}

This comment has been minimized.

@svoj

svoj Oct 15, 2016

Contributor

Why not to reuse my_timer_microseconds() or my_interval_timer() or my_hrtime()?

@svoj

svoj Oct 15, 2016

Contributor

Why not to reuse my_timer_microseconds() or my_interval_timer() or my_hrtime()?

This comment has been minimized.

@GCSAdmin

GCSAdmin Oct 17, 2016

my_interval_timer()/my_timer_microseconds() may failed in some cases, because of using gettimeofday() without a loop.

Can I change my_timer_microseconds like this:
ulonglong my_timer_microseconds(void)
{
return my_hrtime().val;
}

or create a new func
ulonglong my_micro_time()
{
return my_hrtime().val;
}

@GCSAdmin

GCSAdmin Oct 17, 2016

my_interval_timer()/my_timer_microseconds() may failed in some cases, because of using gettimeofday() without a loop.

Can I change my_timer_microseconds like this:
ulonglong my_timer_microseconds(void)
{
return my_hrtime().val;
}

or create a new func
ulonglong my_micro_time()
{
return my_hrtime().val;
}

This comment has been minimized.

@svoj

svoj Oct 17, 2016

Contributor

How can it fail so that loop can help? According to manual all possible failures are permanent:
EFAULT One of tv or tz pointed outside the accessible address space.
EINVAL Timezone (or something else) is invalid.
EPERM The calling process has insufficient privilege to call
settimeofday(); under Linux the CAP_SYS_TIME capability is
required.

That is repeating gettimeofday() in case of error is a dead loop. Or do I miss something?

@svoj

svoj Oct 17, 2016

Contributor

How can it fail so that loop can help? According to manual all possible failures are permanent:
EFAULT One of tv or tz pointed outside the accessible address space.
EINVAL Timezone (or something else) is invalid.
EPERM The calling process has insufficient privilege to call
settimeofday(); under Linux the CAP_SYS_TIME capability is
required.

That is repeating gettimeofday() in case of error is a dead loop. Or do I miss something?

This comment has been minimized.

@svoj

svoj Oct 17, 2016

Contributor

Apparently there was a bug 8 years ago: https://bugs.mysql.com/bug.php?id=36819

I doubt it makes any sense to care about it these days, especially this way. In fact my_timer_microseconds(void) has workaround for this and it looks fairly reasonable.

I leave final decision up to official reviewer.

@svoj

svoj Oct 17, 2016

Contributor

Apparently there was a bug 8 years ago: https://bugs.mysql.com/bug.php?id=36819

I doubt it makes any sense to care about it these days, especially this way. In fact my_timer_microseconds(void) has workaround for this and it looks fairly reasonable.

I leave final decision up to official reviewer.

This comment has been minimized.

@vinchen

vinchen Oct 17, 2016

Contributor

Ok, we will use the my_hrtime() instead.
And we are looking forward to the official relay.

@vinchen

vinchen Oct 17, 2016

Contributor

Ok, we will use the my_hrtime() instead.
And we are looking forward to the official relay.

sql/net_serv.cc
net->where_b = save_pos;
}
net->read_pos = net->buff + net->where_b;
- if (len != packet_error)
+ if (len != packet_error) {

This comment has been minimized.

@svoj

svoj Oct 15, 2016

Contributor

Coding style: "{" should be on a separate line.

@svoj

svoj Oct 15, 2016

Contributor

Coding style: "{" should be on a separate line.

This comment has been minimized.

@GCSAdmin

GCSAdmin Oct 17, 2016

sorry, we will fix it

@GCSAdmin

GCSAdmin Oct 17, 2016

sorry, we will fix it

sql/net_serv.cc
@@ -1182,7 +1185,7 @@ my_net_read_packet(NET *net, my_bool read_from_server)
{
buf_length= net->buf_length; /* Data left in old packet */
first_packet_offset= start_of_packet= (net->buf_length -
- net->remain_in_buf);
+ net->remain_in_buf);

This comment has been minimized.

@svoj

svoj Oct 15, 2016

Contributor

I believe original spacing was fine.

@svoj

svoj Oct 15, 2016

Contributor

I believe original spacing was fine.

This comment has been minimized.

@GCSAdmin

GCSAdmin Oct 17, 2016

ok, we will restore it

@GCSAdmin

GCSAdmin Oct 17, 2016

ok, we will restore it

@knielsen

This comment has been minimized.

Show comment
Hide comment
@knielsen

knielsen Oct 15, 2016

Member

Sergey Vojtovich notifications@github.com writes:

Similar to other open source projects, the MariaDB Foundation needs to
have shared ownership of all code that is included in the MariaDB
distribution. The easiest way to achieve this is by submitting your
code under the BSD-new license.

This is a deliberate lie. There is no such requirement. In fact, most of the
code in MariaDB Server is contributed only under the GPLv2, including code
from the MariaDB Corporation, Oracle, Galera, Tokutek, ...

You already indicated that your code is contributed under the GPLv2 by
publishing it on a public github branch here:

https://github.com/GCSAdmin/MariaDB/tree/10.2-binlog-speed-limit

Please consider keeping your contribution under the GPLv2. Like other open
source projects, the MariaDB server needs the strong protection that the GPL
gives.

You are of course free to use whatever license you choose for your own
work. But I cannot be seen as supporting this continued abuse from managers
at the MariaDB Corporation, people that will not even stand forward
publically. So If you choose to support the BSD/MCA, I regret that I will
not be able to look into your pull request.

Otherwise, I will be happy to look into it, probably sometimes during the
coming week.

Thanks,

  • Kristian.
Member

knielsen commented Oct 15, 2016

Sergey Vojtovich notifications@github.com writes:

Similar to other open source projects, the MariaDB Foundation needs to
have shared ownership of all code that is included in the MariaDB
distribution. The easiest way to achieve this is by submitting your
code under the BSD-new license.

This is a deliberate lie. There is no such requirement. In fact, most of the
code in MariaDB Server is contributed only under the GPLv2, including code
from the MariaDB Corporation, Oracle, Galera, Tokutek, ...

You already indicated that your code is contributed under the GPLv2 by
publishing it on a public github branch here:

https://github.com/GCSAdmin/MariaDB/tree/10.2-binlog-speed-limit

Please consider keeping your contribution under the GPLv2. Like other open
source projects, the MariaDB server needs the strong protection that the GPL
gives.

You are of course free to use whatever license you choose for your own
work. But I cannot be seen as supporting this continued abuse from managers
at the MariaDB Corporation, people that will not even stand forward
publically. So If you choose to support the BSD/MCA, I regret that I will
not be able to look into your pull request.

Otherwise, I will be happy to look into it, probably sometimes during the
coming week.

Thanks,

  • Kristian.
@knielsen

This comment has been minimized.

Show comment
Hide comment
@knielsen

knielsen Oct 19, 2016

Member

GCSAdmin notifications@github.com writes:

In some case, the speed of reading binlog from master is high, especially when doing a new replica.
It would bring the high traffic in master.
So We introduce a new variable "read_binlog_speed_limit" to control the binlog reading rate for IO thread to solve the problem.

It can work when slave_compressed_protocol is on.
But it maybe doesn't work well when the binlog event is very big.
You can view, comment on, or merge this pull request online at:

#246

-- Patch Links --

https://github.com/MariaDB/server/pull/246.patch
https://github.com/MariaDB/server/pull/246.diff

Overall this looks clean and simple.

There is one problem. The patch adds a field real_network_read_len to the
NET structure. This will break the client library ABI, because NET is a part
of MYSQL. So this would cause client programs to crash if they are linked
with a different version of the client library. So this needs to be changed
(if I understand correctly).

One option might be to introduce new functions like cli_safe_read_reallen()
and my_net_read_packet_reallen(), which return in addition the actual amount
of bytes read from the server. The old cli_safe_read() and
my_net_read_packet() could then become simple wrapper functions around
those. And cli_safe_read_reallen() can be used in read_event() in
sql/slave.cc.

A smaller issue is that in case of a large packet, a large my_sleep() may be
invoked, which will cause STOP SLAVE to hang. I think this can be solved
simply by calling slave_sleep() instead, it handles terminating the wait
early if interrupted by STOP SLAVE.

Detailed comments on the patch below. I rebased the series against latest
10.2 to get a clean diff (the pull request includes a couple merges against
the main 10.2 tree, these changes are unrelated to the patch). The rebase is
in https://github.com/knielsen/server/tree/GCSAdmin-10.2-binlog-speed-limit-2

diff --git a/include/mysql.h.pp b/include/mysql.h.pp
index 857f5b9..1da038c 100644
--- a/include/mysql.h.pp
+++ b/include/mysql.h.pp
@@ -35,6 +35,7 @@ typedef struct st_net {
my_bool thread_specific_malloc;
unsigned char compress;
my_bool unused3;

  • unsigned long real_network_read_len;

As explained above, I believe this would break the ABI (that's the purpose
of mysql.h.pp, to catch such problems).

diff --git a/sql/slave.cc b/sql/slave.cc
index 20bf68e..52bb668 100644
--- a/sql/slave.cc
+++ b/sql/slave.cc

@@ -3307,13 +3308,14 @@ static int request_dump(THD thd, MYSQL mysql, Master_info* mi,
try a reconnect. We do not want to print anything to
the error log in this case because this a anormal
event in an idle server.

  • network_read_len get the real network read length in VIO, especially using compressed protocol

RETURN VALUES
'packet_error' Error
number Length of packet
*/

-static ulong read_event(MYSQL* mysql, Master_info mi, bool suppress_warnings)
+static ulong read_event(MYSQL* mysql, Master_info mi, bool suppress_warnings, ulong* network_read_len)

Generally, lines longer than 80 characters should be avoided (coding style).

@@ -4473,6 +4479,34 @@ Stopping slave I/O thread due to out-of-memory error from master");
goto err;
}

  •  /\* Control the binlog read speed of master when read_binlog_speed_limit is non-zero
    
  •  */
    
  •  ulonglong read_binlog_speed_limit_in_bytes = opt_read_binlog_speed_limit \* 1024;
    
  •  if (read_binlog_speed_limit_in_bytes) 
    
  •  {
    
  •    /\* prevent the tokenamount become a large value, 
    
  •    for example, the IO thread doesn't work for a long time
    
  •    */
    
  •    if (tokenamount > read_binlog_speed_limit_in_bytes \* 2) 
    
  •    {
    
  •      lastchecktime = my_hrtime().val;
    
  •      tokenamount = read_binlog_speed_limit_in_bytes \* 2;
    
  •    }
    
  •    do
    
  •    {
    
  •      ulonglong currenttime = my_hrtime().val;
    
  •      tokenamount += (currenttime - lastchecktime) \* read_binlog_speed_limit_in_bytes / (1000*1000);
    
  •      lastchecktime = currenttime;
    
  •      if(tokenamount < network_read_len)
    
  •      {
    
  •        ulonglong micro_sleeptime = 1000*1000 \* (network_read_len - tokenamount) / read_binlog_speed_limit_in_bytes ;  
    
  •        my_sleep(micro_sleeptime > 1000 ? micro_sleeptime : 1000); // at least sleep 1000 micro second
    
  •      }
    
  •    }while(tokenamount < network_read_len);
    
  •    tokenamount -= network_read_len;
    
  •  }
    

As explained above, probably better to use slave_sleep() here to allow STOP
SLAVE to interrupt a long sleep.

Would it make sense to do this wait after calling queue_event()? This way
the SQL thread can start applying the event immediately, reducing slave lag.

What do you think?

Thanks,

  • Kristian.
Member

knielsen commented Oct 19, 2016

GCSAdmin notifications@github.com writes:

In some case, the speed of reading binlog from master is high, especially when doing a new replica.
It would bring the high traffic in master.
So We introduce a new variable "read_binlog_speed_limit" to control the binlog reading rate for IO thread to solve the problem.

It can work when slave_compressed_protocol is on.
But it maybe doesn't work well when the binlog event is very big.
You can view, comment on, or merge this pull request online at:

#246

-- Patch Links --

https://github.com/MariaDB/server/pull/246.patch
https://github.com/MariaDB/server/pull/246.diff

Overall this looks clean and simple.

There is one problem. The patch adds a field real_network_read_len to the
NET structure. This will break the client library ABI, because NET is a part
of MYSQL. So this would cause client programs to crash if they are linked
with a different version of the client library. So this needs to be changed
(if I understand correctly).

One option might be to introduce new functions like cli_safe_read_reallen()
and my_net_read_packet_reallen(), which return in addition the actual amount
of bytes read from the server. The old cli_safe_read() and
my_net_read_packet() could then become simple wrapper functions around
those. And cli_safe_read_reallen() can be used in read_event() in
sql/slave.cc.

A smaller issue is that in case of a large packet, a large my_sleep() may be
invoked, which will cause STOP SLAVE to hang. I think this can be solved
simply by calling slave_sleep() instead, it handles terminating the wait
early if interrupted by STOP SLAVE.

Detailed comments on the patch below. I rebased the series against latest
10.2 to get a clean diff (the pull request includes a couple merges against
the main 10.2 tree, these changes are unrelated to the patch). The rebase is
in https://github.com/knielsen/server/tree/GCSAdmin-10.2-binlog-speed-limit-2

diff --git a/include/mysql.h.pp b/include/mysql.h.pp
index 857f5b9..1da038c 100644
--- a/include/mysql.h.pp
+++ b/include/mysql.h.pp
@@ -35,6 +35,7 @@ typedef struct st_net {
my_bool thread_specific_malloc;
unsigned char compress;
my_bool unused3;

  • unsigned long real_network_read_len;

As explained above, I believe this would break the ABI (that's the purpose
of mysql.h.pp, to catch such problems).

diff --git a/sql/slave.cc b/sql/slave.cc
index 20bf68e..52bb668 100644
--- a/sql/slave.cc
+++ b/sql/slave.cc

@@ -3307,13 +3308,14 @@ static int request_dump(THD thd, MYSQL mysql, Master_info* mi,
try a reconnect. We do not want to print anything to
the error log in this case because this a anormal
event in an idle server.

  • network_read_len get the real network read length in VIO, especially using compressed protocol

RETURN VALUES
'packet_error' Error
number Length of packet
*/

-static ulong read_event(MYSQL* mysql, Master_info mi, bool suppress_warnings)
+static ulong read_event(MYSQL* mysql, Master_info mi, bool suppress_warnings, ulong* network_read_len)

Generally, lines longer than 80 characters should be avoided (coding style).

@@ -4473,6 +4479,34 @@ Stopping slave I/O thread due to out-of-memory error from master");
goto err;
}

  •  /\* Control the binlog read speed of master when read_binlog_speed_limit is non-zero
    
  •  */
    
  •  ulonglong read_binlog_speed_limit_in_bytes = opt_read_binlog_speed_limit \* 1024;
    
  •  if (read_binlog_speed_limit_in_bytes) 
    
  •  {
    
  •    /\* prevent the tokenamount become a large value, 
    
  •    for example, the IO thread doesn't work for a long time
    
  •    */
    
  •    if (tokenamount > read_binlog_speed_limit_in_bytes \* 2) 
    
  •    {
    
  •      lastchecktime = my_hrtime().val;
    
  •      tokenamount = read_binlog_speed_limit_in_bytes \* 2;
    
  •    }
    
  •    do
    
  •    {
    
  •      ulonglong currenttime = my_hrtime().val;
    
  •      tokenamount += (currenttime - lastchecktime) \* read_binlog_speed_limit_in_bytes / (1000*1000);
    
  •      lastchecktime = currenttime;
    
  •      if(tokenamount < network_read_len)
    
  •      {
    
  •        ulonglong micro_sleeptime = 1000*1000 \* (network_read_len - tokenamount) / read_binlog_speed_limit_in_bytes ;  
    
  •        my_sleep(micro_sleeptime > 1000 ? micro_sleeptime : 1000); // at least sleep 1000 micro second
    
  •      }
    
  •    }while(tokenamount < network_read_len);
    
  •    tokenamount -= network_read_len;
    
  •  }
    

As explained above, probably better to use slave_sleep() here to allow STOP
SLAVE to interrupt a long sleep.

Would it make sense to do this wait after calling queue_event()? This way
the SQL thread can start applying the event immediately, reducing slave lag.

What do you think?

Thanks,

  • Kristian.
@vinchen

This comment has been minimized.

Show comment
Hide comment
@vinchen

vinchen Oct 21, 2016

Contributor

Thanks for you suggestions for this patch.

cli_safe_read_reallen() and and my_net_read_packet_reallen() is a good way to fix the ABI problem. I will fix it like this.

And the minimum precision is second in slave_sleep(), and it also a the mutex. I think it is too heavy in most case. (It will wait millisecond seconds usually)
So I will use my_sleep when wait less than 1 second, otherwise use slave_sleep.

Is it ok?

Contributor

vinchen commented Oct 21, 2016

Thanks for you suggestions for this patch.

cli_safe_read_reallen() and and my_net_read_packet_reallen() is a good way to fix the ABI problem. I will fix it like this.

And the minimum precision is second in slave_sleep(), and it also a the mutex. I think it is too heavy in most case. (It will wait millisecond seconds usually)
So I will use my_sleep when wait less than 1 second, otherwise use slave_sleep.

Is it ok?

@knielsen

This comment has been minimized.

Show comment
Hide comment
@knielsen

knielsen Oct 21, 2016

Member

Pushed into 10.2, thanks!

Member

knielsen commented Oct 21, 2016

Pushed into 10.2, thanks!

@knielsen knielsen closed this Oct 21, 2016

@knielsen

This comment has been minimized.

Show comment
Hide comment
@knielsen

knielsen Oct 25, 2016

Member

vinchen notifications@github.com writes:

cli_safe_read_reallen() and and my_net_read_packet_reallen() is a good
way to fix the ABI problem. I will fix it like this.

And the minimum precision is second in slave_sleep(), and it also a
the mutex. I think it is too heavy in most case. (It will wait
millisecond seconds usually)
So I will use my_sleep when wait less than 1 second, otherwise use slave_sleep.

Is it ok?

Yes, it looks fine thanks!

And thanks for the explanation, yes now I see that usually the sleep will be
quite small, unless there is a huge event. So I agree with using my_sleep()
in the common case.

I'll merge it like this after checking that everything compiles and tests
ok.

Thanks!

  • Kristian.
Member

knielsen commented Oct 25, 2016

vinchen notifications@github.com writes:

cli_safe_read_reallen() and and my_net_read_packet_reallen() is a good
way to fix the ABI problem. I will fix it like this.

And the minimum precision is second in slave_sleep(), and it also a
the mutex. I think it is too heavy in most case. (It will wait
millisecond seconds usually)
So I will use my_sleep when wait less than 1 second, otherwise use slave_sleep.

Is it ok?

Yes, it looks fine thanks!

And thanks for the explanation, yes now I see that usually the sleep will be
quite small, unless there is a huge event. So I agree with using my_sleep()
in the common case.

I'll merge it like this after checking that everything compiles and tests
ok.

Thanks!

  • Kristian.

@svoj svoj added this to the 10.2 milestone Jul 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment