Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MyCat1.4 AIO meet MySQL errno 1156 Got packets out of order #700

Open
ZzzCrazyPig opened this issue Jan 4, 2016 · 16 comments
Open

MyCat1.4 AIO meet MySQL errno 1156 Got packets out of order #700

ZzzCrazyPig opened this issue Jan 4, 2016 · 16 comments

Comments

@ZzzCrazyPig
Copy link
Contributor

#1. 错误描述

在对MyCat1.4版本AIO网络模型并发测试下,发现经常发生Mysql errno 1156 Got packets out of order

以下是MyCat日志报告信息:

WARN [$_AIO3] (SingleNodeHandler.java:222) -execute  sql err : errno:1156 Got packets out of order con:MySQLConnection [id=19, lastTime=1451891972458, user=root, schema=test, old shema=test, borrowed=true, fromSlaveDB=false, threadId=205, charset=utf8, txIsolation=3, autocommit=true, attachment=dn1{SHOW WARNINGS}, respHandler=SingleNodeHandler [node=dn1{SHOW WARNINGS}, packetId=1], host=localhost, port=3306, statusSync=null, writeQueue=0, modifiedSQLExecuted=false] frontend host:127.0.0.1/63900/root

#2. 错误复现

2.1 测试环境

  • win7 64位
  • jdk7 64位
  • mysql 5.6

2.2 测试表结构

mysql> use test;
Database changed
mysql> desc employee;
+---------+--------------+------+-----+---------+----------------+
| Field   | Type | Null | Key | Default | Extra  |
+---------+--------------+------+-----+---------+----------------+
| id  | int(11)  | NO   | PRI | NULL| auto_increment |
| name| varchar(20)  | YES  | | NULL||
| address | varchar(255) | YES  | | NULL||
| phone   | varchar(50)  | YES  | | NULL||
+---------+--------------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

2.3 MyCat配置

(1) schema.xml配置

<?xml version="1.0"?>
<!DOCTYPE mycat:schema SYSTEM "schema.dtd">
<mycat:schema xmlns:mycat="http://org.opencloudb/">

    <schema name="testdb" checkSQLschema="false" sqlMaxLimit="100">
        <table name="employee" primaryKey="id" autoIncrement="true" dataNode="dn1" />
    </schema>

    <dataNode name="dn1" dataHost="localhost" database="test" />


    <dataHost name="localhost" maxCon="1000" minCon="5" balance="0" writeType="0" dbType="mysql" dbDriver="native">
        <heartbeat>select user()</heartbeat>
        <writeHost host="hostM1" url="localhost:3306" user="root" password="mysql">
        </writeHost>
    </dataHost>

</mycat:schema>

(2) server.xml配置

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE mycat:server SYSTEM "server.dtd">
<mycat:server xmlns:mycat="http://org.opencloudb/">
    <system>
        <property name="defaultSqlParser">druidparser</property>
        <property name="serverPort">8066</property>
        <property name="managerPort">9066</property>
        <property name="usingAIO">1</property>
    </system>
    <user name="root">
        <property name="password">mysql</property>
        <property name="schemas">testdb</property>
    </user>


</mycat:server>

(3) sequence_conf.properties配置

#default global sequence
GLOBAL.HISIDS=
GLOBAL.MINID=10001
GLOBAL.MAXID=20000
GLOBAL.CURID=10000

# self define sequence
COMPANY.HISIDS=
COMPANY.MINID=1001
COMPANY.MAXID=2000
COMPANY.CURID=1000

CUSTOMER.HISIDS=
CUSTOMER.MINID=1001
CUSTOMER.MAXID=2000
CUSTOMER.CURID=1000

ORDER.HISIDS=
ORDER.MINID=1001
ORDER.MAXID=2000
ORDER.CURID=1000

HOTNEWS.HISIDS=
HOTNEWS.MINID=1001
HOTNEWS.MAXID=2000
HOTNEWS.CURID=1000

EMPLOYEE.HISIDS=
EMPLOYEE.MINID=100000000
EMPLOYEE.MAXID=200000000
EMPLOYEE.CURID=100099999

2.4 测试程序配置

测试程序下载地址 : https://github.com/ZzzCrazyPig/TestMyCatErr1156.git

(1) 测试程序配置随机sql

config目录 : randomSql.txt

[insert into employee(name,address,phone) values('unknown','unknown','unknown')]
[select * from employee where id = 1]
[insert into employee(name,address,phone) values('UNKNOWN','UNKNOWN','UNKNOWN')]
[select id,name from employee limit 1]
[insert into employee(name,address,phone) values('EMP','GZ','unknown')]
[update employee set address = 'UNKNOWN' where id = 4]
[select * from employee limit 5]
[update employee set address = 'UNKNOWN' where id = 1234]
[select id,name,address from employee limit 10]

(2) 配置TestMyCatErr1156.java启动参数示例:

-uroot -pmysql -Dtestdb -P8066 -c100 -t20

-u :  MyCat登录用户
-p : MyCat登录密码
-D : 需要连接的MyCat数据库
-P : MyCat端口
-c : 测试的并发访问数量,即一次有多少个sql语句同时通往MyCat
-t : 测试维护的线程池大小

**【注意】**测试程序一直循环测试,直到遇到错误码1156即停止测试

@songwie
Copy link
Contributor

songwie commented Jan 5, 2016

aio 不推荐使用了

@magicdoom
Copy link
Contributor

后续考虑废弃AIO

@magicdoom magicdoom reopened this May 22, 2016
@magicdoom
Copy link
Contributor

你可以考虑跟踪下具体原因,另外有同学反馈是原因就是同一个buffer被多个io线程同时写数据导致,你可以具体分析下

@ZzzCrazyPig
Copy link
Contributor Author

我们最终也是怀疑bufferpool里的buffer由于多线程原因导致buffer里面的数据乱掉,但是能力有限,没排查出来,但是我们测试过如果不通过bufferpool取buffer,而每次new bytebuffer来使用,那么是不会发生那个ERROR的。

@HashZhang
Copy link
Contributor

试试新版的bytebufferarena还有directbuffer

@magicdoom
Copy link
Contributor

每次new出来就失去了缓存池的意义了 应该是aio的实现有问题 导致同一个buffer被多个线程同时获取到了

@ZzzCrazyPig
Copy link
Contributor Author

恩 我懂 我们只是为了将问题排查尽量缩小范围,并发问题比较难排查,我们之前也在怀疑是不是DirectByteBuffer的问题,但是也没试验过。这个问题确实隔了太久,也从来没人提过遇到这个问题,并且这么多前辈都说用NIO,要考虑废弃AIO,所以我们这边也没出人力去debug这个问题了

@danielzheng
Copy link

最近大量碰到这个问题,抓包看回包内容是最近一次收到包的拷贝。怀疑也是buffer问题。

@huangyiminghappy
Copy link
Contributor

huangyiminghappy commented Mar 2, 2017

我们在对mycat压测过程中也发现了该问题,不过我们使用的是NIO,,buffer可能还是有问题:
2017-03-02 03:38:59.453 WARN [$_NIOREACTOR-1-RW] (io.mycat.backend.mysql.nio.handler.SingleNodeHandler.backConnectionErr(SingleNodeHandler.java:248)) - execute sql err : errno:1156 Got packets out of order con:MySQLConnection [id=795, lastTime=1488425939443, user=mycat_user, schema=xxx, old shema=xxx, borrowed=true, fromSlaveDB=false, threadId=1138, charset=utf8, txIsolation=3, autocommit=true, attachment=dn1{select * from xxx where id =59210245}, respHandler=SingleNodeHandler [node=dn1{select * from xxx where id =59210245}, packetId=1], host=xxxx, port=3306, statusSync=null, writeQueue=0, modifiedSQLExecuted=false

@songwie
Copy link
Contributor

songwie commented Mar 2, 2017

最近发现的包乱序是,keepalived 心跳导致的,你们看看你们的对mycat的心跳是不是频率太大,或者有问题

@ZzzCrazyPig
Copy link
Contributor Author

@huangyiminghappy 你的版本是?这个issue是我在1.4版本里面发现的,当时换成NIO测试,没发现包乱序问题

@huangyiminghappy
Copy link
Contributor

@ZzzCrazyPig 我是1.6开发版

@huangyiminghappy
Copy link
Contributor

huangyiminghappy commented Mar 2, 2017

https://github.com/mysqljs/mysql/issues/1363 出现1156 Got packets out of order con有多种可能,有一种可能是mysq限制了server接受的数据包大小。
MySQL根据配置文件会限制server接受的数据包大小。
有时候大的插入和更新会被max_allowed_packet 参数限制掉,导致失败。
查看目前配置
show VARIABLES like '%max_allowed_packet%';
显示的结果为:

+--------------------+---------+
| Variable_name | Value |
+--------------------+---------+
| max_allowed_packet | 1048576 |
+--------------------+---------+

以上说明目前的配置是:1M
处理方式:
http://stackoverflow.com/questions/8062496/how-to-change-max-allowed-packet-size

但是,目前遇到的不是mysql限制的问题,我们配置的允许大小有1G

@huangyiminghappy
Copy link
Contributor

huangyiminghappy commented Mar 2, 2017

经过定位,发现是MySQL服务端响应客户端查询请求的第一阶段handleErrorPacket这里引发的错误,在这里收到mysql服务端发来的错误包:
@Override protected void handleData(byte[] data) { switch (resultStatus) { case RESULT_STATUS_INIT: switch (data[4]) { case OkPacket.FIELD_COUNT: handleOkPacket(data); break; case ErrorPacket.FIELD_COUNT: handleErrorPacket(data); break; case RequestFilePacket.FIELD_COUNT: handleRequestPacket(data); break; default: resultStatus = RESULT_STATUS_HEADER; header = data; fields = new ArrayList<byte[]>((int) ByteUtil.readLength(data, 4)); } break;

实际上确实冤枉,这简单的语句怎么会部分有错部分没错,id是某范围内随机的(select * from sbtest1 where id =71822855)
2017-03-02 08:40:52.665 WARN [$_NIOREACTOR-0-RW] (io.mycat.backend.mysql.nio.handler.SingleNodeHandler.backConnectionErr(SingleNodeHandler.java:253)) - execute sql err : errno:1156 Got packets out of order con:MySQLConnection [id=108, lastTime=1488444052648, user=mycat_user, schema=XXXX, old shema=XXXX, borrowed=true, fromSlaveDB=false, threadId=15222, charset=utf8, txIsolation=3, autocommit=true, attachment=dn1{select * from sbtest1 where id =71822855}, respHandler=SingleNodeHandler [node=dn1{select * from sbtest1 where id =71822855}, packetId=1], host=XXXX, port=3306, statusSync=null, writeQueue=0, modifiedSQLExecuted=false] frontend host:XXXX/38672/XXX

@wangcr2015
Copy link
Contributor

wangcr2015 commented May 19, 2017

我在生产环境中的1.5release版本上,NIO,也偶尔有1156 Got packets out of order 的错误,另外,mycat貌似对MySQL的报文没有做拆包处理,所以mycat端能够处理的最大的报文是16M还是8M。不过生产上SQL都很短小,应该不是报文过长导致的1156error

magicdoom added a commit that referenced this issue May 25, 2017
此处在高并发情况下会存在并发问题, fixed #1072  极有可能解决了 #700
@funnyAnt
Copy link
Collaborator

funnyAnt commented Aug 24, 2019

@ZzzCrazyPig 根据你提供的代码已经找到问题所在,buffer被2次重复释放导致。 这个问题在1.6版本NIO 模式下,已在2018/9/7 修改全局序列db方式,生成的主键因为数据库压力 导致主键重复,sql超时导致串包问题 这个提交里面 修复

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants