Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race Condition with multiple Databases #20

Closed
bvdeenen opened this issue Feb 1, 2012 · 10 comments
Closed

Race Condition with multiple Databases #20

bvdeenen opened this issue Feb 1, 2012 · 10 comments
Assignees

Comments

@bvdeenen
Copy link
Collaborator

bvdeenen commented Feb 1, 2012

Hi all

I'm new here, but I seem to have stumbled upon a race condition where simultaneously executing queries on a certain pool end up using a connection that actually belongs to another pool. I've narrowed it down to a simple testcase using two sql tables in two database

mysqldump -udemo -pdemo test2
CREATE TABLE `test` ( `b` int(11) NOT NULL )
mysqldump -udemo -pdemo test1
CREATE TABLE `test` ( `a` int(11) NOT NULL )

I this erl file:

-module(em).

-include_lib("eunit/include/eunit.hrl").
-include_lib("emysql/include/emysql.hrl").

-compile(export_all).

t_test() ->

    crypto:start(),
    application:start(emysql),

    try emysql:remove_pool(test1) catch _:_ -> ok end,
    try emysql:remove_pool(test2) catch _:_ -> ok end,
    try emysql:remove_pool(test3) catch _:_ -> ok end,

    emysql:add_pool(test1, 1, "demo", "demo", "localhost", 3306, "test1", utf8),
    emysql:add_pool(test2, 1, "demo", "demo", "localhost", 3306, "test2", utf8),
    emysql:add_pool(test3, 1, "demo", "demo", "localhost", 3306, "test1", utf8),

    ?assertMatch( #result_packet{}, emysql:execute(test1, "select a from test")),

    ?assertMatch( #result_packet{}, emysql:execute(test2, "select b from test")),

    ?assertMatch( #result_packet{}, emysql:execute(test3, "select a from test")),

    F=fun() ->
        timer:sleep(100+random:uniform(500)),
        ?assertMatch( #result_packet{}, emysql:execute(test2, "select b from test")),
        ?assertMatch( #result_packet{}, emysql:execute(test3, "select a from test")),
        ok
    end,
    [spawn(F) || _ <- lists:seq(1,5)],
    timer:sleep(1000),
    ok.

Running em:test() with values of 1 for the nconnections in the emysql:add_pool(...) causes an assertMatch failure:

=ERROR REPORT==== 1-Feb-2012::14:53:06 ===
Error in process <0.10468.0> with exit value: {{assertMatch_failed,[{module,em},{line,29},{expression,"emysql : 
execute ( test2 , \"select b from test\" )"},{pattern,"# result_packet { }"},
{value,{error_packet,1,1054,<<5 bytes>>,"Unknown column 'b' in 'field list'"}}]},[{em,'-t_test/0-fun-3-'... 

From other tests in more complex situations with actual data in the columns, I know that it was actually talking on another connection than the one specified with add_pool. I also notice that when the error occurs, if you do a emysql_conn_mgr:pools() call, the order of the pools is changed from the initial order.

If you increase the connections in the add_pool call, the whole thing works fine and you get no errors.

@ghost ghost assigned hdiedrich Feb 1, 2012
@Eonblast
Copy link
Owner

Eonblast commented Feb 1, 2012

Thanks a lot, can you say this in other words:

" I know that it was actually talking on another connection than the one specified with add_pool. I also notice that when the error occurs, if you do a emysql_conn_mgr:pools() call, the order of the pools is changed from the initial order."

I am not sure I understand what you mean.

Regardless, thanks for the error reproduction code, I'll check it out.

Henning

@bvdeenen
Copy link
Collaborator Author

bvdeenen commented Feb 1, 2012

I found this bug in quite complex production code, where upon getting the error (field so-and-so not defined), I finally did a query "select * from preferences" in my production code, and found that I was talking to the 'preferences' table on the WRONG database (that table actually had a different schema).

Calling emysql_conn_mgr:pools() shows a different order of pools when the error occurs, than after the initial initialization of the pools with emysql:add_pool().

@bvdeenen
Copy link
Collaborator Author

bvdeenen commented Feb 1, 2012

It seems I've stumbled upon a bug in the pool management mechanism, where if you run into limits of the available connections, you might actually end up at the wrong poolname.

If for instance I increase the poolsize parameter to 20 in the add_pool() call, but increase the size of my spawned functions list to 50, I trigger the same bug.

So I've changed the title, because it has nothing to do with a specific connection size of 1 in emysql:add_pool()

@Eonblast
Copy link
Owner

Eonblast commented Feb 1, 2012

I found it. You're probably using Emysql in an unforeseen way, surprising as that may sound. I will research that out of curiosity, if that was not envisioned or broke somewhere along the way.

We added automatic connection recovery a year ago and a lot of things had to be moved around for this and some things broke.

But the fundamental architecture is that there is only one wait queue for all pools. This seems wrong.

Are you in a hurry? There is an obvious quick fix, if I am right. The error probably happens in emysql_conn_mgr.erl, in pass_on_or_queue_as_available the line {{value, Pid}, Waiting1} = queue:out(Waiting) should check for the PoolId of the queued execution process.

But I am interested in getting it right and adding queues per pool, as I think it should be.

Why this is as it is, I had a conversation with Nick once where I remember I was surprised about his answer as to why they had multiple pools of multiple connections in the first place. Maybe they originally solved the problem of broken connections by adding new pools.

So the actual headline might be something about connections running out when pools go to different databases.

Anyway, good find, thanks.

@Eonblast
Copy link
Owner

Eonblast commented Feb 1, 2012

Could you check out branch issue20 and give it a try?

git fetch
git checkout issue20

This is a verbose commit that should help pinpointing any remaining issues.

Here is the changelog if you're interested: 88a9bb6

Thanks,
Henning

@Eonblast
Copy link
Owner

Eonblast commented Feb 1, 2012

The bug was not fixed in place but an architectural anomaly straightened out:

the execution process wait queue mechanism has been changed to have one queue per pool, instead of one query per connection manager. Reason for this architecture may have been that it was projected to have multiple connection managers, one per pool.

Fixed with 88a9bb6 (tracing source).

@Eonblast Eonblast closed this as completed Feb 1, 2012
hdiedrich pushed a commit that referenced this issue Feb 1, 2012
hdiedrich pushed a commit that referenced this issue Feb 1, 2012
hdiedrich pushed a commit that referenced this issue Feb 1, 2012
@Eonblast
Copy link
Owner

Eonblast commented Feb 1, 2012

Added common test suite to cover this issue.

See README for databases needed for the test.

Run the test from root with make test2.

@bvdeenen
Copy link
Collaborator Author

bvdeenen commented Feb 1, 2012

Wow that was quick!
I'll give it a try tomorrow. Thanks for the good work :-)

@bvdeenen
Copy link
Collaborator Author

bvdeenen commented Feb 2, 2012

Ok it works great! Thanks for the very quick work.

@Eonblast
Copy link
Owner

Eonblast commented Feb 2, 2012

Welcome!

Since I saw you are using eunit, maybe you are curious to check out the pool_SUITE I added in test/, which is your test code but worked for Common Test, using plain Erlang matches to test.

Cheers,
Henning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants