Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reusing connections in a transaction state (?) #132

Open
sgoudelis opened this issue May 17, 2018 · 5 comments
Open

reusing connections in a transaction state (?) #132

sgoudelis opened this issue May 17, 2018 · 5 comments

Comments

@sgoudelis
Copy link

sgoudelis commented May 17, 2018

Good morning,

I am using:

PyPy 5.10.0
txredisapi 1.4.4
Twisted 17.5.0

When under load (lots of redis traffic) there are occasions where, when doing a burst of SCANs to get a list of keys from Redis, instead of getting back cursor information and data, we get the string "QUEUED". From what I understand so far the string "QUEUED" is a reply from Redis when I am issuing commands into a transaction. I am using transactions to periodically update a group of Redis keys asynchronously.

We are using makeUnixConnection() with a pool size of 20.

Here is how I am doing SCANs:

    while not endoflist:
        try:
            result = yield globalredisconnection.scan(cursor, pattern, limit)

            cursor = int(result[0])
            keys += result[1]

            if cursor == 0:
                # reached end of list
                endoflist = True
        except Exception as e:
            logger.error("Something went wrong while doing SCAN (%s): %s" % (result, e))
            logger.exception(e)

And here is a sample print out of the error occurring:

2018-05-11 01:28:34,278 ERROR: Something went wrong while doing SCAN (QUEUED): invalid literal for int() with base 10: 'Q'
2018-05-11 01:28:34,278 ERROR: invalid literal for int() with base 10: 'Q'
Traceback (most recent call last):
  File "/opt/backend/backend.py", line 3187, in getKeyList
    cursor = int(result[0])
ValueError: invalid literal for int() with base 10: 'Q'

Can anyone help to shed some light onto this issue ?

@IlyaSkriblovsky
Copy link
Owner

This certainly seems like a bug in txredisapi. Looks like when you call scan(), it takes a connection from the pool, but this connection is within a transaction. If I recall design correctly, connections that are in transaction state should not be available to get from the pool. So we need to understand how transactioned connection might get back in the pool before transaction is completed.

From a quick glance on the source code, I haven't found any clues. But txredisapi's implementation of transactions is far from ideal, so it is easy to miss something...

Do you use .pipeline() or .unwatch()? If your code using transaction is short and may be published, can you please post it here?

@sgoudelis
Copy link
Author

We use .pipeline() quite a bit in our code. We also use .multi() far less than .pipeline() but still periodically. We do not use .unwatch().

Here is a sample code using .multi()

    try:
        keylist = yield getKeyList("mykey:%s:%s:*" % (var1, var2))

        if len(keylist) > 0:
            # starting a transaction
            transaction = yield globalredisconnection.multi()

            for fetchedkey in keylist:
                transaction.delete(fetchedkey)

            deleteall = yield transaction.commit()

    except Exception as e:
        logger.error("Something went wrong while deleting keys from redis (%s)" % e)
        logger.exception(e)

@IlyaSkriblovsky
Copy link
Owner

IlyaSkriblovsky commented May 17, 2018

I still can't find the bug that makes connection that is currently inside transaction to be available in the pool :( Seems like it somehow gets back to RedisFactory.connectionQueue before transaction is committed or rolled back 😕

BTW, in this particular code you don't need transaction at all — you may just call globalredisconnection.delete(keylist). .delete() (as well as some other methods) supports list of keys to be passed to it.

BTW 2: .pipeline() is almost never needed when using txredisapi because all queries are implicitly pipelined due to async nature of Twisted if you don't use yield on them. These two snippets have almost identical effects:

pipe = yield redis.pipeline()
pipe.op1(args1)
pipe.op2(args2)
pipe.op3(args3)
results = yield pipe.execute_pipeline()
results = yield defer.gatherResults([
    redis.op1(args1),
    redis.op2(args2),
    redis.op3(args3)
])

But using .pipeline() can have negative effect: if you call samething that can take some time between .pipeline() and .execute_pipeline(), connection will not be available in the pool that may be suboptimal.

pipe = redis.pipeline()
pipe.op1(args1)
yield treq.get("http://...")   # ← the connection to redis will not be available to another code while request is in progress
pipe.op2(args2)
yield redis.execute_pipeline()

My strong opinion is that pipelines should not have been implemented in txredisapi at all. I think this is the result of misconception about how async network calls is different from sync ones :)

@sgoudelis
Copy link
Author

Thank you very much for your effort and time. I will look into this further. Regarding you comments about pipelining. I have a different opinion although not as educated as yours. Reading this https://redis.io/topics/pipelining specifically where it says:

$ (printf "PING\r\nPING\r\nPING\r\n"; sleep 1) | nc localhost 6379
+PONG
+PONG
+PONG

I see benefit in grouping commands (in some cases) in that way due to the reduced RTT. It is also stated below. I do not know how this library is implemented. Is it using something this like this ? How is pipelining implement in this library ?

Thank you again for you time

@IlyaSkriblovsky
Copy link
Owner

IlyaSkriblovsky commented May 17, 2018

Since txredisapi is based on Twisted that is all about asynchrony, you may choose yourself whether you want to stop and wait for the end of operation or want to do something else while it is in progress.

If you do:

redis.set('a', 1)
redis.set('b', 2)
redis.set('c', 3)

then all three commands will be actually sent simultaneously, without waiting for each other's responses. When interpeter hits set('b', 2) call, it doesn't yet received response to the set('a', 1). This is not like when you use synchronous redis-py library.

But if you actually want to wait for the previous operation result before sending the next one, Twisted gives you the very convenient @inlineCallbacks:

@inlineCallbacks
def f():
    yield redis.set('a', 1)
    yield redis.set('b', 2)

yield makes the execution to pause on the first set and continue to run code of f() only after response is received (actually, execution is not stopped completely, but instead Twisted's reactor switches to running another coroutines).

So, pipelining Redis requests is fantastic thing and one will only receive a little fraction of Redis performance without it. But with Twisted you can receive pipelining for free without special support from txredisapi, just by not waiting on responses of previous commands before sending next ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants