sqs write_batch creates garbled messages #831

Open
sbailey opened this Issue Jun 20, 2012 · 11 comments

Comments

Projects
None yet
8 participants
@sbailey

sbailey commented Jun 20, 2012

When using sqs.queue.Queue.write_batch(), some of my message come back garbled. I do not have this problem with normal Queue.write().

boto version 2.5.1

Example code:

import boto.sqs
sqs = boto.sqs.connect_to_region('us-west-1')
q = sqs.create_queue('blat')

messages = list()
messages.append( (1, 'echo "hello"', 0) )
messages.append( (2, 'echo "hi"', 0) )
messages.append( (3, 'echo "goodbye"', 0) )
messages.append( (4, 'echo "done"', 0) )    #- Garbled return message
messages.append( (5, 'done', 0) )           #- Garbled return message
messages.append( (6, 'don', 0) )            #- OK
messages.append( (7, 'echo "hello"', 0) )

q.write_batch(messages)

while True:
    m = q.read()
    if m is not None:
        q.delete_message(m)
        print m.get_body()
    else:
        break

Results:

FF:qdo $ python test_boto.py 
echo "goodbye"
v??
echo "hello"
echo "hello"
y?hv??
echo "hi"
don

messages with "done" in the text come back garbled. In a separate test yesterday, messages with "goodbye" would fail but I can't reproduce that today.

@garnaat

This comment has been minimized.

Show comment
Hide comment
@garnaat

garnaat Jun 22, 2012

Member

I am able to reproduce this. I'm looking into it now and will report back when I find more information.

Member

garnaat commented Jun 22, 2012

I am able to reproduce this. I'm looking into it now and will report back when I find more information.

@pasc

This comment has been minimized.

Show comment
Hide comment
@pasc

pasc Feb 28, 2013

Contributor

This is happening because boto.sqs.queue.Queue.read() assumes that the messages are base 64 encoded, and tries to decode them as follows. When decoding fails, the original value is returned:

def decode(self, value):
    try:
        value = base64.b64decode(value)
    except:
        boto.log.warning('Unable to decode message')
        return value
    return value

Out of all the sample strings, only 'done' and 'echo "done"' are valid base-64 strings:

>>> base64.b64decode('echo "done"')
'y\xc8hv\x89\xde'
>>> base64.b64decode('done')
'v\x89\xde'

Which matches the decoded value is returned instead of the original value.

Fixing this is going to be a pain. If people are already using the existing API they are either encoding their values or sometimes losing messages. Maybe we should just update the documentation to indicate that these messages should be base-64 encoded already?

Contributor

pasc commented Feb 28, 2013

This is happening because boto.sqs.queue.Queue.read() assumes that the messages are base 64 encoded, and tries to decode them as follows. When decoding fails, the original value is returned:

def decode(self, value):
    try:
        value = base64.b64decode(value)
    except:
        boto.log.warning('Unable to decode message')
        return value
    return value

Out of all the sample strings, only 'done' and 'echo "done"' are valid base-64 strings:

>>> base64.b64decode('echo "done"')
'y\xc8hv\x89\xde'
>>> base64.b64decode('done')
'v\x89\xde'

Which matches the decoded value is returned instead of the original value.

Fixing this is going to be a pain. If people are already using the existing API they are either encoding their values or sometimes losing messages. Maybe we should just update the documentation to indicate that these messages should be base-64 encoded already?

@pasc

This comment has been minimized.

Show comment
Hide comment
@pasc

pasc Feb 28, 2013

Contributor

I have also just realised the same issue exists in the existing integration test as none of the test messages are valid base64-encoded messages. If a test is done with "This is message 100" instead of something like "This is message 1" then the same error will occur.

Contributor

pasc commented Feb 28, 2013

I have also just realised the same issue exists in the existing integration test as none of the test messages are valid base64-encoded messages. If a test is done with "This is message 100" instead of something like "This is message 1" then the same error will occur.

@garnaat

This comment has been minimized.

Show comment
Hide comment
@garnaat

garnaat Feb 28, 2013

Member

What's happening is that the default message class associated with a Queue is the Message class which always encodes the payload using base64 and then decodes it when read from the queue. Since the original messages are not written using a Message object, they are not encoded prior to writing to SQS. And then the base64 decode fails.

One way to get around this is to set the Queue message class to be RawMessage:

from boto.sqs.message import RawMessage
q.set_message_class(RawMessage)

Then, the messages will be read correctly. We could document that in the method.

Another way to fix it would be to do the base64 encoding in the batch write method. Or, we could require that the messages be base64 encoded and document that.

The base64 encoding is kind of an anachronism. In the early days of the SQS service, any non-ASCII characters in a message would cause problems so the recommendation was to base64-encode all messages. So, that became the default behavior of boto. Somewhere along the way, however, this changed on the service side. The range of acceptable characters is much more accommodating now and the default behavior of encoding the content probably no longer makes sense. But then we start worrying about backwards-compatibility. So, it remains.

I think the best solution is to document the fact that the messages are not being encoded in anyway by boto and that the user must remember to set the appropriate message class on the queue when reading the messages.

Member

garnaat commented Feb 28, 2013

What's happening is that the default message class associated with a Queue is the Message class which always encodes the payload using base64 and then decodes it when read from the queue. Since the original messages are not written using a Message object, they are not encoded prior to writing to SQS. And then the base64 decode fails.

One way to get around this is to set the Queue message class to be RawMessage:

from boto.sqs.message import RawMessage
q.set_message_class(RawMessage)

Then, the messages will be read correctly. We could document that in the method.

Another way to fix it would be to do the base64 encoding in the batch write method. Or, we could require that the messages be base64 encoded and document that.

The base64 encoding is kind of an anachronism. In the early days of the SQS service, any non-ASCII characters in a message would cause problems so the recommendation was to base64-encode all messages. So, that became the default behavior of boto. Somewhere along the way, however, this changed on the service side. The range of acceptable characters is much more accommodating now and the default behavior of encoding the content probably no longer makes sense. But then we start worrying about backwards-compatibility. So, it remains.

I think the best solution is to document the fact that the messages are not being encoded in anyway by boto and that the user must remember to set the appropriate message class on the queue when reading the messages.

@kopertop

This comment has been minimized.

Show comment
Hide comment
@kopertop

kopertop Feb 28, 2013

Member

I agree, it should be documented what the behavior is. I currently greatly rely on the fact that batch_write does NOT b64encode it's messages. I typically use raw messages everywhere, since they are required to handle SNS messages.


Sent from my iPhone
Chris Moyer

On Feb 28, 2013, at 8:35 AM, Mitch Garnaat notifications@github.com wrote:

What's happening is that the default message class associated with a Queue is the Message class which always encodes the payload using base64 and then decodes it when read from the queue. Since the original messages are not written using a Message object, they are not encoded prior to writing to SQS. And then the base64 decode fails.

One way to get around this is to set the Queue message class to be RawMessage:

from boto.sqs.message import RawMessage
q.set_message_class(RawMessage)
Then, the messages will be read correctly. We could document that in the method.

Another way to fix it would be to do the base64 encoding in the batch write method. Or, we could require that the messages be base64 encoded and document that.

The base64 encoding is kind of an anachronism. In the early days of the SQS service, any non-ASCII characters in a message would cause problems so the recommendation was to base64-encode all messages. So, that became the default behavior of boto. Somewhere along the way, however, this changed on the service side. The range of acceptable characters is much more accommodating now and the default behavior of encoding the content probably no longer makes sense. But then we start worrying about backwards-compatibility. So, it remains.

I think the best solution is to document the fact that the messages are not being encoded in anyway by boto and that the user must remember to set the appropriate message class on the queue when reading the messages.


Reply to this email directly or view it on GitHub.

Member

kopertop commented Feb 28, 2013

I agree, it should be documented what the behavior is. I currently greatly rely on the fact that batch_write does NOT b64encode it's messages. I typically use raw messages everywhere, since they are required to handle SNS messages.


Sent from my iPhone
Chris Moyer

On Feb 28, 2013, at 8:35 AM, Mitch Garnaat notifications@github.com wrote:

What's happening is that the default message class associated with a Queue is the Message class which always encodes the payload using base64 and then decodes it when read from the queue. Since the original messages are not written using a Message object, they are not encoded prior to writing to SQS. And then the base64 decode fails.

One way to get around this is to set the Queue message class to be RawMessage:

from boto.sqs.message import RawMessage
q.set_message_class(RawMessage)
Then, the messages will be read correctly. We could document that in the method.

Another way to fix it would be to do the base64 encoding in the batch write method. Or, we could require that the messages be base64 encoded and document that.

The base64 encoding is kind of an anachronism. In the early days of the SQS service, any non-ASCII characters in a message would cause problems so the recommendation was to base64-encode all messages. So, that became the default behavior of boto. Somewhere along the way, however, this changed on the service side. The range of acceptable characters is much more accommodating now and the default behavior of encoding the content probably no longer makes sense. But then we start worrying about backwards-compatibility. So, it remains.

I think the best solution is to document the fact that the messages are not being encoded in anyway by boto and that the user must remember to set the appropriate message class on the queue when reading the messages.


Reply to this email directly or view it on GitHub.

@sbailey

This comment has been minimized.

Show comment
Hide comment
@sbailey

sbailey Feb 28, 2013

From the user standpoint, the asymmetry between Queue.write() and Queue.batch_write() is confusing. Here's an updated example:

import boto.sqs
import base64
sqs = boto.sqs.connect_to_region('us-west-1')
q = sqs.create_queue('blat')

commands = list()
commands.append( 'echo "hello"' )
commands.append( 'echo "goodbye"' )
commands.append( 'echo "done"' )    #- Garbled return message
commands.append( 'done' )           #- Garbled return message
commands.append( 'don' )            #- OK
commands.append( 'echo "hello"' )

def read_queue(q):
    while True:
        m = q.read()
        if m is not None:
            q.delete_message(m)
            print m.get_body()
        else:
            break

print "Let's try using batch write"
batch = list()
for i, cmd in enumerate(commands):
    batch.append( (i+1, cmd, 0) )   #- This fails for some strings
    # batch.append( (i+1, base64.b64encode(cmd), 0) )  #- This works
q.write_batch(batch)
read_queue(q)

print "Let's try using single write"
for cmd in commands:
    q.write(q.new_message(cmd))  #- This works
read_queue(q)

The same string which works with q.write(q.new_message(cmd)) may not work when putting it in a list to pass to write_batch. Shouldn't write_batch just be a faster version of batch, with the same encoding requirements for both? And as a non-expert user, I would certainly prefer the default to "just work" on python strings of ASCII characters without requiring the user to do any special encoding/decoding/set_message_class stuff. But as mentioned above, whatever is done, the most important thing to to document it clearly.

sbailey commented Feb 28, 2013

From the user standpoint, the asymmetry between Queue.write() and Queue.batch_write() is confusing. Here's an updated example:

import boto.sqs
import base64
sqs = boto.sqs.connect_to_region('us-west-1')
q = sqs.create_queue('blat')

commands = list()
commands.append( 'echo "hello"' )
commands.append( 'echo "goodbye"' )
commands.append( 'echo "done"' )    #- Garbled return message
commands.append( 'done' )           #- Garbled return message
commands.append( 'don' )            #- OK
commands.append( 'echo "hello"' )

def read_queue(q):
    while True:
        m = q.read()
        if m is not None:
            q.delete_message(m)
            print m.get_body()
        else:
            break

print "Let's try using batch write"
batch = list()
for i, cmd in enumerate(commands):
    batch.append( (i+1, cmd, 0) )   #- This fails for some strings
    # batch.append( (i+1, base64.b64encode(cmd), 0) )  #- This works
q.write_batch(batch)
read_queue(q)

print "Let's try using single write"
for cmd in commands:
    q.write(q.new_message(cmd))  #- This works
read_queue(q)

The same string which works with q.write(q.new_message(cmd)) may not work when putting it in a list to pass to write_batch. Shouldn't write_batch just be a faster version of batch, with the same encoding requirements for both? And as a non-expert user, I would certainly prefer the default to "just work" on python strings of ASCII characters without requiring the user to do any special encoding/decoding/set_message_class stuff. But as mentioned above, whatever is done, the most important thing to to document it clearly.

@kopertop

This comment has been minimized.

Show comment
Hide comment
@kopertop

kopertop Feb 28, 2013

Member

IMHO in botocore we should do away with b64 encoding entirely. it's not
really needed anymore.

On Thu, Feb 28, 2013 at 12:58 PM, sbailey notifications@github.com wrote:

From the user standpoint, the asymmetry between Queue.write() and
Queue.batch_write() is confusing. Here's an updated example:

import boto.sqs
import base64
sqs = boto.sqs.connect_to_region('us-west-1')
q = sqs.create_queue('blat')

commands = list()
commands.append( 'echo "hello"' )
commands.append( 'echo "goodbye"' )
commands.append( 'echo "done"' ) #- Garbled return message
commands.append( 'done' ) #- Garbled return message
commands.append( 'don' ) #- OK
commands.append( 'echo "hello"' )

def read_queue(q):
while True:
m = q.read()
if m is not None:
q.delete_message(m)
print m.get_body()
else:
break

print "Let's try using batch write"
batch = list()
for i, cmd in enumerate(commands):
batch.append( (i+1, cmd, 0) ) #- This fails for some strings
# batch.append( (i+1, base64.b64encode(cmd), 0) ) #- This works
q.write_batch(batch)
read_queue(q)

print "Let's try using single write"
for cmd in commands:
q.write(q.new_message(cmd)) #- This works
read_queue(q)

The same string which works with q.write(q.new_message(cmd)) may not work
when putting it in a list to pass to write_batch. Shouldn't write_batch
just be a faster version of batch, with the same encoding requirements for
both? And as a non-expert user, I would certainly prefer the default to
"just work" on python strings of ASCII characters without requiring the
user to do any special encoding/decoding/set_message_class stuff. But as
mentioned above, whatever is done, the most important thing to to document
it clearly.


Reply to this email directly or view it on GitHubhttps://github.com/boto/boto/issues/831#issuecomment-14247014
.

Chris Moyer

Member

kopertop commented Feb 28, 2013

IMHO in botocore we should do away with b64 encoding entirely. it's not
really needed anymore.

On Thu, Feb 28, 2013 at 12:58 PM, sbailey notifications@github.com wrote:

From the user standpoint, the asymmetry between Queue.write() and
Queue.batch_write() is confusing. Here's an updated example:

import boto.sqs
import base64
sqs = boto.sqs.connect_to_region('us-west-1')
q = sqs.create_queue('blat')

commands = list()
commands.append( 'echo "hello"' )
commands.append( 'echo "goodbye"' )
commands.append( 'echo "done"' ) #- Garbled return message
commands.append( 'done' ) #- Garbled return message
commands.append( 'don' ) #- OK
commands.append( 'echo "hello"' )

def read_queue(q):
while True:
m = q.read()
if m is not None:
q.delete_message(m)
print m.get_body()
else:
break

print "Let's try using batch write"
batch = list()
for i, cmd in enumerate(commands):
batch.append( (i+1, cmd, 0) ) #- This fails for some strings
# batch.append( (i+1, base64.b64encode(cmd), 0) ) #- This works
q.write_batch(batch)
read_queue(q)

print "Let's try using single write"
for cmd in commands:
q.write(q.new_message(cmd)) #- This works
read_queue(q)

The same string which works with q.write(q.new_message(cmd)) may not work
when putting it in a list to pass to write_batch. Shouldn't write_batch
just be a faster version of batch, with the same encoding requirements for
both? And as a non-expert user, I would certainly prefer the default to
"just work" on python strings of ASCII characters without requiring the
user to do any special encoding/decoding/set_message_class stuff. But as
mentioned above, whatever is done, the most important thing to to document
it clearly.


Reply to this email directly or view it on GitHubhttps://github.com/boto/boto/issues/831#issuecomment-14247014
.

Chris Moyer

@spulec spulec referenced this issue in spulec/moto Mar 23, 2013

Closed

SQS doesn't round trip messages #4

@mholt

This comment has been minimized.

Show comment
Hide comment
@mholt

mholt Apr 4, 2013

Ah. This explains why I'm having troubles... I'm publishing messages to the queue in batches of 10 using Queue.write_batch(), but I'm consuming them one at a time using read(). I figure, from reading this issue, that read() is decoding, but write_batch() isn't encoding... hence why most of my messages are garbled.

Are there plans to standardize this behavior? It probably ought to happen soon.

mholt commented Apr 4, 2013

Ah. This explains why I'm having troubles... I'm publishing messages to the queue in batches of 10 using Queue.write_batch(), but I'm consuming them one at a time using read(). I figure, from reading this issue, that read() is decoding, but write_batch() isn't encoding... hence why most of my messages are garbled.

Are there plans to standardize this behavior? It probably ought to happen soon.

@saintsjd

This comment has been minimized.

Show comment
Hide comment
@saintsjd

saintsjd Jan 18, 2014

Hi all. Thank you for the helpful discussion. I just ran into this issue today using boto 2.23.0

The solution to my issue seems to be:
from boto.sqs.message import RawMessage
q.set_message_class(RawMessage)

I no longer see a recommendation from Amazon that messages be base64 encoded. Our .write_batch() and other amazon sdks, such as C#, appear to not be encoding by default. This incompatibility with our boto SDK would seem to be very confusing to users.

Is it time we do away with the b64 encoding by default?

This would standardize our own sdk and make us more compatible with other sdks. We would need to document this change very clearly and provide clear path for backwards compatibility.

Hi all. Thank you for the helpful discussion. I just ran into this issue today using boto 2.23.0

The solution to my issue seems to be:
from boto.sqs.message import RawMessage
q.set_message_class(RawMessage)

I no longer see a recommendation from Amazon that messages be base64 encoded. Our .write_batch() and other amazon sdks, such as C#, appear to not be encoding by default. This incompatibility with our boto SDK would seem to be very confusing to users.

Is it time we do away with the b64 encoding by default?

This would standardize our own sdk and make us more compatible with other sdks. We would need to document this change very clearly and provide clear path for backwards compatibility.

@jessedhillon

This comment has been minimized.

Show comment
Hide comment
@jessedhillon

jessedhillon Apr 7, 2015

IMO the Queue.write_batch method should instantiate an instance of Queue.message_class and then call Message.get_encoded_body, passing the return of that to SQSConnection.send_message_batch

IMO the Queue.write_batch method should instantiate an instance of Queue.message_class and then call Message.get_encoded_body, passing the return of that to SQSConnection.send_message_batch

@dustinboswell

This comment has been minimized.

Show comment
Hide comment
@dustinboswell

dustinboswell Jun 19, 2015

What's the status of removing the base64 en/decoding? This bug caused my company a lot of headaches. (A message written by non-boto was being read by boto and most of the time, the failed-b64decode would result in the original message being returned, but every once in a while a message would "successfully" b64decode and garbage binary was returned.)

What's the status of removing the base64 en/decoding? This bug caused my company a lot of headaches. (A message written by non-boto was being read by boto and most of the time, the failed-b64decode would result in the original message being returned, but every once in a while a message would "successfully" b64decode and garbage binary was returned.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment