Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

event hub body cannot be decompressed, when use gzipped event hub as trigger #415

Closed
AEYWang opened this issue May 10, 2019 · 9 comments
Closed

Comments

@AEYWang
Copy link

@AEYWang AEYWang commented May 10, 2019

Actual behavior

I use event hub binding as a trigger. The event content is compressed by gZip.
The message body of input object event: azure.functions.EventHubEvent.get_body()
can not be decompressed.

Known workarounds

When I read the message using azure-eventhub, it can be decompressed.

Example Code:

Function TimerTrigger.py sends gZipped string message to eventhub.
Function EventHubTrigger.py uses it as trigger, reads the message body. But message content is different from what is sent, and can not be un-gZipped.

TimerTrigger.py

import datetime
import logging
import gzip
import io

import azure.functions as func
from azure.eventhub import EventHubClient, Sender, EventData

ADDRESS = "Removed"
USER = "Removed"
KEY = "Removed"

def main(mytimer: func.TimerRequest) -> None:
    utc_timestamp = datetime.datetime.utcnow().replace(
        tzinfo=datetime.timezone.utc).isoformat()

    if mytimer.past_due:
        logging.info('The timer is past due!')

    client = EventHubClient(ADDRESS, debug=False, username=USER, password=KEY)
    sender = client.add_sender(partition="0")
    client.run()
    message = gZipString(('Test Event @ '+ utc_timestamp).encode('utf-8'))
    sender.send(EventData(message))
    logging.info('TimerTrigger: sent message %s', message )

def gZipString(stringtoZip):
    out = io.BytesIO()
    with gzip.GzipFile(fileobj=out, mode="wb") as f:
        f.write(stringtoZip)
    return out.getvalue()

EventHubTrigger.py

import logging
import gzip
import io
import azure.functions as func

def main(event: func.EventHubEvent):
    logging.info('EventHubEvent: SN = %s, Partition = %s', event.sequence_number, event.partition_key)
    logging.info('  EventHubEvent: Body= %s', event.get_body() )
    decompressed_data = ''
    try: 
        decompressed_data = gunzip_bytes_obj(event.get_body())
    except Exception as error:
        logging.info('EventHubEvent: Uzip Error = %s', error )
        decompressed_data = 'Failed'
        pass
    logging.info('  EventHubEvent: Decompressed Data= %s', decompressed_data )
    
def gunzip_bytes_obj(bytes_obj):
    in_ = io.BytesIO()
    in_.write(bytes_obj)
    in_.seek(0)
    with gzip.GzipFile(fileobj=in_, mode='rb') as fo:
        gunzipped_bytes_obj = fo.read()

    return gunzipped_bytes_obj

Example Ouput:

Here is an example from function log:
From TimerTrigger function log:

TimerTrigger: sent message b'\x1f\x8b\x08\x00\xa1Q\xed\\\x02\xff\x0bI-.Qp-K\xcd+QpP020\xb4\xd450\xd55\xb2\x0814\xb522\xb020\xd03107\xb60\xd56\x00q\x00b|\x10\xfc-\x00\x00\x00'

From EventHubTigger function log:

EventHubEvent: Body= b'\x1f\xef\xbf\xbd\x08\x00\xef\xbf\xbdQ\xef\xbf\xbd\\\x02\xef\xbf\xbd\x0bI-.Qp-K\xef\xbf\xbd+QpP020\xef\xbf\xbd\xef\xbf\xbd50\xef\xbf\xbd5\xef\xbf\xbd\x0814\xef\xbf\xbd22\xef\xbf\xbd20\xef\xbf\xbd3107\xef\xbf\xbd0\xef\xbf\xbd6\x00q\x00b|\x10\xef\xbf\xbd-\x00\x00\x00'
EventHubEvent: Uzip Error = Not a gzipped file (b'\x1f\xef')

The Received message body is different from what is sent. The '\xef\xbf\xbd' was not in the original message. Could it come from a different encoding (e.g. Unicode)?

Related information

azure-functions==1.0.0b4
azure-functions-worker==1.0.0b6
grpcio==1.20.1
grpcio-tools==1.20.1
protobuf==3.7.1
six==1.12.0
@AEYWang AEYWang changed the title event hub body cannot be uncompressed, when use gzipped event hub as trigger event hub body cannot be decompressed, when use gzipped event hub as trigger May 10, 2019
@asavaritayal asavaritayal added this to the Active Questions milestone May 13, 2019
@asavaritayal asavaritayal removed this from the Active Questions milestone May 13, 2019
@asavaritayal asavaritayal added this to the Triaged milestone May 13, 2019
@asavaritayal asavaritayal removed this from the Triaged milestone May 13, 2019
@asavaritayal asavaritayal added this to the Functions Sprint 50 milestone May 13, 2019
@anirudhgarg
Copy link
Member

@anirudhgarg anirudhgarg commented May 30, 2019

Can you share your function code if possible or a small repro app will also work ? Also it would be good to let us know what you are trying to do in steps.

@AEYWang
Copy link
Author

@AEYWang AEYWang commented May 31, 2019

@anirudhgarg Yes, of course. I edited my post by adding demo codes and example output.
One could deploy the code to an function app. A event hub is needed, and the connection strings need to be added in the function settings.

The example output of the functions may give you an idea about the issue, I hope.

@anirudhgarg
Copy link
Member

@anirudhgarg anirudhgarg commented Jun 4, 2019

I researched this a bit and it does seem to be an encoding issue. The extra characters that you see seem to be UTF-8 BOM marker. It appears to remove them you can decode your file contents to unicode and then encode them back to utf-8 and that might remove the BOM markers. Have a look at this: https://stackoverflow.com/questions/18664712/split-function-add-xef-xbb-xbf-n-to-my-list/18664752

Let us know if that removed the marker and things started working. You might have to experiment a little with different decode/encode options.

@AEYWang
Copy link
Author

@AEYWang AEYWang commented Jun 5, 2019

@anirudhgarg I am afraid, that is a different encoding. What I see in the event hub message is '\xef\xbf\xbd', but it is '\xef\xbb\xbf' in the link you sent.
Could you take a look at my code example, and try to reproduce my result. Then you may have an better idea, if I did it wrong, or there is something need to be fixed.

@anirudhgarg
Copy link
Member

@anirudhgarg anirudhgarg commented Jun 6, 2019

Yes you are right. This is not the BOM marker. \xEF\xBF\xBD appears to be the UTF-8 encoding for the unicode character U+FFFD. This is a special character, also known as the "Replacement character". Have a look at this: https://stackoverflow.com/questions/11159118/incorrect-string-value-xef-xbf-xbd-for-column
The suggestion here is to just take out that string. Can you please try that.

@AEYWang
Copy link
Author

@AEYWang AEYWang commented Jun 6, 2019

@anirudhgarg No, it does not work by only take the special character out. Take a look at the first few bytes in the sent/received message body from example output in the main post.
It is '\x1f\x8b\x08' in original message, and ''\x1f\xef\xbf\xbd\x08' in received message. If I take out '\xef\xbf\xbd' from the later one, the second from original '\x8b' is no more present.

The problem I see here is that, the message body in byte that I receive from eventhub trigger is different from what is actually in the eventhub event. This does not depend on what encoding I use in the event sender. There could be an extra encoding applied in the eventhub binding/trigger from azure-function-worker.

@maiqbal11 maiqbal11 removed this from the Active Questions milestone Jun 17, 2019
@maiqbal11 maiqbal11 added this to the Functions Sprint 52 milestone Jun 17, 2019
@maiqbal11
Copy link
Contributor

@maiqbal11 maiqbal11 commented Jun 21, 2019

Hi @AEYWang, you should be able to change the function app configuration and code to unblock your scenario:

  1. Modify your function.json to add "dataType": "binary". This will transmit the raw gzipped bytes without applying any encoding to them.
  2. Change the function signature from def main(event: func.EventHubEvent) to def main(event). You should still be able to access all the properties associated with the EventHubEvent but this annotation change is needed to match up the function.json with the function definition.

I was able to make these changes and run your function successfully. Please try and feel free to circle back with the results.

@AEYWang
Copy link
Author

@AEYWang AEYWang commented Jun 24, 2019

@maiqbal11 Your suggestion worked for me, thanks!
I am still curious, why do I have to remove the annotation in the function as in your suggestion part 2? Could you please tell me more about it, like where should I use annotation and where not in azure functions?

@maiqbal11
Copy link
Contributor

@maiqbal11 maiqbal11 commented Jun 25, 2019

Hi @AEYWang, the behavior that you are encountering is not fully documented. When you specify binary in the function.json, we try to match it against the annotation that you have provided (in your case EventHubEvent). If you requested a particular dataType, this does not match up with the annotation of EventHubEvent and we would error out. This is our way of ensuring that users do not specify inconsistent settings. However, the only way to request raw bytes is to use the dataType so the annotation needs to be removed to avoid the error. In the future, we will most likely make this a warning since it is not an inconsistency in every case. Thanks for pointing out the issue and asking all the right questions. 😄 Closing this issue for now. Please re-open if you have further concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants