New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
event hub body cannot be decompressed, when use gzipped event hub as trigger #415
Comments
Can you share your function code if possible or a small repro app will also work ? Also it would be good to let us know what you are trying to do in steps. |
@anirudhgarg Yes, of course. I edited my post by adding demo codes and example output. The example output of the functions may give you an idea about the issue, I hope. |
I researched this a bit and it does seem to be an encoding issue. The extra characters that you see seem to be UTF-8 BOM marker. It appears to remove them you can decode your file contents to unicode and then encode them back to utf-8 and that might remove the BOM markers. Have a look at this: https://stackoverflow.com/questions/18664712/split-function-add-xef-xbb-xbf-n-to-my-list/18664752 Let us know if that removed the marker and things started working. You might have to experiment a little with different decode/encode options. |
@anirudhgarg I am afraid, that is a different encoding. What I see in the event hub message is '\xef\xbf\xbd', but it is '\xef\xbb\xbf' in the link you sent. |
Yes you are right. This is not the BOM marker. \xEF\xBF\xBD appears to be the UTF-8 encoding for the unicode character U+FFFD. This is a special character, also known as the "Replacement character". Have a look at this: https://stackoverflow.com/questions/11159118/incorrect-string-value-xef-xbf-xbd-for-column |
@anirudhgarg No, it does not work by only take the special character out. Take a look at the first few bytes in the sent/received message body from example output in the main post. The problem I see here is that, the message body in byte that I receive from eventhub trigger is different from what is actually in the eventhub event. This does not depend on what encoding I use in the event sender. There could be an extra encoding applied in the eventhub binding/trigger from azure-function-worker. |
Hi @AEYWang, you should be able to change the function app configuration and code to unblock your scenario:
I was able to make these changes and run your function successfully. Please try and feel free to circle back with the results. |
@maiqbal11 Your suggestion worked for me, thanks! |
Hi @AEYWang, the behavior that you are encountering is not fully documented. When you specify |
Actual behavior
I use event hub binding as a trigger. The event content is compressed by gZip.
The message body of input object event: azure.functions.EventHubEvent.get_body()
can not be decompressed.
Known workarounds
When I read the message using azure-eventhub, it can be decompressed.
Example Code:
Function TimerTrigger.py sends gZipped string message to eventhub.
Function EventHubTrigger.py uses it as trigger, reads the message body. But message content is different from what is sent, and can not be un-gZipped.
TimerTrigger.py
EventHubTrigger.py
Example Ouput:
Here is an example from function log:
From TimerTrigger function log:
From EventHubTigger function log:
The Received message body is different from what is sent. The '\xef\xbf\xbd' was not in the original message. Could it come from a different encoding (e.g. Unicode)?
Related information
The text was updated successfully, but these errors were encountered: