Body stored as binary for text/html content #64

Closed
cressie176 opened this Issue Sep 25, 2012 · 8 comments

Projects

None yet

2 participants

@cressie176

The following code will create a binary tape rather than one you can edit...

@Betamax(tape='bbc')
def "Tape body"() {

    setup:
        HTTPBuilder http = new HTTPBuilder()
        BetamaxRoutePlanner.configure(http.client)

    when:
        http.get(uri: 'http://www.bbc.co.uk/news')

    then:
        1
}

This is because MemoryTape.isPrintable(...) is returning false, causing Betamax to use binary format. This behaviour might be correct but it seems a little odd since the page content type is "text/html"

Any idea if there is something genuinely unprintable in the response or is this a bug?

@robfletcher
Collaborator

The BBC are being a bit rubbish and not declaring a charset on their Content-Type header. Without that it's easy to misinterpret the data. The servlet spec says that the default should be ISO-8859-1 but some sites will encode as UTF-8 and include multi-byte characters which will crash parsing as ISO-8859-1.

I can't see anything obvious on that page but I'll do some digging & see if I can figure out what's doing it.

@robfletcher
Collaborator

Ok. Looking at the data it appears that it's really UTF-8. It occurs to me that it might make sense if instead of assuming ISO-8859-1 when there's no declared charset I assume UTF-8 because whilst you can get errors trying to interpret data that really is UTF-8 as ISO-8859-1 the reverse is not true as it's (AFAIK) a pure subset.

If I change AbstractMessage.DEFAULT_CHARSET to UTF-8 the page is recorded as text and doesn't cause any problems when played back.

@cressie176

What about having the ability to override the default charset in the tape options?

@robfletcher
Collaborator

@cressie176 The problem with that is that you may have multiple things going on in that one tape

@cressie176

In principal is this not just a limitation of the current tape format? Do you not currently support a mixture of binary and text in the same tape?

@robfletcher
Collaborator

Yes, you can have a mixture in the same tape. The issue is that the same tape can be used for multiple requests which might each have a different character encoding. If the @Betamax annotation specified the default encoding that would be across all requests in the tape & may be appropriate for some and not for others. See #52 for more on the fun & games that can ensure.

I think for anything that isn't standard ASCII or UTF-8 it would be madness not to declare a charset so falling back to UTF-8 as a default is probably the best option.

@robfletcher
Collaborator

This should be fixed by 34cb4aa

@cressie176

n1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment