Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content.Text adds invisible whitespace to output #1105

Closed
panesofglass opened this issue Jun 10, 2020 · 10 comments · Fixed by #1106
Closed

Content.Text adds invisible whitespace to output #1105

panesofglass opened this issue Jun 10, 2020 · 10 comments · Fixed by #1106

Comments

@panesofglass
Copy link
Contributor

The tests for the-benchmarker/web-frameworks#2888 are failing. When printed out, they look correct, e.g. "" = "" or "0" = "0". However, there is evidence of additional characters or at least mismatched encoding. According to the project, encoding should be UTF-8, which should match Content.Text's output. However, something is clearly added that doesn't appear to affect normal parsing or processing.

@granicz
Copy link
Member

granicz commented Jun 10, 2020

What content-type is the benchmark expecting?

@panesofglass
Copy link
Contributor Author

I think it expects text/plain, though that is not explicitly stated anywhere.

@granicz
Copy link
Member

granicz commented Jun 10, 2020

By default, Content.Text uses UTF-8, and it also takes an optional argument for the encoding. See the function here: https://github.com/dotnet-websharper/core/blob/master/src/sitelets/WebSharper.Sitelets/Content.fs#L427

I think the problem might be that it sends empty headers. Can you try replacing Content.Text with the following Text function? If it works, I can change Context.Text accordingly and re-release.

let Text (msg: string) =
    Content.Custom(
        Headers = [Http.Header.Custom "content-type" "text/plain"],
        WriteBody = fun s ->
            let encoding = System.Text.Encoding.UTF8
            use w = new System.IO.StreamWriter(s, encoding)
            w.Write msg
    )

@Tarmil
Copy link
Member

Tarmil commented Jun 10, 2020

Alternately, this should be equivalent: Content.Text msg |> Content.WithContentType "text/plain"

@panesofglass
Copy link
Contributor Author

Strangely, the above results in the same issue. In Postman, I'm able to download the the response, even though when I open the file it appears empty. However, with the following, I get a true, empty body:

let Text (msg: string) =
    Content.Custom(
        WriteBody = fun s ->
            let bytes = System.Text.Encoding.UTF8.GetBytes(msg)
            s.Write(bytes, 0, bytes.Length))

@granicz
Copy link
Member

granicz commented Jun 11, 2020

You mentioned extra characters - can you paste a screenshot of the difference between the two functions above?

@panesofglass
Copy link
Contributor Author

panesofglass commented Jun 11, 2020

I deleted the file I downloaded from Postman. It's some sort of non-printable whitespace character. When writing "", the body contains some kind of byte. I didn't decode it to find out which. I cannot comprehend why the StreamWriter would add it.

original:

static member Text (text: string, ?encoding: System.Text.Encoding) : Async<Content<'Endpoint>> =
    let encoding = System.Text.Encoding.UTF8
    Content.Custom(
        WriteBody = fun s ->
            use w = new System.IO.StreamWriter(s, encoding)
            w.Write msg
    )

updated:

let Text (msg: string) =
    Content.Custom(
        WriteBody = fun s ->
            let bytes = System.Text.Encoding.UTF8.GetBytes(msg)
            s.Write(bytes, 0, bytes.Length)
    )

The only differences are that I get the bytes and write them directly, whereas the original uses a StreamWriter to write to the stream. You can see the direct comparison in the PR: https://github.com/dotnet-websharper/core/pull/1106/files#diff-ada0621f6daf8e515b65bb44ba7192fa

@Tarmil
Copy link
Member

Tarmil commented Jun 11, 2020

It looks like StreamWriter actually writes the 3-byte UTF-8 byte order mark EF BB BF. It would be a better fix to prevent it from doing that, rather than use GetBytes which creates an intermediary byte array. The trick is to use new UTF8Encoding(false) instead of the static Encoding.UTF8. I can confirm that the following writes 0 bytes:

let defaultEncoding = new System.Text.Encoding.UTF8Encoding(false)

let Text (msg: string) =
    Content.Custom(
        WriteBody = fun s ->
            use w = new System.IO.StreamWriter(s, defaultEncoding)
            w.Write(msg)
    )

@panesofglass
Copy link
Contributor Author

Awesome! I didn’t know about that!

@panesofglass
Copy link
Contributor Author

panesofglass commented Jun 12, 2020

As Content.Text already accepts an encoding parameter, it seems this is not a bug. Should I just close this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants