Feature Request - Create a setter method to sets the encoding when parsing the response as a Document #997

julianomqs · 2017-12-22T11:28:01Z

Hi,

I'm using your library in some projects here, it's great.

It would be great to have a setter method to set the encoding when parsing a response as a document.

For example, if I need to execute a post request and parse the response as a Document with the ISO-8859-1 encoding, I have to do this:

private Document executeRequest(String value2) throws IOException {
    return Jsoup.connect(DEFAULT_URL)
        .timeout(DEFAULT_TIMEOUT)
        .data("param1", "value1")
        .data("param2", value2)
        .userAgent(DEFAULT_USER_AGENT)
        .method(Method.POST)
        .execute()
        .charset(CHARSET_ISO_8859_1)
        .parse();
}

It would be great something like this:

private Document executeRequest(String value2) throws IOException {
    return Jsoup.connect(DEFAULT_URL)
        .timeout(DEFAULT_TIMEOUT)
        .data("param1", "value1")
        .data("param2", value2)
        .userAgent(DEFAULT_USER_AGENT)
        .responseEncoding("ISO-8859-1") // <-- This is the setter method I'm suggesting, something like that
        .post();
}

The postDataCharset method sets the charset when sending a POST request, but not for parsing the response as a document.

Of course the method name is your choice.

What do you think?

P.S: @krystiangorecki This is the issue with the correct description.

The text was updated successfully, but these errors were encountered:

jhy · 2017-12-22T18:02:01Z

Thanks, makes sense. Is the site not setting the response encoding in a header or meta though, or is jsoup parsing it incorrectly? Trying to understand the root issue.

julianomqs · 2017-12-22T19:10:21Z

In my tests, the site I was scraping didn't bring the encoding in response headers neither in html, but I knew beforehand the proper encoding was ISO-8859-1.

As far as I know, jsoup parses documents as UTF-8 when it can't detect the document encoding, right?

I don't think it is a jsoup bug, more likely a site problem.

This is the site if you want to check it out.

julianomqs mentioned this issue Dec 22, 2017

Feature Request - Pass encoding in methods that return a Document #989

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request - Create a setter method to sets the encoding when parsing the response as a Document #997

Feature Request - Create a setter method to sets the encoding when parsing the response as a Document #997

julianomqs commented Dec 22, 2017 •

edited

Loading

jhy commented Dec 22, 2017

julianomqs commented Dec 22, 2017

Feature Request - Create a setter method to sets the encoding when parsing the response as a Document #997

Feature Request - Create a setter method to sets the encoding when parsing the response as a Document #997

Comments

julianomqs commented Dec 22, 2017 • edited Loading

jhy commented Dec 22, 2017

julianomqs commented Dec 22, 2017

julianomqs commented Dec 22, 2017 •

edited

Loading