Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP "POST" request with UTF-8 non latin [feature] #1315

Closed
Churator opened this issue Jan 17, 2023 · 7 comments · Fixed by #2523
Closed

HTTP "POST" request with UTF-8 non latin [feature] #1315

Churator opened this issue Jan 17, 2023 · 7 comments · Fixed by #2523
Labels
enhancement New feature or request

Comments

@Churator
Copy link

I'm trying to post a request with UTF-8 chars
failing because latin-1 is used
couldn't find where to change it

'latin-1' codec can't encode characters in position 57-63: Body ('בדיקה') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

@Churator Churator added the enhancement New feature or request label Jan 17, 2023
@dgtlmoon dgtlmoon changed the title Post UTF-8 [feature] HTTP "POST" with UTF-8 non latin [feature] Jan 18, 2023
@dgtlmoon dgtlmoon changed the title HTTP "POST" with UTF-8 non latin [feature] HTTP "POST" request with UTF-8 non latin [feature] Jan 20, 2023
@dgtlmoon
Copy link
Owner

can you paste the full HTTP request settings here?

@Churator
Copy link
Author

Sure

Url : https://www.bezeq.co.il/umbraco/api/FormWebApi/CheckAddress

Method: POST

Data:
{"CityId":"1111","StreetId":"1111","House":"11111","Street":"בדיקה","City":"בדיקה","Entrance":""}

@dgtlmoon
Copy link
Owner

thanks, I can confirm this one.

@leiless
Copy link

leiless commented Jun 20, 2024

I'm having the same issue here.

$ docker exec -it changedetection_io_app_1 bash
$ python3 -c "import requests; r = requests.post('http://httpbin.org/post', data='你好')"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
  File "/usr/local/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "/usr/local/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/urllib3/connectionpool.py", line 416, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/urllib3/connection.py", line 244, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/local/lib/python3.10/http/client.py", line 1283, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.10/http/client.py", line 1328, in _send_request
    body = _encode(body, 'body')
  File "/usr/local/lib/python3.10/http/client.py", line 166, in _encode
    raise UnicodeEncodeError(
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: Body ('你好') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

https://stackoverflow.com/questions/55887958/what-is-the-default-encoding-when-python-requests-post-data-is-string-type/56120372#56120372

--

If body is a string, it is encoded as ISO-8859-1, the default for HTTP. If it is a bytes-like object, the bytes are sent as is. If it is a file object, the contents of the file is sent; this file object should support at least the read() method.

ISO-8859-1 is well known as latin-1.

https://docs.python.org/3/library/http.client.html#http.client.HTTPConnection.request

@leiless
Copy link

leiless commented Jun 20, 2024

Possible solution

https://github.com/dgtlmoon/changedetection.io/blob/0.45.24/changedetectionio/content_fetchers/requests.py#L49

        r = requests.request(method=request_method,
-                            data=request_body,
+                            data=request_body.encode('utf-8') if type(request_body) is str else request_body,
                             url=url,
                             headers=request_headers,
                             timeout=timeout,
                             proxies=proxies,
                             verify=False)

@dgtlmoon
Copy link
Owner

dgtlmoon commented Jun 20, 2024

@leiless isnt this a duplicate of #2309 ?

If you are using JSON for your posts:// - Make sure you are using | tojson when building your json message, this should encode anything non-ascii and bypass this error. For example, it will turn the smiley ツ into \u30c4

@leiless
Copy link

leiless commented Jun 20, 2024

@dgtlmoon No, it's not, I'm using the Basic fast Plaintext/HTTP Client POST with body (Chinese chars encoded in UTF-8).

#2309 is all about deliver notification with UTF-8 chars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants