Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for parsing JSON in GBK format. #3347

Open
AlwaysOnlineSuperMan opened this issue Mar 6, 2025 · 5 comments
Open

Add support for parsing JSON in GBK format. #3347

AlwaysOnlineSuperMan opened this issue Mar 6, 2025 · 5 comments

Comments

@AlwaysOnlineSuperMan
Copy link

The application/json content type defaults to supporting charset=UTF-8 parsing. How can it also be made compatible with charset=GBK parsing, such that if UTF-8 parsing fails, it will attempt GBK parsing?

@airween
Copy link
Member

airween commented Mar 6, 2025

Hi @AlwaysOnlineSuperMan,

could you explain the root cause of this issue? Eg. could you share an example of JSON with GBK charset?

@AlwaysOnlineSuperMan
Copy link
Author

AlwaysOnlineSuperMan commented Mar 7, 2025

Hi @AlwaysOnlineSuperMan,

could you explain the root cause of this issue? Eg. could you share an example of JSON with GBK charset?

POST http://127.0.0.1/case/gbk
Content-Type: application/json;charset=GBK

{
  "gbk": "中国欢迎你"
}

"transaction":{"client_ip":"127.0.0.1","time_stamp":"Fri Mar 7 10:44:06 2025","server_id":"ab3506138b340c17b7e7ca7cc54bb28bf63c0013","client_port":55178,"host_ip":"127.0.0.1","host_port":80,"unique_id":"174131544673.294671","request":{"method":"POST","http_version":1.0,"uri":"/case/gbk","body":"{\n \"gbk\":\"Öйú»¶ӭÄã\"\n}","headers":{"X-Real-IP":"127.0.0.1","Host":"127.0.0.1:16880","Connection":"close","Content-Length":"24","Content-Type":"application/json;charset=GBK","User-Agent":"Apache-HttpClient/4.5.14 (Java/17.0.8)","Accept-Encoding":"br,deflate,gzip,x-gzip"}},"response":{"body":"<html>\r\n<head><title>400 Bad Request</title></head>\r\n<body>\r\n<center><h1>400 Bad Request</h1></center>\r\n<hr><center>openresty/1.27.1.1</center>\r\n</body>\r\n</html>\r\n","http_code":400,"headers":{"Server":"openresty/1.27.1.1","Date":"Fri, 07 Mar 2025 02:44:06 GMT","Content-Length":"163","Content-Type":"text/html","Connection":"close"}},"producer":{"modsecurity":"ModSecurity v3.0.14 (Linux)","connector":"ModSecurity-nginx v1.0.3","secrules_engine":"Enabled","components":["OWASP_CRS/4.13.0-dev\""]},"messages":[{"message":"Failed to parse request body.","details":{"match":"Matched \"Operator Eq' with parameter 0' against variable REQBODY_ERROR' (Value: 1' )","reference":"v241,1","ruleId":"200002","file":"/usr/local/nginx/conf/modsecurity/modsecurity.conf","lineNumber":"77","data":"JSON parsing error: lexical error: invalid bytes in UTF8 string.\n","severity":"2","ver":"","rev":"","tags":[],"maturity":"0","accuracy":"0"}}]}}

@AlwaysOnlineSuperMan
Copy link
Author

AlwaysOnlineSuperMan commented Mar 7, 2025

Hi @AlwaysOnlineSuperMan,

could you explain the root cause of this issue? Eg. could you share an example of JSON with GBK charset?

Currently, we are adding a prefix location in nginx, where a block of Lua code is used to convert GBK to UTF-8 based on the determined charset type before passing it to ModSecurity for processing. This approach is taken to address the issue. We hope that ModSecurity can directly support the parsing of charset=gbk.

location /case {
    content_by_lua_block {
        ngx.req.read_body()
        local body = ngx.req.get_body_data()
        local headers = ngx.req.get_headers()
        local content_type = headers["Content-Type"] or ""
        local lower_content_type = string.lower(content_type)
        if body then
            if lower_content_type:match("charset=gbk") then
                local iconv = require "resty.iconv"
                local i,err = iconv:new("UTF-8", "GBK")
                local utf8_body,count = i:convert(body)
                --if utf8_body
                ngx.req.set_body_data(utf8_body)
                ngx.log(ngx.ERR,"NewReqBody:=========》" .. utf8_body)
                --end
            end
        end
    }
    proxy_pass http://modsecurity;
}

@airween
Copy link
Member

airween commented Mar 10, 2025

@AlwaysOnlineSuperMan,

first, thanks for pointing out to a possible workaround.

Regarding to JSON payload that you mentioned: I can't reproduce the issue.

Here is what I did:

$ cat 3347.json 
{"gbk": "中国欢迎你"}

sent this JSON file to the server:

curl -v -H "Content-Type: application/json; charset=GBK" -X POST http://localhost/post -d @3347.json

Note that I use my Nginx with Albedo as a backend - for more info please read the relevant documentation.

And here are the relevant parts of my debug.log:
rule 200002

[174161108542.217980] [/post] [4] (Rule: 200002) Executing operator "Eq" with param "0" against REQBODY_ERROR.
[174161108542.217980] [/post] [9] Target value: "0" (Variable: REQBODY_ERROR)
[174161108542.217980] [/post] [4] Rule returned 0.

rule 920230 which checks ARGS

[174161108542.217980] [/post] [4] (Rule: 920230) Executing operator "Rx" with param "%[0-9a-fA-F]{2}" against ARGS.
[174161108542.217980] [/post] [9] Target value: "\xe4\xb8\xad\xe5\x9b\xbd\xe6\xac\xa2\xe8\xbf\x8e\xe4\xbd\xa0" (Variable: ARGS:json.gbk)
[174161108542.217980] [/post] [4] Rule returned 0.

so the JSON parser parses the JSON content.

I use libmodsecurity3 current master branch:

$ git describe
v3.0.14

libyajl packages from Debian repository:

$ dpkg -l "*libyajl*"
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name              Version      Architecture Description
+++-=================-============-============-============================================
ii  libyajl-dev:amd64 2.1.0-5+b2   amd64        Yet Another JSON Library - development files
ii  libyajl2:amd64    2.1.0-5+b2   amd64        Yet Another JSON Library

@AlwaysOnlineSuperMan
Copy link
Author

After testing, even if the curl tool specifies charset=GBK in the Content-Type header, the request body content will not automatically perform encoding conversion. Therefore, the 3347.json file content was always sent in UTF-8 encoding during testing. To resolve this, the request body content should first be manually converted to GBK encoding:

iconv -f UTF-8 -t GBK -o gbk.json 3347.json
curl -v -H "Content-Type: application/json; charset=GBK" -X POST http://localhost/post -d @gbk.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants