-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check is valid character in uri #1506
Conversation
你们的原始url里面含有不可打印字符,也没有做UrlEncode的转换,然后签名计算的时候出的问题? |
是签名以后,sign=xxx,这个参数要追加到url中,xxx是openssl计算的。但是openssl是C库,它是和char*,unsigned char*打交道的,要自己转成string,转string的时候多了一位。比如xxx = string(x, l),应该是string(x,l-1) |
自己转string为什么会多转一位呢,这个听起来是转化的问题,做完转化直接把Hex格式的数据追加到url应该不会有问题吧。或者把代码贴上来看看是什么问题 |
多一位是我这边使用openssl不当导致的问题。加这个检验主要是方便debug。后面如果其他人再有给URL不小心引入不可见字符的时候,可以通过错误信息,快速定位问题。 具体我那个问题是用openssl计算base64,用出了问题。调用的那个外部接口要求计算完hmac_sha256后,对结果做一次base64 BIO *bmem, *b64;
BUF_MEM *bptr;
b64 = BIO_new(BIO_f_base64());
bmem = BIO_new(BIO_s_mem());
b64 = BIO_push(b64, bmem);
BIO_write(b64, input, length); // input 是unsigned char* 类型,也就是待编码的二进制数据
BIO_flush(b64);
BIO_get_mem_ptr(b64, &bptr);
// 这里,一开始我写成了std::string output(bptr->data, bptr->length); debug了很久
std::string output(bptr->data, bptr->length-1);
BIO_free_all(b64);
output 会作为签名追加到url中 &sign=output 我这个确实是我使用openssl不当造成的,但是我觉得brpc应该帮助做一些url字符合法性上的检验。我这个PR只是简单处理检验的可打印字符,我也可以再改下,严格起来排除一切不合法的URL字符 |
好的明白了。URI的允许的字符应该在 https://datatracker.ietf.org/doc/html/rfc3986#section-2.2 和 https://datatracker.ietf.org/doc/html/rfc3986#section-2.3 定义了,最好可以封装个函数来检查URI字符的合法性,可以把这个来源放在注释里。 |
好的,我改下 |
@zyearn 代码修改了,另外除了2.3和2.4里面的字符外,还有一个%字符也是合法的,2.1节有描述。一并加到注释里面了。 |
src/brpc/uri.cpp
Outdated
static bool is_valid_char(const char* p) { | ||
static const std::unordered_set<char> other_valid_char = { | ||
':', '/', '?', '#', '[', ']', '@', '!', '$', '&', | ||
'\'', '(', ')'/ '*', '+', ',', ';', '='/ '-', '.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/
-> ,
,是不是补个单测比较好?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已有的单测,应该就够了吧。我看写的挺全的。
https://github.com/apache/incubator-brpc/blob/master/test/brpc_uri_unittest.cpp
我的修改只要能跑过这些例子就ok。当然也可以补一个非法的字符来check一下,让SetHttpURL失败。或者是其他case(暂时没想到别的case),看owner们是否需要吧。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/ 现成的例子里是有的
, 好像是没有。也可以加一个单测
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
复制时两个/
忘改了吧?是说这个改成,
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
哦哦 。。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改了。另外发现这里要补一个空格,否则已有的单例会有失败。因为brpc之前允许url里有空格,有skip逻辑
CI好像卡住了,不知道怎么回事 @zyearn |
看了下CI job在排队 |
src/brpc/uri.cpp
Outdated
// https://datatracker.ietf.org/doc/html/rfc3986#section-2.3 | ||
// https://datatracker.ietf.org/doc/html/rfc3986#section-2.4 | ||
// space is not allowed by rfc3986, but allowed by brpc | ||
static bool is_valid_char(const char* p) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个函数传char c感觉更直观一点?
src/brpc/uri.cpp
Outdated
'_', '~', '%', ' ' | ||
}; | ||
|
||
return (isalnum(*p) || other_valid_char.find(*p) != other_valid_char.end()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以直接other_valid_char.count(*p)
@zyearn 改好了,辛苦review |
好的,感谢贡献 :) |
这个PR的目的检查url中是否有不可打印字符。
可打印字符的维基百科:
https://en.wikipedia.org/wiki/ASCII#Printable_characters
之前遇到过一个问题,在调用一个HTTP API接口的时候,要加一个签名字段作为query参数。用openssl计算签名的时候,最后意外多复制了一位,这一位是不可打印的字符。
让后请求接口一直失败,日志打印URL查看也没有问题,因为是不可打印字符,所以无法输出,其实是多了一个隐藏字符。
导致调试了很长时间。
当然可打印的字符中也不是全部都是URL的合法字符,但是其他的非法字符,只要是可打印的都可以很快定位到问题,因此不做考虑。因为判断合法字符的比较繁琐(也容易错漏,引发不必要问题),以下这些都是合法字符:
A-Z
,a-z
,0-9
,-
,.
,_
,~
,:
,/
,?
,#
,[
,]
,@
,!
,$
,&
,'
,(
,)
,*
,+
,,
,;
,%
,=
当然如果你们觉得判断是不是合法字符更合适的话,我也可以改成 is_valid 函数。