Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check is valid character in uri #1506

Merged
merged 6 commits into from
Aug 6, 2021

Conversation

guodongxiaren
Copy link
Member

@guodongxiaren guodongxiaren commented Aug 1, 2021

这个PR的目的检查url中是否有不可打印字符。

可打印字符的维基百科:
https://en.wikipedia.org/wiki/ASCII#Printable_characters

之前遇到过一个问题,在调用一个HTTP API接口的时候,要加一个签名字段作为query参数。用openssl计算签名的时候,最后意外多复制了一位,这一位是不可打印的字符。
让后请求接口一直失败,日志打印URL查看也没有问题,因为是不可打印字符,所以无法输出,其实是多了一个隐藏字符。
导致调试了很长时间。

当然可打印的字符中也不是全部都是URL的合法字符,但是其他的非法字符,只要是可打印的都可以很快定位到问题,因此不做考虑。因为判断合法字符的比较繁琐(也容易错漏,引发不必要问题),以下这些都是合法字符:

A-Z, a-z, 0-9, -, ., _, ~, :, /, ?, #, [, ], @, !, $, &, ', (, ), *, +, ,, ;, %, =

当然如果你们觉得判断是不是合法字符更合适的话,我也可以改成 is_valid 函数。

@guodongxiaren
Copy link
Member Author

@zyearn

@zyearn
Copy link
Member

zyearn commented Aug 2, 2021

你们的原始url里面含有不可打印字符,也没有做UrlEncode的转换,然后签名计算的时候出的问题?

@guodongxiaren
Copy link
Member Author

你们的原始url里面含有不可打印字符,也没有做UrlEncode的转换,然后签名计算的时候出的问题?

是签名以后,sign=xxx,这个参数要追加到url中,xxx是openssl计算的。但是openssl是C库,它是和char*,unsigned char*打交道的,要自己转成string,转string的时候多了一位。比如xxx = string(x, l),应该是string(x,l-1)

@zyearn
Copy link
Member

zyearn commented Aug 3, 2021

自己转string为什么会多转一位呢,这个听起来是转化的问题,做完转化直接把Hex格式的数据追加到url应该不会有问题吧。或者把代码贴上来看看是什么问题

@guodongxiaren
Copy link
Member Author

guodongxiaren commented Aug 3, 2021

自己转string为什么会多转一位呢,这个听起来是转化的问题,做完转化直接把Hex格式的数据追加到url应该不会有问题吧。或者把代码贴上来看看是什么问题

多一位是我这边使用openssl不当导致的问题。加这个检验主要是方便debug。后面如果其他人再有给URL不小心引入不可见字符的时候,可以通过错误信息,快速定位问题。

具体我那个问题是用openssl计算base64,用出了问题。调用的那个外部接口要求计算完hmac_sha256后,对结果做一次base64

    BIO *bmem, *b64;
    BUF_MEM *bptr;

    b64 = BIO_new(BIO_f_base64());
    bmem = BIO_new(BIO_s_mem());
    b64 = BIO_push(b64, bmem);
    BIO_write(b64, input, length); // input 是unsigned char* 类型,也就是待编码的二进制数据
    BIO_flush(b64);
    BIO_get_mem_ptr(b64, &bptr);

    // 这里,一开始我写成了std::string output(bptr->data, bptr->length); debug了很久
    std::string output(bptr->data, bptr->length-1);

    BIO_free_all(b64);

output 会作为签名追加到url中 &sign=output

我这个确实是我使用openssl不当造成的,但是我觉得brpc应该帮助做一些url字符合法性上的检验。我这个PR只是简单处理检验的可打印字符,我也可以再改下,严格起来排除一切不合法的URL字符
@zyearn

@zyearn
Copy link
Member

zyearn commented Aug 4, 2021

好的明白了。URI的允许的字符应该在 https://datatracker.ietf.org/doc/html/rfc3986#section-2.2https://datatracker.ietf.org/doc/html/rfc3986#section-2.3 定义了,最好可以封装个函数来检查URI字符的合法性,可以把这个来源放在注释里。

@guodongxiaren
Copy link
Member Author

好的,我改下

@guodongxiaren
Copy link
Member Author

guodongxiaren commented Aug 5, 2021

@zyearn 代码修改了,另外除了2.3和2.4里面的字符外,还有一个%字符也是合法的,2.1节有描述。一并加到注释里面了。
另外发现这里除了合法字符以外,还要补一个空格,否则已有的单例会有失败。因为brpc之前允许url里有空格,有skip掉空格的逻辑

src/brpc/uri.cpp Outdated
static bool is_valid_char(const char* p) {
static const std::unordered_set<char> other_valid_char = {
':', '/', '?', '#', '[', ']', '@', '!', '$', '&',
'\'', '(', ')'/ '*', '+', ',', ';', '='/ '-', '.',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ -> ,,是不是补个单测比较好?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已有的单测,应该就够了吧。我看写的挺全的。
https://github.com/apache/incubator-brpc/blob/master/test/brpc_uri_unittest.cpp

我的修改只要能跑过这些例子就ok。当然也可以补一个非法的字符来check一下,让SetHttpURL失败。或者是其他case(暂时没想到别的case),看owner们是否需要吧。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ 现成的例子里是有的
, 好像是没有。也可以加一个单测

Copy link
Member

@wasphin wasphin Aug 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

复制时两个/忘改了吧?是说这个改成,.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

哦哦 。。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改了。另外发现这里要补一个空格,否则已有的单例会有失败。因为brpc之前允许url里有空格,有skip逻辑

@guodongxiaren
Copy link
Member Author

CI好像卡住了,不知道怎么回事 @zyearn

@zyearn
Copy link
Member

zyearn commented Aug 5, 2021

看了下CI job在排队

src/brpc/uri.cpp Outdated
// https://datatracker.ietf.org/doc/html/rfc3986#section-2.3
// https://datatracker.ietf.org/doc/html/rfc3986#section-2.4
// space is not allowed by rfc3986, but allowed by brpc
static bool is_valid_char(const char* p) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数传char c感觉更直观一点?

src/brpc/uri.cpp Outdated
'_', '~', '%', ' '
};

return (isalnum(*p) || other_valid_char.find(*p) != other_valid_char.end());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以直接other_valid_char.count(*p)

@guodongxiaren
Copy link
Member Author

guodongxiaren commented Aug 6, 2021

@zyearn 改好了,辛苦review

@guodongxiaren guodongxiaren changed the title check is printable character in uri check is validcharacter in uri Aug 6, 2021
@guodongxiaren guodongxiaren changed the title check is validcharacter in uri check is valid character in uri Aug 6, 2021
@zyearn
Copy link
Member

zyearn commented Aug 6, 2021

好的,感谢贡献 :)

@zyearn zyearn merged commit b8af1f1 into apache:master Aug 6, 2021
@guodongxiaren guodongxiaren deleted the printable_uri_check branch August 7, 2021 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants