New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework Content-Disposition parsing totally #461
Conversation
This is great! @DoumanAsh could you review this pr. |
let (param_name, new_left) = split_once_and_trim(left, '='); | ||
if param_name.len() == 0 { | ||
return Err(::error::ParseError::Header); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we check for new_left != 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but not necessary. Line 294 header::parse_extended_value
will error out if ext_value
is empty, which includes cases where new_left.len() == 0
. The nested if block starting at L305 falls into the branch starting at L334, if new_left
is empty, which accepts empty values. But check in L342 will ensure it error out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if new_left.len() == 0
it wouldn't fall into the line 289 branch.
I just feel like it would be better to leave earlier in bad case instead of checking it out.
Though I'm not sure should we just ignore such parameter or error....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't? Whether falling into L289 only dependent on the presence of trailing asterisk of param_name
.
Fine, I will add that check in a new commit.
I am also not sure and fail to find related statements in those standards or elsewhere. But I do not think it is really a problem in practice because that is generated by browsers and does not rely on end-users input. Besides, what the lenient parsing implementation wants to do is just ensuring whatever directly submitted by end-users (typically, file names) are parsed correctly as long as browsers produce headers correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think that the current version is ok as it is unlikely to happen then leave it as it is 😄
left = new_left; | ||
if param_name.ends_with('*') { | ||
// extended parameters | ||
let (param_name, _) = param_name.split_at(param_name.len() - 1); // trim asterisk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why split_at
instead of slicing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I just forget to use...
escaping = false; | ||
quoted_string.push(c); | ||
} else { | ||
if c == 0x5c // backslash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it would be better to use char literals here instead of raw byte?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the following comments.
let mut quoted_string = vec![]; | ||
let mut end = None; | ||
// search for closing quote | ||
for (i, &c) in left.as_bytes().iter().skip(1).enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why use reference for &c
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To dereference &u8
into u8
? I think as_bytes().iter()
produces &u8 instead of u8, am I right?
I am not sure how to write that idiomatically in Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, I see.
If I'm not mistaken though you can omit &
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After change &c
to c
in line 311:
error[E0277]: can't compare `&u8` with `{integer}`
--> src/header/common/content_disposition.rs:316:34
|
316 | if c == 0x5c // backslash
| ^^ no implementation for `&u8 == {integer}`
|
= help: the trait `std::cmp::PartialEq<{integer}>` is not implemented for `&u8`
error[E0277]: can't compare `&u8` with `{integer}`
--> src/header/common/content_disposition.rs:319:41
|
319 | } else if c == 0x22 // double quote
| ^^ no implementation for `&u8 == {integer}`
|
= help: the trait `std::cmp::PartialEq<{integer}>` is not implemented for `&u8`
error[E0308]: mismatched types
--> src/header/common/content_disposition.rs:331:59
|
331 | let quoted_string = String::from_utf8(quoted_string)
| ^^^^^^^^^^^^^ expected u8, found &u8
|
= note: expected type `std::vec::Vec<u8>`
found type `std::vec::Vec<&u8>`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I was sure Rust would copy it as it is, but it seems you went for right syntax originally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I am often messed up by those rules/implications too.
let mut quoted_string = vec![]; | ||
let mut end = None; | ||
// search for closing quote | ||
for (i, &c) in left.as_bytes().iter().skip(1).enumerate() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe use .chars()
here?
This way you could just collect it into string, instead of going through parsing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what I attempt at first, and then I find that the following split_at
panics. Then I get to know Char
, which is produced by the .chars()
iterator, is different from u8
in Rust. But split_at
takes byte offset instead of Char offset. Seems that this is also not idiomatic... Any suggestions to improve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see... Yes, in this case chars are nasty as they are themself four bytes and doesn't represent actual character's byte len
} | ||
left = left.split_at(end.ok_or(::error::ParseError::Header)? + 1).1; | ||
left = split_once(left, ';').1.trim_left(); | ||
// In fact, tt should not be Err if the above code is correct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tt
-> it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a work you did! Thank you.
I'm feeling that it would be better to use full fledged parser for Content-Disposition
but since our parsing is quite lenient I suppose it is not really necessary.
One thing I'd like to ask:
How should we handle cases when param's value is missing?
I'd like to add such tests to be sure that we would react correctly
@DoumanAsh Thanks for your patient review! I assume cases where param's value is missing will only appear in |
@Gowee I see that code covers the case, but I didn't notice unit tests for that. |
Several test cases are added and tweaks have been made accordingly. : ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only two minor comments, but overall it is good :)
|
||
impl<'a> From<&'a str> for DispositionType { | ||
fn from(origin: &'a str) -> DispositionType { | ||
match origin.as_bytes().to_ascii_lowercase().as_slice() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it makes it more elegant, but I think it would be better to use non-allocating comparison using eq_ignore_ascii_case
}; | ||
} | ||
cd.parameters | ||
.push(match param_name.to_ascii_lowercase().as_str() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please use eq_ignore_ascii_case
? Know it is not so elegant but even more effectiveness 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. but I am not sure what is the idiomatic way to write that. is a long if
block fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the only way sadly, you may wish to move if block outside of cd.parameters.push
for a better read-ability
|
||
impl fmt::Display for DispositionParam { | ||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | ||
// TODO: More special charaters in filename should be escaped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw we already use regex for example, so if you think we'd need some more characters you could add it here too using regex crate I guess.
@fafhrd91 The commit makes breaking changes to |
Codecov Report
@@ Coverage Diff @@
## master #461 +/- ##
==========================================
+ Coverage 74.05% 74.34% +0.29%
==========================================
Files 107 107
Lines 16790 17050 +260
==========================================
+ Hits 12433 12675 +242
- Misses 4357 4375 +18
Continue to review full report at Codecov.
|
…Dispostion display in content_disposition.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor comment. We also need to consider breaking changes, if @fafhrd91 says it is a bit earlier, we might need to make DispositionParam
without breaking changes.
@@ -467,17 +475,23 @@ impl fmt::Display for DispositionType { | |||
|
|||
impl fmt::Display for DispositionParam { | |||
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result { | |||
// TODO: More special charaters in filename should be escaped. | |||
// All ASCII control charaters (0-30, 127) excepting horizental tab, double quote, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
horizental
-> horizontal
|
I think breaking change is fine for this |
If necessary, could you please help me write CHANGES.md instead? I might fall into a sleep soon 😴 and be a little busy tomorrow. |
@Gowee ahah, ok, I'll add changes myself. Thank you for your work |
This PR is going to solve several bugs that exist in content_disposition.rs and improve the ergonomics of extracting data out of
Content-Disposition
header with actix-web.The previous implementation:
Content-Disposition
if it contains special characters like " (double quote) and ; (semicolon) due to its implementation limits;Content-Disposition
if it contains parameter values in UTF-8 asquoted-string
(e.g.form-data; name=upload; filename="文件.webp"
). UTF-8 inquoted-string
is not compliant with old RFCs, however:Content-Disposition
at all from HTML forms submitted using Chrome and Firefox if non-ASCII characters are present in file name.Content-Disposition
header without escaping some characters, resulting in unexpected file name truncation. For example, if a file called quote"here.zip is served withfs::StaticFiles
, then browsers will see it asquote
with all characters after the double quote stripped.Besides solving these problems listed, this commit also:
Content-Disposition
as response header used to indicate how to open files and file names to be saved as) and [RFC7578] (Content-Disposition
generated for submitting HTML forms ofmultipart/form-data
).Content-Disposition
generated by implementations compliant with RFC6266 and RFC7578 under most circumstances. (But, it could also possibly accept some malformed headers.)filename
andfilename*
equally, and users cannot distinguish between them, which is not wanted in some cases).https://github.com/Gowee/actix-web/blob/c4e29eb1dc0bb3339207f0ca22cbcca48bb46012/src/header/common/content_disposition.rs#L240