-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize upload file name #606
Conversation
server/handlers.go
Outdated
func sanitize(fileName string) string { | ||
return path.Base(fileName) | ||
t := transform.Chain(norm.NFD, runes.Remove(runes.In(unicode.Mn)), norm.NFC) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about unicode.Mn
: we should allow diactrics, and as far as I can see this will drop.
In general I'm little lost in unicode composition/decomposition and compatibility/canonical
can you please explain what the transformation will look like for the following:
straße
tëst
Also we should apply sanitize
everytime only in input, to remove extraneous character upon saving or processing
Especially is important to not introduce a breaking change when users will try to donwload files alread affected by the sanitization issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I did too agressive normalization.
with deleting unicode.Mn characters we get straße, without deleting straße
with deleting unicode.Mn characters we get test, without deleting tëst
list of characters https://www.fileformat.info/info/unicode/category/Mn/list.htm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we should apply
sanitize
everytime only in input, to remove extraneous character upon saving or processing
As I can see you do this only on input (put, post, zip, gzip, etc). So previously uploaded file even with '\n\r' downloading after this commit.
Filenames can cantain a lot of unusual codes, even utf-16 or cesu8, and I afraid that it can cause problem in some cases, may be not with transfersh itself, but with http servers (frontends). I'm a little bit paranoic
I will do as you say, but removing newline characters is most important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I found way better than use regexp
t := transform.Chain(
norm.NFD,
runes.Remove(runes.In(unicode.Cc)),
runes.Remove(runes.In(unicode.Cf)),
runes.Remove(runes.In(unicode.Co)),
runes.Remove(runes.In(unicode.Other)),
runes.Remove(runes.In(unicode.Zl)),
norm.NFC)
https://pkg.go.dev/unicode#pkg-variables
How about this lists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's good.
what' about adding unicode.Cs
and unicode.Zp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I can see you do this only on input (put, post, zip, gzip, etc).
please, let me double check, when I looked briefly I was not to able to catch every usage :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's good.
what' about adding
unicode.Cs
andunicode.Zp
?
I agree, it's a good offer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, fine :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's good.
what' about adding
unicode.Cs
andunicode.Zp
?
Added this to transform chain
Double check usage of sanitize funciton. I see usage only in virustotal, clamav and postHandler, putHandler, zipHandler and tarGzHandler
hi @rumanzo , thanks for the PR, please, see my comments |
We found a problem, that caused due too simple input normalize.
If we look into https://github.com/dutchcoders/transfer.sh/blob/main/server/handlers.go#L253 filename variable when we use GET method with HEADERS, we will see "\n\r" in variable, and it's lead to runtime error.
I realized filename normalization and trimming all newlines in user input in sanitize function