Skip to content

Conversation

@dag10
Copy link
Contributor

@dag10 dag10 commented May 16, 2025

Currently if a URL being requested has a tilde in the path, the clen() of the tilde will be reported as 0, causing it to stop parsing.

Tildes are valid in URIs and don't have to be percent-encoded. It's less clear to me on whether ~ is a valid symbol in header values, but it doesn't seem like any standards preclude it.

@dag10
Copy link
Contributor Author

dag10 commented May 16, 2025

One thing I'm wondering is if tilde being excluded from the valid character set in clen() also happened to prevent unintentionally serving files from the home directory. I'm not familiar enough with the codebase to ensure that such behavior wouldn't be possible as a result of this bugfix, and we'd want to be confident there's no such bug elsewhere in the program before merging this.

That said, if that is somehow a bug that would be possible with this fix, I'd argue that preventing that behavior by having an undocumented load bearing bug inside clen() isn't the correct place anyway.

@scaprile scaprile self-assigned this May 16, 2025
@scaprile
Copy link
Collaborator

Mongoose runs in several environments, not accessing the filesystem by itself but through your command. We "sanitize" the double dots only, AFAIK, and in some file related functions; don't know the exact reasons for excluding the tilde. I'll check.
We don't claim to be RFC compliant, said that, can you please point to normative references on the use of the tilde ? If you have that handy, I mean, otherwise I'll do my research.

@dag10
Copy link
Contributor Author

dag10 commented May 16, 2025 via email

@dag10
Copy link
Contributor Author

dag10 commented May 16, 2025

For URIs, looks like tildes are explicitly valid, and even encouraged to be sent without percent-encoding since server implementations might not perform encoding normalization before string comparison (RFC 3986 section 2.3).

They're also valid in the syntax for header field-name and field-value as per RFC 9110 sections 5.1 and 5.5:

  token          = 1*tchar

  tchar          = "!" / "#" / "$" / "%" / "&" / "'" / "*"
                 / "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
                 / DIGIT / ALPHA
                 ; any VCHAR, except delimiters

  field-name     = token

  field-value    = *field-content
  field-content  = field-vchar
                   [ 1*( SP / HTAB / field-vchar ) field-vchar ]
  field-vchar    = VCHAR / obs-text
  obs-text       = %x80-FF

VCHAR is defined as any visible ascii value in the range of 0x21-0x7E (! through ~ inclusively) per RFC 5234 appendix B.1.

@cpq
Copy link
Member

cpq commented May 19, 2025

@dag10 thank you, this PR looks good
could you sign the CLA at https://cesanta.com/cla.html to have it integrated, please?

@scaprile scaprile merged commit 662cc27 into cesanta:master May 21, 2025
scaprile added a commit that referenced this pull request May 21, 2025
We prepend current path to the URI, so a tilde could not be the first
char in a path. However, the same would happen for double dots, and
since we're already checking for that, it doesn't hurt to be on the safe
side for future's sake.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants