Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Dumping a string as properly escaped Crystal string literal #9953

Open
straight-shoota opened this issue Nov 21, 2020 · 15 comments
Open

Comments

@straight-shoota
Copy link
Member

The spec command generates Crystal source code looking like this (simplified): %(require "./#{path}").
path is an arbitrary path and may include characters or combinations of characters that lead to the resulting code being either illegal or messed up (for example if it contains ", #{ or {{).
Some problems can be fixed using String#dump. That would escape double quotes, string interpolation and any illegal characters. But it does not excape macro expressions. This could theoretically offer an oportunity for code injection. For the spec runner that's maybe not a relevant threat, but would still cause issues when you're working with funny paths.
I'm not aware of any solutions in the compiler or stdlib that make this safe to work. Except appling manual gsubs which are prone to being incomplete.

(This specific use case could probably also make use of Compiler access to construct the respective AST nodes programmatically instead of letting the parser do that. This should avoid problems with string literal transformation.)

I've faced this problem before with baked_file_system. Since then it has a specific encoder which should make sure the generated string literals are sane and don't try to expand macro expressions: https://github.com/schovi/baked_file_system/blob/master/src/loader/string_encoder.cr

There surely are other cases as well, either in the compiler or in user code (macro run programs would come to mind) where a string is supposed to be written as a string literal in Crystal code and not cause any unexpected effect.

Maybe we could add handling for macro delimiters to String#dump. But I don't think that's good. I it my understanding that String#dump is a pretty generic method to escape a string using common escape sequences and not specific to Crystal syntax. So even escaped interpolation might be unexpected behaviour when you're using this method for other context than Crystal code (or Ruby which has the same syntax).

So perhaps it would be better to have a separate method specifcally for escaping a Crystal string literal. Or it could be a flag on String#dump. I'm not sure what's the best API.

Comments?

@jhass
Copy link
Member

jhass commented Nov 21, 2020

What about a string literal that is immune to any interpretation, wouldn't that solve all these usecases? Something like %l(a literal \ and two literal {{ with two literal }} coming after, literal #{ too followed by a literal }), l for literal, but totally up to discussion of course.

Of course to then interpolate values you would have to use concatenation or String.build directly, but maybe that's not so bad for mostly generated code.

@jhass
Copy link
Member

jhass commented Nov 21, 2020

As a workaround for now I guess you could wrap the generated code into {% verbatim do %} {% end %}

@oprypin
Copy link
Member

oprypin commented Nov 21, 2020

This is just String.inspect, and it should simply be fixed to be safe against macros

@oprypin
Copy link
Member

oprypin commented Nov 21, 2020

it my understanding that String#dump is a pretty generic method to escape a string using common escape sequences and not specific to Crystal syntax.

inspect is made to be specific to Crystal, though

@straight-shoota
Copy link
Member Author

What about a string literal that is immune to any interpretation, wouldn't that solve all these usecases?

So %l would essentially be %q wrapped in verbatim 🤔
It still needs to escape the end delimiter, and currently that's not possible in percent string literals. But can probably be fixed relatively easily (#5403). And the escape method would need to support a configurable end delimiter.

@oprypin Yeah, that's actually a good point 🤔 🤔 I'd really like inspect to return what could be used as crystal code to recreate that exact value. At least where that's possible, primitives, simple data structures. Unfortunately #7525 has lead us in a different direction 😞

But with this use case in mind, it actually fits perfectly I think. Just add macro expression escapes.

@asterite
Copy link
Member

Can you provide an example of code injection? I also don't understand how this can happen given that macros are compile time things. Security concerns with code injection only happen at runtime.

@oprypin
Copy link
Member

oprypin commented Nov 22, 2020

Generally: security vulnerabilities always find a way. Someone will see how to exploit it even if you don't see right now.

Using Crystal compiler & source code as an implementation detail of some service turns this into a run-time injection.

Silly example: there's an online service that lets you download a binary which only prints the text that you supplied to it. It happens to be implemented by creating source code puts #{input.inspect}. You call it with download_binary?text=Hello {%`echo pwned`%}.

@jhass
Copy link
Member

jhass commented Nov 22, 2020

I don't think @straight-shoota's motivation here is runtime security though, but code generation safety.

@asterite
Copy link
Member

But macro code isn't executed inside string literals...

@oprypin
Copy link
Member

oprypin commented Nov 22, 2020

Hmmmm seems so :o

https://carc.in/#/r/9zi8

@straight-shoota
Copy link
Member Author

Both security and safety are a concern here.

Can you provide an example of code injection?

Say I'd run a service that provides ready-built custom hello world programs. Users can enter a name to be greeted and the service then compiles a binary with the user-provided name embedded. Obviously, it uses Crystal as a programming language because it's simply the best for this kind of software ;) When the user-provided name contains unescaped macro expressions, it could be used for code injections. A name like {{ `run_some_program` }} for example could compromise the build service.

Macro expressions in string literals do work inside a macro block:

{% begin %}
  puts "{{`echo ok`}}" # => ok
{% end %}

https://carc.in/#/r/9zi9

@asterite
Copy link
Member

Yes, but how can an string passed by a user get into that macro.

@asterite
Copy link
Member

Oh, I just read it. If you provide user input to a program, I think you have other concerns too, like using a sandbox, etc. At that point, having a library or something that escapes that code sounds good. I'm not sure it belongs to the std.

That said, I wouldn't mind a macro_escape method for String, but we'd have to be careful about explaining when to use it.

@asterite
Copy link
Member

Plus you already have this implemented. I just don't think this is a common use case, to be honest.

@straight-shoota
Copy link
Member Author

There isn't actually much that needs to change. I believe we can agree that the return value of String#inspect should be a valid and self-contained string literal in Crystal. This is almost the case except for macro expressions.
To fix this we have basically two options:

  • escape macro expression delimiters in String#inspect
  • don't evaluate macro expressions in string literals

I'd actually prefer the latter. I don't think it's useful to have macro expansion in literals, it can certainly be surprising. It's also not consistent because it doesn't even work outside of macro bodies (so perhaps it's just a mistake that it works inside?). If you need to expand a macro expression inside a string literal, you can just wrap it in string interpolation. The compiler could even resolve that interpolation at compiler time, so that "#{ {{ foo }} }" is completely equivalent to "{{ foo }}". Reads a bit mor ugly, but this is really a very rare use case. I'd rather have that than lots of escaped curly braces in String#inspect.

Furthermore we should also clarify the use case and behaviour of #dump. I'm actually not sure anymore what it is about.
Currently, it's exactly the same except that all non-ascii characters are also escaped. Crystal code must be UTF-8 and doesn't need unicode characters escaped. So the combunation of unicode escapes + string interpolation escapes doesn't seem to make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants