Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

form_prep() strips backslashes (\) and keeps htmlentities unescaped. #2477

Closed
Dumk0 opened this issue Jun 6, 2013 · 14 comments
Closed

form_prep() strips backslashes (\) and keeps htmlentities unescaped. #2477

Dumk0 opened this issue Jun 6, 2013 · 14 comments

Comments

@Dumk0
Copy link
Contributor

Dumk0 commented Jun 6, 2013

  1. I would like to edit in form field the exact text "foo bar", but when I pass this text to form_input(), then there is field with text "foo bar" on page.
  2. As long as Input class handles all POST, GET, etc superglobals and strips slashes on CodeIgniter load then it is unnecessary to stripslashes on form_prep() again or what was the purpose of that action?

I have made two test cases for form_prep() helper function which fail for me.

        $this->assertEquals(
            'Here is a string containing slashes / and \.',
            form_prep('Here is a string containing slashes / and \.')
        );

        $this->assertEquals(
            'Here is a string containing ampersand &.',
            form_prep('Here is a string containing ampersand &.')
        );

I guess the right way to process the form_prep() values is to convert all chars to entities and escape double quotes. Maybe You have also comments and/or solutions on this.

@narfbg
Copy link
Contributor

narfbg commented Jun 24, 2013

The first one is ... natural, I guess. The second is just not proper - the ampersand doesn't need escaping in a form field.

@AkenRoberts
Copy link
Contributor

Is there a specific problem you're having with the way form_prep() works? Because everything you've said so far is normal behavior for that function. If you have a specific use case where form_prep() is causing problems in your application, that's more helpful.

@Dumk0
Copy link
Contributor Author

Dumk0 commented Jul 9, 2013

Sorry,
case 1. has no "no break space" code shown in description, my problem was with this text foo bar as input string not foo bar.

I can explain my problem with another example, when putting this string into form field using form_field() helper:
<b> html test "double" 'single' /slash/ \backslash\ & char and entity &amp; </b>
as result I get the following text visible in the field on my HTML page:
<b> html test "double" 'single' /slash/ backslash & char and entity & </b>
in other words I would like to edit some text where html entity codes must remain as is, the same with backslashes.

I hope it is more clear now.

@narfbg
Copy link
Contributor

narfbg commented Jul 9, 2013

&amp; will always be visible as & in HTML, unless you put it inside a <pre> tag - there's nothing you can do about that. Also (I'm not 100% sure about that) you'd probably have to write \\ or maybe even a triple backslash in order for it to appear as a backslash, otherwise it's an escape character.

Otherwise, if you don't want backslashes being stripped - then don't use form_prep(). This is a framework, it is supposed to help you with processing input data. And here it seems to me that you want the input data to remain as is. Well, if you don't want it to help you, then don't use the helper functions.

@Dumk0
Copy link
Contributor Author

Dumk0 commented Jul 9, 2013

&amp; is ambiguous but &nbsp; is better example, users don't want to have it replaced to simple space when editing the string again.
Why to use a backslash as escape character in html tag attribute? The answer is you need it is to escape double quotes like <input value="some \"quoted\" value" />, why else, right?

Especially, there is a rule to prevent XSS:

escape all characters with ASCII values less than 256 with the &#xHH; format (or a named entity if available) to prevent switching out of the attribute

What problems you'll have if the form_prep will convert htmlentities before placing it to the value attribute?

@narfbg
Copy link
Contributor

narfbg commented Jul 9, 2013

To answer your second question - it's simply not necessary, the only thing that needs escaping in an input field are quotes, since they can break the page.
This brings me to your first - you just noted that quotes need to be escaped. They are, only form_prep() escapes quotes by replacing them with their respective HTML entities - &#39; for a single and &quot; for a double quote, no backslashes needed here.

@Dumk0
Copy link
Contributor Author

Dumk0 commented Jul 11, 2013

And here it seems to me that you want the input data to remain as is.

No, I want the input string to be shown as is, but in html tag attribute it must be escaped securely to avoid breaking the tag.

And if you say that quotes already escaped using replacement to HTML entities why not to escape other symbols as well? In my opinion form_prep() must prepare PHP $var value to be inserted into HTML value attribute. So why I have to do some extra preparation on top of it to have backslashes and htmlentities shown on html page as they are in my $var?

What are your fears about this change? I can do pool request if you wish.

@narfbg
Copy link
Contributor

narfbg commented Jul 11, 2013

Because, that may be what you want, but this doesn't mean it's what everybody wants. It should do what it is supposed to do - escape quotes, not less and not more.

@narfbg
Copy link
Contributor

narfbg commented Jul 11, 2013

Btw, you do realise that you could just use htmlspecialchars() instead of form_prep() to do what you want, don't you?

@Dumk0
Copy link
Contributor Author

Dumk0 commented Jul 11, 2013

Yes, I do. form_prep() is always called from form_input() helper, so I had to overload the function

function form_prep($str = '', $is_textarea = FALSE)
{
    if (is_array($str))
    {
        foreach (array_keys($str) as $key)
        {
            $str[$key] = form_prep($str[$key], $is_textarea);
        }

        return $str;
    }
//  This is how it works in core helper
//  if ($is_textarea === TRUE)
//  {
//      return str_replace(array('<', '>'), array('&lt;', '&gt;'), stripslashes($str));
//  }
//  return str_replace(array("'", '"'), array('&#39;', '&quot;'), stripslashes($str));
    return htmlspecialchars($str);
}

I can go on with this solution, but for me the logic of form_prep() is not what it should be. I understand that it might be a surprise for those who was doing htmlspecialchars() by their own before posting it to the form_input() functions. But if this change will go to next major release, then users anyways have to check migration manual and do changes in their apps accordingly when they upgrade, right.

@AkenRoberts
Copy link
Contributor

After investigating this some more, I believe @Dumk0 is right. form_prep() isn't where it should be.

If I want to enter the literal text &amp; into a form field, that's what I expect to be shown next time. Instead, it turns into & by itself -- which is normal and expected for HTML parsing, but it's not what I want to be shown in my form field. Same reason for the given example of foo&nbsp;bar -- if that was the literal value entered, that's what should be shown again.

So ampersands should be encoded. There should be little concern for double encoding upon submission, since foo&amp;nbsp;bar will be shown literally in the input field as foo&nbsp;bar, which is how it will be sent if the form is submitted again.

Upon testing the encoding of ampersands, however, there is a problem with double encoding when using set_value() and form_input() together. Both functions run the value through form_prep() which does cause double encoding issues.

Regarding stripslashes(), I don't see why it's there. Andrey, you yourself don't even know. ;-) I looked through the history of the form helper file, and you had done a major "removal" of form_prep() prior to reverting it back and making it in its current state. Maybe there is a good reason for stripslashes, but it isn't very evident.

@narfbg
Copy link
Contributor

narfbg commented Jul 12, 2013

I've never said that form_prep() is where it should be, in terms of where the framework uses it under the hood. I just don't agree that it should escape all HTML entities - that isn't the same thing. If we want it to act like htmlspecialchars(), then we shouldn't have it at all - we don't need native function aliases, we need features.

If you want to change what set_value(), form_input() and whatever else there is that uses form_prep(), then based on your reasoning that would most likely be fine. This issue here however is about form_prep() does, not where.

As for stripslashes() - yes, I didn't put it there personally and I wouldn't think of doing it, but also - yes, I'm not changing something that I don't know why somebody put there in the first place.

@ivantcholakov
Copy link
Contributor

A case:

You have a CKEditor attached to a textarea element created with form_textarea(). By using the editor you enter a HTML fragment, on saving this HTML is sanitized (allowed tags) and stored within a database field.

Now let us edit the stored HTML fragment the same way with CKEditor. We take this fragment from the database field and put it as an argument of form_textarea(). As a result, an escaped HTML fragment will be inserted inside the textarea tag. And the editor will show garbage.

Please, reconsider this change, form_prep() with the optional parameter $is_textarea was fine in this case.

@narfbg
Copy link
Contributor

narfbg commented Jan 20, 2015

CKEditor is made to work with raw HTML fields, it doesn't care about form_textarea().

And more importantly, we can't support every edge case, let alone third-party tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants