Skip to content

charset in iframe is not utf8 #184

Closed
davidshen84 opened this Issue Sep 23, 2012 · 29 comments

3 participants

@davidshen84

i create a epicEditor with default setting, and entered below text:

#empty

   content

then i get the content from the editor by using:

var data = editor.exportFile('myfile');

when i post the data to my server, i saw

"#empty\n\n  content"

the spaces are incorrectly encoded

@OscarGodson
Owner

Hmm, I can't reproduce this on my end since I use it everyday with spaces. Do you think adding the UTF8 meta tag will fix this? Could you send a pull request of a fix? If not ill try adding the UTF8 meta tag in another branch and let you try it out.

@mxswd
mxswd commented Sep 23, 2012

Yeah I get this too. It's really annoying. I had to filter unicode out to fix it.

When I use 4 spaces and export, I get 2 unicode chars at the start of the line. So when I pass it to be formatted, it doesn't see the code block.

I can write up a test tomorrow...

@OscarGodson
Owner
@OscarGodson
Owner

Yeah, tried even changing languages and no luck. What browser, OS, and keyboard setup are you guys on?

@mxswd
mxswd commented Sep 23, 2012

Ok here is the bug.

maxpow4h/EpicEditor@c4969d2

Get the gh-pages branch and make apply that patch / make those 2 changes.
Run the ruby app.rb or setup a localhost one of your own.

In the editor write up something like

asdf

    asdf

asdf

Then hit the try button, it does a ajax post to /api so you can inspect the chars properly (I think webkit filters the console.log?...)

You will see something like this:

But if you inspect the chars, you should see

This is on Mac, Chrome, Japanese Keyboard in Romaji.

@OscarGodson
Owner
@OscarGodson
Owner
@OscarGodson
Owner

Trying to test it but when I try to install Sinatra:

⨀_⨀ gh-pages EpicEditor $ gem install sinatra
Fetching: rack-protection-1.2.0.gem (100%)
Fetching: sinatra-1.3.3.gem (100%)
Successfully installed rack-protection-1.2.0
Successfully installed sinatra-1.3.3
2 gems installed
/usr/local/rvm/gems/ruby-1.9.3-p194/gems/json-1.7.4/lib/json/ext/parser.bundle: [BUG] Segmentation fault
ruby 1.8.7 (2012-02-08 patchlevel 358) [universal-darwin12.0]

Not much of a ruby guy. looking into it tho

@OscarGodson
Owner

So, I did a test with a node framework (Geddy). I checked my DB and see:

No special characters. This is with the following code:

I then tried looking at what gets output into the browser if I empty the contents raw, but I just see actual spaces:

I'm thinking this is either a Ruby or japanese keyboard specific thing. Here's a stackoverflow thread on something similar if not the same issue:

http://stackoverflow.com/questions/11512592/ruby-markdown-string-processing-issue-with-encoding

The guy there said it was not an EpicEditor issue, but if there's a fix I can make on the front-end side to fix Ruby I'd love to do it. Unfortunately I'm not well versed in Ruby and Sinatra doesn't seem to want to install on my machine. Any help debugging that error or if you use another framework or just raw ruby would help :)

UPDATE:
Oh good, the new skitch evernote links dont work at all creating broken images. Fantastic.

@mxswd
mxswd commented Sep 23, 2012

Ok, can you try using this

var express = require('express');
var app = express();

app.use(express.bodyParser());
app.post('/api', function(req, res) {
  var r = req.body.text;
  var b = new Buffer(r, 'ascii');
  console.log(b.toString());
  res.send("ok");
});

app.use("/", express.static(__dirname));

app.listen(3000);

And using a POST like this

$.ajax({
    url: '/api',
    type: 'POST',
    contentType: 'application/json',
    data: JSON.stringify({"text": editor.exportFile()}),
});

That will treat it like ASCII, so if you see this in your terminal.

Then we both have the bug.

I don't think it's a Ruby bug, but it could be something with my keyboard. I'm still testing that out.

EDIT: The post on stack overflow has an server side fix by doing:

bad = "#{194.chr}#{160.chr}".force_encoding('utf-8')
good = 32.chr
self.body = body.gsub(bad, good)

So I think these are the chars we should be looking at debugging.

@davidshen84

@OscarGodson my system configuration is pure English with language supporting Chinese. And I was inputting English using English keyboard setting.

@mxswd
mxswd commented Sep 24, 2012

I just tried the U.S. keyboard on Safari and got the same bug.

@OscarGodson
Owner

Well, in the example you have above you're converting the encoding to ascii, right? So, no matter what the encoding was before it'll be messed up because you're converting a rich encoding down to ascii. This is why, by default, this works out of the box with in Node (express, geddy, etc). You're manually trying to convert to ascii.

The original ticket says "charset in iframe is not utf8". There's a simple way to test this:

editor.getElement('editor').characterSet

Put that in your console on http://epiceditor.com or locally or wherever. For me it says UTF-8. Does it for you two?

I believe your Rails(?) app or other kind of Ruby app is not setting the character set header correctly or at all... I think.

When you load your app thats giving you this issue is the Content-Type:text/html; charset=UTF-8 set in the headers? In Node if you dont force ascii, UTF8 is the default and why I'm not seeing this in my apps.

Can you put that JS in your console and also see what headers your app is sending out?

@davidshen84
@OscarGodson
Owner
@davidshen84
@OscarGodson
Owner
@OscarGodson
Owner

@davidshen84 what do you see when you do the characterSet call in your app in Chrome's console? Is it the same or different?

@davidshen84
@OscarGodson
Owner
@mxswd
mxswd commented Sep 24, 2012

I think it is a problem with the editor. Try this on the default docs page.

tryItBtn.onclick = function () {
  var str = editor.exportFile();
  var i = 0;
  for (i = 0; i < str.length; i++) {
    console.log(str.charCodeAt(i));
  }
}

Where it should have 4 "32"'s for the 4 spaces, it has "32" "160" "32" "160". The "160"'s are the broken chars...

@OscarGodson
Owner

Awh, 160 is &nbsp; for the editor pane. I think I figured it out... WebKit converts no-break-breaking-spaces to unicode characters when you ask for innerText. Firefox gives the actual HTML entity in our code. I need to replace \u00a0 to ' '.

@OscarGodson OscarGodson added a commit that referenced this issue Sep 24, 2012
@OscarGodson Ticket #184 - Fixed character encoding bug where weird characters wou…
…ld appear in some cases on some platforms because they didnt understand no-break spaces (such as ascii)
5a239dd
@OscarGodson
Owner

The fix is in develop. Can you guys check it out and see if that actually does fix it? Now the tryItBtn only returns 4 32 character codes.

Thanks a lot @maxpow4h !!!

@mxswd
mxswd commented Sep 24, 2012

Awesome! Thanks for the fix!

@davidshen84

I think the issue has not been fix completely. When I edit the content,

  • then preview, the preview is correct;
  • then I export the file and post the data to my service, the data is correct;

But when I try to edit the content again, the 4 spaces are gone. This issue does not reproduce on http://epiceditor.com/, which does not have the latest developer version.

I think the editor needs to retain the char(160) for HTML to render the content in the editor-view correctly,

@OscarGodson
Owner

I think it has something to do with how you're saving it or giving it back. The develop branch is working as expected:
http://screencast.com/t/6oHLdyi5

Tested on Firefox too. You can't test against epiceditor.com because you're passing it through a backend and DB. Could you show me with jing maybe? Maybe let me see your logs or give me the exact steps to repro it?

@maxpow4h Did you ever test this? Did the fix work for you or not?

In the meantime I'll setup a quick geddy + mongo "app" and see if I can reproduce this.

@OscarGodson
Owner

Awh, yep. I was able to repro this by doing some stuff with geddy. Working on it! Something with the importFile method

@OscarGodson OscarGodson added a commit that referenced this issue Sep 30, 2012
@OscarGodson Ticket #184 - Fixed importing content where it wasn't readding the no…
…-break space in chrome which is a unicode character \u00a0. Firefox was fine because we already were replacing spaces with &nbsp;s
c680e28
@OscarGodson
Owner

OK @davidshen84 could you pull and try again :) *fingers crossed* - it worked for me in my quick little blogging app I just did.

@johnmdonahue we need some tests to test that spaces are preserved correctly. Not sure how to test that tho off the top of my head. If you ever get time let me know if you have any good ideas on how to test all these spacing cases.

@mxswd
mxswd commented Sep 30, 2012

@OscarGodson ah, I only export file, I don't import, so I never noticed.

edit: pulled latest version and added an importFile, works for me fine too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.