Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

给人启发的魔法: 如何正确的把HTML字符串转换成DOM元素 #6

Open
HuarenYu opened this issue Mar 23, 2016 · 0 comments

Comments

@HuarenYu
Copy link
Owner

This seems like a trivial task. However, it is not. There are few popular solutions, but they have a big problem and if you use them a lot you will spot the issue.
这像是一个微不足道的任务,事实并非如此,虽然有许多解决方案,但是你会发现这些方案存在一个大问题,如果你经常使用的话。
Let's say that we have the following HTML markup:
让我们来说说这些问题吧,假如有下面一段html标签:

<section><a href="#">Link label</a></section>

If you google the problem you will find that the most popular solution involves the usage of .innerHTML property of a newly created element.
如果你Google搜索了这个问题的解决方案,你会发现使用innerHTML属性去创建新元素是比较流行的解决方案。

var notWorking = function(html) {
    var el = document.createElement('div');
    el.innerHTML = html;
    return el.childNodes[0];
}
console.log(notWorking('<section><a href="#">Link label</a></section>'));

The result of the above script is actually correct. You will get section element with link inside it. Everything seems ok and it looks like you did the trick. Now let's try with something else:
上面这段代码输出的结果是正确的,你会得到section标签内的元素,一切看起来完美,你好像成功的解决了这个问题。现在让我们来看看别的代码:

console.log(notWorking('<tr><td>Text Here</td></tr>'));

The result is only
输出的结果只有

Text Here

Hm ... the and tags are missing. This seems wrong isn't it. Actually that's kinda correct behaviour of the browser. I searched for similar problems and landed on this article by Eric Vasilik. Here is what he said back at 2006:
呵呵...和标签不见了。这似乎发生了什么错误,事实上这是浏览器正常的表现,Eric Vasilik的文章描述了这个问题
"When one sets the innerHTML property of an element, the string containing the HTML is run through the parser."
"Now, parsing something like "Foo" where there is no TABLE tag preceding the TR causes the parser to ignore the TR tag altogether."
当给innerHTML属性设置字符串时,字符串需要经过parser来转换,parser需要符合语法规范的字符串,标签只能包含在table标签中。
So, I realize that the string which is passed to .innerHTML shouldn't be just a mix of tags. It should be a valid DOM representation. If I want to use the function above I should pass

tag as well. And I did, I tried with the following code:
所以,我意识到传给innerHTML的属性的字符串需要符合语法规范才行,需要向这样:

console.log(notWorking('<table><tr><td>Text Here</td></tr></table>'));

输出结果

<table>
    <tbody>
        <tr>
            <td>Text Here</td>
        </tr>
    </tbody>
</table>

Which proves the words from Eric Vasilik. My method should somehow examine the passed HTMl and automatically adds the necessary tags. Along with that it should return not the first child node, but the correct nested element. Looks like a lot of work.
这证明了Eric Vasilik所说的,我的函数必须能够自动添加缺少的标签,这样的话就应该返回内嵌的标签才行,看起来需要做更多的工作。
Ok, but ... wait a minute. jQuery doesn't have this problem. You may pass

and still get the desired result. I started digging into the jQuery's code and found this:
好吧,但是jQuery没有这个问题,所以我决定看看jQuery的源码,有了如下发现:

// We have to close these tags to support XHTML (#13200)
wrapMap = {
    option: [ 1, "<select multiple='multiple'>", "</select>" ],
    legend: [ 1, "<fieldset>", "</fieldset>" ],
    area: [ 1, "<map>", "</map>" ],
    param: [ 1, "<object>", "</object>" ],
    thead: [ 1, "<table>", "</table>" ],
    tr: [ 2, "<table><tbody>", "</tbody></table>" ],
    col: [ 2, "<table><tbody></tbody><colgroup>", "</colgroup></table>" ],
    td: [ 3, "<table><tbody><tr>", "</tr></tbody></table>" ],
    // IE6-8 can't serialize link, script, style, or any html5 (NoScope) tags,
    // unless wrapped in a div with non-breaking characters in front of it.
    _default: jQuery.support.htmlSerialize ? [ 0, "", "" ] : [ 1, "X<div>", "</div>"  ]
},

This looks like the magic which solves the problem. I cheated a bit and got the code above. I managed to extract that functionality into a single method:
这就是解决这个问题的魔法,我借鉴上面的代码修改了我的函数:

var str2DOMElement = function(html) {
    /* code taken from jQuery */
   var wrapMap = {
        option: [ 1, "<select multiple='multiple'>", "</select>" ],
        legend: [ 1, "<fieldset>", "</fieldset>" ],
        area: [ 1, "<map>", "</map>" ],
        param: [ 1, "<object>", "</object>" ],
        thead: [ 1, "<table>", "</table>" ],
        tr: [ 2, "<table><tbody>", "</tbody></table>" ],
        col: [ 2, "<table><tbody></tbody><colgroup>", "</colgroup></table>" ],
        td: [ 3, "<table><tbody><tr>", "</tr></tbody></table>" ],

        // IE6-8 can't serialize link, script, style, or any html5 (NoScope) tags,
        // unless wrapped in a div with non-breaking characters in front of it.
        _default: [ 1, "<div>", "</div>"  ]
    };
    wrapMap.optgroup = wrapMap.option;
    wrapMap.tbody = wrapMap.tfoot = wrapMap.colgroup = wrapMap.caption = wrapMap.thead;
    wrapMap.th = wrapMap.td;
    var element = document.createElement('div');
    var match = /<\s*\w.*?>/g.exec(html);
    if(match != null) {
        var tag = match[0].replace(/</g, '').replace(/>/g, '');
        var map = wrapMap[tag] || wrapMap._default, element;
        html = map[1] + html + map[2];
        element.innerHTML = html;
        // Descend through wrappers to the right content
        var j = map[0]+1;
        while(j--) {
            element = element.lastChild;
        }
    } else {
        // if only text is passed
        element.innerHTML = html;
        element = element.lastChild;
    }
    return element;
}

The map of jQuery nicely shows me what exactly I should wrap my string in. There are few lines of code which find the root tag and its type. At the beginning I wondered what are this numbers in the wrapMap object for. Later I found this:
jQuery完美的展示了如何来包裹不完整的元素,仅仅几行代码就可以找到根标签和标签的类型

j = wrap[0];
while ( j-- ) {
    tmp = tmp.lastChild;
}

That was the code that returns the needed DOM element from the build tree. And the numbers were the level of nesting. Pretty simple, but I think, one of the most used features of jQuery.
这代码从构建树当中返回了需要的dom元素,并且是内嵌层的,非常简洁,但我认为是jquery最常用的功能。
There is a special case when you want to create a new body tag. The function above doesn't work because the div element could not have a body inside. Here is the fixed version.

var str2DOMElement = function(html) {
   var wrapMap = {
        option: [ 1, "<select multiple='multiple'>", "</select>" ],
        legend: [ 1, "<fieldset>", "</fieldset>" ],
        area: [ 1, "<map>", "</map>" ],
        param: [ 1, "<object>", "</object>" ],
        thead: [ 1, "<table>", "</table>" ],
        tr: [ 2, "<table><tbody>", "</tbody></table>" ],
        col: [ 2, "<table><tbody></tbody><colgroup>", "</colgroup></table>" ],
        td: [ 3, "<table><tbody><tr>", "</tr></tbody></table>" ],
        body: [0, "", ""],
        _default: [ 1, "<div>", "</div>"  ]
    };
    wrapMap.optgroup = wrapMap.option;
    wrapMap.tbody = wrapMap.tfoot = wrapMap.colgroup = wrapMap.caption = wrapMap.thead;
    wrapMap.th = wrapMap.td;
    var match = /<\s*\w.*?>/g.exec(html);
    var element = document.createElement('div');
    if(match != null) {
        var tag = match[0].replace(/</g, '').replace(/>/g, '').split(' ')[0];
        if(tag.toLowerCase() === 'body') {
            var dom = document.implementation.createDocument('http://www.w3.org/1999/xhtml', 'html', null);
            var body = document.createElement("body");
            // keeping the attributes
            element.innerHTML = html.replace(/<body/g, '<div').replace(/<\/body>/g, '</div>');
            var attrs = element.firstChild.attributes;
            body.innerHTML = html;
            for(var i=0; i<attrs.length; i++) {
                body.setAttribute(attrs[i].name, attrs[i].value);
            }
            return body;
        } else {
            var map = wrapMap[tag] || wrapMap._default, element;
            html = map[1] + html + map[2];
            element.innerHTML = html;
            // Descend through wrappers to the right content
            var j = map[0]+1;
            while(j--) {
                element = element.lastChild;
            }
        }
    } else {
        element.innerHTML = html;
        element = element.lastChild;
    }
    return element;
}
Text Here
@HuarenYu HuarenYu changed the title Revealing the magic: How to properly convert HTML string to a DOM element 给人启发的魔法: 如何正确的把HTML字符串转换成DOM元素 Mar 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant