Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Children and self-closing tags #598

Closed
Yomguithereal opened this issue Nov 13, 2014 · 2 comments
Closed

Children and self-closing tags #598

Yomguithereal opened this issue Nov 13, 2014 · 2 comments

Comments

@Yomguithereal
Copy link

Hello,
I've stumbled upon a little issue while parsing XML files with cheerio and I do not know whether things have been intended this way or if there is really an underlying problem here.

If you try to retrieve children from a tag containing self-closing children, you will only get the children up to the first self-closing child.

However I noticed that some tags such as img that are popularly self-closed, don't abide by the same problem. I therefore think that a white list concerning those is used but does not work for the other ones.

Here is a tiny script illustrating the problem (working with cheerio 0.18.0) :

var cheerio = require('cheerio');

var $scxml = cheerio.load('<div><folder></folder><one /><two /><three /></div>'),
    $imgxml = cheerio.load('<div><folder></folder><img /><one /><two /><three /></div>'),
    $noscxml = cheerio.load('<div><folder></folder><one></one><two></two><three></three></div>');

console.log('Starting test...\n');

// With self-closing tags
console.log('With self-closing tags:');
$scxml('div').first().children().each(function() {
  console.log('--' + $scxml(this)[0].name);
});

// With self-closing tags and propably whitelisted ones
console.log('\nWith img and self-closing tags:');
$imgxml('div').first().children().each(function() {
  console.log('--' + $imgxml(this)[0].name);
});

// Without self-closing tags
console.log('\nWithout self-closing tags:');
$noscxml('div').first().children().each(function() {
  console.log('--' + $noscxml(this)[0].name);
});

console.log('\nDone');

And here is the console output if you run the said script:

Starting test...

With self-closing tags:
--folder
--one

With img and self-closing tags:
--folder
--img
--one

Without self-closing tags:
--folder
--one
--two
--three

Done
@sharmalalit
Copy link

Self closing tags are not working as expected in Cheerio.
Consider the html contained a self-closing tag.

var cheerio = require('cheerio'),
    $ = cheerio.load('<div> <img src="a.jpg" /> </div>');

var output  = cheerio.html(); 

output here is <div> <img src="a.jpg"> </div> and self-closing tag breaks.

If the html content contained some xml. Browser does not load the broken html.

@Yomguithereal
Copy link
Author

Nevermind, I found that using the xmlMode setting to true does solve this problem.

var $ = cheerio.load('XMLSTRING', {xmlMode: true});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants