I am using Goutte to scrape a couple of sites and a few of them provide UTF-8 content but only set "text/html" as the Content-Type, thus making the DomCrawler assume it is ISO-8859-1 which results in double-encoded UTF-8 strings in the returned DOMDocument (and in the results for text() and so on).
Right now I am working around this by extending Goutte\Client and overriding createCrawlerFromContent, calling the parent method with ";charset=UTF-8" added to the type when there is no charset attribute. Probably not a really good way to do it, so I didn't want to make a pull request just yet.
My main point is that this took me quite a while to figure out and Goutte could probably be more convenient/save other new users from falling into the same trap by letting users specify an encoding. Besides that, thanks for a great library!
@aleksblendwerk, this can be what you're after:
Maybe Goutte/DomCrawler already can do that and I'm not aware how setting names for them are called.