Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newsfeed module cannot show description right. Wrong HTML handling? #2712

Closed
Jochbart opened this issue Nov 18, 2021 · 37 comments · Fixed by #2736
Closed

Newsfeed module cannot show description right. Wrong HTML handling? #2712

Jochbart opened this issue Nov 18, 2021 · 37 comments · Fixed by #2736
Labels

Comments

@Jochbart
Copy link

Hi together,
somehow, the newsfeed module cannot handle the RSS feed right. I tried different sources but every time the same at my MagicMirror:

image

Thank you in advance.

@sdetweil
Copy link
Collaborator

can u post the rss url

@Jochbart
Copy link
Author

Jochbart commented Nov 18, 2021

@sdetweil
Copy link
Collaborator

thanks.. the data is encoded, but the module does not attempt to decode it

<item>
<title>
<![CDATA[ &raquo;Thomaidis Gewerbepark Hanau&laquo; fast fertig ]]>
</title>
<link>https://www.main-echo.de/7421848?utm_source=rssfeed&utm_medium=rss&utm_campaign=rss-aschaffenburg</link>
<dc:creator>
<![CDATA[ Pressestelle der Stadt Hanau ]]>
</dc:creator>
<pubDate>Thu, 18 Nov 2021 16:54:09 +0000</pubDate>
<description>
<![CDATA[ &raquo;Da&shy;mit er&shy;f&auml;hrt der Ha&shy;nau&shy;er Ha&shy;fen den gr&ouml;&szlig;&shy;ten Ent&shy;wick&shy;lungs&shy;schub seit lan&shy;gem&laquo;, freut sich Ha&shy;n&shy;aus Ober&shy;b&uuml;r&shy;ger&shy;meis&shy;ter Claus Ka&shy;mins&shy;ky &uuml;ber die An&shy;sied&shy;lung des in&shy;ha&shy;ber&shy;ge&shy;f&uuml;hr&shy;ten Spe&shy;di&shy;ti&shy;ons&shy;un&shy;ter&shy;neh&shy;mens Gebr. Tho&shy;mai&shy;dis GmbH auf dem ehe&shy;ma&shy;li&shy;gen Ca&shy;bot-Ge&shy;l&auml;n&shy;de im Ha&shy;nau&shy;er Ha&shy;fen. Der Im&shy;mo&shy;bi&shy;lien&shy;in&shy;ves&shy;tor und Pro&shy;jekt&shy;ent&shy;wick&shy;ler Al&shy;pha In&shy;du&shy;s&shy;trial schafft dort bis En&shy;de 2021 den &raquo;Tho&shy;mai&shy;dis Ge&shy;wer&shy;be&shy;park Ha&shy;nau&laquo; auf dem 3,5 Hektar gro&shy;&szlig;en Areal in der Jo&shy;sef-Bautz-Stra&shy;&szlig;e, das er vom Main-Kin&shy;zig-Kreis er&shy;warb. ]]>
</description>
<content:encoded>
<![CDATA[ &raquo;Da&shy;mit er&shy;f&auml;hrt der Ha&shy;nau&shy;er Ha&shy;fen den gr&ouml;&szlig;&shy;ten Ent&shy;wick&shy;lungs&shy;schub seit lan&shy;gem&laquo;, freut sich Ha&shy;n&shy;aus Ober&shy;b&uuml;r&shy;ger&shy;meis&shy;ter Claus Ka&shy;mins&shy;ky &uuml;ber die An&shy;sied&shy;lung des in&shy;ha&shy;ber&shy;ge&shy;f&uuml;hr&shy;ten Spe&shy;di&shy;ti&shy;ons&shy;un&shy;ter&shy;neh&shy;mens Gebr. Tho&shy;mai&shy;dis GmbH auf dem ehe&shy;ma&shy;li&shy;gen Ca&shy;bot-Ge&shy;l&auml;n&shy;de im Ha&shy;nau&shy;er Ha&shy;fen. Der Im&shy;mo&shy;bi&shy;lien&shy;in&shy;ves&shy;tor und Pro&shy;jekt&shy;ent&shy;wick&shy;ler Al&shy;pha In&shy;du&shy;s&shy;trial schafft dort bis En&shy;de 2021 den &raquo;Tho&shy;mai&shy;dis Ge&shy;wer&shy;be&shy;park Ha&shy;nau&laquo; auf dem 3,5 Hektar gro&shy;&szlig;en Areal in der Jo&shy;sef-Bautz-Stra&shy;&szlig;e, das er vom Main-Kin&shy;zig-Kreis er&shy;warb. ]]>
</content:encoded>
</item>

@sdetweil
Copy link
Collaborator

sdetweil commented Nov 18, 2021

the spec says html-entities are allowed.. these seem language specific.

&shy;
&auml;
&ouml;
&laquo;
&uuml;
&szlig;

but they display correctly here (5 of the 6)
­
ä
ö
«
ü
ß

none of the decoders I can find handle those.

there is an approach it seems

create a textarea html object,
post this as the html content
extract the text content
https://stackoverflow.com/questions/7394748/whats-the-right-way-to-decode-a-string-that-has-special-html-entities-in-it/7394787


function decodeHtml(html) {
    var txt = document.createElement("textarea");
    txt.innerHTML = html;
    return txt.value;
}

@rejas
Copy link
Collaborator

rejas commented Nov 18, 2021

ah, good old german umlauts :-)

@rejas
Copy link
Collaborator

rejas commented Nov 18, 2021

but the "­" dont get decoded neither :-(

@Jochbart
Copy link
Author

the spec says html-entities are allowed.. these seem language specific.

&shy;
&auml;
&ouml;
&laquo;
&uuml;
&szlig;

but they display correctly here (5 of the 6) ­ ä ö « ü ß

none of the decoders I can find handle those.

there is an approach it seems

create a textarea html object, post this as the html content extract the text content https://stackoverflow.com/questions/7394748/whats-the-right-way-to-decode-a-string-that-has-special-html-entities-in-it/7394787


function decodeHtml(html) {
    var txt = document.createElement("textarea");
    txt.innerHTML = html;
    return txt.value;
}

Hi sdetweil,
thanks for your help. I will try to realize your mentioned solution, but I am still a big noob if it is about programming and its logic. But I cannot imagine that I am the only German user of this module and has this problem. How can this be?

@sdetweil
Copy link
Collaborator

I was not suggesting YOU solve this.

I confirmed the problem and the cause.

The newsreader module does not handle the encoded description.

I do not know how to resolve it. The few things I tried do not provide a fix.

@rejas
Copy link
Collaborator

rejas commented Nov 19, 2021

@Jochbart you need to add

encoding: "ISO-8859-1",

to the section of that newsfeed in your config.js. Then the umlaute etc should appear correctly.

@Jochbart
Copy link
Author

@Jochbart you need to add

encoding: "ISO-8859-1",

to the section of that newsfeed in your config.js. Then the umlaute etc should appear correctly.

I added this already and my news headline is right.

@rejas
Copy link
Collaborator

rejas commented Nov 20, 2021

Happy to hear that. So this issue can be closed @Jochbart ? Or would you say that the documentation would need some clarification regarding this?

@Jochbart
Copy link
Author

Happy to hear that. So this issue can be closed @Jochbart ? Or would you say that the documentation would need some clarification regarding this?

No, we miss understood us. I added this already before I opened this issue. It helped me to encode the news headline right, but I have this problem in the news description. You can see it in the screenshot I posted: „wächst“ is right, but the description is still wrong with html commands.

@rejas
Copy link
Collaborator

rejas commented Nov 20, 2021

Ah I see, yes, the description still has the encoding issues.

Not sure if it is something the MM should fix (replacing the special html entities with "correct" chars) or if that is something the used iconv-lite library needs to adress.

What doy ou think @sdetweil ?

@sdetweil
Copy link
Collaborator

sdetweil commented Nov 20, 2021

iconv is used to decode the entire fetch payload, so it 'should' have worked on the description too..

response.body.pipe(iconv.decodeStream(encoding)).pipe(parser);

but adding debug shows title not converted correctly

"title": " �PNV-Ausschuss:

which matches iconv doc

Untranslatable characters are set to � or ?. No transliteration is currently supported.

so I think something else would have to be done.
and iconv has already mangled the content

@rejas
Copy link
Collaborator

rejas commented Nov 28, 2021

I noticed, that using something like "ä" (ä) in the title field of the newsfeed-config isnt displayed correctly, while using the same chars in the title field of a calendar module entry is displayed as "ä".

Maybe this has something to do with the newsfeed module using the nunjuck templates while the calendar module creates it DOM "by hand" (with javascript)?

Will have to investigate further...

@rejas
Copy link
Collaborator

rejas commented Nov 28, 2021

... so adding " | safe" to the tags in the nunjuck template would solve the issue and display the umlauts like they are supposed to.

Question is: Does this open up security issues in the case of a malicious newsfeed? What are your opinions about that @sdetweil @MichMich @khassel ?

@MichMich
Copy link
Collaborator

Yeah that's a good one. I'm not a big fan of adding thousands of option flags. But maybe this should be based on a toggle: allowSuperUnsafeContentToBeDisplayedWhileIThrowAwayAllMySecurityAndIPromiseNotToSueAnyMagicMirrorContributor = true

😅

@sdetweil
Copy link
Collaborator

I have no experience with the issues raised, so no idea

@khassel
Copy link
Collaborator

khassel commented Nov 28, 2021

my knowledge about nunjuck tends to zero, so can't say what implications the "safe" tag would have ...

@rejas
Copy link
Collaborator

rejas commented Dec 21, 2021

Yeah that's a good one. I'm not a big fan of adding thousands of option flags. But maybe this should be based on a toggle: allowSuperUnsafeContentToBeDisplayedWhileIThrowAwayAllMySecurityAndIPromiseNotToSueAnyMagicMirrorContributor = true

😅

Updated my PR with a new config option "dangerouslyDisableAutoEscaping" :-)

@JoeSouthern
Copy link

JoeSouthern commented Jan 2, 2022

There was a similar question here - https://forum.magicmirror.builders/topic/16100/character-set-for-news-fed-text-apos/9

Here is how I solved it -

I actually made a fix for this that works fine for me - Not sure how to present it - but in the newsfeed.js make the following changes

  1. At the end of the defaults of the newsfeed.js add the line replaceMe: [] as shown below -

    logFeedWarnings: false,
    replaceMe: []       
    

    },

  2. In the getTemplateData: function () { add the following before the return { loaded: true,

    basically everything in the jep section all between the //*******
    return item;

     });
     //*******
     //jep  to fix title for various translations such as
     // a simple ' instead of showing &apos;
     // also replace things like Seattle with Seattle, WA
     var tempTitle = item.title; // jepFixTitle(item.title);
    
     for(let i = 0; i <  this.config.replaceMe.length; i+= 2)
     {
         tempTitle = tempTitle.toString().replaceAll(this.config.replaceMe[i], this.config.replaceMe[i+1]);
     }
     //**********************
    
     return {
    
         loaded: true,
    

2A. In the return section - change the jep line as shown below

publishDate: moment(new Date(item.pubdate)).fromNow(),
title: tempTitle, //jep see above
description: item.description,

  1. Finally - in your config.js - the area for the newsfeed - at the end add this and edit it for the items you want changed...

replaceMe: [ "'", "'", "Seattle", "Seattle, WA", "Biden", "(Pres) Biden", "Zuckerberg", "Zuckerberg [DATA]"]

This will replace the 1st item with the 2nd item, etc.... Add whatever translations you want (: I've had some fun with the replacements.....

(it's (the forum) reformating the 1st item in the array - it should be without space - & a p o s ; )

@MichMich
Copy link
Collaborator

MichMich commented Jan 2, 2022

In that case it might be smarter to let the user set an optional transform callback. That callback can then be set in the config and can do anything you want. :)

@JoeSouthern
Copy link

Not sure what you mean. If you don't want it to transform though, your array in the config file can be blank (or not exist) and it won't transform anything - the default in the newsfeed.js is a blank array. But, I'm probably not understanding. I'm kind of a newbie on the MM.

@sdetweil
Copy link
Collaborator

sdetweil commented Jan 2, 2022

he is suggesting to let you specify that replace routine and its data IN the config.js, and MM would call it if present..(optional)

that way everything is outside MM, and the routine can be as fancy or not as needed (up to the user)

transform means change (which u are doing)

@JoeSouthern
Copy link

I'm not sure exactly how to do that.
That was my first idea to contain it in the config but I had to intercept the call for the title of the newsfeed prior to being pushed to MM and replace characters at that point.
Any way you do it I think you're stuck with an edit in the newsfeed file. I am 'no' expert though!!! (:

@sdetweil
Copy link
Collaborator

sdetweil commented Jan 2, 2022

yes, have to change newsfeed..

but it could look like this in config.js

module:
config: {
   newStrings:[]  // your array
   replaceFunc: function(old_title){ 
              let newtitle=oldtitle
              for(let i in newStrings){
                newtitle=newtitle.replaceAll(newStrings[i], newStrings[i+1])
              }
              return newtitle
              }

same code u had in newsfeed.js
replaceFunc is whatever is the code in mm change

its code is

   if(module.config.replaceFunc)
      title=module.config.replaceFunc(old_title)

@JoeSouthern
Copy link

Interesting. Yeah, I worked it a bit to try to minimize changes to the newsfeed, but this reduces it further. Nice Sam.

@sdetweil
Copy link
Collaborator

sdetweil commented Jan 2, 2022

Mich's idea, i was just commenting on how it might look..

do we need 2 handlers,

one for title,
one for contents?

@JoeSouthern
Copy link

And yes - thanks Mitch - I'm still learning what I can get away with in the config file (:

Yes, two handlers I'd suspect - for me - I'm just displaying titles so that's as far as I went, but I did think about that for the contents. Should be just a duplicate of the process I set up I'd think. Run it through the same array. The description is at the same place as where I intercepted the title.

I actually thought about having another array that was a 'notification' - same principal as what I already did.

So you have an another array that has trigger words - so say the headline has 'Market Drops', or 'Inflation' etc - then you have it display a notification (at the top of MM) that would then display it longer - say 1 minute. I have my headlines at about 15 seconds I think. It seems that would also be a cool feature that would only require slight additions to what I showed.

I hope what I set up (and perhaps modified as you two showed/suggested) works for people.

@MichMich
Copy link
Collaborator

MichMich commented Jan 2, 2022

I would recommend one. Which could be something like:

config: {
    // ...
    itemTransformer: function (item) {
        item.title = item.title.toUpperCase()
        item.description = item.description.replace('foo', 'bar')

        return item
    }
}

(disclaimer: code not tested. Just as an example)

If a method becomes huge, it could always be extracted to an external file.

This same principle could be used for the calendar module to solve the issues like reoccurrence count.

@sdetweil
Copy link
Collaborator

sdetweil commented Jan 2, 2022

@MichMich yes, cool.. I am loathe to expose object cause you don't know what other trouble you will get into.. but
whatever..

@MichMich
Copy link
Collaborator

MichMich commented Jan 2, 2022

The implementation in the module itself is super simple, the default transformer is just:

(item) => item

After all the transformations we should filter out any item that is null, that way, if a transformer returns null, the item gets removed. This way a user can use the transformer to filter the array.

@MichMich
Copy link
Collaborator

MichMich commented Jan 2, 2022

@MichMich yes, cool.. I am loathe to expose object cause you don't know what other trouble you will get into.. but
whatever..

Oh it will definitely bring users in trouble, but it's an awesome way to learn javascript. And the answer is simple: "your transformer is messed up" :)

Of course it's optional. So it won't bother new users.

It might be smart to do to make it part of the feeds module, or allow a generic itemTransformer, and one per feed. That way you have the option to apply the transformer on specific feeds.

@rejas
Copy link
Collaborator

rejas commented Jan 2, 2022

js code in config... that would be the first here, right?

the answer to people who run into this might be simple. but until then they ask in the forum, scratch their head, then get the answer, but then have to fiure out how to correct it... i see more work then less work :-(

@sdetweil
Copy link
Collaborator

sdetweil commented Jan 2, 2022

yeh, fun times...

i will add the new field (function) to the newsfeed form for MMM-Config..

and we should probably make a pinned forum topic, if we publish this kind of thing.

@MichMich
Copy link
Collaborator

MichMich commented Jan 2, 2022

I think since it's optional people will only know about it when they read the docs. It might be worth adding a big note there: this is for advanced users and needs some self-solving ability. :)

@stale
Copy link

stale bot commented Apr 17, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants