Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newsfeed: Convert HTML entities to characters #3092

Conversation

peterkappelt
Copy link

Hey everyone, a local newsfeed includes HTML entities (like " and some Umlauts) rather than normal characters in their titles.

Looks quite strange:
image

This PR adds some code to properly convert those entities (basically anything starting with & and ending with ;) in a safe manner, while not requiring dangerouslyDisableAutoEscaping to be set.

Let me know if you've got any thoughts!

@khassel
Copy link
Collaborator

khassel commented Apr 17, 2023

I'm not sure but reading this issue and this PR this should already be possible by setting the newsfeed config option dangerouslyDisableAutoEscaping: true.

@peterkappelt can you test this? Or provide the feed url so I can test this?

@rejas maybe we should add the missing config option to the docs?

@peterkappelt
Copy link
Author

peterkappelt commented Apr 17, 2023

Hi,
yes, dangerouslyDisableAutoEscaping does indeed fix the issue - but at the cost of allowing any HTML and making the whole think prone to XSS and stuff (hence the dangerously...).

I'd prefer to not enable that flag on a production-like system. The current pull request is just making sure that the special HTML entities are displayed properly.

This is a feed that includes " and sometimes some other HTML entities for Umlauts and ticks: https://feeds.feedburner.com/blogspot/rkEL

@khassel
Copy link
Collaborator

khassel commented Apr 21, 2023

@rejas if you find some time (and because you implemented #2736):

I'm not very happy having 2 implementations to render html in the newsfeed title so I like to know your thoughts about this.

If we accept this PR we should revert the dangerouslyDisableAutoEscaping option.

On the other side we already have the | safe command in several nunjucks templates in other default modules and the whole mm project is not really safe.

@codecov-commenter
Copy link

Codecov Report

Merging #3092 (14ed764) into develop (7e58b38) will increase coverage by 0.01%.
The diff coverage is 0.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff             @@
##           develop    #3092      +/-   ##
===========================================
+ Coverage    25.94%   25.96%   +0.01%     
===========================================
  Files           53       53              
  Lines        11572    11583      +11     
===========================================
+ Hits          3002     3007       +5     
- Misses        8570     8576       +6     
Impacted Files Coverage Δ
modules/default/newsfeed/newsfeed.js 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

@rejas
Copy link
Collaborator

rejas commented May 5, 2023

My 2 cents in a hurry:
Maybe have some kind of "level" (like in the log function) for displaying newsfeed content instead of the boolean "dangerouslyDisableAutoEscaping" option only?
so something like "convertContent" with options for

  • using the method here
  • using dangerouslyDisableAutoEscaping method

@peterkappelt
Copy link
Author

Maybe have some kind of "level" (like in the log function)

If everyone agrees that that's the preferred solution I could definitely implement it that way, if it would help getting this PR merged.

Honestly, I don't really see the purpose of keeping an option that is neither documented nor recommended and prefixed dangerously. I assume it was created for some edge cases like the one mentioned in the initial comment, where some unusual characters are not displayed properly. IMHO those cases should be handled individually though (this PR handling all &quot;-like entities should already cover a lot of edge cases). Just telling people to set dangerouslyDisableAutoEscaping (which allows arbitrary code injection, like the feed owner injecting JS) when those kind of rendering errors show up is not the right way.

@sdetweil
Copy link
Collaborator

sdetweil commented Jun 5, 2023

@peterkappelt the problem is embedded chars in the newsfeed titles in sone countries, German umlats in particular. the default handling drops them.

to get it to work, requires enabling some translation function which does NOT CHECK for things that could be code, which could be a backdoor into your system.

which ugly choice do you prefer?

@rejas
Copy link
Collaborator

rejas commented Sep 13, 2023

hi @peterkappelt could you check out #3191 if that would solve your problem too?

rejas pushed a commit that referenced this pull request Sep 13, 2023
related to PR [#3092](#3092)

maybe best way is using `html-to-text` library

sample: `&quotHello World&quot` will be transformed to `"Hello World"`
@sdetweil
Copy link
Collaborator

sdetweil commented Oct 4, 2023

Maybe have some kind of "level" (like in the log function)

If everyone agrees that that's the preferred solution I could definitely implement it that way, if it would help getting this PR merged.

Honestly, I don't really see the purpose of keeping an option that is neither documented nor recommended and prefixed dangerously. I assume it was created for some edge cases like the one mentioned in the initial comment, where some unusual characters are not displayed properly. IMHO those cases should be handled individually though (this PR handling all &quot;-like entities should already cover a lot of edge cases). Just telling people to set dangerouslyDisableAutoEscaping (which allows arbitrary code injection, like the feed owner injecting JS) when those kind of rendering errors show up is not the right way.

the reason not documented is we don't want everyone using it.. someone will report on the forum and we will advise them and explain the risks..

@rejas
Copy link
Collaborator

rejas commented Oct 4, 2023

Closing this. We have a similar PR merged. If @peterkappelt needs, he can reopen this to keep the discussion running again

@rejas rejas closed this Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants