Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search result shows hidden content parts #377

Open
mariofischer opened this issue Mar 19, 2021 · 10 comments
Open

Search result shows hidden content parts #377

mariofischer opened this issue Mar 19, 2021 · 10 comments

Comments

@mariofischer
Copy link

We updated flowpack/elasticsearch-contentrepositoryadaptor to 7.0.6 (from 6.0.6). In the search result there are now content-parts (elements from the content tree) shown which are hidden in Neos (Before that the result did not show the hidden parts).
These elements are part of the indexed fulltext.

@mariofischer
Copy link
Author

Additional clarification: The problem seems to be that the content of a node is being indexed into the page fulltext even if the parent node is hidden.

@dfeyer
Copy link
Contributor

dfeyer commented Apr 9, 2021

This issue duplicate this one:
#214 (comment)

@dfeyer dfeyer closed this as completed Apr 9, 2021
@kdambekalns
Copy link
Member

Am no longer sure this is a duplicate. Even though #214 is quite old, I am seeing this issue now after an upgrade of ESCR from 6 to 7 – now hidden nodes are returned that were not found before.

Fact: Both indexes (old and new) do contain data about the nodes, and the data itself does not differ (both are not marked as hidden in the index.)

The question thus is: Did the code fetching the nodes from the ES result change in that regard? It does not look like it for me: 6.0.6...7.0.9#diff-5310f1e603b84f1f7463df8670be1896e0cf5f7c3a32b4aa0a8f2a374c5eb6c4L833-L872

Does that mean the CR changed it's behaviour? But I did not update that… 🤔

@daniellienert
Copy link
Contributor

Hey @kdambekalns, thats quite interesting. Obviously we have several related problems here:

  • Hidden nodes, found by ES are resolved by the CR (The hack, why it kind of worked before)
  • Nodes, hidden by there parent are found in general
  • Fulltext extracts of nodes, hidden by there parent is indexed and even worse, displayed in result snippets

I am also pretty sure, that this can be solved with current CR as somehow flowquery is also capable to handle it correctly.

Wat do you think about a brainstorming session to discuss some approaches?

@kitsunet
Copy link
Member

I would also be interested then :)

@kdambekalns
Copy link
Member

Wat do you think about a brainstorming session to discuss some approaches?

It would be a pleasure. Suggest a date/time starting this Friday that suits you. :)

@TheLalaMan
Copy link
Contributor

Any news on this? Our editors also noticed this recently...

@mficzel
Copy link
Member

mficzel commented Nov 10, 2022

I wonder wether the issue is in the elasticsearch or in the convertHitsToNodes method.

In older versions we often had the issue that we wanted the first 10 results but only got 8 because during mapping of the elastic result to a list of nodes the nodes with hidden parents were thrown out. This was a problem aswell but way harder to spot.

Checked further ... the method used internally is the $contextNode->getNode($nodePathFromElasticSearch) ... imho getNode should return null for nodes that are not accessible and am pretty sure it did in the past.

@mficzel
Copy link
Member

mficzel commented Nov 10, 2022

As a hotfix the aspect here https://gist.github.com/mficzel/292d6dab09ea0ef9017b804ad49f97a2 removes the items with hidden parents from the results. Totally not nice but it kinda works ...

@dlubitz
Copy link
Contributor

dlubitz commented Mar 23, 2023

Is there already some solution discussed? Our customers also complain about "hidden (by parent) content nodes" which appear in search results.

I had another look into the diff 6.0.6...7.0.9#diff-5310f1e603b84f1f7463df8670be1896e0cf5f7c3a32b4aa0a8f2a374c5eb6c4L833-L872 as @kdambekalns already had. It might be the case that before these changes the hidden nodes never were indexed, because the context was using the default of invisibleContentShown instead of (now) setting it to invisibleContentShown => true.

See:

I guess this was necessary to handle the time controlled visibility of nodes. But then IMHO it misses some check if some node in the parent path is hidden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants