You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are trying the Crawler and and we noticed that our Next 14 site is not being indexed.
The problem is probably that we have many nested components that render texts inside <div> instead of <p>.
I realize that it's not the best in terms of accessibility and semantics but we have this need.
Looking at the source code (general-purpose.ts) we realized that the contents of the <div>s are totally ignored.
In fact I and @gioboa did a test modifying your function, adding <div>s to the query, but dirt and non-useful DOM elements were also indexed. So it doesn't seem like a decent solution.
Proposed Solution
We thought an interesting idea might be to let users decide what content to index outside of your rules.
A very simple hypothetical solution could be to insert a data-orama attribute on the elements to be indexed into the site you want to index and extend the crawler to also query those elements.
<divdata-orama> content </div>
I think it might be a simple, clean and powerful way to extend it.
What do you think?
Alternatives
Another future solution could be to allow the crawler function to be completely customized by the users
Additional Context
No response
The text was updated successfully, but these errors were encountered:
This is a great idea! I made a pr to the repo to add custom selectors and will ping when this is merged and these options are also added on the website
Problem Description
We are trying the Crawler and and we noticed that our Next 14 site is not being indexed.
The problem is probably that we have many nested components that render texts inside
<div>
instead of<p>
.I realize that it's not the best in terms of accessibility and semantics but we have this need.
Looking at the source code (
general-purpose.ts
) we realized that the contents of the<div>
s are totally ignored.https://github.com/askorama/crawly/blob/2892e473775a408495d07a0dea016ec23a85d362/src/general-purpose.ts#L34-L51
In fact I and @gioboa did a test modifying your function, adding
<div>
s to the query, but dirt and non-useful DOM elements were also indexed. So it doesn't seem like a decent solution.Proposed Solution
We thought an interesting idea might be to let users decide what content to index outside of your rules.
A very simple hypothetical solution could be to insert a
data-orama
attribute on the elements to be indexed into the site you want to index and extend the crawler to also query those elements.I think it might be a simple, clean and powerful way to extend it.
What do you think?
Alternatives
Another future solution could be to allow the crawler function to be completely customized by the users
Additional Context
No response
The text was updated successfully, but these errors were encountered: