Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of concept using elastic suite for product listing #1228

Closed

Conversation

Swahjak
Copy link
Contributor

@Swahjak Swahjak commented Dec 10, 2018

References #1205

As stated in #1205 it could be useful to use Elasticsuite (elasticsearch) as a backend for the product listing. For me there are two points of interest;

  • Possible performance gain; Limiting the number of queries to the database or bypassing the database as a whole
  • This could to some extend replace the product flat index; When there are too many attributes used for product listing the flat tables can no longer be used.

This is not a pull request that should be reviewed as a feature request, but rather as a proof of concept. I'm trying to figure out the direction that is preferred if this would ever be considered as a serious feature. I tried to use a minimal amount of code to get to a point where the full collection is provided by Elastic Suite.

Some initial insights / questions;

  • The index source should have a list of attributes that are used for product listing. The proposal is based on the attributes_indexed which is far from optimal
  • Is there a specific reason for having the attributes values as an array in the source document?
  • Should this be part of ElasticsuiteCatalog or should it be developed in a separate folder?

Please share your thoughts how, and if, this could become a serious feature.

@romainruaud
Copy link
Collaborator

Hello @Swahjak

Wow, thank you for taking some time to write it. It's not that far from what I would have done myself.

So again, thank you.

Do you have any metrics with/without this one ?

I see some things that will probably not work very well, or which remains to be discussed :

  • for composite products, attributes are merged inside ES, even those who "should not" support multiple values : Eg. name is an array for a bundle product. This way, I'm not sure about how this could work. Maybe we should have another "version" of fields containing the data to be used in product listing ?

  • actually I'm not sure about proper price injection inside product data, maybe this should remain as a join on the price index table. We need to think about it imho.

  • same thing for stock data, should we fetch it from ES or keep the join, I'm unsure.

  • people may encounter strange issue when using the fulltext collection and trying to retrieve data for attributes that are not used in product listing. This can be confusing, and since data will not exist on the ES index in this case, there is no proper solution (or we could have an hybrid fetch from ES and fallback to the DB... but I'm not a huge fan of this solution).

  • and last but not least : is it still relevant to do such things when the new standard is PWA/GraphQL and we will soon have headless magento directly triggering the Search API ?

So there are still many points to discuss on this one.

Regards, thank you again for sharing your point of view !

@Swahjak
Copy link
Contributor Author

Swahjak commented Dec 10, 2018

@romainruaud

You are welcome, thank you for your efforts in making this request possible 😉

Do you have any metrics with/without this one ?
Not yet

  • for composite products, attributes are merged inside ES, even those who "should not" support multiple values : Eg. name is an array for a bundle product. This way, I'm not sure about how this could work. Maybe we should have another "version" of fields containing the data to be used in product listing ?

I see, I haven't gotten to the point with anything other than simple products.

  • actually I'm not sure about proper price injection inside product data, maybe this should remain as a join on the price index table. We need to think about it imho.
  • same thing for stock data, should we fetch it from ES or keep the join, I'm unsure.

I'm not sure. For starters I think we would be better of fleaving it out. I think observers / plugins would go a long way, but maybe not far enough. We could hook to both the Magento index processes which should give us near realtime stock and pricing data.

  • people may encounter strange issue when using the fulltext collection and trying to retrieve data for attributes that are not used in product listing. This can be confusing, and since data will not exist on the ES index in this case, there is no proper solution (or we could have an hybrid fetch from ES and fallback to the DB... but I'm not a huge fan of this solution).

I agree that it might be hard to sell for a broader audience, maybe this could / should be more of a side project outside of this repo.

  • and last but not least : is it still relevant to do such things when the new standard is PWA/GraphQL and we will soon have headless magento directly triggering the Search API ?

API's will still heavily rely on the database and they might actually be better suited for use with ES since they are more predictable. Looking at https://github.com/DivanteLtd/vue-storefront for instance, which heavily relies on Elasticsearch. I'm not sure about GrapQL though. I'm quite sure about the fact that Magento's current API is hardly suited for any serious PWA (though I hope GraphQL will change that).The other question is how long it will take for the general public to start using PWA as their general storefront. This could easily take a (couple of) year(s).

@romainruaud
Copy link
Collaborator

romainruaud commented Jan 15, 2019

During christmas holidays I thought about this one.

Here is another approach that I do prefer :

  • create a new DataSource
  • these data source will compute indexing for all attributes that are marked as "used_in_product_listing" automatically
  • it will wrap this data inside a meta field on the ES index. Something like _listing, allowing to have _listing.name and so on... this way, no collision with legacy search index data.
  • _listing sub fields should have a lightweight mapping, something like index : false or not_analyzed or both (should check). Should allow proper indexing, and since they are used only for data loading, no need to mess them with the analyzed fields.
  • and finally, same logic for the _loadAttributes part to extract data contained in the _listing field for each attribute.

Let me know your opinion @Swahjak

@romainruaud
Copy link
Collaborator

I close for now since it will not be merged as is. But I keep in mind there are a lot of good ideas in this one.

@romainruaud romainruaud closed this Oct 9, 2019
@francescosalvi
Copy link

Hi @romainruaud
I know the issue is closed, but stumbled upon it while looking for information about using the ES index to feed all catalog data into a headless solution's frontend, where I don't require all catalog attributes to be searchable/filterable, or indexed at all.

something like index : false

from my limited knowledge of ElasticSearch, this is exactly the approach I was able to gather I should follow... can you confirm that at the moment ElasticSuite does not offer any way (e.g. via configuration) to allow for such field mappings to be generated?

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants