Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility to easily import data mapped by playOrm to SOLR #7

Open
mvallebr opened this issue Oct 20, 2012 · 0 comments
Open

Possibility to easily import data mapped by playOrm to SOLR #7

mvallebr opened this issue Oct 20, 2012 · 0 comments

Comments

@mvallebr
Copy link

We have an old system which uses DataImportHandler to import data from PostgreSQL. The way we use it is:

We configure a xml with selects what will be indexed into solr from our database. Take a look at this: http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example
We call URLs to perform the import. http://localhost:8983/solr/db/dataimport?command=full-import performs a full import and http://localhost:8983/solr/dataimport?command=delta-import imports only what wasn't imported since last time.

I am not saying we need exactly this solution, I just told you about DataImportHandler because we are used to it. However, your idea of Listeners sounds like something that would help realtime indexing, so that could be even better. Would just need some documentation or explanation on how to do it.
The most important thing is: we need to select what goes indexed by SOLR and how it goes. For instance, I might have two CFs: User and City

I might want to import just a document to SOLR, User document, and each user goes together with his/her associated cities. I might want to import the address column, but not import the birthDate column
I might want to import to documents to SOLR, User document and City document, each one independently.


use case:

Imagine a very big and complex data model.
I have the following entities:
User (has id, name, birthDate, List...)
UserRequests
City
UserLikes
UserInterests
etc.

All that goes to solr, but there is only one document: user.
A second scenario would be City as separate document in Solr too.
What if I receive a new user interest? I will add one more interest to the UserInterests CF, that would be the change in cassandra.
In Solr, however, I would need to reindex the entire user, as SOLR, AFAIK, doesn't allow you to reindex only part of the document, you can whether delete the indexed document or replace it, you can't update.

In the example bellow, savingEntity object would receive UserInterest entity, wouldn't it? But I want to reindex user.

Remember that realtime is good, but having the possibility to do it in batches is also desirable. Suppose the following scenario, indexing in real time:
I add a new interest to the user.
I add a new like to the user.
I add a new request to the user.

Indexing the entire user every event could be problematic in some cases, so in some cases it might be better to perform a delta index every hour, for instance... Final users still have near real time data, but the processing amount needed from the server decreases a lot.

easility added a commit that referenced this issue Jan 10, 2013
merging deanhiller with easility
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant