-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solr: support update/delete/atomic update operations and batch those #1164
Conversation
Thank you for making the Solr connector non-blocking! Now that the blocking call is executed asynchronously, you should take care of selecting a proper execution context for it - the global one is not a good choice. |
I've now understood that you can make the async stuff much simpler.
This should make the code much simpler. |
@ennru : Not sure to see what you mean. Here a code snippet of the onPush method:
} |
Ah ok, I finally see how you are trying to add asynchronous updates to Solr. I'm afraid your solution will not be able to guarantee the order of updates to Solr, as Futures not necessarily are executed in the order they are created. Are you confident the client library is even thread-safe? For Elasticsearch client API is prepared for this use-case, it is not added by Alpakka. If you can ignore the order of updates to Solr in your use-case, you might construct a flow using the Graph DSL, that runs several Solr stages in parallel. |
It is why i use getAsyncCallback on the completion (success or failure) of my future. Since my batch is ordered and the buffer is managed by onPush, all my updates from a source will be ordered in my sink. There is no doubt. For me the client is thread safe (used by us in multithreaded application since some years), ans thus, if we cannot guarantee the order as you said, the elastic search implementation can't do it too. This is the same code. |
I'm not sure what you're referring to in the Elasticsearch way of doing this. The Alpakka Elasticsearch connector sends all operations as one JSON using the ES client's |
The solr client sends documents as a batch too... |
Yes, the Solr client sends batches. But your proposal first runs a |
Ok, see what you mean now. Thank you! I will try to make it. |
To get the best performance with SolR, you need to batch. It's what i do. To batch documents, i need to enqueue them before calling SolR. If i do not use futures, and if i use IODispatcher as you described, i scale horizontally perhaps, but the batch size is always 1 element. And the performance are the same as the initial version. I tell that because i test it. So i keep going with futures, but now the order is respected (takeWhile). |
…o keep exact ordering fo the incoming messages
Ok, if the Solr API does much better with batches of data, I'd suggest this stage should accept |
Ok. I think we are not so far now. |
…d be done with groupedWithin.
…s and deletes have to be done with documents, beans and typed methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this approach is better.
You are introducing some unnecessary breaking API changes, please try to not break the API. (You may add @deprecated
.)
Please add documentation.
solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala
Outdated
Show resolved
Hide resolved
solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala
Outdated
Show resolved
Hide resolved
solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala
Outdated
Show resolved
Hide resolved
return IncomingMessage.create(doc); | ||
List<IncomingMessage<SolrInputDocument, NotUsed>> list = new ArrayList<>(); | ||
list.add(IncomingUpdateMessage.create(doc)); | ||
return list; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of building the lists in user code, the examples should show groupedWithin
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it should
@@ -67,18 +68,20 @@ class SolrSpec extends WordSpecLike with Matchers with BeforeAndAfterAll { | |||
.map { tuple: Tuple => | |||
val book: Book = tupleToBook(tuple) | |||
val doc: SolrInputDocument = bookToDoc(book) | |||
IncomingMessage(doc) | |||
Seq(IncomingUpdateMessage(doc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same about groupedWithin
.
collection = "collection2", | ||
settings = SolrUpdateSettings(commitWithin = 5) | ||
) | ||
)(cluster.getSolrClient) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you passing the Solr client explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the implicit was another solr client, and we do not have to instantiate multiple clients. We could instantiate an implicit lazily with this instance.
createCollection("collection7") //create a new collection | ||
val stream = getTupleStream("collection1") | ||
|
||
//#run-document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated #run-document
even here.
@ennru I've done my best to complete your main requirements in this last code review. But i won't work on this project for the next weeks. However I will follow this pull request to help people if necessary. And i hope it will be successfully merged soon. Thank you for your ideas and support. See you. |
@ennru Hi, i've just pushed some solr features and improved documentation. Please, Have a look. |
documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick review and noticed that it might drop messages now since they are no longer enqueued.
solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala
Outdated
Show resolved
Hide resolved
Let's go for the merge? ;-) |
//Now take the remaining | ||
val remaining = toSend.dropWhile(m => m.operation == operation) | ||
if (remaining.nonEmpty) { | ||
send(remaining) //Important: Not really recursive, because the future breaks the recursion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is outdated, is it not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx
case Finished => handleSuccess() | ||
case _ => state = Idle | ||
doc.addField(message.idFieldOpt.get, message.idFieldValueOpt.get) | ||
if (client.isInstanceOf[CloudSolrClient]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace isInstanceOf
with a pattern match:
client match {
case c: CloudSolrClient => ...
case _ => ...
}```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx
messageBinder(source) | ||
} | ||
) | ||
.flatten |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
map + flatten = flatMap
Replace to:
messages.flatMap(_.sourceOpt.map(messageBinder))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx
} | ||
responses.filter(r => r.getStatus != 0).headOption.getOrElse(responses.head) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter + headOption = find
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx
Almost there. :) If you could fix the last nitpicks, then this can go in! |
Thx @2m for your help. Do you think we could merge? ;-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for pushing this one through!
At this moment, there's a little problem of performance with SolR component because there's no asynchronous part in the code. So the batch size is always 1.
I propose an alternative by keeping the inspiration from the Elastic Search component.
The solr update is now encapsulated in a future, and the async callbacks are used to make everything coherent (in failure and in success).
Still using the elastic search code, I will provide a notion of operation, to allow the removal of documents.
Please, have a look.