Solr: support update/delete/atomic update operations and batch those #1164

giena · 2018-08-24T15:25:45Z

At this moment, there's a little problem of performance with SolR component because there's no asynchronous part in the code. So the batch size is always 1.
I propose an alternative by keeping the inspiration from the Elastic Search component.
The solr update is now encapsulated in a future, and the async callbacks are used to make everything coherent (in failure and in success).
Still using the elastic search code, I will provide a notion of operation, to allow the removal of documents.
Please, have a look.

ennru · 2018-08-27T10:12:58Z

Thank you for making the Solr connector non-blocking!

Now that the blocking call is executed asynchronously, you should take care of selecting a proper execution context for it - the global one is not a good choice.
A better default is using Akka's akka.stream.default-blocking-io-dispatcher and possibly making it configurable. For inspiration how to use it, look into JmsConsumerStage.

ennru · 2018-08-28T11:26:16Z

I've now understood that you can make the async stuff much simpler.

Select the IODispatcher for the stage

override protected def initialAttributes: Attributes = Attributes(ActorAttributes.IODispatcher)

Remove the ExecutionContext, Future and blocking from the client calls (the stage will execute on the IODispatcher)

This should make the code much simpler.

giena · 2018-08-28T14:44:58Z

@ennru : Not sure to see what you mean. Here a code snippet of the onPush method:
override def onPush(): Unit = {
queue.enqueue(grab(in))

state match {
  case Idle => {
    state = Sending
    val messages = (1 to settings.bufferSize).flatMap { _ =>
      queue.dequeueFirst(_ => true)
    }
    sendBulkToSolr(messages)
  }
  case _ => ()
}

tryPull()

}
Even if i use IODispatcher, i will send message 1 by 1, if sendBulkToSolr does not use Futures...
The ElasticSearch code does not use IODispatcher no more.
The best way to have the best performance with SolR is to batch updates. Not to scale horizontally with many threads with batch of one update.

ennru · 2018-08-29T07:38:36Z

Ah ok, I finally see how you are trying to add asynchronous updates to Solr. I'm afraid your solution will not be able to guarantee the order of updates to Solr, as Futures not necessarily are executed in the order they are created. Are you confident the client library is even thread-safe?

For Elasticsearch client API is prepared for this use-case, it is not added by Alpakka.

If you can ignore the order of updates to Solr in your use-case, you might construct a flow using the Graph DSL, that runs several Solr stages in parallel.

giena · 2018-08-29T09:14:05Z

It is why i use getAsyncCallback on the completion (success or failure) of my future. Since my batch is ordered and the buffer is managed by onPush, all my updates from a source will be ordered in my sink. There is no doubt. For me the client is thread safe (used by us in multithreaded application since some years), ans thus, if we cannot guarantee the order as you said, the elastic search implementation can't do it too. This is the same code.

ennru · 2018-08-29T11:26:31Z

I'm not sure what you're referring to in the Elasticsearch way of doing this. The Alpakka Elasticsearch connector sends all operations as one JSON using the ES client's performRequestAsync (see https://github.com/akka/alpakka/blob/master/elasticsearch/src/main/scala/akka/stream/alpakka/elasticsearch/ElasticsearchFlowStage.scala#L299).

giena · 2018-08-29T12:36:50Z

The solr client sends documents as a batch too...
https://github.com/worldline-messaging/alpakka/blob/master/solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala#L180
I don't understand why you make some difference...

ennru · 2018-08-31T08:48:06Z

Yes, the Solr client sends batches. But your proposal first runs a groupBy(operation) after that calls the client with batches per operation wrapped in futures. The order Solr will see the messages will be something different from what the incoming stream contained.
It would be more correct to implement some takeWhile(Operation) and update Solr with those, before taking the next batch.
You should not wrap the call to a client in Futures, the stage's asynchronicity must be enough.
(Side note: blocking should not be used in Futures.)

giena · 2018-08-31T11:54:10Z

Ok, see what you mean now. Thank you! I will try to make it.

giena · 2018-08-31T15:40:21Z

To get the best performance with SolR, you need to batch. It's what i do. To batch documents, i need to enqueue them before calling SolR. If i do not use futures, and if i use IODispatcher as you described, i scale horizontally perhaps, but the batch size is always 1 element. And the performance are the same as the initial version. I tell that because i test it. So i keep going with futures, but now the order is respected (takeWhile).

…o keep exact ordering fo the incoming messages

ennru · 2018-09-03T05:58:33Z

Ok, if the Solr API does much better with batches of data, I'd suggest this stage should accept Seq[IncomingMessage] and it is the user's responsibility to find the best level of batching with eg. groupedWithin. The stage should not apply batching itself.

giena · 2018-09-03T16:40:54Z

Ok. I think we are not so far now.

…d be done with groupedWithin.

…s and deletes have to be done with documents, beans and typed methods.

ennru

Yes, this approach is better.
You are introducing some unnecessary breaking API changes, please try to not break the API. (You may add @deprecated.)

Please add documentation.

solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala

solr/src/main/scala/akka/stream/alpakka/solr/javadsl/SolrFlow.scala

solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala

ennru · 2018-09-05T12:25:35Z

solr/src/test/java/akka/stream/alpakka/solr/SolrTest.java

-                  return IncomingMessage.create(doc);
+                  List<IncomingMessage<SolrInputDocument, NotUsed>> list = new ArrayList<>();
+                  list.add(IncomingUpdateMessage.create(doc));
+                  return list;


Instead of building the lists in user code, the examples should show groupedWithin.

Yes it should

solr/src/test/java/akka/stream/alpakka/solr/SolrTest.java

ennru · 2018-09-05T12:28:05Z

solr/src/test/scala/akka/stream/alpakka/solr/SolrSpec.scala

@@ -67,18 +68,20 @@ class SolrSpec extends WordSpecLike with Matchers with BeforeAndAfterAll {
        .map { tuple: Tuple =>
          val book: Book = tupleToBook(tuple)
          val doc: SolrInputDocument = bookToDoc(book)
-          IncomingMessage(doc)
+          Seq(IncomingUpdateMessage(doc))


Same about groupedWithin.

ennru · 2018-09-05T12:29:41Z

solr/src/test/scala/akka/stream/alpakka/solr/SolrSpec.scala

            collection = "collection2",
            settings = SolrUpdateSettings(commitWithin = 5)
-          )
+          )(cluster.getSolrClient)


Why are you passing the Solr client explicitly?

Because the implicit was another solr client, and we do not have to instantiate multiple clients. We could instantiate an implicit lazily with this instance.

ennru · 2018-09-05T12:30:15Z

solr/src/test/scala/akka/stream/alpakka/solr/SolrSpec.scala

+      createCollection("collection7") //create a new collection
+      val stream = getTupleStream("collection1")
+
+      //#run-document


Duplicated #run-document even here.

giena · 2018-09-05T16:54:00Z

@ennru I've done my best to complete your main requirements in this last code review. But i won't work on this project for the next weeks. However I will follow this pull request to help people if necessary. And i hope it will be successfully merged soon. Thank you for your ideas and support. See you.

giena · 2018-09-19T12:26:26Z

@ennru Hi, i've just pushed some solr features and improved documentation. Please, Have a look.

documentation

2m

Did a quick review and noticed that it might drop messages now since they are no longer enqueued.

solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala

giena · 2018-09-25T12:28:39Z

Let's go for the merge? ;-)

solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala

2m · 2018-09-25T19:42:17Z

solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala

+      //Now take the remaining
+      val remaining = toSend.dropWhile(m => m.operation == operation)
+      if (remaining.nonEmpty) {
+        send(remaining) //Important: Not really recursive, because the future breaks the recursion


I think this comment is outdated, is it not?

2m · 2018-09-25T19:44:09Z

solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala

-        case Finished => handleSuccess()
-        case _ => state = Idle
+      doc.addField(message.idFieldOpt.get, message.idFieldValueOpt.get)
+      if (client.isInstanceOf[CloudSolrClient]) {


Replace isInstanceOf with a pattern match:

client match { case c: CloudSolrClient => ... case _ => ... }```

2m · 2018-09-25T19:45:07Z

solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala

+            messageBinder(source)
+        }
+      )
+      .flatten


map + flatten = flatMap

Replace to:

messages.flatMap(_.sourceOpt.map(messageBinder))

2m · 2018-09-25T19:45:44Z

solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala

    }
+    responses.filter(r => r.getStatus != 0).headOption.getOrElse(responses.head)


filter + headOption = find

2m · 2018-09-25T19:51:50Z

Almost there. :) If you could fix the last nitpicks, then this can go in!

giena · 2018-09-26T13:15:01Z

Thx @2m for your help. Do you think we could merge? ;-)

2m

LGTM! Thanks for pushing this one through!

giena changed the title ~~Based on alpakka elastic search, for better performance~~ Based on alpakka elastic search, for better performance with SolR Aug 24, 2018

Based on alpakka elastic search, for better performance

50f63d6

Delete operation is now ready. Use the correct dispatcher.

0767b75

add atomic updates and use subsets of incoming messages (takeWhile) t…

04b21ab

…o keep exact ordering fo the incoming messages

ennru added the p:solr label Sep 3, 2018

Jean-Noel Allart added 2 commits September 3, 2018 20:33

Remove futures from th stage and use IODispatcher. The batching shoul…

27359e7

…d be done with groupedWithin.

remove updates and deletes flow. they don't have to be. atomic update…

4bdbe67

…s and deletes have to be done with documents, beans and typed methods.

ennru reviewed Sep 5, 2018

View reviewed changes

ennru changed the title ~~Based on alpakka elastic search, for better performance with SolR~~ Solr: support update/delete/atomic update operations and batch those Sep 5, 2018

Jean-Noel Allart added 3 commits September 5, 2018 20:36

some improvements

a98a787

use .groupedWithin in the test

4b3fdc1

fix on atomic updates

de54a9f

delete by query

3f3dd4b

documentation

2m reviewed Sep 19, 2018

View reviewed changes

solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala Outdated Show resolved Hide resolved

Jean-Noel Allart added 2 commits September 20, 2018 17:16

Do not manage the internal state since there's no more internal batching

49411ac

remove retry settings

bfbf149

2m reviewed Sep 25, 2018

View reviewed changes

solr/src/main/scala/akka/stream/alpakka/solr/SolrFlowStage.scala Show resolved Hide resolved

2m reviewed Sep 25, 2018

View reviewed changes

fix the last nitpicks

47e209d

2m approved these changes Sep 26, 2018

View reviewed changes

2m merged commit c75a0a3 into akka:master Sep 26, 2018

2m added this to the 1.0-M1 milestone Sep 26, 2018

giena deleted the PR branch October 12, 2018 08:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solr: support update/delete/atomic update operations and batch those #1164

Solr: support update/delete/atomic update operations and batch those #1164

giena commented Aug 24, 2018 •

edited

Loading

ennru commented Aug 27, 2018

ennru commented Aug 28, 2018

giena commented Aug 28, 2018 •

edited

Loading

ennru commented Aug 29, 2018

giena commented Aug 29, 2018

ennru commented Aug 29, 2018

giena commented Aug 29, 2018

ennru commented Aug 31, 2018

giena commented Aug 31, 2018

giena commented Aug 31, 2018

ennru commented Sep 3, 2018

giena commented Sep 3, 2018

ennru left a comment

ennru Sep 5, 2018

giena Sep 5, 2018

ennru Sep 5, 2018

ennru Sep 5, 2018

giena Sep 5, 2018

ennru Sep 5, 2018

giena commented Sep 5, 2018

giena commented Sep 19, 2018 •

edited

Loading

2m left a comment

giena commented Sep 25, 2018

2m Sep 25, 2018

giena Sep 26, 2018

2m Sep 25, 2018

giena Sep 26, 2018

2m Sep 25, 2018

giena Sep 26, 2018

2m Sep 25, 2018

giena Sep 26, 2018

2m commented Sep 25, 2018

giena commented Sep 26, 2018

2m left a comment

		}
		responses.filter(r => r.getStatus != 0).headOption.getOrElse(responses.head)

Solr: support update/delete/atomic update operations and batch those #1164

Solr: support update/delete/atomic update operations and batch those #1164

Conversation

giena commented Aug 24, 2018 • edited Loading

ennru commented Aug 27, 2018

ennru commented Aug 28, 2018

giena commented Aug 28, 2018 • edited Loading

ennru commented Aug 29, 2018

giena commented Aug 29, 2018

ennru commented Aug 29, 2018

giena commented Aug 29, 2018

ennru commented Aug 31, 2018

giena commented Aug 31, 2018

giena commented Aug 31, 2018

ennru commented Sep 3, 2018

giena commented Sep 3, 2018

ennru left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giena commented Sep 5, 2018

giena commented Sep 19, 2018 • edited Loading

2m left a comment

Choose a reason for hiding this comment

giena commented Sep 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

2m commented Sep 25, 2018

giena commented Sep 26, 2018

2m left a comment

Choose a reason for hiding this comment

giena commented Aug 24, 2018 •

edited

Loading

giena commented Aug 28, 2018 •

edited

Loading

giena commented Sep 19, 2018 •

edited

Loading