Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With Write-Once Buckets, Riak Search Is Asynchronously Written To [JIRA: RIAK-1904] #512

Closed
wbrown-lg opened this issue Jun 25, 2015 · 7 comments

Comments

@wbrown-lg
Copy link

Symptom
Data written to Write-Once buckets that are indexed by Riak Search take a sometimes very long time to show up in search results. Write speeds are very fast, but there is potential for missing index data because the index operation seems to be happening completely asynchronously.

Reproduce
Sequentially load a large number of keys into a Write-Once bucket of some period of time, perform queries on the index for that bucket during and after the data is finished being written. Note how the total record count continues to grow well after data has stopped flowing into cluster.

Assessment
Write-Once buckets combined with Riak Search cannot be used in a high throughput environment due to the lack of backpressure exerted by asynchronous PUTs to Riak Search. This was discovered with @drewkerrigan and further details may be obtained from him and I as needed.

System Configuration

  • 32-node cluster, n=3 replication
@Basho-JIRA Basho-JIRA changed the title With Write-Once Buckets, Riak Search Is Asynchronously Written To With Write-Once Buckets, Riak Search Is Asynchronously Written To [JIRA: RIAK-1904] Jun 25, 2015
@zeeshanlakhani
Copy link
Contributor

@fadushin guessing you may want to take a look at this when you have some time going forward.

@wbrown-lg
Copy link
Author

@zeeshanlakhani @fadushin @drewkerrigan Any updates on this? Any guesses as to when this will be fixed?

The lack of backpressure exerted by Riak Search against a write-once bucket will cause the cluster to crash under load, and this is a fairly serious bug that precludes the use of write-once buckets in our environment.

@zeeshanlakhani
Copy link
Contributor

@wbrown-lg no updates currently. @fadushin will take a look after he finishes some other write-once work that he's currently on.

@fadushin
Copy link
Contributor

@wbrown-lg I think we have a handle on what's wrong. The fix looks pretty straightforward (make sure we are indexing, in the write once path. We believe the only reason we are indexing now is because of yokozuna AAE, which explains the lag and overload (which I have verified on a local setup).

For a preliminary fix, have a look at:

2.1...bugfix/fd/RIAK-1937
basho/riak_kv@2.1...bugfix/fd/RIAK-1937

With the fix, I have observed about a 3x increase in put latency when indexing (and a corresponding reduction in throughput), which is what you would expect, given your above suggestions. (Note: not a "real" test environment -- just a couple of VMs)

I do not have a timeline for inclusion in a release yet, but I hope this gets the ball rolling.

@zeeshanlakhani
Copy link
Contributor

#529 is in review and will fix this issue currently.

@zeeshanlakhani
Copy link
Contributor

@wbrown-lg I'm closing this, as #529 has been completed and reviewed. It will be in 2.1.2. https://github.com/basho/yokozuna/wiki/WriteOnceBucketIndexingBug may be helpful as well.

@wbrown-lg
Copy link
Author

@zeeshanlakhani Appreciate the fix, and I look forward to 2.1.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants