Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed issue GORA-443 #95

Merged
merged 2 commits into from Feb 23, 2017

Conversation

cloudysunny14
Copy link
Contributor

@cloudysunny14 cloudysunny14 commented Feb 16, 2017

#86

All of tests pass when run individually, but when run as a whole some of them fail.

I also encountered the same problem.
I think this is because of AsyncProcess asynchronously process a series of mutations.
I made this fix as a possible solution.
Required more investigation.

Anyway, If there is anything I can contribute to something, I definitely want to work for GORA or NUTCH :)

@lewismc
Copy link
Member

lewismc commented Feb 16, 2017

I tried this locally and I am still getting error messages

Results :

Failed tests:   testUpdate(org.apache.gora.hbase.store.TestHBaseStore)
  testGetWebPage(org.apache.gora.hbase.store.TestHBaseStore)
  testQuery(org.apache.gora.hbase.store.TestHBaseStore)
  testQueryStartKey(org.apache.gora.hbase.store.TestHBaseStore)
  testQueryWebPageSingleKey(org.apache.gora.hbase.store.TestHBaseStore)
  testDeleteByQueryFields(org.apache.gora.hbase.store.TestHBaseStore)

Tests run: 42, Failures: 6, Errors: 0, Skipped: 1

}
}
bufMutator.flush();
bufMutator.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I tried this as well, but this would mean that if we have millions of operations buffered, we would flush each and every one of them one by one, but even with this we couldn't get the tests to pass. IMO the bufMutator.flush() should remain where it is, and we should find out why the mutations get applied asynchronously when we call flush.

@cloudysunny14
Copy link
Contributor Author

cloudysunny14 commented Feb 22, 2017

Thank you for comments.

I think this is because of the reason as follows:

First, BufferedMutation#flush is processed synchronously,
and a batch request (MultiRequest) is created from a buffered mutations.

Then, RegionServer processes a MultiRequest as a minibatch(HRegion#doMiniBatchMutation) that updates the timestamp of each cell to currentTime if Mutation has HConstants.LATEST_TIMESTAMP(by default).
This operation apply all mutations in minibatch, therefore all cells has same timestamp.

Since the HBaseStore#put create the Delete and Put as the MultiRequest, Puts are invisible.

See Also:
https://hbase.apache.org/book.html#_deletes_mask_puts
https://issues.apache.org/jira/browse/HBASE-2256
https://hbase.apache.org/book.html#version.delete

I'm sorry for my poor english :(

I made this fix as a possible solution. (HACK)
b0cd195

However, does not pass testDeleteByQueryFields yet.
This is known issue GORA-472. I will create(reopen) the pull request for GORA-472 later.
and I am trying to run all test..

Kiyonari Harigae

@lewismc
Copy link
Member

lewismc commented Feb 22, 2017

OK thank you @cloudysunny14

@asfgit asfgit merged commit b0cd195 into apache:master Feb 23, 2017
@alfonsonishikawa
Copy link
Member

I know this thread is old, but I don't understand why this not surfaced before (the issue at HBase is from 2014). It seems that at HBase they will not fix it until 2.0.0: https://issues.apache.org/jira/browse/HBASE-8770
Thank you for the hack-fix! 👍

@renato2099
Copy link
Contributor

Thanks for finding this out @alfonsonishikawa ! This makes much more sense now!
I do agree with you that is weird that this hasn't surfaced before. My guess would be (1) using a slower machine before which would make the deletes come after the inserts, but with a faster machine they could be swapped (we are also using the same writer btw) (2) Multi-threaded Hbase client? I think this came later than our previous releases, but I don't know if the client we are using is actually multithreaded or not. Maybe another thing to try would be limiting HBase client to see what happens by reducing the connection pool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants