Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NUTCH-2373 Index writer plugin for hbase implemented #184

Merged
merged 16 commits into from May 22, 2017
Merged

Conversation

kaidul
Copy link

@kaidul kaidul commented Apr 19, 2017

An index writer for HBase like index writer for Solr, ElasticSearch etc. Expected HBase table description and NutchDocument to HBase mapping is read using a indexer-solr alike mapping file and write NutchDocument fields into the table of a HBase server.

TODO: Functionality to send and set kerberos authentication configuration for secure hdfs

@kaidul
Copy link
Author

kaidul commented Apr 22, 2017

I've used HBase-0.98.18-hadoop2 version and APIs. I was really confused which HBase version should I use? Should I use HBase 1.x ? Please review the changes.

@kaidul
Copy link
Author

kaidul commented Apr 30, 2017

Very frustrating for newcomers like me. Not even a single initial response :( :'(

@sebastian-nagel
Copy link
Contributor

Hi @kaidul, sorry about any delays. We are a small projects and a small community and a careful review including a test setup will take its time. I hope to get some hours during this week but cannot promise anything.

Regarding the HBase version: Gora 0.6.1 uses HBase 0.98.8, to avoid any conflicts (if Hbase is used also for storage via Gora) it's for sure not a bad idea to use the same version. But I do not feel really competent in any questions regarding Gora.

@kaidul
Copy link
Author

kaidul commented May 2, 2017

Hi @sebastian-nagel , Really thanks for your response. Always appreciate your effort and sorry for my impatience. Without no initial response, I thought my PR might seem somewhat irrelevant for Nutch. Now I know, this PR is on the queue no matter how many days to wait :)

Yes, Fortunately my current PR uses hbase 0.98.8-hadoop2 (sorry for I mistakenly mentioned 0.98.18 which is not correct), so I think this is the same version with gora 0.6.1.

@sju
Copy link

sju commented May 2, 2017

Hi @kaidul, I'm currently porting this to Nutch 1.x branch. During testing I found that there's a commit() missing in the close() function. Because of this, the last batch isn't committed. Can you change this aswell?

@kaidul
Copy link
Author

kaidul commented May 2, 2017

Hi @sju Thanks for your participation. I've added the commit() on close() and pushed it.

@lewismc
Copy link
Member

lewismc commented May 19, 2017

Hi @kaidul yes I shadow @sebastian-nagel comments, apologies for not getting around to this earlier. It is not because the committers don't care, it may have to do with the fact that not everyone has HBase running... regardless thank you for the patch. I managed to try it out with HBase 0.98.X tonight and I am glad to report that it performed flawlessly.
The only comment I really have is if you could implement parameterized logging for Slf4j logging instances.
Thanks again, if you are able to address the logging issue then I would more than happily commit this to master branch.

Copy link
Member

@lewismc lewismc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaidul Please see my comments

@kaidul
Copy link
Author

kaidul commented May 19, 2017

Hi @lewismc Thanks for your effort to test my patch and I am very glad that everything worked perfectly and the PR is more likely to be merged :D I've added parameterized logging where possible and pushed the changes. However, here and here, the log-messages need to be created to throw RuntimeException no matter log is enabled or not. So I used String.format here.

Please let me know what you're thinking :)

There was no change in ivy/ivy/xml, but the existing four spaced tab was accidentally replaced with two spaced tab after opening in my local machine. Now, the original ivy.xml restored.
…commit(), made HBaseIndexWriter.LOG identifier public.
@lewismc lewismc merged commit 1216411 into apache:2.x May 22, 2017
@lewismc
Copy link
Member

lewismc commented May 22, 2017

Thank you @kaidul

@kaidul
Copy link
Author

kaidul commented May 23, 2017

You're welcome @lewismc I really love this project and hope I will be active in this community :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants