Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

METRON-2312 solr scripts fixup #1558

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

tigerquoll
Copy link
Contributor

Contributor Comments

Issue being fixed:

  1. Install HDP Search 4 or greater.
  2. source /etc/defaults/metron
  3. $METRON_HOME/bin/add-collection bro

The command returns

Node does not exist: /live_nodes
curl: (6) Could not resolve host: WatchedEvent state:SyncConnected type:None path; Unknown error
curl: (6) Could not resolve host: WatchedEvent state:SyncConnected type:None path; Unknown error

and the SOLR collection is not created.

The readme in Metron-solr-common notes that the scripts do not work outside of Ambari by default if SOLR is using chroot zookeeper addressing and that you need to manually tweak $ZOOKEEPER environment variable to include the chrooted Zookeeper addresses that SOLR is storing its information at.

The fix:

  1. Test the return value of zkcli command that pools SOLR Cloud configuration information for Zookeeper.
    a. If zkcli did not return a SOLR Cloud node, return 1.
    b. if zkcli did not return a SOLR Cloud node, and $ZOOKEEPER does not end in '/solr', make a suggestion to the user that it should.

Testing:

echo $ZOOKEEPER
localhost:2181
[root@node1 bin]# ./delete_collection.sh bro
Node does not exist: /live_nodes
Error occurred while attempting to read SOLR Cloud configuration data from Zookeeper.
Warning! Environment variable ZOOKEEPER=localhost:2181 does not contain a chrooted zookeeper ensemble address - are you sure you do not mean ZOOKEEPER=localhost:2181/solr?
[root@y134 bin]# export ZOOKEEPER=hdpsearchHost:2181/solr
[root@y134 bin]# ./delete_collection.sh foo . 
{
  "responseHeader":{
    "status":400,
    "QTime":90},
  "Operation delete caused exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Could not find collection : foo",
  "exception":{
    "msg":"Could not find collection : foo",
    "rspCode":400},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"Could not find collection : foo",
    "code":400}}
{
  "responseHeader":{
    "status":400,
    "QTime":31},
  "Operation delete caused exception:":"org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: ConfigSet does not exist to delete: foo",
  "exception":{
    "msg":"ConfigSet does not exist to delete: foo",
    "rspCode":400},
  "error":{
    "metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"ConfigSet does not exist to delete: foo",
    "code":400}}

Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.

In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:

For all changes:

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?

For code changes:

  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?
  • Have you included steps or a guide to how the change may be verified and tested manually?

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.

@mmiklavc
Copy link
Contributor

FYI, Apache Metron does not depend directly on HDP Search (that's a vendor-specific support thing) - we actually use the OSS version of Solr by default. We should verify that Apache Solr works by default first using the provided install instructions in the dev env, and then check that we can still accommodate other 3rd party packaging. To wit, I'm not 100% sure if the zookeeper "/solr" endpoint is universal or not, so this check might need to be adapted a bit.

@tigerquoll
Copy link
Contributor Author

Quote from jira

When installing SOLR cloud, its been highly recommended to use the cluster zookeeper ensemble rather then installing your in own mini-zk cluster just for SOLR. For the past several years, its been standard practice to use a chrooted / namespaced environment for storing solr information in zookeeper. The practical effects of this is to need to prepend '/solr' to any zookeeper ensemble URLs. The use of chrooted zookeeper configurations is the default in both lucidworks/HWX SOLR (from 4.0), and for Cloudera SOLR (not sure which version but for many years). It has also been the documented recommendation for Apache SOLR Cloud since approximately version 6.6.

End result is, if Metron is dealing with a SOLR cluster that has been installed or updated any time in the past couple of years, it is dealing with a SOLR configuration stored in a chrooted Zookeeper environment.

The problem is the Metron SOLR collection create/destroy scripts assume that we are not using a CHROOTed environment, and fail badly when the expected SOLR configuration is not present at the expected location in SOLR. Buried in the readme is instruction on how to set modify the zookeeper environment variables before running the script to add chrooted address, and when the scripts are used by Ambari, they are called using the correct chrooted quorum URL, because there is a seperate configuration item that can be set to indicate the chroot zookeeper address for SOLR.

Having just been burnt by this I think we should at least

Cleanly catch the failure of the zkcli command in the collection scripts when it queries for zookeeper state that is not present
If the zkcli error is caught, make a suggestion in the error message to check for a chrooted SOLR cloud zookeeper configuration.

@mmiklavc
Copy link
Contributor

Quote from jira

Thanks for adding that. Just a heads up, it's probably worth adding a link and a note on the PR when putting that level of detail in the Jira, at least for now. There has been some scattered discussion about getting better descriptions in Jiras, but for the most part we rely almost exclusively on the gitbot copying PR contents into the Jira activity history currently. Most folks, myself included, typically look primarily at the PR description when reviewing PRs.

@tigerquoll
Copy link
Contributor Author

Will link to Jira in future.
cypress seems to be timing out trying to detect chrome?

@tigerquoll
Copy link
Contributor Author

retest this please

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants