Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

METRON-2327 - Support for SOLR time-based arrays #1572

Draft
wants to merge 7 commits into
base: feature/METRON-2088-support-hdp-3.1
Choose a base branch
from

Conversation

tigerquoll
Copy link
Contributor

Contributor Comments

Updated schemas and instructions on how to enable SOLR time based arrays.

Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.

In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:

For all changes:

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via site-book/target/site/index.html:

    cd site-book
    mvn site
    
  • Have you ensured that any documentation diagrams have been updated, along with their source files, using draw.io? See Metron Development Guidelines for instructions.

@nickwallen
Copy link
Contributor

You should create a new JIRA not reuse 2088 please.

@tigerquoll tigerquoll changed the title METRON-2088 - Support for SOLR time-based arrays METRON-2327 - Support for SOLR time-based arrays Nov 26, 2019
@tigerquoll
Copy link
Contributor Author

Interesting enough, when I ran the integration tests on my laptop, I got the following error:

Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 19.962 sec <<< FAILURE! - in org.apache.metron.indexing.integration.SolrIndexingIntegrationTest
test(org.apache.metron.indexing.integration.SolrIndexingIntegrationTest)  Time elapsed: 19.907 sec  <<< ERROR!
java.lang.NoSuchMethodError: org.apache.storm.flux.parser.FluxParser.parseFile(Ljava/lang/String;ZZLjava/lang/String;Z)Lorg/apache/storm/flux/model/TopologyDef;
	at org.apache.metron.integration.components.FluxTopologyComponent.loadYaml(FluxTopologyComponent.java:282)
	at org.apache.metron.integration.components.FluxTopologyComponent.startTopology(FluxTopologyComponent.java:252)
	at org.apache.metron.integration.components.FluxTopologyComponent.submitTopology(FluxTopologyComponent.java:248)
	at org.apache.metron.indexing.integration.IndexingIntegrationTest.test(IndexingIntegrationTest.java:133)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)

# test for presence of datetime field in schema collection
DQT='"'
DATETIME_SCHEMA="<field name=${DQT}datetime${DQT} type=${DQT}datetime${DQT}"
grep --quiet --fixed-strings $DATETIME_SCHEMA $METRON_HOME/config/schema/$1/schema.xml; rc=$?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this when I run the create_configset script...

[root@solrtra-5 schema]# pwd
/usr/metron/0.7.2/config/schema
[root@solrtra-5 schema]# ../../bin/create_configset.sh bro1
grep: name="datetime": No such file or directory
grep: type="datetime": No such file or directory
  adding: solrconfig.xml (deflated 72%)
  adding: schema.xml (deflated 88%)
{
  "responseHeader":{
    "status":0,
    "QTime":96}}
Configset bro1 successfully uploaded

Looks like the regex pattern needs fixing?


1. Add to the sensor parser config json field the following transformation:
```
"fieldTransformations" : [{
Copy link
Contributor

@anandsubbu anandsubbu Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing the markdown changed some of the double quotes.. this should be something like....

{
  "parserClassName":"org.apache.metron.parsers.bro.BasicBroParser",
  "sensorTopic":"bro",
  "parserConfig": {
  "fieldTransformations" : [{
      "input" : [ ]
    ,"transformation" : "STELLAR"
    ,"output" : [ "datetime"  ]
    ,"config" : {
      "datetime" : "DATE_FORMAT(\"yyyy-MM-dd'T'HH:mm:ss.SSSX\",timestamp)"
     }
    }]
}
}

Assuming the following values:
* SOLR_HOST: Host SOLR is installed on

* ALIAS_NAME: Name of the new alias
Copy link
Contributor

@anandsubbu anandsubbu Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my case, I created the alias name as bro1 but I notice the indices are not being written. In the solr admin UI, I see the name of the collection as bro1_2019-12-03.

Can you provide an example for "name of the new alias" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case the alias is what Metron is configured to read and write from - "bro1". What SOLR does behind the scenes is actually route the event to be stored in the collection "bro1_2019-12-03". If the event had been collected the next day, then it may be stored in the collection "bro1_2019-12-04". Both events are still retrievable when searching for them via the alias "bro1".

#
METRON_VERSION=${project.version}
METRON_HOME=/usr/metron/$METRON_VERSION
ZOOKEEPER=${ZOOKEEPER:-localhost:2181}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to source /etc/default/metron before setting ZOOKEEPER. This step will fail in a multinode deployment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in addition, we need to do something like this before invoking the script so that the ZK variable is set properly with the /solr suffix.

The README can be updated with this info.

#
METRON_VERSION=${project.version}
METRON_HOME=/usr/metron/$METRON_VERSION
ZOOKEEPER=${ZOOKEEPER:-localhost:2181}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in addition, we need to do something like this before invoking the script so that the ZK variable is set properly with the /solr suffix.

The README can be updated with this info.


Then the following command will create a time-routed alias:
```
curl "http://$SOLR_HOST:8983/solr/admin/collections?action=CREATEALIAS\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit... can we remove the spaces at the beginning of this block of code? Copy-pasting this will result in this command to fail.

Also, in my case I had to set the numShards to 1 for the command to complete successfully. Not sure if that is something to do with the way my solr is configured.

```
${HDP_HOME}/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server $BROKERLIST --topic $PARSER_NAME
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be good to include the step to start the parser topology ? :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
3 participants