Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

METRON-1708 Run the Batch Profiler in Spark [Feature Branch] #1161

Conversation

nickwallen
Copy link
Contributor

@nickwallen nickwallen commented Aug 13, 2018

This adds the ability to run the Batch Profiler from the command line. This also packages up the Batch Profiler into a tarball.

This is a pull request against the METRON-1699-create-batch-profiler feature branch.

This is dependent on the following PRs. By filtering on the last commit, this PR can be reviewed before the others are reviewed and merged.

Testing

  1. Start-up the development environment. Allow Metron to run for a bit so that a fair amount of telemetry is archived in HDFS.

  2. Stop all Metron services.

  3. Install Spark2 using Ambari.

    • Use Add Service > Spark2, then follow prompts.
  4. Deploy the Batch Profiler to the development environment.

    From the host machine; outside the development VM, run the following.

    cd metron-deployment/development/centos6
    vagrant scp ../../../metron-analytics/metron-profiler-spark/target/metron-profiler-spark-0.5.1-archive.tar.gz /tmp
    

    Then from the development VM, run the following.

    source /etc/default/metron
    cd $METRON_HOME
    tar -xvf /tmp/metron-profiler-spark-0.5.1-archive.tar.gz
    
  5. Create a profile by editing $METRON_HOME/config/zookeeper/profiler.json as follows.

    [root@node1 0.5.1]# cat $METRON_HOME/config/zookeeper/profiler.json
    {
      "profiles": [
        {
          "profile": "hello-world",
          "foreach": "'global'",
          "init":    { "count": "0" },
          "update":  { "count": "count + 1" },
          "result":  "count"
        }
      ],
      "timestampField": "timestamp"
    }
    
  6. Count the number of messages in the 'indexing' topic. This should not be changing.

    [root@node1 ~]# /usr/hdp/current/kafka-broker/bin/kafka-run-class.sh kafka.tools.GetOffsetShell \
       --broker-list $BROKERLIST \
       --topic indexing \
       --time -1
    
    indexing:0:8130
    

    In this case there are 8,131 messages.

  7. Delete any previously written profile measurements from HBase.

    [root@node1 ~]# hbase shell
    ...
    
    hbase(main):001:0> truncate 'profiler'
    Truncating 'profiler' table (it may take a while):
     - Disabling table...
     - Truncating table...
    0 row(s) in 4.1070 seconds
    
  8. Confirm that all of the messages were successfully indexed in HDFS.

    [root@node1 ~]# hdfs dfs -cat /apps/metron/indexing/indexed/*/* | wc -l
    8130
    
    • Remember that we found 8,130 in the indexing topic previously. This shows that all of them were indexed in HDFS successfully.
  9. Alter the $METRON_HOME/config/batch-profiler.properties as follows.

    [root@node1 0.5.1]# cat config/batch-profiler.properties
    spark.master=local
    spark.app.name=Batch Profiler
    spark.sql.shuffle.partitions=8
    
    profiler.period.duration=1
    profiler.period.duration.units=MINUTES
    
    profiler.batch.input.path=hdfs://localhost:8020/apps/metron/indexing/indexed/*/*
    
  10. Fix-up some of the Spark configuration.

    SPARK_HOME=/usr/hdp/current/spark2-client
    cp  /usr/hdp/current/hbase-client/conf/hbase-site.xml $SPARK_HOME/conf/
    cp $SPARK_HOME/conf/log4j.properties.template $SPARK_HOME/conf/log4j.properties
    echo "log4j.logger.org.apache.metron.profiler.spark=DEBUG" >> $SPARK_HOME/conf/log4j.properties
    
  11. You may need to create the Spark history directory in HDFS (if doing this in Full Dev.)

    export HADOOP_USER_NAME=hdfs
    hdfs dfs -mkdir /spark2-history
    
  12. You may want to edit the log4j properties that sits in your config directory in $SPARK_HOME, or create one.

    # Set everything to be logged to the console
    log4j.rootCategory=WARN, console
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.err
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
    
    # Set the default spark-shell log level to WARN. When running the spark-shell, the
    # log level for this class is used to overwrite the root logger's log level, so that
    # the user can have different defaults for the shell and regular Spark apps.
    log4j.logger.org.apache.spark.repl.Main=WARN
    
    # Settings to quiet third party logs that are too verbose
    log4j.logger.org.spark_project.jetty=WARN
    log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
    log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
    log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
    log4j.logger.org.apache.parquet=ERROR
    log4j.logger.parquet=ERROR
    
    # SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
    log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
    log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
    
    #log4j.logger.org.apache.metron.profiler=DEBUG
    log4j.logger.org.apache.metron.profiler.spark=DEBUG
    
  13. Run the Batch Profiler.

    source /etc/default/metron
    cd $METRON_HOME
    $METRON_HOME/bin/start_batch_profiler.sh
    
  14. You should see something like the following.

    [root@node1 0.5.1]# ./bin/start_batch_profiler.sh
    Warning: Ignoring non-spark config property: profiler.batch.input.path=hdfs://localhost:8020/apps/metron/indexing/indexed/*/*
    Warning: Ignoring non-spark config property: profiler.period.duration=1
    Warning: Ignoring non-spark config property: profiler.period.duration.units=MINUTES
    18/08/13 18:23:50 WARN Utils: Your hostname, node1 resolves to a loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface eth0)
    18/08/13 18:23:50 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
    18/08/13 18:23:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    18/08/13 18:23:51 INFO BatchProfilerCLI: Loading profiler properties from '/usr/metron/0.5.1/config/batch-profiler.properties'
    18/08/13 18:23:51 INFO BatchProfilerCLI: Properties = {spark.app.name=Batch Profiler, profiler.batch.input.path=hdfs://localhost:8020/apps/metron/indexing/indexed/*/*, profiler.period.duration.units=MINUTES, profiler.period.duration=1, spark.sql.shuffle.partitions=8, spark.master=local}
    18/08/13 18:23:51 INFO BatchProfilerCLI: Loading profiles from '/usr/metron/0.5.1/config/zookeeper/profiler.json'
    18/08/13 18:23:51 INFO BatchProfilerCLI: Loaded 1 profile(s)
    18/08/13 18:23:52 DEBUG BatchProfiler: Building 1 profile(s)
    18/08/13 18:23:52 DEBUG BatchProfiler: Loading telemetry from 'hdfs://localhost:8020/apps/metron/indexing/indexed/*/*'
    18/08/13 18:23:53 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
    18/08/13 18:23:58 DEBUG BatchProfiler: Found 8130 telemetry record(s)
    ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.7ANTLR Tool version 4.5 used for code generation does not match the current runtime version 4.7ANTLR Runtime version 4.5 used for parser compilation does not match the current runtime version 4.718/08/13 18:24:00 DEBUG BatchProfiler: Generated 8130 message route(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552359' from 320 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552359, value=320
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552363' from 210 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552363, value=210
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552365' from 260 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552365, value=260
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552367' from 250 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552367, value=250
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552369' from 280 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552369, value=280
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552375' from 270 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552375, value=270
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552374' from 240 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552374, value=240
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552378' from 320 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552378, value=320
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552355' from 340 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552355, value=340
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552361' from 280 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552361, value=280
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552372' from 250 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552372, value=250
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552373' from 280 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552373, value=280
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552379' from 320 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552379, value=320
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552352' from 180 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552352, value=180
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552353' from 240 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552353, value=240
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552360' from 320 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552360, value=320
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552362' from 290 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552362, value=290
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552376' from 310 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552376, value=310
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552357' from 330 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552357, value=330
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552381' from 130 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552381, value=130
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552356' from 320 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552356, value=320
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552366' from 260 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552366, value=260
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552370' from 260 message(s)
    18/08/13 18:24:02 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552370, value=260
    18/08/13 18:24:03 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552368' from 280 message(s)
    18/08/13 18:24:03 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552368, value=280
    18/08/13 18:24:03 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552377' from 230 message(s)
    18/08/13 18:24:03 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552377, value=230
    18/08/13 18:24:03 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552380' from 290 message(s)
    18/08/13 18:24:03 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552380, value=290
    18/08/13 18:24:03 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552354' from 310 message(s)
    18/08/13 18:24:03 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552354, value=310
    18/08/13 18:24:03 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552358' from 330 message(s)
    18/08/13 18:24:03 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552358, value=330
    18/08/13 18:24:03 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552364' from 220 message(s)
    18/08/13 18:24:03 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552364, value=220
    18/08/13 18:24:03 DEBUG ProfileBuilderFunction: Building a profile for group 'hello-world-global-25552371' from 210 message(s)
    18/08/13 18:24:03 DEBUG ProfileBuilderFunction: Profile measurement created; profile=hello-world, entity=global, period=25552371, value=210
    18/08/13 18:24:03 DEBUG BatchProfiler: Produced 30 profile measurement(s)
    18/08/13 18:24:03 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
    18/08/13 18:24:04 DEBUG HBaseWriterFunction: 6 profile measurement(s) written to HBase
    18/08/13 18:24:04 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
    18/08/13 18:24:04 DEBUG HBaseWriterFunction: 2 profile measurement(s) written to HBase
    18/08/13 18:24:04 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
    18/08/13 18:24:04 DEBUG HBaseWriterFunction: 5 profile measurement(s) written to HBase
    18/08/13 18:24:04 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
    18/08/13 18:24:04 DEBUG HBaseWriterFunction: 5 profile measurement(s) written to HBase
    18/08/13 18:24:04 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
    18/08/13 18:24:04 DEBUG HBaseWriterFunction: 2 profile measurement(s) written to HBase
    18/08/13 18:24:05 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
    18/08/13 18:24:05 DEBUG HBaseWriterFunction: 3 profile measurement(s) written to HBase
    18/08/13 18:24:05 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
    18/08/13 18:24:05 DEBUG HBaseWriterFunction: 3 profile measurement(s) written to HBase
    18/08/13 18:24:05 DEBUG HBaseWriterFunction: About to write profile measurement(s) to HBase
    18/08/13 18:24:05 DEBUG HBaseWriterFunction: 4 profile measurement(s) written to HBase
    18/08/13 18:24:05 DEBUG BatchProfiler: 30 profile measurement(s) written to HBase
    18/08/13 18:24:05 INFO BatchProfilerCLI: Profiler produced 30 profile measurement(s)
    
  15. Fetch the profile measurements created by the Profiler.

    ./bin/stellar -z $ZOOKEEPER
    

    Change the period duration for the Profiler Client to match the 1 minute duration that was used by the Batch Profiler.

     %define profiler.client.period.duration := 1
     %define profiler.client.period.duration.units := "MINUTES"
    

    The Profiler counted a couple hundred messages each minute.

    Stellar]>>> m := PROFILE_GET("hello-world","global", PROFILE_FIXED(30, "DAYS"))
    [180, 240, 310, 340, 320, 330, 330, 320, 320, 280, 290, 210, 220, 260, 260, 250, 280, 280, 260, 210, 250, 280, 240, 270, 310, 230, 320, 320, 290, 130]
    

    Overall, there were 30 measurements captured from the archived telemetry.

    [Stellar]>>> LENGTH(m)
    30
    

    The Profiler counted a total of 8,130 messages.

    [Stellar]>>> REDUCE(m, (l,r)->l+r, 0)
    8130
    
  16. Validate the range of time over which we have telemetry.

    Launch the Spark Shell.

    export HADOOP_USER_NAME=hdfs
    /usr/hdp/current/spark2-client/bin/spark-shell
    

    In the spark shell, run the following.

    scala> val msgs = spark.read.json("hdfs://localhost:8020/apps/metron/indexing/indexed/*/*")
    msgs: org.apache.spark.sql.DataFrame = [AA: boolean, RA: boolean ... 91 more fields]
    
    scala> msgs.count()
    res15: Long = 8130
    
    scala> msgs.select(max("timestamp") - min("timestamp")).show()
    +---------------------------------+
    |(max(timestamp) - min(timestamp))|
    +---------------------------------+
    |                          1769980|
    +---------------------------------+
    

    We see that 1,769,980 milliseconds is about 30 minutes. That matches the 30 measurements that have been captured by the Profiler.

Pull Request Checklist

  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?
  • Have you included steps or a guide to how the change may be verified and tested manually?
  • Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
  • Have you written or updated unit tests and or integration tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

@nickwallen nickwallen changed the title METRON-1708 Run the Batch Profiler in Spark METRON-1708 Run the Batch Profiler in Spark [Feature Branch] Aug 14, 2018
@nickwallen nickwallen changed the base branch from master to feature/METRON-1699-create-batch-profiler August 14, 2018 13:59
@merrimanr
Copy link
Contributor

I had to change the period settings in batch-profiler.properties to match what's in zookeeper. Everything worked as expected after that. Assuming the README will be updated in a future PR. +1

@nickwallen
Copy link
Contributor Author

Thanks @merrimanr ! I updated the testing steps in the PR description to account for what you found.

I will add a README for the Spark Profiler in #1163 . That will allow it to include some installation steps.

@nickwallen
Copy link
Contributor Author

This has been merged.

@nickwallen nickwallen closed this Aug 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
2 participants