From 842670a30a59b3498b3d626b4507499a611f3954 Mon Sep 17 00:00:00 2001 From: merrimanr Date: Tue, 14 Aug 2018 14:52:20 -0500 Subject: [PATCH 1/2] initial commit --- metron-interface/metron-rest/README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/metron-interface/metron-rest/README.md b/metron-interface/metron-rest/README.md index d19d8c387a..f93bcad151 100644 --- a/metron-interface/metron-rest/README.md +++ b/metron-interface/metron-rest/README.md @@ -222,6 +222,17 @@ Out of the box it is a simple wrapper around the tshark command to transform raw REST will supply the script with raw pcap data through standard in and expects PDML data serialized as XML. Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. +It is highly recommended that a dedicated YARN queue be created and configured for Pcap queries to prevent a job from consuming too many cluster resources. + +Pcap query results are stored in HDFS. The location of query results when run through the REST app is determined by a couple factors. The root of Pcap query results defaults to `/apps/metron/pcap/output` but can be changed with the +Spring property `pcap.final.output.path`. Assuming the default Pcap query output directory, the path to a result page will follow this pattern: +``` +/apps/metron/pcap/output/{username}/MAP_REDUCE/{job id}/page-{page number}.pcap +``` +Over time Pcap query results will accumulate in HDFS. Currently these results are not cleaned up automatically so cluster administrators should be aware of this and monitor them. It is highly recommended that a process be put in place to +periodically delete files and directories under the Pcap query results root. + +Users should also be mindful of date ranges used in queries so they don't produce result sets that are too large. Currently there are no limits enforced on date ranges. ## API From c5ed82479305b07d026ad6a86269e86b231747e7 Mon Sep 17 00:00:00 2001 From: merrimanr Date: Wed, 15 Aug 2018 15:37:38 -0500 Subject: [PATCH 2/2] added link to YARN docs --- metron-interface/metron-rest/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/metron-interface/metron-rest/README.md b/metron-interface/metron-rest/README.md index f93bcad151..cc18e5763c 100644 --- a/metron-interface/metron-rest/README.md +++ b/metron-interface/metron-rest/README.md @@ -222,7 +222,7 @@ Out of the box it is a simple wrapper around the tshark command to transform raw REST will supply the script with raw pcap data through standard in and expects PDML data serialized as XML. Pcap query jobs can be configured for submission to a YARN queue. This setting is exposed as the Spring property `pcap.yarn.queue`. If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value. -It is highly recommended that a dedicated YARN queue be created and configured for Pcap queries to prevent a job from consuming too many cluster resources. +It is highly recommended that a dedicated YARN queue be created and configured for Pcap queries to prevent a job from consuming too many cluster resources. More information about setting up YARN queues can be found [here](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html#Setting_up_queues). Pcap query results are stored in HDFS. The location of query results when run through the REST app is determined by a couple factors. The root of Pcap query results defaults to `/apps/metron/pcap/output` but can be changed with the Spring property `pcap.final.output.path`. Assuming the default Pcap query output directory, the path to a result page will follow this pattern: