Skip to content
This repository has been archived by the owner on May 15, 2019. It is now read-only.

Edit Solution Configuration

elopezsa edited this page Sep 21, 2016 · 4 revisions

Edit .conf Copy the template config file to /etc and then edit it for your machine configuration. We recommend only changing the HUSER, UINODE, LUSER, and NODES variables.

[soluser@node-04]$ git clone  https://github.com/Open-Network-Insight/oni-setup.git
[soluser@node-04]$ cd setup 
[soluser@node-04 setup]$ sudo cp <solution>.conf /etc/.
[soluser@node-04 setup]$ sudo vim /etc/<solution>.conf

Once the file has been edited, copy it to the three nodes named as UINODE, MLNODE, and GWNODE in the config file.

Below is what the default configuration file looks like:

#node configuration
NODES=('node-01' 'node-02')
UINODE='node03'
MLNODE='node04'
GWNODE='node16'
DBNAME='oni'

#hdfs - base user and data source config
HUSER='/user/oni'
DSOURCES='flow'
DFOLDERS=('binary' 'csv' 'hive' 'stage')
DNS_PATH=${HUSER}/${DSOURCE}/hive/y=${YR}/m=${MH}/d=${DY}/
PROXY_PATH=${HUSER}/${DSOURCE}/hive/y=${YR}/m=${MH}/d=${DY}/
FLOW_PATH=${HUSER}/${DSOURCE}/hive/y=${YR}/m=${MH}/d=${DY}/
HPATH=${HUSER}/${DSOURCE}/scored_results/${FDATE}

#impala config
IMPALA_DEM='node04'

KRB_AUTH=false
KINITPATH=
KINITOPTS=
KEYTABPATH=
KRB_USER=

#local fs base user and data source config
LUSER='/home/oni'
LPATH=${LUSER}/ml/${DSOURCE}/${FDATE}
RPATH=${LUSER}/ipython/user/${FDATE}
LDAPATH=${LUSER}/ml/oni-lda-c
LIPATH=${LUSER}/ingest

SPK_EXEC='400'
SPK_EXEC_MEM='2048m'
TOL='1e-6'


# MPI configuration

# command to run MPI
MPI_CMD='mpiexec'

# command to prepare system for MPI, eg. load environment variables
MPI_PREP_CMD=''

# number of processes to run in MPI
PROCESS_COUNT=20

The following variables are needed throughout the ML pipeline:

  • NODES – a space delimited list of the Data Nodes that will run the C/MPI part of the pipeline. Be very careful to keep * the variable in the format (‘host1’ ‘host2’ ‘host3’ …). The first node is the same node as the MLNODE.
  • UINODE – the node that runs the Operational Analytics (aka, user interface node).
  • MLNODE- the node that runs the ML pipeline, controlling the other nodes. The MLNODE must be the first node in the list NODES.
  • GWNODE – the node that runs the ingest process.
  • DBNAME – the name of the database used by the solution
  • HUSER – HDFS user path that will be the base path for the solution; this is usually the same user that you created to run the solution
  • DSOURCES – data sources enabled in this installation
  • DFOLDERS – built-in paths for the directory structure in HDFS
  • FLOW_PATH – the path to the flow records in Hive; this will be dynamically built within the pipeline with values for ${YR}, ${MH} and ${DY}
  • DNS_PATH – the path to the DNS records in Hive; this will be dynamically built within the pipeline with values for ${YR}, ${MH} and ${DY}
  • PROXY_PATH – the path to the proxy records in Hive; this will be dynamically built within the pipeline with values for ${YR}, ${MH} and ${DY}
  • HPATH – path where output from the ML analysis will be stored.
  • KRB_AUTH – (default: false) turn Kerberos authentication features on/off
  • KINITPATH - path to the kinit binary file
  • KINITOPTS - additional options to the kinit command
  • KEYTABPATH - keytab file path
  • KRB_USER - kerberos user
  • LUSER – the local filesystem path for the solution, ‘/home/solution-user/’
  • LPATH – the local path for the ML intermediate and final results, dynamically built when the pipeline runs
  • RPATH – the path on the Operational Analytics node where the pipeline output will be delivered
  • LDAPATH – path to the directory containing the oni-lda-c executable and configuration files.
  • LIPATH – local ingest path
  • SPK_EXEC – number of spark executors
  • SPK_EXEC_MEM – size (in MB) of spark executor
  • MPI_CMD - command to execute MPI
  • MPI_PREP_CMMD - optional command to prepare for MPI execution, such as sourcing variables. Can be empty.
  • PROCESS_COUNT - total number of processes across all workers to be used in MPI execution