DIG workflow processing for the EFFECT project.
- Download and install conda - https://www.continuum.io/downloads
- Install conda env -
conda install -c conda conda-env - Create the environment
conda env create. This will create a virtual environment named effect-env (The name is defined in environment.yml) - Switch to the environment using
source activate effect-env
NOTE: You should build the environment on the same hardware/os you're going to run the job
- Follow above instructions to create conda environment - Steps 1-3
- Switch to the effect-env:
source activate effect-env - Execute:
python postgresToCDR.py --host <postgreSQL hostname> --user <db username> --password <db password> \
--database <databasename> --table <tablename> \
--output <output filename> --team <Name of team providing data>`
Running script to convert CSV,JSON,XML,CDR data into a format that should be used for Karma Modeling
- Follow above instrucrions to create conda environment - Steps 1-3
- Switch to the effect-env:
source activate effect-env - Execute:
python generateDataForKarmaModeling.py --input <input filename> --output <output filename> \
--format <input format-csv/json/xml/cdr> --source <a name for the source> \
--separator <column separator for CSV files>
Example Invocations:
python generateDataForKarmaModeling.py --input ~/github/effect/effect-data/nvd/sample/nvdcve-2.0-2003.xml \
--output nvd.jl --format xml --source nvd
python generateDataForKarmaModeling.py --input ~/github/effect/effect-data/hackmageddon/sample/hackmageddon_20160730.csv \
--output hackmageddon.jl --format csv --source hackmageddon
python generateDataForKarmaModeling.py --input ~/github/effect/effect-data/hackmageddon/sample/hackmageddon_20160730.jl \
--output hackmageddon.jl --format json --source hackmageddon
- Login to AWS and create a tunnel -
ssh -L 8888:localhost:8888 hadoop@ec2-52-42-169-124.us-west-2.compute.amazonaws.com - Access Hue on http://localhost:8888
- See hiveQueries.sql for examples
To build the python libraries required by the workflows,
-
Edit make.sh and update the path to
dig-workflows -
Run
./make.sh. This will createeffect-env.zipthat can be attached with the--archivesoption to the spark workflow -
Copy the
effect-env.zipfile to AWS -scp effect-env.zip hadoop@ec2-52-42-169-124.us-west-2.compute.amazonaws.com:/home/hadoop/effect-workflows/lib -
zip your karma home folder into
karma.zipand copt to AWS -scp karma.zip hadoop@ec2-52-42-169-124.us-west-2.compute.amazonaws.com:/home/hadoop/effect-workflows/ -
Build a shaded karma-spark jar -
cd karma-spark mvn clean install -P shaded -Denv=hive scp lib/karma-spark-0.0.1-SNAPSHOT-shaded.jar hadoop@ec2-52-42-169-124.us-west-2.compute.amazonaws.com:/home/hadoop/effect-workflows/lib -
Login to AWS and run the workflow using the script
run_karma_workflow.shThis will load data from HIVE table CDR, apply karma models to it and save the output to HDFS.
To load the data to ES,
-
Create an index, say effect-2 with mappings from file - https://raw.githubusercontent.com/usc-isi-i2/effect-alignment/master/es/es-mappings.json
-
Run spark workflow to load data from hdfs to this effect-2 index
spark-submit --deploy-mode client \ --executor-memory 5g \ --driver-memory 5g \ --jars "/home/hadoop/effect-workflows/jars/elasticsearch-hadoop-2.4.0.jar" \ --py-files /home/hadoop/effect-workflows/lib/python-lib.zip \ /home/hadoop/effect-workflows/effectWorkflow-es.py \ --host 172.31.19.102 \ --port 9200 \ --index effect-2 \ --doctype attack \ --input hdfs://ip-172-31-19-102/user/effect/data/cdr-framed/attackThis shows how to add in the attack frame. This needs to executed for all the available frames.
-
Change the alias 'effect' in ES to point to this new index - effect-2
POST _aliases { "actions": [ { "add": { "index": "effect-2", "alias": "effect" }, "remove": { "index": "effect-1", "alias": "effect" } } ] }
- Follow the Installation Instructions to install conda and conda-env if you dont have them installed
- Create the effect environement
conda env create - Switch to the environment using
source activate effect-env - Run
.\make-extractor.sh. This bundles up the entire environemnt, including python that is used to run the workflow - If spark is not installed in the default
/usr/lib/spark/, change paths inrun-extractor.sh - Run
run-extractor.sh
- To remove the environment run
conda env remove -n effect-env - To see all environments run
conda env list