effect-workflows

DIG workflow processing for the EFFECT project.

Installation

Download and install conda - https://www.continuum.io/downloads
Install conda env - conda install -c conda conda-env
Create the environment conda env create. This will create a virtual environment named effect-env (The name is defined in environment.yml)
Switch to the environment using source activate effect-env

NOTE: You should build the environment on the same hardware/os you're going to run the job

Running script to convert PostgreSQL to CDR

Follow above instructions to create conda environment - Steps 1-3
Switch to the effect-env: source activate effect-env
Execute:

python postgresToCDR.py --host <postgreSQL hostname> --user <db username> --password <db password> \
                        --database <databasename> --table <tablename> \
                        --output <output filename> --team <Name of team providing data>`

Running script to convert CSV,JSON,XML,CDR data into a format that should be used for Karma Modeling

Follow above instrucrions to create conda environment - Steps 1-3
Switch to the effect-env: source activate effect-env
Execute:

python generateDataForKarmaModeling.py --input <input filename> --output <output filename> \
      --format <input format-csv/json/xml/cdr> --source <a name for the source> \
      --separator <column separator for CSV files>

Example Invocations:

python generateDataForKarmaModeling.py --input ~/github/effect/effect-data/nvd/sample/nvdcve-2.0-2003.xml \
          --output nvd.jl --format xml --source nvd


python generateDataForKarmaModeling.py --input ~/github/effect/effect-data/hackmageddon/sample/hackmageddon_20160730.csv \
          --output hackmageddon.jl --format csv --source hackmageddon


python generateDataForKarmaModeling.py --input ~/github/effect/effect-data/hackmageddon/sample/hackmageddon_20160730.jl \
          --output hackmageddon.jl --format json --source hackmageddon

Loading data in HIVE

Login to AWS and create a tunnel - ssh -L 8888:localhost:8888 hadoop@ec2-52-42-169-124.us-west-2.compute.amazonaws.com
Access Hue on http://localhost:8888
See hiveQueries.sql for examples

Running the workflow

To build the python libraries required by the workflows,

Edit make.sh and update the path to dig-workflows
Run ./make.sh. This will create effect-env.zip that can be attached with the --archives option to the spark workflow
Copy the effect-env.zip file to AWS - scp effect-env.zip hadoop@ec2-52-42-169-124.us-west-2.compute.amazonaws.com:/home/hadoop/effect-workflows/lib
zip your karma home folder into karma.zip and copt to AWS - scp karma.zip hadoop@ec2-52-42-169-124.us-west-2.compute.amazonaws.com:/home/hadoop/effect-workflows/

Build a shaded karma-spark jar -

cd karma-spark
mvn clean install -P shaded -Denv=hive
scp lib/karma-spark-0.0.1-SNAPSHOT-shaded.jar hadoop@ec2-52-42-169-124.us-west-2.compute.amazonaws.com:/home/hadoop/effect-workflows/lib

Login to AWS and run the workflow using the script run_karma_workflow.sh This will load data from HIVE table CDR, apply karma models to it and save the output to HDFS.

To load the data to ES,

Create an index, say effect-2 with mappings from file - https://raw.githubusercontent.com/usc-isi-i2/effect-alignment/master/es/es-mappings.json

Run spark workflow to load data from hdfs to this effect-2 index

spark-submit --deploy-mode client  \
    --executor-memory 5g \
    --driver-memory 5g \
    --jars "/home/hadoop/effect-workflows/jars/elasticsearch-hadoop-2.4.0.jar" \
    --py-files /home/hadoop/effect-workflows/lib/python-lib.zip \
    /home/hadoop/effect-workflows/effectWorkflow-es.py \
    --host 172.31.19.102 \
    --port 9200 \
    --index effect-2 \
    --doctype attack \
    --input hdfs://ip-172-31-19-102/user/effect/data/cdr-framed/attack

This shows how to add in the attack frame. This needs to executed for all the available frames.

Change the alias 'effect' in ES to point to this new index - effect-2

POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "effect-2",
        "alias": "effect"
      },
      "remove": {
        "index": "effect-1",
        "alias": "effect"
      }
    }
  ]
}

Running the Extractor Workflow

Follow the Installation Instructions to install conda and conda-env if you dont have them installed
Create the effect environement conda env create
Switch to the environment using source activate effect-env
Run .\make-extractor.sh. This bundles up the entire environemnt, including python that is used to run the workflow
If spark is not installed in the default /usr/lib/spark/, change paths in run-extractor.sh
Run run-extractor.sh

Extras

To remove the environment run conda env remove -n effect-env
To see all environments run conda env list

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
jars		jars
oozie/workflows		oozie/workflows
scripts/APIDownloader		scripts/APIDownloader
sparkRunCommands		sparkRunCommands
.gitignore		.gitignore
.hdfscli.cfg		.hdfscli.cfg
LICENSE		LICENSE
README.md		README.md
cdrLoader.py		cdrLoader.py
clean.sh		clean.sh
csvToJson.py		csvToJson.py
effect-cdr-dump.py		effect-cdr-dump.py
effect-emailextractor-workflow.py		effect-emailextractor-workflow.py
effectWorkflow-es.py		effectWorkflow-es.py
effectWorkflow.py		effectWorkflow.py
environment.yml		environment.yml
generateDataForKarmaModeling.py		generateDataForKarmaModeling.py
hiveQueries.sql		hiveQueries.sql
install.sh		install.sh
make.sh		make.sh
nginx-conf.txt		nginx-conf.txt
nginx.conf		nginx.conf
postgresToCDR.py		postgresToCDR.py
pyspark		pyspark
ransomware-workflow.py		ransomware-workflow.py
run-extractor-oozie.sh		run-extractor-oozie.sh
run-extractor.sh		run-extractor.sh
run_karma_workflow.sh		run_karma_workflow.sh
run_karma_workflow_cdh.sh		run_karma_workflow_cdh.sh
run_load_to_es.sh		run_load_to_es.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

effect-workflows

Installation

Running script to convert PostgreSQL to CDR

Running script to convert CSV,JSON,XML,CDR data into a format that should be used for Karma Modeling

Loading data in HIVE

Running the workflow

Running the Extractor Workflow

Extras

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

effect-workflows

Installation

Running script to convert PostgreSQL to CDR

Running script to convert CSV,JSON,XML,CDR data into a format that should be used for Karma Modeling

Loading data in HIVE

Running the workflow

Running the Extractor Workflow

Extras

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages