To start cluster (HDFS + YARN + Spark), run the 'hdfs_yarn_init.sh' script from the phase1 folder.
Exec the hdfs-start.sh script in order to prepare HDFS.
The hdfs-start.sh will create a volume where the contents of $HADOOP_HOME will be available. In this way NiFi will be able to read the Hadoop Configuration files. Inside /data you will find the contents of the $PWD/nifi folder
When the containers start, exec the initialize.sh script that you will find in the /data/ folder.
DO NOT CLOSE THE BASH FOR THE CONTAINER OR IT WILL BE STOPPED.
Exec the nifi-start.sh script from the local system. Insert the dataset file (d14_filtered.csv) inside the /nifi/spooldir directory so that NiFi can start ingest the data.
Start a Redis cluster (on localhost:6379).
Now you are ready to exec the spark script:
$SPARK_HOME/bin/spark-submit --class it.uniroma2.sabd.mjolnir.MjolnirSparkSession path-to/mjolnir-1.0-jar-with-dependencies.jar hdfs=localhost:54310 houseid=*houseID*
Being the query3 final resolution demanded to Redis, you can submit as many executions as the houses.
So using:
ZRANGE mjolnir/results/query3/plugsrank 0 -1 WITHSCORES
you will get the orderd values for each plug identified by houseID_householdID_plugID