#Prediction for IoT Traffic Simulation

###Author: Neeraj Asthana and Professor Robert Brunner

###Collaborators: Anchal, Rishabh, Vaishali

___

##Problem Statement
Recent advances in sensor technology, infrastructure for estimating temperature, pressure, humidity, precipitation, etc. are readily available on roadways to predict traffic states. The goal and novelty of our project is to combine streamed GPS traffic estimation algorithms with dynamic road sensor data to accurately predict traffic states and efficiently advise and direct drivers.

![alt text](car_road.png "Cars and Sensors")

##Cluster Details

All of these resources are on the NCSA ACX cluster. 

###Spark Details


###Kafka Details

Nodes (brokers): 141.142.236.172, 141.142.236.194

Port: 9092

Zookeeper: 10.0.3.130:2181,10.0.3.131:2181

###Cassandra Details

Table name: traffic

Schema:

traffic (id uuid, time_stamp timestamp, latitude decimal, longitude decimal, PRIMARY KEY (id, time_stamp));

____

Visual of all resources:

![alt text](Data Pipeline.png "Data Pipeline")

Details of the cluster:

![alt text](Cluster Details.png "Cluster Details")

##Time Decay Model for Speed Estimation

Used to estimate Average Road Speeds for specific segments of road by weighting speed observations.

####Observations
* Higher Speeds are more valuable the lower speeds
* Recent Obersvations are favored over historical observations
* Each data point will be of the form ($t_i, v_i, l_i$) or (time, velocity, location)


####Time Weight
An observation's timestamp ($t_i$) is weighted  by: $w(i,t) = \frac{f(t_i - L)}{f(t - L)}$

*f* is some positive, monotonic, non-decreasing function

*L* is some Landmark time (starting time)

*t* is the most recent timestamp

####Velocity Weight
An observation's velocity ($v_i$) is weighted  by: $w^v (i) = g (v_i)$

*g* is some positive, monotonic, non-decreasing function

####Combination of Weights
Weight combinations of the time and velocity weights:

$w^* (i,t) = w(i,t) \cdot w^v (i) = \frac{f(t_i - L)}{f(t - L)} \cdot g(v_i)$

####Aggregating observatoins
Calculates average velocity of a road segment (*r*) by aggregating most recent and historical observations. In order compute these values efficiently and to be able to update values, I will persist 2 quantities in Spark , $X,Y$ which will be then be used to calulate $\overline{V}(r) = \frac{X}{Y}$

$$X = \sum_{n=1}^{m} f(t_i - L) \cdot g(v_i) \cdot v_i$$         

$$Y = \sum_{n=1}^{m} f(t_i - L) \cdot g(v_i)$$

Visual of the Time Decay Model:

![alt text](Time Decay Visual.png "Time Decay")

In [2]:
def f():
    None

def g():
    None

##Spark Kafka Integration

Steps:

1. Write file with kafka script (use ipython to write script)

2. use spark-submit to submit file with appropriate jars to Spark Cluster

In [3]:
%%writefile timedecaykafka.py

##Spark Kafka
import pyspark
from pyspark import SparkContext, SparkConf
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils

# create SparkContext
conf = SparkConf().setAppName("Kafka-Spark")
sc = SparkContext(conf=conf)

# create StreamingContext (updates every 30 seconds)
ssc = StreamingContext(sc, 30)

topic = ["mytopic"]
brokers =  "141.142.236.172:9092,141.142.236.194:9092"
directKafkaStream = KafkaUtils.createDirectStream(ssc, topic, {"metadata.broker.list": brokers})

lines = directKafkaStream.map(lambda x: x[1])
counts = lines.flatMap(lambda line: line.split(" ")) \
        .map(lambda word: (word, 1)) \
        .reduceByKey(lambda a, b: a+b)
counts.pprint()

ssc.start()
ssc.awaitTermination()

Writing timedecaykafka.py


In [None]:
!../../../spark/spark-1.5.0-bin-hadoop2.6/bin/spark-submit --master spark://10.0.3.70:7077 --packages org.apache.spark:spark-streaming-kafka_2.10:1.5.0 timedecaykafka.py