### Import dependencies and create datastore connection 

In [1]:
import org.locationtech.jts.geom._
import org.apache.spark.sql.types._
import org.locationtech.geomesa.spark.jts._

val dataFrame = spark.read.format("geomesa").option("hbase.zookeepers","hbase.optix-ons-local:2181").option("hbase.catalog", "ons-historical").option("geomesa.feature", "orbcomm").load()
dataFrame.createOrReplaceTempView("ais")

Starting Spark application


ID,YARN Application ID,Kind,State,Spark UI,Driver log,Current session?
337,application_1580146381176_0338,spark,idle,Link,Link,✔


SparkSession available as 'spark'.
import org.locationtech.jts.geom._
import org.apache.spark.sql.types._
import org.locationtech.geomesa.spark.jts._
dataFrame: org.apache.spark.sql.DataFrame = [__fid__: string, mmsi: string ... 54 more fields]


### Create temp view with all observations for the last week of all ships within 100km of Dubai. Calculate the distance from that ship to the Dubai port. 

In [2]:
%%sql
create or replace temp view travel_log as (
    select 
        mmsi, 
        dtg, 
        geom,
        st_distance(geom, st_makePoint(55.31, 25.26)) as distance
    from ais
    where st_contains(st_bufferPoint(st_makePoint(55.31, 25.26), 100000), geom) 
        and dtg > cast('2018-11-13' as timestamp) 
        and dtg < cast('2018-11-20' as timestamp)
    order by
        mmsi,
        dtg
)

VBox(children=(HBox(), EncodingWidget(children=(VBox(children=(HTML(value='Encoding:'), Dropdown(description='…

Output()

#### What the temp view looks like now

In [3]:
spark.sql("""
select * from travel_log order by distance desc
""").show()

+---------+-------------------+--------------------+------------------+
|     mmsi|                dtg|                geom|          distance|
+---------+-------------------+--------------------+------------------+
|354171000|2018-11-15 18:49:36|POINT (55.9450683...| 0.994242896951686|
|354171000|2018-11-15 20:49:57|POINT (55.9453733...|0.9942403103444776|
|354171000|2018-11-15 19:07:49|POINT (55.94515 2...| 0.994109142236288|
|354171000|2018-11-15 19:08:18|POINT (55.9451416...|0.9941089465348132|
|354171000|2018-11-17 00:34:29|POINT (55.945425 ...| 0.994077212545771|
|375128000|2018-11-14 18:00:14|POINT (54.361235 ...|0.9940395976142009|
|538004025|2018-11-15 13:13:11|POINT (54.8778666...|0.9940313807700221|
|576215000|2018-11-14 18:23:26|POINT (54.3609983...|0.9939780639235012|
|354171000|2018-11-16 10:58:38|POINT (55.945225 ...|0.9939609192658294|
|354171000|2018-11-16 16:34:47|POINT (55.9451233...|0.9939574850132558|
|376668000|2018-11-16 23:14:09|POINT (54.3461499...|0.9939366742

### If we consider 50km as 'Arriving' at the port we can look for ships that have moved from outside that radius to inside and disregard passing ships.

In [4]:
spark.sql("""
select count(*) 
from (
    select 
        *, 
        lag(distance, 1) over (order by mmsi, dtg) as last_distance
    from travel_log
)
where 
    last_distance > .5 
    and distance < .5
""").show()

+--------+
|count(1)|
+--------+
|     647|
+--------+

