# Final Project

Author: Andy Sollish

Date: December 8, 2017

Platform: Jupyter Notebook, Python 3, (Windows 10)

## Project Task

The goal of the project was to create a means for someone to identify co-occurrence among vessels.  We interpret co-occurrence as an event whereby two or more vessels appear in the same location at the same time.  The location and time are determined arbitrarily in this case, but can be adjusted as wished. Our starting point for finding co-occurrences was limited to a single vessel.  That is, given a vessel of interest, find all the locations the vessel reported it's position and then find all **other** vessels that reported their position at the same place and time. 

We used data generated from the Automatic Identification System (AIS) which is a transponder based system for vessels in the maritime domain.  Vessels equipped with VHF transceivers transmit their position (and several other pieces of data) to shore and satellite stations.  AIS was initially designed as a collision avoidance system, so the position of vessels can be seen by other vessels equipped with AIS.    

Since there could be a very large amount of data, there could be many comparisions that need to be made between locations and times when looking for co-occurrences.  To expedite the process, we create space-time-boxes for each of the AIS records.  In short, these are sort of like a hash that includes the location of the vessel and the date it was in the location.  It permits very fast lookup and avoids having to do lots of comparisons.  The process is explained in more detail below.  

Lastly, we wanted to create a simple visualization that would allow the user to see the locations where the vessel of interest appeared and the other vessels that had co-occurrence events there. 

## Key Terms 

AIS - Automatic Identification System (a transponder based system for recording vessel positions and identities) <br />
MMSI - Maritime Mobile Service Identity (a unique nine-digit transponder code for a vessel)<br />
IMO - International Maritime Organization (a unique seven-digit number associated with a vessel hull)<br />
VOI - vessel of interest <br />
Space-time-box - an identifier for a vessel's location in time and space.  Adjustable parameters.

Import necessary python modules.  We'll be using pymongo to interact with a MongoDB, pygeohash to convert dates and times to hash values, and folium to visually display our data.

In [3]:
import pymongo
import json
import pprint as p
import pygeohash as pgh
import time
import folium

The code below is necessary to work with the MongoDB.  MongoDB has to be started first before the below code is executed. After an instance of MongoDB is started, the CSV delineated AIS data must be loaded into the database using the prescribed MongoDB commands.  It's important to note that raw AIS data is not formatted to work with the project code.  One has to convert the raw AIS data to a more useable format (CSV) with the correct headers.  If we wanted this to be an ongoing process where new AIS data is continually added, we would generate another application, or at least a codebook in order for raw AIS data to be converted into the right format for inclusion in the MongoDB.   

In [4]:
client = pymongo.MongoClient()
db = client.AIS_data_test
collection = db.summaries

AIS data includes a date/time value in the form of an epoch number, also known as Unix time.  This is a numerical value that encodes the number of seconds that have elapsed since January 1, 1970.  

The function below is what allows us to convert the epoch number to a more useable format.  In it's current form, it converts the epoch time to a year, month, day string.  For the purpose of our analysis, we're only interested in vessels that have appeared in the same place, sometime in the same day.  If we wanted more fine-grained analysis, we could alter the function to return a string with the date *and* time. For example, if we wanted to find vessels that appeared near each other at the same hour, we would merely append the hour value to the string that contains the year, month, and day.  

What we produce with the function is not a hash in the true sense of the word, since multiple vessels will have the same date/time string.  However, it will provide us with a very fast lookup capablility when doing comparisons.  We know longer have to compare dates and times, we merely grab all vessels that have a specific date/time string. 

In [5]:
def convertTime(date):
    ''' Convert an epoch time to a year, month, day string.
    Input: a single epoch date
    Output: a single string in the form '20171113' '''
    
    epoch = time.gmtime(int(date))
    yr, mo, d = epoch[0:3]
    dateHash = str(yr) + str(mo) + str(d)
    
    return dateHash

## Create Space-time-boxes

The code below **only has to be run once, and only once** in order to add a value to each document in the MongoDB.  The added value is a 'space-time-box'.  It's essentially a mashup of a 5-digit location hash (i.e. w8th9) with a data hash (i.e. 20171113).  The final product would look something like this, 'w8th920171113'. That means for every AIS record, there is a space-time-box included.  We'll use these space-time-boxes to do very fast and efficient lookup in order to grab information of interest.

Using pygeohash ("pgh") we convert the latitude and longitude contained in each AIS record to a sort of hash value as mentioned above.  We've set a precision of "5," but this can be adjusted to higher or lower values.  Higher values would provide for a more specific location and vice versa for lower precision values. 

If more AIS data is added to the MongoDB, you would need to run those AIS records through the code below in order to create space-time-boxes for the new records.

In [None]:
for vessel in collection.find():
    lat = vessel['latitude']
    long = vessel['longitude']
    id = vessel['_id']
    dateHash = convertTime(vessel['time'])                       # convert date/time to a sort of hash value
    locationHash = pgh.encode(lat, long, precision=5)            # convert location to a sort of hash value
    spaceTimeBox = str(locationHash) + str(dateHash)             # combine the date/time and location hash values
    collection.find_one_and_update({'_id': id}, {'$set': {'spaceTime': spaceTimeBox}})

## Vessel of Interest

The code below allows you to enter a MMSI number of interest.  When entered the code extracts all space-time-boxes for the MMSI of interest and adds them to a list.  We do this so that we have a data structure (list) that contains all the locations/times our vessel of interest self-reported.  Later we'll iterate through this list of space-time-boxes and find **other** vessels that were located in the same space-time-box; in effect, identifying co-occurrence events with our vessel of interest. 

In [7]:
mmsi = 636016072

voiBoxes = []
for voi in collection.find({'mmsi': mmsi}):
    voiBoxes.append(voi['spaceTime'])

## Find Co-occurrences

The code below does a quick lookup of documents--AIS records--using the previously created space-time-boxes.  When matches are found, we add vessel MMSI and location data to a data structure (list).  

In [8]:
plotVessels = []
for st in voiBoxes:                                      # iterate through our vessel of interest space-time-boxes
    for match in collection.find({'spaceTime': st}):     # find other vessels that have same space-time-box
        plotVessels.append([match['mmsi'], match['latitude'], match['longitude'], match['spaceTime']])

for item in plotVessels:                                 # this gives us a quick sanity check to make sure we 
    print(item)                                          # grabbed the right matches

[636016072, 4.141513, 109.640753, 'w2tzw20141211']
[352108000, 4.169983, 109.6041, 'w2tzw20141211']
[636016072, 4.458592, 110.057035, 'w2y3220141211']
[636016072, 7.000632, 114.412925, 'w91rg20141212']
[636016072, 7.270563, 115.003885, 'w93c220141212']
[636016072, 7.270435, 115.003495, 'w93c220141212']
[636016072, 7.270563, 115.003885, 'w93c220141212']
[636016072, 7.270435, 115.003495, 'w93c220141212']
[636016072, 8.93532, 116.728415, 'w9e4b20141213']
[636016072, 38.577992, 120.992077, 'wwy5320141218']
[636016072, 38.576733, 120.994523, 'wwy5320141218']
[636016072, 38.576963, 120.994078, 'wwy5320141218']
[636016072, 38.577992, 120.992077, 'wwy5320141218']
[636016072, 38.576733, 120.994523, 'wwy5320141218']
[636016072, 38.576963, 120.994078, 'wwy5320141218']
[636016072, 38.577992, 120.992077, 'wwy5320141218']
[636016072, 38.576733, 120.994523, 'wwy5320141218']
[636016072, 38.576963, 120.994078, 'wwy5320141218']
[636016072, 38.87727, 118.289502, 'wwuj520141219']
[229743000, 38.847663, 11

## Visualization

We use folium to depict all the locations our vessel of interest appeared (depicted with red markers) and all the locations **other** vessels appeared (depicted with green markers).  When you click on a green marker, you are presented with the MMSI number of the vessel that appeared there.  **If someone is interested in more information after clicking on a green marker, that can be adjusted very easily by adding more information to the popup.** 

The map can be adjusted to different styles and scales as well using folium's built in features.  The map we produced below allows a user to zoom in and out of areas of interest. 

In [21]:
loi = [plotVessels[0][1], plotVessels[0][2]]                              # give a starting point for initializing map

m = folium.Map(location=loi,zoom_start=3, control_scale=True)             # initialize map 

for record in plotVessels:
    mmsi_num = str(record[0])
    if record[0] == mmsi:
        folium.Marker([record[1], record[2]], popup='VOI', icon=folium.Icon(color='red')).add_to(m)
    else:
        folium.Marker([record[1], record[2]], popup=mmsi_num, icon=folium.Icon(color='green')).add_to(m)

m                                                                          # display the map

## Discussion

This seems like a pretty nice way to do some quick analysis.  The space-time-boxes do save quite a bit of time, if speed is important.  However, there is still some upfront time necessary to convert the AIS lat/long/time to a space-time-box ('hash' string).  Of course, there is also the data engineering side of things where AIS data has to be converted to CSV and headers added before the data is included in a MongoDB. 

Using MongoDB seems like the perfect solution for this.  It could not have been more conveniet to upload and access the data on MongoDB.  The various attributes of an AIS record lend themselves to the flexible storage paradigm of MongoDB.  Because we're working with historical AIS data, we are not concerned with velocity of incoming data.  We're also not concerned with voracity, since all the AIS data contains the same bits of information.  MongoDB was a very useful in this case.

This process could be very handy.  Adding some more layers could prove very useful.  AIS data comes with more information; including it in the visualization could be very nice depending on what the research question is.  Additionally, it might also be nice if you could search for more than one vessel of interest.  There could be some more features that you could stack on this.  