## Homework 7 - Visual Analytics
#### Team Members: Anmol Singh Suag and Sanuj Bhatia
#### VAST Challenge Selected - VAST 2015 (MC1)
We have worked on a data set similar to the one provided in MC1 of VAST 2015, and have applied the same clustering technique in this data set.

In [2]:
#Importing commonly used ML Libraries
from bokeh.io import output_notebook, show
from bokeh.layouts import column, row, widgetbox
from bokeh.plotting import figure
from bokeh.models import HoverTool, ColumnDataSource, LabelSet, CustomJS, Slider, Range1d
from bokeh.models.widgets import Select, Panel, Tabs
import pandas as pd
import numpy as np
from scipy import stats
from scipy.optimize import curve_fit
import math
import copy
import warnings
warnings.filterwarnings('ignore')
from sklearn.cluster import KMeans, DBSCAN
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import KFold   #For K-fold cross validation
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn import metrics
from random import randint
import pickle

In [3]:
output_notebook()

### Data Set
1. We have GPS data of people who enter an amusement park called DinoFunWorld. They have apps installed on their phones which provides a unique ID to each visitor, and tracks their locations around the park at 1 second intervals.
2. These locations are either of type 'movement', meaning the visitor moved to a new X, Y coordinate in the park, or of type 'check-in', meaning the visitor has turned up at a ride or restaurant or any other attraction and has checked in to it. While checked-in, they are not tracked, and tracking begins again when they start moving.
3. The data set contains the timestamp, unique ID, type of log, and X and Y coordinates of the person. There are three files, one for each day Friday through Sunday. The combined data is around 260 million rows. A glimpse of the data is shown below.
4. We try and find groups of people who came together and roamed together in the park with the assumption that their check-in behaviours will be similar. To this end, we create a feature matrix for each day which has rows for each unique visitor, and the column represent the check-in counts for each location the park offers check-ins at.
5. We then run K-Means clustering on the feature matrix created with a cluster number set through trial and error.

In [4]:
df_fri = pd.read_csv('park-movement-Fri-FIXED-2.0.csv',infer_datetime_format=True, sep = ',',header=0)
df_sat = pd.read_csv('park-movement-Sat.csv',infer_datetime_format=True, sep = ',',header=0)
df_sun = pd.read_csv('park-movement-Sun.csv',infer_datetime_format=True, sep = ',',header=0)

In [5]:
df = None
df = df_fri.append(df_sat).append(df_sun)
print ("Number of Rows:",len(df))
df.head()


Number of Rows: 26021963


Unnamed: 0,Timestamp,id,type,X,Y
0,2014-6-06 08:00:16,1591741,check-in,63,99
1,2014-6-06 08:00:16,825652,check-in,63,99
2,2014-6-06 08:00:19,179386,check-in,63,99
3,2014-6-06 08:00:19,531348,check-in,63,99
4,2014-6-06 08:00:31,1483004,check-in,0,67


In [6]:
#Finding all Check-In locations for the 3 days 
check_in_ids = {}
df_checkin=df.loc[df['type'] == 'check-in']

for i in range(len(df_checkin)):
    check_in_ids[str(int(df_checkin.iloc[i,3])) + '_' + str(int(df_checkin.iloc[i,4]))] = True
    
print("Unique Locations in the Park:", len(check_in_ids))

Unique Locations in the Park: 42


In [7]:
movement_ids = {}
df_movement=df.loc[df['type'] == 'movement']

for i in range(200000):
    movement_ids[str(int(df_movement.iloc[i,3])) + '_' + str(int(df_movement.iloc[i,4]))] = True
    
print("Unique Movement Locations in the Park:", len(movement_ids))

Unique Movement Locations in the Park: 1259


In [8]:
check_in_ids = list(check_in_ids)
df_checkin_fri = df_fri.loc[df_fri['type'] == 'check-in']
df_checkin_sat = df_sat.loc[df_sat['type'] == 'check-in']
df_checkin_sun = df_sun.loc[df_sun['type'] == 'check-in']
feature_matrix_fri = {}
feature_matrix_sat = {}
feature_matrix_sun = {}

for i in range(len(df_checkin_fri)):
    userId = df_checkin_fri.iloc[i, 1]
    index = check_in_ids.index(str(int(df_checkin_fri.iloc[i,3])) + '_' + str(int(df_checkin_fri.iloc[i,4])))
    if userId not in feature_matrix_fri:
        feature_matrix_fri[userId] = np.zeros((len(check_in_ids),), dtype=np.int)
    
    feature_matrix_fri[userId][index]+=1
    
for i in range(len(df_checkin_sat)):
    userId = df_checkin_sat.iloc[i, 1]
    index = check_in_ids.index(str(int(df_checkin_sat.iloc[i,3])) + '_' + str(int(df_checkin_sat.iloc[i,4])))
    if userId not in feature_matrix_sat:
        feature_matrix_sat[userId] = np.zeros((len(check_in_ids),), dtype=np.int)
    
    feature_matrix_sat[userId][index]+=1
    
for i in range(len(df_checkin_sun)):
    userId = df_checkin_sun.iloc[i, 1]
    index = check_in_ids.index(str(int(df_checkin_sun.iloc[i,3])) + '_' + str(int(df_checkin_sun.iloc[i,4])))
    if userId not in feature_matrix_sun:
        feature_matrix_sun[userId] = np.zeros((len(check_in_ids),), dtype=np.int)
    
    feature_matrix_sun[userId][index]+=1


In [9]:
print("Number of Unique Users in the Park on Friday: ",len(feature_matrix_fri))
print("Number of Unique Users in the Park on Saturday: ",len(feature_matrix_sat))
print("Number of Unique Users in the Park on Sunday: ",len(feature_matrix_sun))

Number of Unique Users in the Park on Friday:  3557
Number of Unique Users in the Park on Saturday:  6410
Number of Unique Users in the Park on Sunday:  7650


### Figure 1 - Map of DinoFunWorld
We have plotted the check-in locations around the amusement park, and the movement paths as well that are traversible and trackable. There are 81 locations in total around the park, but some of them do not implemented check-ins, and so those are not drawn here. <br>
This map helps us visualize the movements of visitors, the connectivity in the park and check-in preferences in a later visualization.

In [10]:
# Figure 1: Visualising Locations and paths in the Park
loc_x = []
loc_y = []
mov_x = []
mov_y = []

for location in check_in_ids:
    loc = location.split('_')
    loc_x.append(loc[0])
    loc_y.append(loc[1])

for location in movement_ids:
    loc = location.split('_')
    mov_x.append(loc[0])
    mov_y.append(loc[1])

fig1=figure(title='Dino World Map', plot_height = 600,plot_width=600)
fig1_c=fig1.annulus(x=loc_x,y=loc_y,color="#E74C3C",inner_radius=1, outer_radius=0.5,legend="Ride")
fig1_c2=fig1.circle(x=mov_x,y=mov_y,color="green",size=2,legend = "Path")

fig1.background_fill_alpha = 0.4
fig1.ygrid.grid_line_alpha = 0.8
fig1.ygrid.grid_line_dash = [5, 3]
fig1.xgrid.grid_line_alpha = 0.8
fig1.xgrid.grid_line_dash = [5, 3]

fig1.legend.click_policy="hide"

show(fig1)
    

In [11]:
user_checkIn_matrix_fri = []
user_checkIn_matrix_sat = []
user_checkIn_matrix_sun = []

userIds_fri = []
userIds_sat = []
userIds_sun = []

for key,val in feature_matrix_fri.items():
    user_checkIn_matrix_fri.append(val)
    userIds_fri.append(key)
for key,val in feature_matrix_sat.items():
    user_checkIn_matrix_sat.append(val)
    userIds_sat.append(key)
for key,val in feature_matrix_sun.items():
    user_checkIn_matrix_sun.append(val)
    userIds_sun.append(key)

In [12]:
kmeans_fri = KMeans(n_clusters=1500, n_jobs=-3, n_init=5).fit(user_checkIn_matrix_fri)
kmeans_groups_fri= kmeans_fri.labels_

kmeans_sat = KMeans(n_clusters=2000, n_jobs=-3, n_init=5).fit(user_checkIn_matrix_sat)
kmeans_groups_sat= kmeans_sat.labels_

kmeans_sun = KMeans(n_clusters=2000, n_jobs=-3, n_init=5).fit(user_checkIn_matrix_sun) 
kmeans_groups_sun= kmeans_sun.labels_

In [13]:
groups_fri = {}
groups_sat = {}
groups_sun = {}


group_sizes_fri = {}
group_sizes_sat = {}
group_sizes_sun = {}
group_sizes = {}


group_sizes_users_fri = {}
group_sizes_users_sat = {}
group_sizes_users_sun = {}
group_sizes_users = {}

group_size_fri = []
group_count_fri = []
group_size_sat = []
group_count_sat = []
group_size_sun = []
group_count_sun = []
group_size= []
group_count= []


for i in range(len(kmeans_groups_fri)):
    if kmeans_groups_fri[i] in groups_fri:
        groups_fri[kmeans_groups_fri[i]].append(userIds_fri[i])
    else:
        groups_fri[kmeans_groups_fri[i]] = [userIds_fri[i]]

        
for i in range(len(kmeans_groups_sat)):
    if kmeans_groups_sat[i] in groups_sat:
        groups_sat[kmeans_groups_sat[i]].append(userIds_sat[i])
    else:
        groups_sat[kmeans_groups_sat[i]] = [userIds_sat[i]]
        
        
        
for i in range(len(kmeans_groups_sun)):
    if kmeans_groups_sun[i] in groups_sun:
        groups_sun[kmeans_groups_sun[i]].append(userIds_sun[i])
    else:
        groups_sun[kmeans_groups_sun[i]] = [userIds_sun[i]]
        
        

for key , val in groups_fri.items():
    size = len(val)
    if size in group_sizes_fri:
        group_sizes_fri[size]+=1
    else:
        group_sizes_fri[size] = 1
    if size in group_sizes_users_fri:
        group_sizes_users_fri[size].extend(val)
    else:
        group_sizes_users_fri[size] = val
    if size in group_sizes:
        group_sizes[size]+=1
    else:
        group_sizes[size] = 1
        
    if size in group_sizes_users:
        group_sizes_users[size].extend(val)
    else:
        group_sizes_users[size] = val
        
for key , val in groups_sat.items():
    size = len(val)
    if size in group_sizes_sat:
        group_sizes_sat[size]+=1
    else:
        group_sizes_sat[size] = 1
        
    if size in group_sizes_users_sat:
        group_sizes_users_sat[size].extend(val)
    else:
        group_sizes_users_sat[size] = val
        
    if size in group_sizes:
        group_sizes[size]+=1
    else:
        group_sizes[size] = 1
        
    if size in group_sizes_users:
        group_sizes_users[size].extend(val)
    else:
        group_sizes_users[size] = val
        
        
for key , val in groups_sun.items():
    size = len(val)
    if size in group_sizes_sun:
        group_sizes_sun[size]+=1
    else:
        group_sizes_sun[size] = 1
        
    if size in group_sizes_users_sun:
        group_sizes_users_sun[size].extend(val)
    else:
        group_sizes_users_sun[size] = val
    
    if size in group_sizes:
        group_sizes[size]+=1
    else:
        group_sizes[size] = 1
    
    if size in group_sizes_users:
        group_sizes_users[size].extend(val)
    else:
        group_sizes_users[size] = val
        
        


for key, val in group_sizes_fri.items():
    group_size_fri.append(key)
    group_count_fri.append(val)
        
for key, val in group_sizes_sat.items():
    group_size_sat.append(key)
    group_count_sat.append(val)
        
for key, val in group_sizes_sun.items():
    group_size_sun.append(key)
    group_count_sun.append(val)
        
for key, val in group_sizes.items():
    group_size.append(key)
    group_count.append(val)
        

In [14]:

fig2_dict = {}
fig2_dict['group_size_fri']=group_size_fri
fig2_dict['group_size_sat']=group_size_sat
fig2_dict['group_size_sun']=group_size_sun
fig2_dict['group_size_all']=group_size
fig2_dict['group_size_curr']=group_size_fri

fig2_dict['group_count_fri']=group_count_fri
fig2_dict['group_count_sat']=group_count_sat
fig2_dict['group_count_sun']=group_count_sun
fig2_dict['group_count_all']=group_count
fig2_dict['group_count_curr']=group_count_fri

fig2_source=ColumnDataSource(data=fig2_dict)

In [15]:
#Figure 2: Visualising Number of Groups of a Group Size on different days
fig2_menu=[('fri','Friday'),('sat','Saturday'),('sun','Sunday'),('all','All Days')]
fig2_dd=Select(title="Choose Day",value="fri", options=fig2_menu,width=150)

fig2=figure(title='Group Statistics', plot_height = 400,plot_width=600)

fig2.vbar(source=fig2_source,x='group_size_curr',top='group_count_curr',width=1,color="white",fill_color="#E74C3C")
fig2_c=fig2.circle(source=fig2_source,x='group_size_curr',y='group_count_curr',color="#E74C3C")
fig2.add_tools(HoverTool(tooltips=[
    ("Group Size", "@group_size_curr"),
    ("Count", "@group_count_curr")
],renderers=[fig2_c]))

update_curve = CustomJS(args=dict(source=fig2_source,fig2_dd=fig2_dd), code="""

    day=fig2_dd.value
    source.data['group_size_curr']=source.data['group_size_'+day]
    source.data['group_count_curr']=source.data['group_count_'+day]
    source.trigger('change');

""")

fig2_dd.js_on_change('value', update_curve)


fig2.background_fill_alpha = 0.4
fig2.xaxis.axis_label = 'Number of Groups'
fig2.yaxis.axis_label = 'Group Size'
fig2.axis.major_label_text_color = "black"
fig2.ygrid.grid_line_alpha = 0.8
fig2.ygrid.grid_line_dash = [5, 3]
fig2.xgrid.grid_line_alpha = 0.8
fig2.xgrid.grid_line_dash = [5, 3]


fig2.legend.location = "top_right"
fig2.legend.click_policy="hide"
show(row(fig2,fig2_dd))




### Figure 2 - Groups and group sizes
We visualize here the the group size of each type of group we have found through clustering. The x-axis represents the group size, and the y-axis is the number of groups of that size/type. This visualization answers a couple of questions from MC1.1:
1. Characterize the attendance at DinoFun World on this weekend. Describe up to twelve different types of groups at the park on this weekend.
    1.      How big is this type of group?
    2.      Where does this type of group like to go in the park?
    3.      How common is this type of group?
    4.      What are your other observations about this type of group?
    5.      What can you infer about this type of group?
    6.       If you were to make one improvement to the park to better meet this group's needs, what would it be?
    
We answer these questions as we come across more visualizations. <br>
(A) and (C) are answered here as shown for all three days of the park's operation. There are many groups of smaller sizes of 2 - 6, and decrease in number as the group size increases. Some groups are huge (sizes 25-45) which we think are tour groups or on a picnic from school. Some other observations are:
1. The number of people visiting the park increases every day from Friday to Sunday, leading up to the show on Sunday afternoon.
2. There are less large groups on Sunday, and many more smaller groups of 3 and 4 instead. These could be families wanting to have fun on the weekend.
3. Large groups on Friday and Saturday could be colleagues at work who arrive together at the park.

In [94]:
# Fig 3: Visualising Locations visited by Group Types on Map
fig3_dict = {}
fig3_dict_temp = {}

for groupSize in group_sizes_users_fri:
    users = group_sizes_users_fri[groupSize]
    indices = [i for i,x in enumerate(userIds_fri) if x in users]
    check_ins = np.sum(np.array(user_checkIn_matrix_fri)[indices],axis=0)
    if groupSize in fig3_dict:
        fig3_dict = np.sum(fig3_dict['size_'+str(groupSize)],check_ins,axis=0)
    else:
        fig3_dict['size_'+str(groupSize)] = check_ins
        
for groupSize in group_sizes_users_sat:
    users = group_sizes_users_sat[groupSize]
    indices = [i for i,x in enumerate(userIds_fri) if x in users]
    check_ins = np.sum(np.array(user_checkIn_matrix_fri)[indices],axis=0)
    if groupSize in fig3_dict:
        fig3_dict = np.sum(fig3_dict['size_'+str(groupSize)],check_ins,axis=0)
    else:
        fig3_dict['size_'+str(groupSize)] = check_ins
        
for groupSize in group_sizes_users_sun:
    users = group_sizes_users_sun[groupSize]
    indices = [i for i,x in enumerate(userIds_fri) if x in users]
    check_ins = np.sum(np.array(user_checkIn_matrix_fri)[indices],axis=0)
    if groupSize in fig3_dict:
        fig3_dict = np.sum(fig3_dict['size_'+str(groupSize)],check_ins,axis=0)
    else:
        fig3_dict['size_'+str(groupSize)] = check_ins


In [97]:
for key in fig3_dict:
    if(key == 'size_curr'):
        continue
    size = key.split('_')[1]
    if(np.sum(fig3_dict[key])==0):
        continue
    fig3_dict_temp[key]=(fig3_dict[key]/np.sum(fig3_dict[key])*500)+5

fig3_dict = fig3_dict_temp

In [98]:
fig3_menu=[]
fig3_keys = []
fig3_dict['size_curr']=fig3_dict['size_1']
fig3_source=ColumnDataSource(data=fig3_dict)

for key in fig3_dict:
    if(key == 'size_curr'):
        continue
    split = key.split('_')[1]
    fig3_keys.append(int(split))

for key in sorted(fig3_keys):
    fig3_menu.append((str(key),str(key)))
    


fig3_dd=Select(title="Choose Group Size",value="1", options=fig3_menu,width=150)
fig3=figure(title='Group Most Visited Locations', plot_height = 600,plot_width=600)
fig3_c=fig3.circle(x=loc_x,y=loc_y,color="#E74C3C",size=4,legend="Ride")
fig3_c2=fig3.circle(x=mov_x,y=mov_y,color="green",size=2,legend = "Path")
fig3_c3=fig3.circle(x=loc_x,y=loc_y,color="#3F51B5",size='size_curr',fill_alpha=0.35,source=fig3_source,legend="Frequency")

update_curve = CustomJS(args=dict(source=fig3_source,fig3_dd=fig3_dd), code="""

    size=fig3_dd.value
    source.data['size_curr']=source.data['size_'+size]
    source.trigger('change');

""")

fig3_dd.js_on_change('value', update_curve)
fig3.background_fill_alpha = 0.4
fig3.ygrid.grid_line_alpha = 0.8
fig3.ygrid.grid_line_dash = [5, 3]
fig3.xgrid.grid_line_alpha = 0.8
fig3.xgrid.grid_line_dash = [5, 3]

fig3.legend.click_policy="hide"
fig3.legend.location = "top_left"


show(row(fig3,fig3_dd))


### Figure 3: Most visited locations
In this plot, we plot the map of DinoFunWorld to show which groups prefer to visit which locations at the park. The circles in translucent grey represent frequency of visits by their size. This graph answers questions (B), (D), and (E):
1. Groups of small size 2-8 seldom go to kiddie land (top right part of the park), but are very often seen in thrill rides around the park, and also rides for everyone (rides on the left). These are inferred to be either young-adult thrill junkies, or couples looking for a pleasant time.
2. Groups from size 10-20 visit kiddie land much more frequently, and these could be large groups of kids from school with their parents or teachers who ride the kiddie rides with them.
3. Groups of size 20 seem to have visited the thrill rides frequently, and have subsequently went on to the restroom, which means they got unwell on the hard-to-digest thrill rides.
4. Groups of size 3 and 4 go to almost all the rides in the park.
5. The group of size 47 is a huge group of thrill riders who check in at all the thrill rides across the park, and also visit the restroom quite frequently.

In [None]:
#Figure 4: Heatmap of Check-ins vs time

In [90]:
checkin_id_hour_fri = {x:{l:0 for l in range(8,24)} for x in check_in_ids}
checkin_id_hour_sat = {x:{l:0 for l in range(8,24)} for x in check_in_ids}
checkin_id_hour_sun = {x:{l:0 for l in range(8,24)} for x in check_in_ids}

for i in range(len(df_checkin_fri)):
    checkinLoc = str(int(df_checkin_fri.iloc[i,3])) + '_' + str(int(df_checkin_fri.iloc[i,4]))
    checkinHour = pd.to_datetime(df_checkin_fri.iloc[i,0]).hour
    checkin_id_hour_fri[checkinLoc][checkinHour]+=1
    
for i in range(len(df_checkin_sat)):
    checkinLoc = str(int(df_checkin_sat.iloc[i,3])) + '_' + str(int(df_checkin_sat.iloc[i,4]))
    checkinHour = pd.to_datetime(df_checkin_sat.iloc[i,0]).hour
    checkin_id_hour_sat[checkinLoc][checkinHour]+=1
    
for i in range(len(df_checkin_sun)):
    checkinLoc = str(int(df_checkin_sun.iloc[i,3])) + '_' + str(int(df_checkin_sun.iloc[i,4]))
    checkinHour = pd.to_datetime(df_checkin_sun.iloc[i,0]).hour
    checkin_id_hour_sun[checkinLoc][checkinHour]+=1

In [91]:
x_labels = list(range(8,24))
y_labels = check_in_ids
x = []
y = []
colors_fri = []
colors_sat = []
colors_sun = []
for cid in y_labels:
    x.extend(x_labels)
    y.extend([cid for x in range(8,24)])
    for l in x_labels:
        colors_fri.append('#%02x%02x%02x' % (255,int(255*(1-(math.pow(checkin_id_hour_fri[cid][l],0.4)/40))),int(255*(1-(math.pow(checkin_id_hour_fri[cid][l],0.4)/30)))))
        colors_sat.append('#%02x%02x%02x' % (255,int(255*(1-(math.pow(checkin_id_hour_sat[cid][l],0.4)/40))),int(255*(1-(math.pow(checkin_id_hour_sat[cid][l],0.4)/30)))))
        colors_sun.append('#%02x%02x%02x' % (255,int(255*(1-(math.pow(checkin_id_hour_sun[cid][l],0.4)/40))),int(255*(1-(math.pow(checkin_id_hour_sun[cid][l],0.4)/30)))))

s_fri = ColumnDataSource(data=dict(colors=colors_fri))
s_sat = ColumnDataSource(data=dict(colors=colors_sat))
s_sun = ColumnDataSource(data=dict(colors=colors_sun))
s_toPlot = ColumnDataSource(data=dict(colors=colors_fri))

In [106]:
heatmap = figure(title='Heatmap of Check-ins vs Time of day', plot_height=500, plot_width=700, y_range=y_labels)
heatmap.rect(x, y, fill_color='colors', source=s_toPlot, width=1, height =1,color="white")
callback_heatmap = CustomJS(args=dict(s_toPlot=s_toPlot, s_fri=s_fri, s_sat=s_sat,s_sun=s_sun), code='''
    var selection = cb_obj.value;
    switch(selection){
        case 'Friday':
            s_toPlot.data.colors = s_fri.data.colors;
            break;
        case 'Saturday':
            s_toPlot.data.colors = s_sat.data.colors;
            break;
        case 'Sunday':
            s_toPlot.data.colors = s_sun.data.colors;
            break;
    }
    s_toPlot.trigger('change')
''')
heatmap_select = Select(title="Select Day", value='Friday', options=['Friday', 'Saturday', 'Sunday'], callback=callback_heatmap)
heatmap.background_fill_alpha = 0.4
heatmap.ygrid.grid_line_alpha = 0.8
heatmap.ygrid.grid_line_dash = [5, 3]
heatmap.xgrid.grid_line_alpha = 0.8
heatmap.xgrid.grid_line_dash = [5, 3]

heatmap.xaxis.axis_label = 'Hour of Day'
heatmap.xaxis.axis_label_text_font='times'
heatmap.yaxis.axis_label_text_font='times'
heatmap.yaxis.axis_label = 'Location X_Y'

heatmap.legend.click_policy="hide"
heatmap.legend.location = "top_left"

show(row(heatmap,heatmap_select))



### Figure 4: Heatmap of Checkins
This visualization, coupled with the ones before, intends to answer MC.2 and MC.3.
MC.2: Are there notable differences in the patterns of activity on in the park across the three days? Please describe the notable difference you see. <br>
MC.3: What anomalies or unusual patterns do you see? Describe them. <br>
This visualization is a heatmap of check-in frequencies at each location by hour of day on the X axis, for each of the three days (which can be selected from the dropdown on the right). A brighter spot means more activity in and around that area. We discuss our observations below:
1. The first thing we immediately notice is that Friday's attendance is much lower than that of the other two days, and the whole plot is brighter for Saturday and Sunday.
2. The north gate of entry (63_99 on Y, meaning 63,99 coordinates) sees heavy traffic every day from 8 a.m. to 9 a.m. The other two gates, (0,67) and (99,77) see moderate traffic throughout the weekend. All gates see little traffic throughout the day after that, except a little around afternoon.
3. The Creighton pavillion (32,33) seems to be closed from 10 to 11 everyday. Manually checking the check-in times of visitors reveals it has no check-ins from 9.30 to 11.30 in the morning. On Friday and Saturday, it is also closed from 14:30 to 16:30. On Sunday,however, we notice an anomoly, and it closes at 12 and remains closed for the remainder of the day. This can be reasoned as the crime of vandalism occured around that time.
4. People seem to have left early on Friday, or go to the resturants or shopping after 8, as we see the check-ins at rides decrease. Saturday sees a steady check-in rate until as late as after 11 for some of the rides. As for Sunday, people tend to leave earlier as well, and there is no one checking in to rides at the park after 11.
5. Rides (78,48) - TerroSaur, (69,44) - Flight of the Swingodon, and (45,24) - Atmosfear are all thrill rides, and can clearly be seen as the busiest on all days. Other busy rides that we observe, like (47,11), (86,44), etc are also all Thrill rides. This tells us that people who ride the Thrill rides keep coming back for them, and they must be really, really thrilling to be so busy all day, every day.
6. We observe (76,22) - Grinosaurus stage, and it seems to open twice for the Scott Jones show on Friday and Saturday, and once on Sunday. People check - in ahead of time and leave as the show gets over, shown clearly by the fading rectangle in the map.
7. The ride (73,84) - Flying TyrAndrienkos is a kiddie ride that seems to be the least favourite among the visitors. This could be because either not a lot of small kids come to the park (because parents are worried about safety), or the ride itself is really boring, that even kids do not ride it. It also looks like this ride closed for an hour on Saturday from 10 to 11.
8. The location (50,57) is the least crowded, and is the first aid station. This sees little activity that decreases as the day goes on, and is most frequently visited on Sunday, which makes sense as there are the most people visiting the park on Sunday.

We conclude this assignment with the observation that the data set is large, complex, and requires many more hours of work and analysis coupled with appropriate visualizations to glean everything it has to offer. We plan some future tasks which we hope to get to as we work on MC2 and the Grand Challenge for VAST 2015.
1. Finding more anomalies and analysis paths of individuals and groups in more details so as to find patterns in movement.
2. Finding clues relevant to the crime, analysing activity aroung the area and time using the communication data as well as movement data.
3. Use other clustering techniques and features to find out more more discernable qualities about the groups we obtained.