## Banners placement for Carnival Cruise Line Agency

The cuise agency **Carnival Cruise Line** plans a promotion of it's serviceswith ad banners and as the first step the agency decides to place 20 banners across various geographic locations. The placements need to be chosen close to the agency offices also located around the globe.
The objective is to identify the best placements fo the first  20 test promotional banners. Our task is to pick the locations in a way that it let maximize commercial effect from the ads, thus as the initial condition we need to satisfy following requirements:
1 - ad banner placement is close to the agency office ( so that it will be more easy to arrange the placement of the banner and also control the banner efficiency)
2 - pick the location with the maximum visiting tourists.
For the locations search we will use the database of the largest social network, publishing all checkins in the various tourists locations - *Foursquare* ( the data is avilable here - use the checkins.dat file: (https://archive.org/details/201309_foursquare_dataset_umn))


Below there are coordinates for the Carnival Cruise Line Agency offices we are going to look at (https://www.google.com/maps/d/viewer?mid=1n07TQwkq65xLiIsmQKEYravjAmguehgG&ll=35.781965009786845%2C-59.31815504999997&z=2):


33.751277, -118.188740 (Los Angeles)

25.867736, -80.324116 (Miami)

51.503016, -0.075479 (London)

52.378894, 4.885084 (Amsterdam)

39.366487, 117.036146 (Beijing)

-33.868457, 151.205134 (Sydney)

After clusterization we may observe that some clusters contain very few objects and thus do not meet our requirement for to be well visited by the tourists  - the threashold here will be set at 15 elemets - so any cluster with the number of objects less than 15 will be excuded from consideration.

Once cluster centers for the most promising location for the banner placements have been identified we can use mapcustomizer.com for visualization and additional visual control for the suggested locations for the test banners placement.

First let's read all data fist into pandas dataframe and remove lines which do ot contain latitude and longitude detail.

In [72]:
import pandas as pd
import numpy as np
from sklearn.cluster import MeanShift
import warnings
warnings.filterwarnings("ignore")

In [73]:
raw_data = pd.read_csv('checkins.csv')
raw_data.shape

(1021967, 7)

In [74]:
raw_data.head()

Unnamed: 0,id,user_id,venue_id,latitude,longitude,create,d_at
0,984301,2041916,5222.0,,,4/21/2012,17:39:01
1,984222,15824,5222.0,38.895112,-77.036366,4/21/2012,17:43:47
2,984315,1764391,5222.0,,,4/21/2012,17:37:18
3,984234,44652,5222.0,33.800745,-84.41052,4/21/2012,17:43:43
4,984249,2146840,5222.0,,,4/21/2012,17:42:58


In [75]:
data = raw_data.drop(['id','venue_id', 'user_id', 'create', 'd_at'], axis = 1)

In [76]:
data.dropna(inplace=True)

In [77]:
data.isnull().any()

latitude     False
longitude    False
dtype: bool

In [78]:
data.shape

(396634, 2)

So, now it's correct number of lines which we expected and we will use the MeachShift method here for clusters identification. The MeanShift here can be good choice as MeanShift clustering aims to discover “blobs” in a smooth density of samples.
It is a centroid-based algorithm, which works by updating candidates for centroids to be the mean of the points within a given region. These candidates are then filtered in a post-processing stage to eliminate near-duplicates to form the final set of centroids. So in our case we will identify the centrs of potential clusters as the centers of the most visited by tourists geographic locations and later find the cluster centers which are located closer than others to the agency offices.

To make sure we include certain limitations for the clusters areas the 'bandwidth' parameter will be set to 0.1, which limist the cluser formation boundaries to the equivalent of 5-10 kilometers (if we consider some medium latitudes, not close to the poles :) ). Also for to restrain ourselves from having too small and unpopular locations being identified as clusters we can set the 'min_bis_freq' parameter to 15, so not less than 15 objects will be considered as a cluster for our further analysis.

In [79]:
offices = pd.read_csv('ccl_offices.csv')
offices.drop(['City'], axis = 1, inplace = True)
offices.head()

Unnamed: 0,latitude,longitude
0,33.751277,-118.18874
1,25.867736,-80.324116
2,51.503016,-0.075479
3,52.378894,4.885084
4,39.366487,117.036146


In [80]:
ms = MeanShift(bandwidth = 0.1, bin_seeding = True, min_bin_freq = 15)
ms.fit(data)
labels = ms.labels_
cluster_centers = ms.cluster_centers_

labels_unique = np.unique(labels)
n_clusters_ = len(labels_unique)

print("number of estimated clusters : %d" % n_clusters_)

number of estimated clusters : 1170


Now let's just find 20 locations in the most close priximity to our offices world-wide. The distance from the placemet candidate to the agency office will be the only selection criteria once the clusterization already eliminated any places with low number of tourists' visits. Also, we will identify the placement candidate which is the closest to one of our offices.
The distances here are calculated as a simple Euclidean ditance as far as on the close ditance the Earth curvature doesn't really matter and for very large distances the calculation error will be insignifficant.

In [81]:
import math

dist = {}

for oix in range(len(offices.index)):
    office_loc = offices.iloc[oix].values
    for c in range(len(cluster_centers)):
        distance = math.sqrt((office_loc[0] - cluster_centers[c][0])**2 + (office_loc[1] - cluster_centers[c][1])**2)
        dist[distance] = cluster_centers[c]
closest_20_keys = sorted(dist.keys())[:20]
the_closest = dist[closest_20_keys[0]]
top_20_locations = []
for i, key in enumerate(closest_20_keys):
   print ("Distance " + str(i+1) + " : " + str(round(key, 3)) + " from location at: " + str(dist[key]))
print ("The closest location is at: " + str(the_closest))


Distance 1 : 0.003 from location at: [-33.86614607 151.20708242]
Distance 2 : 0.01 from location at: [52.37248935  4.89226825]
Distance 3 : 0.039 from location at: [ 25.89689645 -80.29771155]
Distance 4 : 0.052 from location at: [51.50305542 -0.1271134 ]
Distance 5 : 0.075 from location at: [  33.81127536 -118.14433437]
Distance 6 : 0.136 from location at: [ 25.7870861  -80.21512757]
Distance 7 : 0.174 from location at: [  33.87632837 -118.06740971]
Distance 8 : 0.181 from location at: [ 26.00505198 -80.20559812]
Distance 9 : 0.218 from location at: [  33.87201291 -118.37034494]
Distance 10 : 0.261 from location at: [ 26.11945702 -80.39255995]
Distance 11 : 0.299 from location at: [  33.70280018 -117.89332878]
Distance 12 : 0.301 from location at: [  33.81172271 -117.89365104]
Distance 13 : 0.303 from location at: [ 26.12204445 -80.15977558]
Distance 14 : 0.316 from location at: [51.48562471 -0.39104649]
Distance 15 : 0.323 from location at: [  34.06497839 -118.26547835]
Distance 16 : 