# Donut Geomasking

## Introduction

### What is Donut Geomasking?

<img src='images/Glazed-Donut.jpg' width='300' alt="Donut Geomasking with Folium" style="float:left; padding-right:20px;"/> 

**Figure 1.** More about geomasking...

## Beginning with the end in mind

<img src='./images/Donut-Geomasking.png' width='600' alt="Donut Geomasking with Folium" style="float:left; padding-right:20px;"/> 

**Figure 2.** Folium visualization of donut geomasked Cholera death locations (points), with a random set of points highlighted to show the donut, the origin point and destination (geomasked) point. The mean centers of origin and destination points are also shown as larger circles with yellow fill color.



### Steps

1. Prepare Cholera data set
2. Geomask Cholera death points using Donut Geomask class
3. Compute Weighted Mean Center values for original and geomasked points
4. Visualize donut geomasking and weighted mean center using Folium

## Prepare Cholera Case DataFrame for Geomasking

In [1]:
import pandas as pd
#pd.set_option('precision', 6)

In [2]:
# Read the comma-delimited text file (.csv) of cholera deaths and create a dataframe.
deaths_df = pd.read_csv('data/cholera_deaths.csv')

deaths_df

Unnamed: 0,FID,DEATHS,LON,LAT
0,0,3,-0.137930,51.513418
1,1,2,-0.137883,51.513361
2,2,1,-0.137853,51.513317
3,3,1,-0.137812,51.513262
4,4,4,-0.137767,51.513204
...,...,...,...,...
245,245,3,-0.137108,51.514526
246,246,2,-0.137065,51.514706
247,247,1,-0.138474,51.512311
248,248,1,-0.138123,51.511998


## The `donut_geomask()` and other functions

The `donut_geomask()`,`random_donuts_records()`, `distance_between_points()` and `weighted_mean_center()` functions are written to disk as a Python file in the folder `geoprivacy` as `donut_geomask.py`. In Python code, the functions above can be invoked as follows:

`from geoprivacy.donut_geomask import donut_geomask, random_donuts_records, distance_between_points, weighted_mean_center`

Explanation:
1. `from geoprivacy. donut_geomask` loads the `donut_geomask.py` file
2. `import donut_geomask, random_donuts_records, distance_between_points, weighted_mean_center` imports the specific functions below.

In [3]:
%%writefile geoprivacy/donut_geomask.py

import pandas as pd
from random import randrange
import numpy as np
import math
import geopy
from geopy import distance
from geopy.distance import geodesic
from geopy.distance import great_circle 
import time

def donut_geomask(band_range, orig_point):
    """
    The solution below is based on:
    https://stackoverflow.com/questions/24427828/calculate-point-based-on-distance-and-direction

    This computes a destination (geomasked) point based on random bearing and distance 
        from origin point.

    Parameters:
    band_range (list of two tuples): first tuple is range of min distance 
        and second tuple is range of max distance in meters
    orig_point (tuple): Tuple of latitude and latitude of origin point

    Returns:
    geomask_dict: dictionary containing:
        destination latitude
        destination longitude
        bearing (in degrees)
        distance (in kilometers)
    """
    # get start time
    tick = time.perf_counter()

    # get random number between 0 and 360 and store as bearing
    bearing = randrange(0, 360)

    # get random number between lower_value and upper_value in meters 
    #    and store as distance in kilometers 
    lower, upper = band_range
    min_distance = randrange(lower[0],lower[1])
    max_distance = randrange(upper[0],upper[1])
    distance_kilometers = (randrange(min_distance, max_distance)) / 1000

    # convert orig_point to Shapely Point (lat, lon)
    origin = geopy.Point(orig_point)

    # create a geopy distance object (measurement unit in kilometers)
    d = distance.distance(kilometers=distance_kilometers)

    # store destination point of distance object in destination point
    destination = d.destination(origin, bearing)

    # get end time
    tock = time.perf_counter()
    elapsed_time = tock-tick

    # create return dictionary
    return_dictionary = {'latitude':destination.latitude,'longitude':destination.longitude,\
            'bearing':bearing, 'distance':distance_kilometers, 'elapsed_time':elapsed_time,\
             'final_band_range':(min_distance, max_distance)}

    return return_dictionary
    
def random_donuts_records(range_max, max_random):
    """    
    This returns a random list of numbers to use for randomly selecting which
    record to draw a donut for, to decrease the visual noise in a folium map
    of geomasked points (and amplify cognition of how donut geomasking works.

    Parameters:
    range_max (int): max number of records in dataframe to be displayed in folium 
    max_random (int): number of random donuts to display

    Returns:
    random_record_list (list): list of random records to display donuts for
    """
    total_rows = range_max
    random_record_list = []
    track_list = []
    counter = 1
    max_records = max_random
    for i in range(0, total_rows):
        random_number = randrange (0, total_rows)
        if random_number in track_list:
            continue
        else:
            random_record_list.append(random_number)
        track_list.append(random_number)
        counter = counter + 1
        if counter > max_records:
            break
    return random_record_list
    
def distance_between_points(orig, dest):
    """    
    This returns a dictionary of distance values in kilometers using geodesic 
        and great circle methods. The Vincenty distance method will be 
        deprecated from geopy soon (v2), so it was excluded from this class
        function.

    Parameters:
    orig (tuple): pair of floats representing lat, lon values of origin point 
    dest (tuple): pair of floats representing lat, lon values of destination point
        (computed after geomasking)

    Returns:
    return_dict (dict): dictionary of values representing distance computation using
        geodesic and great_circle methods of geopy. Distance values are in km and m.
    """
    geod_km = geodesic(orig, dest).km
    geod_m = geod_km*1000
    #geod_mi = geodesic(orig, dest).miles
    gcircle_km = great_circle(orig, dest).km
    gcircle_m = gcircle_km*1000
    #gcircle_mi = great_circle(orig, dest).miles
    return_dict = {'geodesic_km': geod_km, 'great_circle_km': gcircle_km,\
                  'geodesic_m':geod_m, 'great_circle_m': gcircle_m}
    return return_dict

def weighted_mean_center(weights, lats, lons):
    """    
    This computes the weighted mean center given a list each of weights, latitude and longitude.

    Parameters:
    weights (list of ints or floats): weighting factor for points 
        (see product_lat and product_lon below)
    lats (list of floats): list of latitudes
    lons (list of floats): list of longitudes

    Returns:
    mean_center_dictionary (dict): dictionary containing:
        mean center latitude
        mean center longitude
    """
    # create data frame from dictionary of lists
    
    df = pd.DataFrame({'weight':weights, 'lat':lats, 'lon':lons})

    # add weighted lat and lon columns
    df['product_lat'] = df['lat'] * df['weight']
    df['product_lon'] = df['lon'] * df['weight']

    # compute for mean lat and lon
    mean_lon = np.sum(df['product_lon'])/np.sum(df['weight'])
    mean_lat = np.sum(df['product_lat'])/np.sum(df['weight'])

    # create return dictionary
    mean_center_dictionary = {'latitude':mean_lat, 'longitude':mean_lon}

    return mean_center_dictionary

Overwriting geoprivacy/donut_geomask.py


## Geomasking Cholera Death Points

### Analytic Columns

The following analytic columns are added to a copy of the original Cholera deaths dataframe.

1. `gmLAT`: Original point latitude
2. `gmLON`: Original point longitude
3. `gmBEARING`: Random bearing (degrees, between 0 and 360)
4. `gmDISTANCE`: Random distance between `gmBANDlo` and `gmBANDhi`
5. `gmPERF_noID`: Seconds it took to perform a geomasking operation on one point
6. `gmBANDlo`: Inner donut radius in meters
7. `gmBANDhi`: Outer donut radius in meters
8. `gmIDtries`: Maximum number of tries to reidentify original point

If `reidentify` is set to `True`, five more analytic columns are added:
1. `gmIDstatus`: Reidentification status, True or False
2. `gmIDruns`: Number of runs to reidentification (if not reindentified, this will be equal to `gmIDtries`)
3. `gmPERF_wID`: Seconds it took to perform a geomasking operation on one point, with reidentification
4. `gmIDlat`: Reidentified point latitude
5. `gmIDlon`: Reidentified point longitude

### `data_geomask()` function

The `data_geomask()` function is written to disk as a Python file in the folder `geoprivacy` as `data_geomask.py`. In Python code, the `cholera_geomask()` function below can be used as follows:

`from geoprivacy.data_geomask import data_geomask`

Explanation:
1. `from geoprivacy.data_geomask` loads the `data_geomask.py` file
2. `import data_geomask` imports the specific `data_geomask()` function below.

In [4]:
%%writefile geoprivacy/data_geomask.py
import pandas as pd
import time
from geoprivacy.donut_geomask import donut_geomask, distance_between_points

#pd.set_option('precision', 6)

def data_geomask(df, band_range, reidentify=False, tries=1, min_distance_risk=20):
    """    
    This returns an extended dataframe based on the original. It adds analytic columns
        used for computing performance evaluations and reidentification. 
        Note: The function can be adapted for other dataframes.
    
    Parameters:
    df (Pandas dataframe): Cholera dataframe containing deaths, lat and lon values
    band_range (tuple of integer tuples): Tuple of tuples of format ((low1, low2), (high1, high2)). 
        (low1, low2): tuple of integers (meters) representing randomization range for inner donut radius.
        (high1, high2) tuple of integers (meters) representing randomization range for outer donut radius.
    reidentify (bool): Boolean representing use (or not) of reidentification algorithm. True=use
        reidentification, False=do not use reidentification.
    tries (integer): Number of tries to do for reidentification, defaults to 1
    min_distance_risk (integer): Distance in meters between original point and geomasked point as the 
        threshold to use for reidentification status. 
            reidentified=true if distance_between_points(origin, geomasked) <= min_distance_risk
            reidentified=false if distance_between_points(origin, geomasked) > min_distance_risk
    
    Returns:
    gm_df (dataframe): Modified copy of the original dataframe with the added and computed analytic columns
    
    """    
    gm_df = df

    # add columns to the deaths_df for 
    gm_df['gmLAT'] = 0.0
    gm_df['gmLON'] = 0.0
    gm_df['gmBEARING'] = 0.0
    gm_df['gmDISTANCE'] = 0.0
    gm_df['gmPERF_noID'] = 0.0
    gm_df['gmBANDlo'] = 0
    gm_df['gmBANDhi'] = 0
    if reidentify == True:
        gm_df['gmIDstatus'] = False
        gm_df['gmIDruns'] = 0
        gm_df['gmPERF_wID'] = 0.0
        gm_df['gmIDlat'] = 0.0
        gm_df['gmIDlon'] = 0.0
        gm_df['gmIDrate'] = 0.0
        tick = time.perf_counter()
    # loop through deaths_df and create geomasked_deaths_df
    for index, row in df.iterrows(): 

        # get random number between 500 and 1000 and store as distance
        band_range = band_range

        #set the point to geomask
        orig_point = (row['LAT'],row['LON'])

        gm_data = donut_geomask(band_range=band_range, orig_point=orig_point)

        #add the geomasked measures to the deaths_df
        gm_df.at[index,'gmLAT'] = gm_data['latitude']
        gm_df.at[index,'gmLON'] = gm_data['longitude']
        gm_df.at[index,'gmBEARING'] = gm_data['bearing']
        gm_df.at[index,'gmDISTANCE'] = gm_data['distance']
        gm_df.at[index,'gmPERF_noID'] = gm_data['elapsed_time']
        final_band_range = gm_data['final_band_range']
        gm_df.at[index,'gmBANDlo'] = final_band_range[0]
        gm_df.at[index,'gmBANDhi'] = final_band_range[1]
        gm_df.at[index,'gmIDtries'] = tries
        
        if reidentify == True:
            orig_point2 = (gm_data['latitude'], gm_data['longitude'])
            min_range = (min_distance_risk/1000 if min_distance_risk > 0 else final_band_range[0]/1000) 
            for i in range(0,tries):
                gm_data2 = donut_geomask(band_range=band_range, orig_point=orig_point2)
                p1 = orig_point
                p2 = (gm_data2['latitude'], gm_data2['longitude'])
                d1 = distance_between_points(orig=p1, dest=p2)
                if d1['geodesic_km'] < min_range:
                    gm_df.at[index, 'gmIDstatus'] = True
                    gm_df.at[index, 'gmIDlat'] = gm_data2['latitude']
                    gm_df.at[index, 'gmIDlon'] = gm_data2['longitude']
                    break
            # get end time
            tock = time.perf_counter()
            elapsed_time = tock-tick
            gm_df.at[index, 'gmIDruns'] = i+1 # actual runs needed to reidentify
            gm_df['gmIDtries'] = tries
            gm_df.at[index,'gmPERF_wID'] = elapsed_time
        gm_df['gmIDeffort'] = gm_df['gmIDruns'] / gm_df['gmIDtries']
    return gm_df

Overwriting geoprivacy/data_geomask.py


### Reidentification using the Reverse Geomask approach

Note that to reidentify the original point, we just execute a reverse geomask, using the `donut_geomask` function but using the geomasked point as the origin.

### Variables to manipulate for reidentification evaluation

1. `band_range`: The first tuple is the lower range from which to pick a random distance value in meters for the radius of the inner circle of the donut. The second tuple is the upper range from which to pick a random distance value in meters for the radius of the outer circle of the donut.
2. `tries`: Maximum number of tries to attempt for reidentification. Reidentification here means being able to generate a point that is less than `mean_distance_risk` from the original point.
3. `mean_distance_risk`: Distance of a geomasked point from the original point within which reidentification is likely.

In [5]:
band_range = ((50,60), (200,350))
tries = 20
min_distance_risk = 5

#band_range = ((20,30), (120,200))
#tries = 10
#min_distance_risk = 20

### Run `data_geomask()` with `reidentify` set to `True`

In [None]:
import geopy

In [6]:
from geoprivacy.data_geomask import data_geomask

In [7]:
deaths2_df = data_geomask(df=deaths_df, band_range=band_range, \
                             reidentify=True, tries=tries, min_distance_risk=min_distance_risk)

### Computing reidentification rate

In [8]:
true_df = deaths2_df[deaths2_df['gmIDstatus']==True]
true_df

Unnamed: 0,FID,DEATHS,LON,LAT,gmLAT,gmLON,gmBEARING,gmDISTANCE,gmPERF_noID,gmBANDlo,gmBANDhi,gmIDstatus,gmIDruns,gmPERF_wID,gmIDlat,gmIDlon,gmIDrate,gmIDtries,gmIDeffort
38,38,2,-0.139317,51.512893,51.512069,-0.139772,199.0,0.097,0.000208,52,313,True,8,0.360575,51.512911,-0.139359,0.0,20,0.4
44,44,1,-0.138096,51.512526,51.513741,-0.135856,49.0,0.206,0.000207,59,218,True,10,0.411704,51.51249,-0.138083,0.0,20,0.5
157,157,1,-0.134885,51.514032,51.513466,-0.132394,110.0,0.184,0.000197,53,286,True,14,1.435056,51.514035,-0.134899,0.0,20,0.7
194,194,2,-0.135357,51.514145,51.515112,-0.138399,297.0,0.237,0.000201,55,264,True,20,1.764037,51.514133,-0.135319,0.0,20,1.0


In [9]:
false_df = deaths2_df[deaths2_df['gmIDstatus']==False]
false_df

Unnamed: 0,FID,DEATHS,LON,LAT,gmLAT,gmLON,gmBEARING,gmDISTANCE,gmPERF_noID,gmBANDlo,gmBANDhi,gmIDstatus,gmIDruns,gmPERF_wID,gmIDlat,gmIDlon,gmIDrate,gmIDtries,gmIDeffort
0,0,3,-0.137930,51.513418,51.512934,-0.138095,192.0,0.055,0.001795,50,223,False,20,0.014047,0.0,0.0,0.0,20,1.0
1,1,2,-0.137883,51.513361,51.512438,-0.136808,144.0,0.127,0.000219,57,289,False,20,0.023916,0.0,0.0,0.0,20,1.0
2,2,1,-0.137853,51.513317,51.511198,-0.136047,152.0,0.267,0.000221,59,274,False,20,0.033204,0.0,0.0,0.0,20,1.0
3,3,1,-0.137812,51.513262,51.514109,-0.136673,40.0,0.123,0.000218,57,334,False,20,0.042626,0.0,0.0,0.0,20,1.0
4,4,4,-0.137767,51.513204,51.513249,-0.135680,88.0,0.145,0.000209,50,296,False,20,0.052054,0.0,0.0,0.0,20,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
245,245,3,-0.137108,51.514526,51.515986,-0.138352,332.0,0.184,0.000205,50,219,False,20,2.219584,0.0,0.0,0.0,20,1.0
246,246,2,-0.137065,51.514706,51.515100,-0.138360,296.0,0.100,0.000201,55,252,False,20,2.228433,0.0,0.0,0.0,20,1.0
247,247,1,-0.138474,51.512311,51.513297,-0.138363,4.0,0.110,0.000198,55,311,False,20,2.237610,0.0,0.0,0.0,20,1.0
248,248,1,-0.138123,51.511998,51.511405,-0.139107,226.0,0.095,0.000204,54,218,False,20,2.246856,0.0,0.0,0.0,20,1.0


In [None]:
## So all output comes through from Ipython
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [10]:
true_df['gmBANDlo'].min(), true_df['gmBANDlo'].max()
false_df['gmBANDlo'].min(), false_df['gmBANDlo'].max()
true_df['gmBANDhi'].min(), true_df['gmBANDhi'].max()
false_df['gmBANDhi'].min(), false_df['gmBANDhi'].max()
true_df['gmBANDlo'].mean(), true_df['gmBANDhi'].mean()
false_df['gmBANDlo'].mean(), false_df['gmBANDhi'].mean()

(52, 59)

In [16]:
x_pct_ID = (len(true_df) * 100) / len(false_df)
x_pct_ID

1.6260162601626016

In [17]:
reidentified_records = true_df['FID'].to_list()
reidentified_records

[38, 44, 157, 194]

In [18]:
runs = deaths2_df[deaths2_df['gmIDstatus']==True]['gmIDruns'].mean()
tries = deaths2_df[deaths2_df['gmIDstatus']==True]['gmIDtries'].mean()
effort_pct = runs*100 / tries

effort_pct

65.0

## Generating evaluation metrics

### Weighted Mean Center values for non-geomasked and geomasked points

In [19]:
from geoprivacy.donut_geomask import weighted_mean_center, random_donuts_records, \
    distance_between_points, donut_geomask

deaths = list(deaths2_df['DEATHS'])
lats = list(deaths2_df['LAT'])
lons = list(deaths2_df['LON'])
gmlats = list(deaths2_df['gmLAT'])
gmlons = list(deaths2_df['gmLON'])

mean_center = weighted_mean_center(weights=deaths, lats=lats, lons=lons)
gm_mean_center = weighted_mean_center(weights=deaths, lats=gmlats, lons=gmlons)

In [20]:
mean_center

{'latitude': 51.51339831083845, 'longitude': -0.1364029734151329}

In [21]:
gm_mean_center

{'latitude': 51.51324382754777, 'longitude': -0.13638469150484692}

### Distance in meters between original and geomasked weighted mean center values

In [22]:
orig_lat = mean_center['latitude']
orig_lon = mean_center['longitude']
dest_lat = gm_mean_center['latitude']
dest_lon = gm_mean_center['longitude']

d1 = distance_between_points(orig=(orig_lat, orig_lon), dest=(dest_lat, dest_lon))

d1

{'geodesic_km': 0.017234308618875448,
 'great_circle_km': 0.017224306190022964,
 'geodesic_m': 17.23430861887545,
 'great_circle_m': 17.224306190022965}

## Visualizing Donut Geomasked Death Points with Folium

### Generate `max_random` number of donuts in Folium

We do not generate donuts for all points because this will present a very cluttered visualization. The following variables are important for generating random donuts:

1. `max_random`: Number of donuts to draw in Folium
2. `random_record_list`: List that holds Cholera death records we will draw donuts for

In [23]:
range_max = len(deaths_df)
max_random = 2

random_record_list = random_donuts_records(range_max=range_max, max_random=max_random)

random_record_list

[220, 110]

### Draw the Folium map visualization for donut geomasking

In [24]:
import folium
from folium import plugins

random_record_list = random_donuts_records(range_max=range_max, max_random=max_random)

SOHO_COORDINATES = (51.513578, -0.136722)
m1 = folium.Map(SOHO_COORDINATES,
              zoom_start=17,
              tiles='cartodbpositron',
              prefer_canvas=True)

for index,row in deaths2_df.iterrows():
    
    # Draw random donuts first
    if index in random_record_list:
        folium.Circle(location=[row['LAT'],row['LON']],
                    radius= row['gmBANDlo'],
                    color="black",
                    opacity=0.6,
                    weight=2,
                    fill=False).add_to(m1)
        folium.Circle(location=[row['LAT'],row['LON']],
                    radius= row['gmBANDhi'],
                    color="black",
                    opacity=0.6,
                    weight=2,
                    fill=False).add_to(m1)

    # orig point popup text
    orig_text = '<b>Orig Point (lat/lon):</b> '+'('+str(row['LAT'])+', '+\
                 str(row['LON'])+')'+'<br/>'+\
                 '<b>Bearing (degrees):</b> '+str(row['gmBEARING'])+'<br/>'+\
                 '<b>Distance (km):</b> '+str(row['gmDISTANCE']) +'<br/>'+\
                 '<b>Feature ID:</b> '+str(row['FID'])
    # popup for orig point
    popup_orig = folium.Popup(orig_text, max_width=200)
    
    folium.Circle(location=[row['LAT'],row['LON']],
                radius=(2 if index in random_record_list else 1),
                color="red",
                opacity=(1 if index in random_record_list else 0.3),
                popup=popup_orig,
                fill_color='red',
                fill_opacity=(1 if index in random_record_list else 0.4),
                fill=True).add_to(m1)
    
    # dest point popup text
    dest_text = '<b>Dest Point (lat/lon):</b> '+'('+str(row['gmLAT'])+', '+\
                 str(row['gmLON'])+')'+'<br/>'+\
                 '<b>Feature ID:</b> '+str(row['FID'])
    # popup for dest point
    popup_dest = folium.Popup(dest_text, max_width=200)

    folium.Circle(location=[row['gmLAT'],row['gmLON']],
                radius= (2 if index in random_record_list else 1),
                color="blue",
                opacity=(1 if index in random_record_list else 0.3),
                fill_color='blue',
                fill_opacity=(1 if index in random_record_list else 0.4),
                popup=popup_dest,
                fill=True).add_to(m1)
    
    origin = [row['LAT'], row['LON']]
    destination = [row['gmLAT'], row['gmLON']]
    folium.PolyLine([origin, destination],
                    opacity=(1 if index in random_record_list else 0.3) , 
                    color='black', 
                    weight=(2 if index in random_record_list else 0.5)
                   ).add_to(m1)

# mean center popup text
mean_center_text = '<b>WEIGHTED MEAN CENTER, ORIGINAL POINTS</b><br/>'+\
             '<b>Point (lat/lon):</b> '+'('+str(mean_center['latitude'])+', '+\
             str(mean_center['longitude'])+')'+'<br/>'+\
             '<b>Geodesic Distance from Dest Pt (km):</b> '+str(d1['geodesic_km'])
# popup for mean center
popup_mean_center = folium.Popup(mean_center_text, max_width=200)

folium.Circle(location=[mean_center['latitude'],mean_center['longitude']],
            radius= 4,
            color="red",
            weight=3,
            fill_color='yellow',
            fill_opacity=1,
            popup=popup_mean_center,
            fill=True).add_to(m1)

# geomasked mean center popup text
gm_mean_center_text = '<b>WEIGHTED MEAN CENTER, GEOMASKED POINTS</b><br/>'+\
             '<b>Point (lat/lon):</b> '+'('+str(gm_mean_center['latitude'])+', '+\
             str(gm_mean_center['longitude'])+')'+'<br/>'+\
             '<b>Geodesic Distance from Orig Pt (km):</b> '+str(d1['geodesic_km'])
# popup for geomasked mean center
popup_gm_mean_center = folium.Popup(gm_mean_center_text, max_width=200)

folium.Circle(location=[gm_mean_center['latitude'],gm_mean_center['longitude']],
            radius= 4,
            color="blue",
            weight=3,
            fill_color='yellow',
            fill_opacity=1,
            popup=popup_gm_mean_center,
            fill=True).add_to(m1)

plugins.Fullscreen(
    position='topright',
    title='Expand me',
    title_cancel='Exit me',
    force_separate_button=True
).add_to(m1)

plugins.MiniMap().add_to(m1)


In [None]:
m1

### Draw Folium map visualization for reidentification after geomasking

In [25]:
import folium
from folium import plugins

SOHO_COORDINATES = (51.513578, -0.136722)
m2 = folium.Map(SOHO_COORDINATES,
              zoom_start=17,
              tiles='cartodbpositron',
              prefer_canvas=True)

for index,row in deaths2_df.iterrows():
    
    # orig point popup text
    orig_text = '<b>Orig Point (lat/lon):</b> '+'('+str(row['LAT'])+', '+\
                 str(row['LON'])+')'+'<br/>'+\
                 '<b>Bearing (degrees):</b> '+str(row['gmBEARING'])+'<br/>'+\
                 '<b>Distance (km):</b> '+str(row['gmDISTANCE']) +'<br/>'+\
                 '<b>Feature ID:</b> '+str(row['FID'])
    # popup for orig point
    popup_orig = folium.Popup(orig_text, max_width=200)
    
    folium.Circle(location=[row['LAT'],row['LON']],
                radius=(2 if index in reidentified_records else 1),
                color="red",
                opacity=(1 if index in reidentified_records else 0.3),
                popup=popup_orig,
                fill_color='red',
                fill_opacity=(1 if index in reidentified_records else 0.4),
                fill=True).add_to(m2)
    
    # dest point popup text
    dest_text = '<b>Geomasked Point (lat/lon):</b> '+'('+str(row['gmLAT'])+', '+\
                 str(row['gmLON'])+')'+'<br/>'+\
                 '<b>Feature ID:</b> '+str(row['FID'])
    # popup for dest point
    popup_dest = folium.Popup(dest_text, max_width=200)

    folium.Circle(location=[row['gmLAT'],row['gmLON']],
                radius= (2 if index in reidentified_records else 1),
                color="blue",
                opacity=(1 if index in reidentified_records else 0.3),
                fill_color='blue',
                fill_opacity=(1 if index in reidentified_records else 0.4),
                popup=popup_dest,
                fill=True).add_to(m2)
    
    origin = [row['LAT'], row['LON']]
    destination = [row['gmLAT'], row['gmLON']]
    folium.PolyLine([origin, destination],
                    opacity=(1 if index in reidentified_records else 0.3) , 
                    color='black', 
                    weight=(2 if index in reidentified_records else 0.5)
                   ).add_to(m2)

    if index in reidentified_records:
        # orig point popup text
        id_text = '<b>Reidentified Point (lat/lon):</b> '+'('+str(row['gmIDlat'])+', '+\
                     str(row['gmIDlon'])+')'+\
                     '<b>Feature ID:</b> '+str(row['FID'])
        # popup for orig point
        popup_id = folium.Popup(id_text, max_width=200)

        folium.Circle(location=[row['gmIDlat'],row['gmIDlon']],
                    radius=(2 if index in reidentified_records else 1),
                    color="black",
                    opacity=(1 if index in reidentified_records else 0.3),
                    popup=popup_id,
                    fill_color='black',
                    fill_opacity=(1 if index in reidentified_records else 0.4),
                    fill=True).add_to(m2)

        origin = [row['gmLAT'], row['gmLON']]
        destination = [row['gmIDlat'], row['gmIDlon']]
        folium.PolyLine([origin, destination],
                        opacity=(1 if index in reidentified_records else 0.3) , 
                        color='black', 
                        weight=(2 if index in reidentified_records else 0.5)
                       ).add_to(m2)

# mean center popup text
mean_center_text = '<b>WEIGHTED MEAN CENTER, ORIGINAL POINTS</b><br/>'+\
             '<b>Point (lat/lon):</b> '+'('+str(mean_center['latitude'])+', '+\
             str(mean_center['longitude'])+')'+'<br/>'+\
             '<b>Geodesic Distance from Dest Pt (km):</b> '+str(d1['geodesic_km'])
# popup for mean center
popup_mean_center = folium.Popup(mean_center_text, max_width=200)

folium.Circle(location=[mean_center['latitude'],mean_center['longitude']],
            radius= 4,
            color="red",
            weight=3,
            fill_color='yellow',
            fill_opacity=1,
            popup=popup_mean_center,
            fill=True).add_to(m2)

# geomasked mean center popup text
gm_mean_center_text = '<b>WEIGHTED MEAN CENTER, GEOMASKED POINTS</b><br/>'+\
             '<b>Point (lat/lon):</b> '+'('+str(gm_mean_center['latitude'])+', '+\
             str(gm_mean_center['longitude'])+')'+'<br/>'+\
             '<b>Geodesic Distance from Orig Pt (km):</b> '+str(d1['geodesic_km'])
# popup for geomasked mean center
popup_gm_mean_center = folium.Popup(gm_mean_center_text, max_width=200)

folium.Circle(location=[gm_mean_center['latitude'],gm_mean_center['longitude']],
            radius= 4,
            color="blue",
            weight=3,
            fill_color='yellow',
            fill_opacity=1,
            popup=popup_gm_mean_center,
            fill=True).add_to(m2)

plugins.Fullscreen(
    position='topright',
    title='Expand me',
    title_cancel='Exit me',
    force_separate_button=True
).add_to(m2)

plugins.MiniMap().add_to(m2)



In [None]:
m2

In [26]:
deaths2_df.to_pickle('resources/deaths2_df.pickle')

In [27]:
!ls -la resources/*.pickle

-rw-r--r--@ 1 hermantolentino  staff      37461 May  4 05:35 resources/deaths2_df.pickle
-rw-r--r--@ 1 hermantolentino  staff  103276378 Jan 24  2022 resources/gm_df.pickle
-rw-r--r--@ 1 hermantolentino  staff     303427 Jan 24  2022 resources/reidentify_eval_df.pickle
-rw-r--r--@ 1 hermantolentino  staff      16890 Jan 24  2022 resources/wmc_eval_df.pickle


## References

### Donut Geomasking

### Geopy
1. GeoPy Documentation: https://geopy.readthedocs.io/en/stable/
2. GeoPy GitHub Repository: https://github.com/geopy/geopy