# Semantic region using Y@N data

## WeNet project - exploratory notebook

We will use the user "374554f1-a18c-4636-8be1-d290dc16f55e" from our previous conclusions.

In [1]:
import pandas as pd
import gmaps
from wenet_models import LocationPoint, UserPlaceTimeOnly
from wenet_algo import estimate_stay_points, estimate_stay_regions, labelize_stay_region
from wenet_tools import time_difference_ms
from datetime import datetime

### The locations can be extracted from the sensors

In [2]:
user = '374554f1-a18c-4636-8be1-d290dc16f55e'
df = pd.read_csv(f'/idiap/temp/wdroz/locations/{user}_location.csv')

It's possible to show how many records we have for that users

In [3]:
len(df)

8489

with the `head` method, we can get the top elements of the dataframe. That useful when you need to get a idea of the data

In [4]:
df.head()

Unnamed: 0,userid,night,type,timestamp,timezone,local_time,source,latitude,longitude,speed,accuracy,provider,bearing
0,374554f1-a18c-4636-8be1-d290dc16f55e,20140927,Location,1411842910,7200,20140927203510,358270055852872,46.520883,6.638972,0.0,986.0,network,0.0
1,374554f1-a18c-4636-8be1-d290dc16f55e,20140927,Location,1411842927,7200,20140927203527,358270055852872,46.520883,6.638972,0.0,986.0,network,0.0
2,374554f1-a18c-4636-8be1-d290dc16f55e,20140927,Location,1411842953,7200,20140927203553,358270055852872,46.53472,6.538107,0.0,52.0,gps,0.0
3,374554f1-a18c-4636-8be1-d290dc16f55e,20140927,Location,1411842960,7200,20140927203600,358270055852872,46.534637,6.538219,0.0,20.0,gps,0.0
4,374554f1-a18c-4636-8be1-d290dc16f55e,20140927,Location,1411842965,7200,20140927203605,358270055852872,46.534596,6.538231,0.0,20.0,gps,0.0


Be checking the accuracy distributions, we can select relevant values to filter them!

In [5]:
df['accuracy'].describe()

count    8489.000000
mean       77.449671
std       230.766469
min         3.000000
25%        15.000000
50%        24.000000
75%        36.000000
max      3147.000000
Name: accuracy, dtype: float64

As 75% on accuracy are <= of 36, we can remove points that are greather than 37

There are a lot of possibles ways to transform the data from the dataframe into our custom python objects. `df.iterrows` do the job this time

In [6]:
locations = []
for index, row in df.iterrows():
    accuracy = row['accuracy']
    if accuracy > 37:
        continue
    pts_t = datetime.fromtimestamp(row['timestamp'])
    location = LocationPoint(pts_t, row['latitude'], row['longitude'], accuracy)
    locations.append(location)

When the user has been recorded at least twice near the same location and at relatively close time interval, then we can set a stay point from this.

In [7]:
stay_points = estimate_stay_points(locations, time_min_ms=2*60*1000)
len(stay_points)

198

from the 8489 rows locations, we have extracted 198 "stay points".

From these stay points, it's possible to extract clusters of points. We call them "stay regions"

In [8]:
stay_regions = estimate_stay_regions(stay_points, distance_threshold_m=20)
len(stay_regions)

7

In [9]:
old_stay_regions = estimate_stay_regions(stay_points, distance_threshold_m=20, accuracy_aware=False)
len(old_stay_regions)

7

From the 303 stay points, we have extracted 7 regions

The user had to fill "survey" about what they drinked. So we can select the data for the our selected user

Depending of your needs, you can add logic and rules when extracing the relevant data

In [15]:
import json
drink_places = []
with open('/idiap/home/wdroz/Downloads/drinkplace_senmantic.json', 'r') as f:
    lines = f.read().split('\n')
    for line in lines:
        try:
            drink_places.append(json.loads(line))
        except:
            pass

In [16]:
drink_places = [d for d in drink_places if d['user'] == user]

In [23]:
for d in drink_places:
    print(d['video_timestamp'])

2014-09-27 21:12:46
2014-09-27 23:00:40
2014-09-28 02:25:06
2014-10-04 00:10:29
2014-10-03 20:05:08
2014-10-04 21:25:25
2014-10-05 02:38:50
2014-10-10 20:16:44
2014-10-10 23:16:01
2014-10-11 21:03:41
2014-10-11 23:55:27
2014-10-17 20:21:58
2014-10-18 02:16:32
2014-10-18 20:53:33

2014-10-25 03:58:21
2014-10-25 20:55:26
2014-10-25 21:24:54
2014-10-25 21:45:49
2014-10-25 22:32:12

2014-10-26 02:26:24
2014-10-31 20:44:48
2014-11-01 20:50:25
2014-11-01 23:11:47
2014-11-02 02:59:20
2014-11-07 20:55:12
2014-11-08 20:46:15
2014-11-08 21:17:44


In [24]:
user_places = []
PLACES = dict()
PLACES[1] = 'Bar'
PLACES[2] = 'Club'
PLACES[3] = 'Restaurant'
PLACES[4] = 'Private'
PLACES[5] = 'School/Uni'
PLACES[6] = 'Street/Urban'
PLACES[7] = 'Indoor Recreational'
PLACES[8] = 'Events'
PLACES[9] = 'Culture'
PLACES[10] = 'Traveling'
PLACES[11] = 'Other'
PLACES[12] = 'Unknown'
PLACES[13] = 'Outdoor/Park'
for drink_place in drink_places:
    try:
        pts_t = datetime.strptime(drink_place['video_timestamp'], "%Y-%m-%d %H:%M:%S")
        place = PLACES[drink_place['ptype_rec']]
        user_place_time_only = UserPlaceTimeOnly(pts_t, place, user)
        user_place = user_place_time_only.to_user_place_from_stay_points(stay_points, max_delta_time_ms=1000*60*3)
        if user_place is not None:
            user_places.append(user_place)
    except:
        continue
labelled_stay_regions = labelize_stay_region(stay_regions, user_places)
stay_regions_set = set(stay_regions) - labelled_stay_regions
labelled_stay_regions

{<wenet_models.LabelledStayRegion at 0x7f1098649898>,
 <wenet_models.LabelledStayRegion at 0x7f1098649a58>,
 <wenet_models.LabelledStayRegion at 0x7f1098649d68>,
 <wenet_models.LabelledStayRegion at 0x7f1098649be0>}

In [25]:
for user_place in user_places:
    print(user_place)

46.53486, 6.53813 [2014-09-27 21:12:46]
46.47458, 6.42432 [2014-10-17 20:21:58]
46.52997, 6.56387 [2014-10-25 20:55:26]
46.52039, 6.63015 [2014-10-25 22:32:12]


We need to prepare our data in the expected format from gmaps module

In [26]:
all_stay_regions_center = [[s._lat, s._lng] for s in stay_regions_set]
all_stay_regions_center

[[41.4161011157307, 2.1319896648474193],
 [46.5290947959957, 6.5664955046249585],
 [41.415259622482694, 2.131734805555073]]

In [27]:
all_labelled_stay_regions_center = [[s._lat, s._lng] for s in labelled_stay_regions]
all_labelled_stay_regions_center

[[46.47467576797251, 6.424412376108274],
 [46.53489243668288, 6.538105543136973],
 [46.52050499938503, 6.6301146789473036],
 [46.53011505511413, 6.563756780453434]]

In [28]:
all_labelled_stay_regions_label = [s._label for s in labelled_stay_regions]
all_labelled_stay_regions_label

['Restaurant', 'Private', 'Bar', 'Private']

In [29]:
all_stay_regions_rect = [gmaps.Polygon([(s._topleft_lat, s._topleft_lng), (s._topleft_lat, s._bottomright_lng), (s._bottomright_lat, s._bottomright_lng), (s._bottomright_lat, s._topleft_lng)], fill_color='#FF8C00', stroke_color='#FF4500', stroke_weight=6) for s in stay_regions_set]

In [30]:
all_old_stay_regions_rect = [gmaps.Polygon([(s._topleft_lat, s._topleft_lng), (s._topleft_lat, s._bottomright_lng), (s._bottomright_lat, s._bottomright_lng), (s._bottomright_lat, s._topleft_lng)], fill_color='#000000', stroke_color='#000000', stroke_weight=6) for s in old_stay_regions]

In [31]:
all_labelled_stay_regions_rect = [gmaps.Polygon([(s._topleft_lat, s._topleft_lng), (s._topleft_lat, s._bottomright_lng), (s._bottomright_lat, s._bottomright_lng), (s._bottomright_lat, s._topleft_lng)], fill_color='#FF8CF0', stroke_color='#FF45F0', stroke_weight=6) for s in labelled_stay_regions]

Don't write your keys/password directly in your notebooks. Use external file and put correct ACL on them (600). Be careful to not commit them.

BTW, you can see on github a lot of unexperimented people pushing google api keys in theirs public repositories by searching "google api key" in the latest commit... [here](https://github.com/search?o=desc&q=google+api+key&s=committer-date&type=Commits) be careful, a lot of hackers scan these kind of credentials on github.

In [32]:
with open('google_api_key.txt', 'r') as f:
    key = f.read()
gmaps.configure(api_key=key)

In [33]:
raw_locations_df = df[['latitude', 'longitude']]
all_stay_regions_center_layer = gmaps.marker_layer(all_stay_regions_center)
all_labelled_stay_regions_center_layer = gmaps.marker_layer(all_labelled_stay_regions_center, label=all_labelled_stay_regions_label)
all_stay_regions_rect_layer = gmaps.drawing_layer(features=all_stay_regions_rect)
all_old_stay_regions_rect_layer = gmaps.drawing_layer(features=all_old_stay_regions_rect)

all_labelled_stay_regions_rect_layer = gmaps.drawing_layer(features=all_labelled_stay_regions_rect)
stay_points_layer = gmaps.symbol_layer([[p._lat, p._lng] for p in stay_points], fill_color='#00FFFF', stroke_color='#00FFFF', scale=2)
place_points_layer = gmaps.symbol_layer([[p._lat, p._lng] for p in user_places], fill_color='#0000FF', stroke_color='#0000FF', scale=4)

locations_layer = gmaps.heatmap_layer(raw_locations_df)
fig = gmaps.figure()
fig.add_layer(stay_points_layer)
fig.add_layer(all_stay_regions_center_layer)
fig.add_layer(all_labelled_stay_regions_center_layer)
fig.add_layer(place_points_layer)
fig.add_layer(all_stay_regions_rect_layer)
fig.add_layer(all_old_stay_regions_rect_layer)
fig.add_layer(all_labelled_stay_regions_rect_layer)
fig.add_layer(locations_layer)

locations_layer.max_intensity = 100
locations_layer.point_radius = 10
fig.map_type = 'SATELLITE'
fig

Figure(layout=FigureLayout(height='420px'))