# Semantic region using Y@N data

## WeNet project - exploratory notebook

We will use the user "374554f1-a18c-4636-8be1-d290dc16f55e" from our previous conclusions.

In [1]:
import pandas as pd
import gmaps
from wenet_models import LocationPoint, UserPlaceTimeOnly
from wenet_algo import estimate_stay_points, estimate_stay_regions, labelize_stay_region
from wenet_tools import time_difference_ms
from datetime import datetime

### The locations can be extracted from the sensors

In [2]:
user = '374554f1-a18c-4636-8be1-d290dc16f55e'
df = pd.read_csv(f'/idiap/temp/wdroz/locations/{user}_location.csv')

It's possible to show how many records we have for that users

In [3]:
len(df)

8489

with the `head` method, we can get the top elements of the dataframe. That useful when you need to get a idea of the data

In [4]:
df.head()

Unnamed: 0,userid,night,type,timestamp,timezone,local_time,source,latitude,longitude,speed,accuracy,provider,bearing
0,374554f1-a18c-4636-8be1-d290dc16f55e,20140927,Location,1411842910,7200,20140927203510,358270055852872,46.520883,6.638972,0.0,986.0,network,0.0
1,374554f1-a18c-4636-8be1-d290dc16f55e,20140927,Location,1411842927,7200,20140927203527,358270055852872,46.520883,6.638972,0.0,986.0,network,0.0
2,374554f1-a18c-4636-8be1-d290dc16f55e,20140927,Location,1411842953,7200,20140927203553,358270055852872,46.53472,6.538107,0.0,52.0,gps,0.0
3,374554f1-a18c-4636-8be1-d290dc16f55e,20140927,Location,1411842960,7200,20140927203600,358270055852872,46.534637,6.538219,0.0,20.0,gps,0.0
4,374554f1-a18c-4636-8be1-d290dc16f55e,20140927,Location,1411842965,7200,20140927203605,358270055852872,46.534596,6.538231,0.0,20.0,gps,0.0


There are a lot of possibles ways to transform the data from the dataframe into our custom python objects. `df.iterrows` do the job this time

In [5]:
locations = []
for index, row in df.iterrows():
    pts_t = datetime.fromtimestamp(row['timestamp'])
    location = LocationPoint(pts_t, row['latitude'], row['longitude'])
    locations.append(location)

When the user has been recorded at least twice near the same location and at relatively close time interval, then we can set a stay point from this.

In [6]:
stay_points = estimate_stay_points(locations, time_min_ms=2*60*1000)
len(stay_points)

303

from the 8489 rows locations, we have extracted 303 "stay points".

From these stay points, it's possible to extract clusters of points. We call them "stay regions"

In [7]:
stay_regions = estimate_stay_regions(stay_points, distance_threshold_m=20)
len(stay_regions)

17

From the 303 stay points, we have extracted 17 regions

The user had to fill "survey" about what they drinked. So we can select the data for the our selected user

In [8]:
df_ambiance = pd.read_csv('/idiap/temp/wdroz/wenet/surveys/ambiance_survey.csv', sep=',', encoding="ISO-8859-1")
df_ambiance = df_ambiance[df_ambiance['user'] == user]

In [9]:
len(df_ambiance)

29

In [10]:
df_ambiance.head()

Unnamed: 0,user,source,timezone_id,timestamp,timezone_display_name,timezone_raw_offset,env_artsy_percent,env_dingy_percent,env_formal_percent,env_loud_percent,env_oldfashioned_percent,env_romantic_percent,env_sophisticated_percent,env_trendy_percent,env_upscale_percent,place_city,place_id_name,place_type,imageFileName
28,374554f1-a18c-4636-8be1-d290dc16f55e,358270055852872,Europe/Zurich,2014-10-05 02:38:26,CET,7200000,2,0,0,0,0,1,1,1,0,other,home_no_parents,personal,IMG_20141005_023623_1507282669.jpg
45,374554f1-a18c-4636-8be1-d290dc16f55e,358270055852872,Europe/Zurich,2014-10-24 21:01:01,CET,7200000,1,0,1,0,0,0,1,1,0,other,other,restaurant,IMG_20141024_210002_-1453837232.jpg
46,374554f1-a18c-4636-8be1-d290dc16f55e,358270055852872,Europe/Zurich,2014-10-25 03:57:59,CET,7200000,2,1,0,0,2,0,1,0,0,other,home_no_parents,personal,IMG_20141025_035705_-1453837232.jpg
47,374554f1-a18c-4636-8be1-d290dc16f55e,358270055852872,Europe/Zurich,2014-10-25 20:55:08,CET,7200000,2,1,0,0,0,0,2,1,0,other,friend_home,personal,IMG_20141025_205317_78126693.jpg
48,374554f1-a18c-4636-8be1-d290dc16f55e,358270055852872,Europe/Zurich,2014-10-25 21:24:36,CET,7200000,0,1,0,0,0,0,1,0,0,other,public_transport,onboard,IMG_20141025_212330_429033244.jpg


Depending of your needs, you can add logic and rules when extracing the relevant data

In [11]:
user_places = []
for index, row in df_ambiance.iterrows():
    pts_t = datetime.strptime(row['timestamp'], "%Y-%m-%d %H:%M:%S")
    if row['place_type'] == 'personal':
        place = row['place_id_name']
    else:
        place = row['place_type']
    user_place_time_only = UserPlaceTimeOnly(pts_t, place, user)
    user_place = user_place_time_only.to_user_place_from_stay_points(stay_points, max_delta_time_ms=1000*60*3)
    if user_place is not None:
        user_places.append(user_place)
labelled_stay_regions = labelize_stay_region(stay_regions, user_places)
stay_regions_set = set(stay_regions) - labelled_stay_regions
labelled_stay_regions

{<wenet_models.LabelledStayRegion at 0x7f6e68bca2b0>,
 <wenet_models.LabelledStayRegion at 0x7f6e68bca470>,
 <wenet_models.LabelledStayRegion at 0x7f6e68bca2e8>,
 <wenet_models.LabelledStayRegion at 0x7f6e68bca390>,
 <wenet_models.LabelledStayRegion at 0x7f6e68bca3c8>,
 <wenet_models.LabelledStayRegion at 0x7f6e68bca438>}

In [12]:
for user_place in user_places:
    print(user_place)

46.47470, 6.42448 [2014-10-24 21:01:01]
46.52842, 6.56646 [2014-10-25 21:24:36]
46.52097, 6.62658 [2014-10-25 21:45:24]
46.52097, 6.62658 [2014-10-25 22:31:54]
46.53015, 6.56378 [2014-10-26 01:51:53]
46.53484, 6.53807 [2014-10-18 20:53:12]
46.52167, 6.62687 [2014-09-27 23:00:02]
46.47458, 6.42432 [2014-10-17 20:21:38]
41.41568, 2.13216 [2014-10-10 20:16:06]
41.40727, 2.14501 [2014-10-11 23:55:04]


We need to prepare our data in the expected format from gmaps module

In [13]:
all_stay_regions_center = [[s._lat, s._lng] for s in stay_regions_set]
all_stay_regions_center

[[41.405598299999994, 2.1400308],
 [41.40740783, 2.1468246875],
 [41.415291749999994, 2.131747425],
 [41.405926171666664, 2.1394147583333334],
 [41.407684034999996, 2.1440890925],
 [46.531140483, 6.573449763000001],
 [41.405750616666666, 2.1396933666666667],
 [46.52188098333334, 6.626651383333333],
 [46.5210080575, 6.6294321825],
 [46.5317286, 6.5446421],
 [46.52237549523809, 6.6254748547619045]]

In [14]:
all_labelled_stay_regions_center = [[s._lat, s._lng] for s in labelled_stay_regions]
all_labelled_stay_regions_center

[[46.47468093975206, 6.424404014297521],
 [41.41568143375, 2.1320549349999998],
 [46.521694389, 6.626904997999999],
 [41.40726467420455, 2.144614930795454],
 [46.530101032500006, 6.563769611875],
 [46.53487522033333, 6.538115216333334]]

In [15]:
all_labelled_stay_regions_label = [s._label for s in labelled_stay_regions]
all_labelled_stay_regions_label

['restaurant', 'bar', 'bar', 'other_hôtel', 'friend_home', 'home_no_parents']

In [16]:
all_stay_regions_rect = [gmaps.Polygon([(s._topleft_lat, s._topleft_lng), (s._topleft_lat, s._bottomright_lng), (s._bottomright_lat, s._bottomright_lng), (s._bottomright_lat, s._topleft_lng)], fill_color='#FF8C00', stroke_color='#FF4500', stroke_weight=6) for s in stay_regions_set]

In [17]:
all_labelled_stay_regions_rect = [gmaps.Polygon([(s._topleft_lat, s._topleft_lng), (s._topleft_lat, s._bottomright_lng), (s._bottomright_lat, s._bottomright_lng), (s._bottomright_lat, s._topleft_lng)], fill_color='#FF8CF0', stroke_color='#FF45F0', stroke_weight=6) for s in labelled_stay_regions]

Don't write your keys/password directly in your notebooks. Use external file and put correct ACL on them (600). Be careful to not commit them.

BTW, you can see on github a lot of unexperimented people pushing google api keys in theirs public repositories by searching "google api key" in the latest commit... [here](https://github.com/search?o=desc&q=google+api+key&s=committer-date&type=Commits) be careful, a lot of hackers scan these kind of credentials on github.

In [18]:
with open('google_api_key.txt', 'r') as f:
    key = f.read()
gmaps.configure(api_key=key)

In [19]:
raw_locations_df = df[['latitude', 'longitude']]
all_stay_regions_center_layer = gmaps.marker_layer(all_stay_regions_center)
all_labelled_stay_regions_center_layer = gmaps.marker_layer(all_labelled_stay_regions_center, label=all_labelled_stay_regions_label)
all_stay_regions_rect_layer = gmaps.drawing_layer(features=all_stay_regions_rect)
all_labelled_stay_regions_rect_layer = gmaps.drawing_layer(features=all_labelled_stay_regions_rect)
stay_points_layer = gmaps.symbol_layer([[p._lat, p._lng] for p in stay_points], fill_color='#00FFFF', stroke_color='#00FFFF', scale=2)
place_points_layer = gmaps.symbol_layer([[p._lat, p._lng] for p in user_places], fill_color='#0000FF', stroke_color='#0000FF', scale=4)

locations_layer = gmaps.heatmap_layer(raw_locations_df)
fig = gmaps.figure()
fig.add_layer(stay_points_layer)
fig.add_layer(all_stay_regions_center_layer)
fig.add_layer(all_labelled_stay_regions_center_layer)
fig.add_layer(place_points_layer)
fig.add_layer(all_stay_regions_rect_layer)
fig.add_layer(all_labelled_stay_regions_rect_layer)
fig.add_layer(locations_layer)

locations_layer.max_intensity = 100
locations_layer.point_radius = 10
fig.map_type = 'SATELLITE'
fig

Figure(layout=FigureLayout(height='420px'))