# Association of the *compressed* trajectories with the events

In the end, I need each point of the users' trajectories to be associated with their corresponding event. This notebook performs the operations needed for that. It takes as input the (fixed) timetables, the (fixed) GIS polygons, and the (reduced) connections. 

This file first filters the connection data and then associates it with their corresponding events.

In [1]:
import os

import pandas as pd
import geopandas as gpd
import h3

from shapely.geometry import Point

In [2]:
# Importing a custom module in a different file
import sys
sys.path.append('C:\Camilo\Estudio\Padova\Master thesis\master-thesis-code')
import constants

Assigning the path to read the preprocessed trajectories files:   

In [3]:
path_trajectories_preprocessed = r'..\..\Datasets\Processed\trajectories_preprocessed'

Assigning the path to write the trajectories files with their associated events (the result of this script):   

In [4]:
path_trajectories_events = r'..\..\Datasets\Processed\trajectories_events'

For both Sónar by night and Sónar by day, I first need to read the timetables.

In [5]:
# Reading the timetables and renaming two columns for clarity
sonar_timetables = pd.read_csv(r'..\..\Datasets\Processed\sonar_timetables_preprocessed.csv',
                               parse_dates = ['start_datetime','end_datetime'])
sonar_timetables.rename(columns={'title':'event_title','activity':'activity_type'}, inplace=True)

# Adding the timezone information so that the times are handled correctly
sonar_timetables['start_datetime'] = sonar_timetables['start_datetime'].dt.tz_localize('Europe/Madrid')
sonar_timetables['end_datetime'] = sonar_timetables['end_datetime'].dt.tz_localize('Europe/Madrid')

In [6]:
# Selecting only the relevant columns
sonar_timetables = sonar_timetables[['sonar_type', 'day_label', 'start_datetime', 'end_datetime',
                                     'event_title', 'activity_type', 'stage', 'music_type',
                                     'genre','genre_grouped','views_youtube']]

In [7]:
sonar_timetables.dtypes

sonar_type                               object
day_label                                object
start_datetime    datetime64[ns, Europe/Madrid]
end_datetime      datetime64[ns, Europe/Madrid]
event_title                              object
activity_type                            object
stage                                    object
music_type                               object
genre                                    object
genre_grouped                            object
views_youtube                           float64
dtype: object

## Association - Sónar by night process

### Associating the timetables with the polygons

As a starting point, I need an intermediate table that associates the timetables with their geographical information (contained in the polygons).

In [8]:
# Reading the polygons and combining them in a single GeoDataFrame
night_polygons_clipped = gpd.read_file(r'..\..\Datasets\Processed\Zonas SONAR clipped\sonar_night_polygons_clipped.json')

In [9]:
sonar_timetables

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre,genre_grouped,views_youtube
0,Sónar by Day,Thursday 13 June,2024-06-13 15:00:00+02:00,2024-06-13 16:00:00+02:00,Rumbler,Music,SonarVillage,DJ,,,
1,Sónar by Day,Thursday 13 June,2024-06-13 16:05:00+02:00,2024-06-13 16:50:00+02:00,Huda,Music,SonarVillage,LIVE,,,
2,Sónar by Day,Thursday 13 June,2024-06-13 17:00:00+02:00,2024-06-13 18:10:00+02:00,Olof Dreijer & Diva Cruz (DJ + Percussion set),Music,SonarVillage,LIVE,,,
3,Sónar by Day,Thursday 13 June,2024-06-13 18:20:00+02:00,2024-06-13 19:00:00+02:00,Toya Delazy,Music,SonarVillage,LIVE,,,
4,Sónar by Day,Thursday 13 June,2024-06-13 19:05:00+02:00,2024-06-13 20:30:00+02:00,Surusinghe,Music,SonarVillage,DJ,,,
...,...,...,...,...,...,...,...,...,...,...,...
141,Sónar by Night,Friday 15 June,2024-06-16 01:05:00+02:00,2024-06-16 01:55:00+02:00,Club Cringe,Music,SonarCar,DJ,Experimental,electronic_hypnotic,55831.0
142,Sónar by Night,Friday 15 June,2024-06-16 02:05:00+02:00,2024-06-16 02:55:00+02:00,Julietta Ferrari,Music,SonarCar,DJ,Experimental,electronic_hypnotic,0.0
143,Sónar by Night,Friday 15 June,2024-06-16 03:05:00+02:00,2024-06-16 03:55:00+02:00,Soto Asa,Music,SonarCar,LIVE,Reggaeton,other_genres,153053094.0
144,Sónar by Night,Friday 15 June,2024-06-16 04:00:00+02:00,2024-06-16 04:50:00+02:00,Drazzit,Music,SonarCar,DJ,Trance/Techno,electronic_hypnotic,22948.0


In [10]:
# In this case I use an outer join because there are events 
# with no geographic information associated to them (e.g. they happen at Room+D -I did not find the corresponding polygon-),
# or are places that are not related to events (e.g. cashless areas, restaurants, etc.)
# and I do not want to discard any of them yet
night_timetables_polygons = pd.merge(sonar_timetables.loc[sonar_timetables['sonar_type']=='Sónar by Night'], 
                                     night_polygons_clipped[['polygon_name','source_gis_file','stage','stage_area_m2','geometry']],
                                     how='outer', on='stage')
night_timetables_polygons.sort_values(by=['sonar_type','day_label','event_title'], inplace=True)
night_timetables_polygons = gpd.GeoDataFrame(night_timetables_polygons)
night_timetables_polygons.drop(columns='geometry').head()

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2
51,Sónar by Night,Friday 14 June,2024-06-15 02:30:00+02:00,2024-06-15 04:00:00+02:00,Adriatique,Music,SonarClub,DJ,General electronic,electronic_accessible,28697840.0,SONAR NIT - Zona VIP Club,av1-2,14914
52,Sónar by Night,Friday 14 June,2024-06-15 02:30:00+02:00,2024-06-15 04:00:00+02:00,Adriatique,Music,SonarClub,DJ,General electronic,electronic_accessible,28697840.0,SONAR NIT - Zona VIP Club Barra,av1-2,14914
53,Sónar by Night,Friday 14 June,2024-06-15 02:30:00+02:00,2024-06-15 04:00:00+02:00,Adriatique,Music,SonarClub,DJ,General electronic,electronic_accessible,28697840.0,SONAR NIT - SonarClub,p2,14914
54,Sónar by Night,Friday 14 June,2024-06-15 02:30:00+02:00,2024-06-15 04:00:00+02:00,Adriatique,Music,SonarClub,DJ,General electronic,electronic_accessible,28697840.0,SONAR NIT - SonarClub Barra la Nueva,p2,14914
55,Sónar by Night,Friday 14 June,2024-06-15 02:30:00+02:00,2024-06-15 04:00:00+02:00,Adriatique,Music,SonarClub,DJ,General electronic,electronic_accessible,28697840.0,SONAR NIT - SonarClub Barra,p2,14914


I need to add a start_datetime and a end_datetime for the polygons that are not in the timetables, so that I do not lose the observations that fall in these zones when filtering by time.

In [11]:
night_timetables_polygons.loc[night_timetables_polygons['event_title'].isna()].drop(columns='geometry')

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2
0,,,NaT,NaT,,,NA-Cashless1,,,,,SONAR NIT - Cashless 1,p2,947
1,,,NaT,NaT,,,NA-Entrada,,,,,SONAR NIT - Entrada,p1,5438
2,,,NaT,NaT,,,NA-Restauración,,,,,SONAR NIT - Restauración,p3,5695
3,,,NaT,NaT,,,NA-autos_choques,,,,,SONAR NIT - Autos de choques,p3,1522
4,,,NaT,NaT,,,NA-autos_choques_barra,,,,,SONAR NIT - Autos de choques Barra,p3,1609


In [12]:
# Adding the start time and the end_datetime as the minimum and maximum times considered for the festival
# These were defined in the 3.preprocessing_filtering_splitting file and stored in the constants.py file

start_night_1 = pd.Timestamp(constants.START_NIGHT_1_STRING, tz='Europe/Madrid')
end_night_2 = pd.Timestamp(constants.END_NIGHT_2_STRING, tz='Europe/Madrid')

night_timetables_polygons.loc[night_timetables_polygons['event_title'].isna(),'start_datetime'] = start_night_1
night_timetables_polygons.loc[night_timetables_polygons['event_title'].isna(),'end_datetime'] = end_night_2

# I also add some explicit labels for clarity
night_timetables_polygons.loc[night_timetables_polygons['event_title'].isna(),'sonar_type'] = 'Sónar by Night'
night_timetables_polygons.loc[night_timetables_polygons['event_title'].isna(),'event_title'] = 'No event'

# Print to visualize the changes
night_timetables_polygons.loc[night_timetables_polygons['event_title']=='No event'].drop(columns='geometry')

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2
0,Sónar by Night,,2024-06-14 19:50:00+02:00,2024-06-16 08:00:00+02:00,No event,,NA-Cashless1,,,,,SONAR NIT - Cashless 1,p2,947
1,Sónar by Night,,2024-06-14 19:50:00+02:00,2024-06-16 08:00:00+02:00,No event,,NA-Entrada,,,,,SONAR NIT - Entrada,p1,5438
2,Sónar by Night,,2024-06-14 19:50:00+02:00,2024-06-16 08:00:00+02:00,No event,,NA-Restauración,,,,,SONAR NIT - Restauración,p3,5695
3,Sónar by Night,,2024-06-14 19:50:00+02:00,2024-06-16 08:00:00+02:00,No event,,NA-autos_choques,,,,,SONAR NIT - Autos de choques,p3,1522
4,Sónar by Night,,2024-06-14 19:50:00+02:00,2024-06-16 08:00:00+02:00,No event,,NA-autos_choques_barra,,,,,SONAR NIT - Autos de choques Barra,p3,1609


### Associating each trajectory point to their corresponding stage

I can read the trajectories dataframe without scikit-mobility (I do not need any of the functionalities).

In [13]:
trajectories_night = pd.read_csv(os.path.join(path_trajectories_preprocessed, 'tdf_night_preprocessed_compressed.csv'), dtype={'vendor_name':str})

Converting to a GeoDataframe with the adequate characteristics.

In [14]:
# Getting the geometry and converting to Geotadaframe
trajectories_night['geometry'] = gpd.points_from_xy(trajectories_night['lng'], trajectories_night['lat'])
trajectories_night = gpd.GeoDataFrame(trajectories_night, geometry='geometry', crs=night_timetables_polygons.crs)

# Converting the date
trajectories_night['datetime'] = pd.to_datetime(trajectories_night['datetime'])
trajectories_night['datetime'] = trajectories_night['datetime'].dt.tz_convert('Europe/Madrid')  

In [15]:
trajectories_night.dtypes

uid                                                         object
macaddr_randomized                                           int64
tid                                                          int64
datetime                             datetime64[ns, Europe/Madrid]
timestamp_ap                                                 int64
lat                                                        float64
lng                                                        float64
vendor_name                                                 object
h3_cell_original                                            object
stage_original                                              object
observations_user_night_original                             int64
timespan_minutes_night_original                            float64
num_distinct_stage_night_original                            int64
minutes_per_stage_original                                 float64
geometry                                                  geom

Performing a spatial join with just the polygons to check the join is correctly performed (before perfroming the actual join with the night_timetables_polygons). I check both the inner join and the left join to see if there are differences.

In [16]:
night_trajs_sjoin_left = gpd.sjoin(trajectories_night, night_polygons_clipped[['polygon_name','source_gis_file','stage','stage_area_m2','geometry']], how='left', predicate='within')

print('Shape after left join:')
night_trajs_sjoin_left.shape

Shape after left join:


(630782, 20)

In [17]:
night_trajs_sjoin_inner = gpd.sjoin(trajectories_night, night_polygons_clipped[['polygon_name','source_gis_file','stage','stage_area_m2','geometry']], how='inner', predicate='within')

print('Shape after inner join:')
night_trajs_sjoin_inner.shape

Shape after inner join:


(630781, 20)

In [18]:
# There are few not joined points, which might correspond to a small change of position due to the trajectory compression
no_points_in_polygon = night_trajs_sjoin_left[night_trajs_sjoin_left['polygon_name'].isna()]
no_points_in_polygon.drop(columns='geometry')

Unnamed: 0,uid,macaddr_randomized,tid,datetime,timestamp_ap,lat,lng,vendor_name,h3_cell_original,stage_original,observations_user_night_original,timespan_minutes_night_original,num_distinct_stage_night_original,minutes_per_stage_original,index_right,polygon_name,source_gis_file,stage,stage_area_m2
62120,1b18a57e83edac30d0fb7fb8edccefb6e04a8bf0544e46...,1,1,2024-06-15 02:56:55+02:00,1718413015,41.354855,2.132505,,8d394461e86c43f,SonarPub,1108,339.93,7,48.56,,,,,


### Associating each trajectory point to an event

As practically all the points were joined, for a cleaner output, I apply the inner join with the night_polygons_clipped and obtain the stages and their corresponding event timetables.

In [19]:
trajectories_events_night = gpd.sjoin(trajectories_night, night_timetables_polygons, how='inner', predicate='within')
trajectories_events_night.shape

(6898356, 30)

To make the association with the actual events, I need to filter with the hour of the events.

In [20]:
# Keep only rows where the datetime is within the event's start and end time
# There is no overlap between the events that happen in the same stage, so I can use the <= condition on the upper bound
trajectories_events_matched_night = trajectories_events_night.loc[(trajectories_events_night['datetime'] >= trajectories_events_night['start_datetime']) &
                                                                 (trajectories_events_night['datetime'] <= trajectories_events_night['end_datetime'])]
trajectories_events_matched_night.shape

(573394, 30)

There are points that were geographically found, but were discarded with the time of events. In order to keep those trajectory points, I can find the difference between the dataframes.

Since there are not duplicates of uid and datetime anymore, I can find the unmatched trajectory points and add them back to the matched trajectory points (with a specific label) and obtain the filtered trajectories_events_night.

In [21]:
# There are no duplicated trajectory points based on 'anonymized_macaddr' and 'datetime'
duplicates_traj_night = night_trajs_sjoin_inner.groupby(['uid', 'datetime']).size().reset_index(name='duplicate_count')
print(f'Duplicate count based on uid and datetime: {len(duplicates_traj_night[duplicates_traj_night["duplicate_count"] > 1])}')

Duplicate count based on uid and datetime: 0


In [22]:
def get_unmatched_points(reference_df, other_df, columns_join):
    
    merged = pd.merge(reference_df, other_df, 
                      on=columns_join,
                      indicator=True, how='left')
    diff_df = merged[merged['_merge'] == 'left_only']
    del diff_df['_merge']
    return diff_df

In [23]:
unmatched_points_events_night = get_unmatched_points(reference_df=night_trajs_sjoin_inner,
                                                     other_df=trajectories_events_matched_night[['uid','datetime', 'event_title']],
                                                     columns_join=['uid','datetime'])
unmatched_points_events_night.drop(columns='geometry')

Unnamed: 0,uid,macaddr_randomized,tid,datetime,timestamp_ap,lat,lng,vendor_name,h3_cell_original,stage_original,observations_user_night_original,timespan_minutes_night_original,num_distinct_stage_night_original,minutes_per_stage_original,index_right,polygon_name,source_gis_file,stage,stage_area_m2,event_title
29,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 23:33:11+02:00,1718487191,41.354105,2.129860,,8d394461e82927f,SonarClub,586,340.90,7,48.70,9,SONAR NIT - SonarClub,p2,SonarClub,14914,
30,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 23:33:25+02:00,1718487205,41.354046,2.129898,,8d394461e8292ff,SonarClub,586,340.90,7,48.70,9,SONAR NIT - SonarClub,p2,SonarClub,14914,
31,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 23:33:36+02:00,1718487216,41.353951,2.129868,,8d394461e87693f,SonarClub,586,340.90,7,48.70,9,SONAR NIT - SonarClub,p2,SonarClub,14914,
32,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 23:33:42+02:00,1718487222,41.353923,2.129927,,8d394461e87683f,SonarClub,586,340.90,7,48.70,9,SONAR NIT - SonarClub,p2,SonarClub,14914,
228,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-16 03:57:09+02:00,1718503029,41.356073,2.130682,,8d394461e94c7bf,SonarCar,586,340.90,7,48.70,14,SONAR NIT - SonarCar,p3,SonarCar,2688,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
630578,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 01:48:07+02:00,1718408887,41.354516,2.130759,,8d394461e8664bf,SonarLab x Printworks,714,510.43,5,102.09,2,SONAR NIT - SonarLab,av2-3,SonarLab x Printworks,9171,
630629,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 03:21:29+02:00,1718414489,41.354137,2.130741,,8d394461e8757bf,SonarLab x Printworks,714,510.43,5,102.09,2,SONAR NIT - SonarLab,av2-3,SonarLab x Printworks,9171,
630630,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 03:23:09+02:00,1718414589,41.354479,2.130749,,8d394461e87587f,SonarLab x Printworks,714,510.43,5,102.09,2,SONAR NIT - SonarLab,av2-3,SonarLab x Printworks,9171,
630631,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 03:23:57+02:00,1718414637,41.354137,2.130741,,8d394461e8757bf,SonarLab x Printworks,714,510.43,5,102.09,2,SONAR NIT - SonarLab,av2-3,SonarLab x Printworks,9171,


Assigning an explicit label for the event title and concatenating the two dataframes into the final `trajectories_events_night`.

In [24]:
unmatched_points_events_night['event_title'] = 'No event'

**The points were correctly associated with their corresponding events.** The difference in size between this table and the original trajectories is just the points that do not fall within the polygons anymore due to the trajectory compression.

In [25]:
trajectories_events_night = pd.concat([trajectories_events_matched_night, unmatched_points_events_night]).sort_values(by=['uid','datetime'])
trajectories_events_night.drop(columns='geometry')

Unnamed: 0,uid,macaddr_randomized,tid,datetime,timestamp_ap,lat,lng,vendor_name,h3_cell_original,stage_original,...,event_title,activity_type,stage,music_type,genre,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2
0,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:38:47+02:00,1718483927,41.353501,2.129162,,8d394461e82b5bf,NA-Entrada,...,No event,,NA-Entrada,,,,,SONAR NIT - Entrada,p1,5438
1,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:39:19+02:00,1718483959,41.353440,2.129005,,8d394461e82a67f,NA-Entrada,...,No event,,NA-Entrada,,,,,SONAR NIT - Entrada,p1,5438
2,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:40:37+02:00,1718484037,41.353501,2.129162,,8d394461e82b5bf,NA-Entrada,...,No event,,NA-Entrada,,,,,SONAR NIT - Entrada,p1,5438
3,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:49:15+02:00,1718484555,41.353426,2.128963,,8d394461e82a67f,NA-Entrada,...,No event,,NA-Entrada,,,,,SONAR NIT - Entrada,p1,5438
4,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:49:21+02:00,1718484561,41.353365,2.128970,,8d394461e82a6ff,NA-Entrada,...,No event,,NA-Entrada,,,,,SONAR NIT - Entrada,p1,5438
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
630777,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:26:21+02:00,1718425581,41.354558,2.130617,,8d394461e95b2ff,SonarLab x Printworks,...,DJ Flight & MC Chickaboo,Music,SonarLab x Printworks,DJ,Drum and bass,electronic_hypnotic,0.0,SONAR NIT - SonarLab,av2-3,9171
630778,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:26:35+02:00,1718425595,41.354326,2.130864,,8d394461e875abf,SonarLab x Printworks,...,DJ Flight & MC Chickaboo,Music,SonarLab x Printworks,DJ,Drum and bass,electronic_hypnotic,0.0,SONAR NIT - SonarLab,av2-3,9171
630779,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:27:57+02:00,1718425677,41.354300,2.130898,,8d394461e87533f,SonarLab x Printworks,...,DJ Flight & MC Chickaboo,Music,SonarLab x Printworks,DJ,Drum and bass,electronic_hypnotic,0.0,SONAR NIT - SonarLab Barra 1,av2-3,9171
630780,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:28:41+02:00,1718425721,41.354254,2.130618,,8d394461e875cff,SonarLab x Printworks,...,DJ Flight & MC Chickaboo,Music,SonarLab x Printworks,DJ,Drum and bass,electronic_hypnotic,0.0,SONAR NIT - SonarLab,av2-3,9171


### Associating each updated trajectory point to their corresponding H3 cells

For plotting purposes, I also get the updated h3 cell from the h3 API. As the trajectory compression might have sligthly changed some locations, I recompute the h3_cell for safety.

In [26]:
trajectories_events_night['h3_cell'] = [h3.latlng_to_cell(lat, lng, 13) for lat, lng in zip(trajectories_events_night['lat'], trajectories_events_night['lng'])]

In [27]:
print(f"There are {(trajectories_events_night['h3_cell_original'] != trajectories_events_night['h3_cell']).sum()} points that changed their h3_cell after the trajectory preprocessing.")


There are 5981 points that changed their h3_cell after the trajectory preprocessing.


## Association - Sónar by day process

#### Adjusting the timetables

I found that there were some events that have a time overlap in Sónar by Day that happen in the same space, and there is not a small-enough granularity to distinguish between the areas where these events happened. For this reason, I leave only the more general and correct 'Project area' label to these cases. 

In [28]:
sonar_timetables = sonar_timetables.loc[(sonar_timetables['stage'] != 'Project Area') |
                                        ((sonar_timetables['stage'] == 'Project Area') & (sonar_timetables['event_title'] == 'Project Area'))]

As a starting point, I need an intermediate table that associates the timetables with their geographical information (contained in the polygons).

In [29]:
# Reading the polygons and combining them in a single GeoDataFrame
day_polygons_clipped = gpd.read_file(r'..\..\Datasets\Processed\Zonas SONAR clipped\sonar_day_polygons_clipped.json')

#### Adding the floor number to the stages in day_polygons_clipped

In [30]:
# Mapping the 'source_gis_file' column to the corresponding floor values (the default is floor 0)

floor_assignment = {'p5.2': 2, 'p5.1': 1}

day_polygons_clipped['polygon_floor'] = day_polygons_clipped['source_gis_file'].map(floor_assignment).fillna(0).astype('Int8')
day_polygons_clipped.drop(columns='geometry')

Unnamed: 0,id,polygon_name,index,source_gis_file,stage,stage_area_m2,polygon_floor
0,88259806-b543-45e9-b0e9-87b0ac826ce6,SONAR DIA - SonarPark,0,p1,SonarPark,1914,0
1,036a9d19-b2a9-45a9-aa74-0569fc82ba8c,SONAR DIA - SonarPark Barra,1,p1,SonarPark,1914,0
2,aa598ed7-0b28-444d-8564-b434e0e34b82,SONAR DIA - SonarHall Paso,0,p2,NA-sonar_hall_paso,6973,0
3,cfcf08c7-1230-4df9-9309-6ef436090d99,SONAR DIA - SonarHall,1,p2,SonarHall,1319,0
4,97acc96a-1160-456d-abe7-4046ec78fc41,SONAR DIA - Food Trucks,2,p2,NA-food_trucks,1714,0
5,a03c6c6f-b506-4915-bf7f-fc609b2e20e9,SONAR DIA - Stage+D,3,p2,Stage+D,443,0
6,f10d8100-ef6d-468b-acc4-09f8497cac7c,SONAR DIA - SonarVillage,0,p3,SonarVillage,9366,0
7,0e9222e8-38be-49bf-bbdd-46a37ce4a899,SONAR DIA - SonarVillage VIP,1,p3,SonarVillage,9366,0
8,dc79c74e-b7e6-4a8d-9b6d-f8d20138e8c9,SONAR DIA - SonarVillage Barra 2,2,p3,SonarVillage,9366,0
9,fd73b9fb-8b96-473e-80ff-b648b8a18005,SONAR DIA - SonarVillage Barra 1,3,p3,SonarVillage,9366,0


### Associating the timetables with the polygons

In [31]:
# In this case I use an outer join because there are events 
# with no geographic information associated to them (e.g. they happen at Room+D -I did not find the corresponding polygon-),
# or are places that are not related to events (e.g. cashless areas, restaurants, etc.)
# and I do not want to discard any of them yet
day_timetables_polygons = pd.merge(sonar_timetables.loc[sonar_timetables['sonar_type']=='Sónar by Day'], 
                                     day_polygons_clipped[['polygon_name','source_gis_file','stage','stage_area_m2','polygon_floor','geometry']],
                                     how='outer', on='stage')
day_timetables_polygons.sort_values(by=['sonar_type','day_label','event_title'], inplace=True)
day_timetables_polygons = gpd.GeoDataFrame(day_timetables_polygons)
day_timetables_polygons.drop(columns='geometry').head(5)

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2,polygon_floor
25,Sónar by Day,Friday 14 June,2024-06-14 10:00:00+02:00,2024-06-14 14:00:00+02:00,AI & WEB3 Creative Summit,Talk,Room+D,,,,,,,,
54,Sónar by Day,Friday 14 June,2024-06-14 16:55:00+02:00,2024-06-14 17:40:00+02:00,AMORE,Music,SonarPark,LIVE,,,,SONAR DIA - SonarPark,p1,1914.0,0.0
55,Sónar by Day,Friday 14 June,2024-06-14 16:55:00+02:00,2024-06-14 17:40:00+02:00,AMORE,Music,SonarPark,LIVE,,,,SONAR DIA - SonarPark Barra,p1,1914.0,0.0
3,Sónar by Day,Friday 14 June,2024-06-14 16:15:00+02:00,2024-06-14 17:00:00+02:00,Adelaida presents 'Muérdago',Music,Complex+D,LIVE,,,,SONAR DIA - SonarComplex,p5.2,1092.0,2.0
52,Sónar by Day,Friday 14 June,2024-06-14 15:45:00+02:00,2024-06-14 16:45:00+02:00,Akazie,Music,SonarPark,DJ,,,,SONAR DIA - SonarPark,p1,1914.0,0.0


I need to add a start_datetime and a end_datetime for the polygons that are not in the timetables, so that I do not lose the observations that fall in these zones when filtering by time.

In [32]:
day_timetables_polygons.loc[day_timetables_polygons['event_title'].isna()].drop(columns='geometry')

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2,polygon_floor
12,,,NaT,NaT,,,NA-cashless,,,,,SONAR DIA - Cashless,p4,854.0,0
13,,,NaT,NaT,,,NA-food_trucks,,,,,SONAR DIA - Food Trucks,p2,1714.0,0
14,,,NaT,NaT,,,NA-lounge+d,,,,,SONAR DIA - Lounge+D,p5.0,608.0,0
15,,,NaT,NaT,,,NA-lounge_barra,,,,,SONAR DIA - Lounge Barra,p5.0,101.0,0
16,,,NaT,NaT,,,NA-sonar_hall_paso,,,,,SONAR DIA - SonarHall Paso,p2,6973.0,0


In [33]:
# Adding the start time and the end_datetime as the minimum and maximum times considered for the festival
# These were defined in the 3.preprocessing_filtering_splitting file and stored in the constants.py file

start_day_1 = pd.Timestamp(constants.START_DAY_1_STRING, tz='Europe/Madrid')
end_day_3 = pd.Timestamp(constants.END_DAY_3_STRING, tz='Europe/Madrid')

day_timetables_polygons.loc[day_timetables_polygons['event_title'].isna(),'start_datetime'] = start_day_1
day_timetables_polygons.loc[day_timetables_polygons['event_title'].isna(),'end_datetime'] = end_day_3

# I also add some explicit labels for clarity
day_timetables_polygons.loc[day_timetables_polygons['event_title'].isna(),'sonar_type'] = 'Sónar by Day'
day_timetables_polygons.loc[day_timetables_polygons['event_title'].isna(),'event_title'] = 'No event'

# Print to visualize the changes
day_timetables_polygons.loc[day_timetables_polygons['event_title']=='No event'].drop(columns='geometry')

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2,polygon_floor
12,Sónar by Day,,2024-06-13 09:30:00+02:00,2024-06-16 00:00:00+02:00,No event,,NA-cashless,,,,,SONAR DIA - Cashless,p4,854.0,0
13,Sónar by Day,,2024-06-13 09:30:00+02:00,2024-06-16 00:00:00+02:00,No event,,NA-food_trucks,,,,,SONAR DIA - Food Trucks,p2,1714.0,0
14,Sónar by Day,,2024-06-13 09:30:00+02:00,2024-06-16 00:00:00+02:00,No event,,NA-lounge+d,,,,,SONAR DIA - Lounge+D,p5.0,608.0,0
15,Sónar by Day,,2024-06-13 09:30:00+02:00,2024-06-16 00:00:00+02:00,No event,,NA-lounge_barra,,,,,SONAR DIA - Lounge Barra,p5.0,101.0,0
16,Sónar by Day,,2024-06-13 09:30:00+02:00,2024-06-16 00:00:00+02:00,No event,,NA-sonar_hall_paso,,,,,SONAR DIA - SonarHall Paso,p2,6973.0,0


In Sónar by Day, there are some events that will not be geographically matched because there is no exact reference of where they happened. 

In [34]:
day_timetables_polygons.loc[day_timetables_polygons['polygon_name'].isna()].drop(columns='geometry')

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2,polygon_floor
25,Sónar by Day,Friday 14 June,2024-06-14 10:00:00+02:00,2024-06-14 14:00:00+02:00,AI & WEB3 Creative Summit,Talk,Room+D,,,,,,,,
26,Sónar by Day,Friday 14 June,2024-06-14 18:00:00+02:00,2024-06-14 19:00:00+02:00,AlphaTheta presents 'Euphonia' Workshop,Networking,Room+D,,,,,,,,
10,Sónar by Day,Friday 14 June,2024-06-14 15:00:00+02:00,2024-06-14 21:00:00+02:00,Espai Oníric,Exhibition,Espai Oníric,,,,,,,,
27,Sónar by Day,Friday 14 June,2024-06-14 16:00:00+02:00,2024-06-14 18:00:00+02:00,Future of Music with Revelator Labs & MUSIC x:...,Talk,Room+D 2,,,,,,,,
11,Sónar by Day,Saturday 15 June,2024-06-15 15:00:00+02:00,2024-06-15 21:00:00+02:00,Espai Oníric,Exhibition,Espai Oníric,,,,,,,,
24,Sónar by Day,Thursday 13 June,2024-06-13 18:00:00+02:00,2024-06-13 19:00:00+02:00,All Our Minds Workshop,Workshop,Room+D,,,,,,,,
9,Sónar by Day,Thursday 13 June,2024-06-13 15:00:00+02:00,2024-06-13 21:00:00+02:00,Espai Oníric,Exhibition,Espai Oníric,,,,,,,,
23,Sónar by Day,Thursday 13 June,2024-06-13 11:00:00+02:00,2024-06-13 13:30:00+02:00,Music Tech Sessions,Networking,Room+D,,,,,,,,


### Associating each trajectory point to their corresponding stage

I can read the trajectories dataframe without scikit-mobility (I do not need any of the functionalities).

In [35]:
trajectories_day = pd.read_csv(os.path.join(path_trajectories_preprocessed, 'tdf_day_preprocessed_compressed.csv'), 
                               dtype={'floor_num_added':'Int8', 'vendor_name':str})
trajectories_day.shape

(1191053, 16)

Converting to a GeoDataframe with the adequate characteristics.

In [36]:
# Getting the geometry and converting to Geotadaframe
trajectories_day['geometry'] = gpd.points_from_xy(trajectories_day['lng'], trajectories_day['lat'])
trajectories_day = gpd.GeoDataFrame(trajectories_day, geometry='geometry', crs=night_timetables_polygons.crs)

# Converting the date
trajectories_day['datetime'] = pd.to_datetime(trajectories_day['datetime'])
trajectories_day['datetime'] = trajectories_day['datetime'].dt.tz_convert('Europe/Madrid')  

In [37]:
trajectories_day.dtypes

uid                                                       object
macaddr_randomized                                         int64
tid                                                        int64
floor_num_added                                             Int8
label_day_floor_change_id                                 object
datetime                           datetime64[ns, Europe/Madrid]
timestamp_ap                                               int64
lat                                                      float64
lng                                                      float64
vendor_name                                               object
h3_cell_original                                          object
stage_original                                            object
observations_user_day_original                             int64
timespan_minutes_day_original                            float64
num_distinct_stage_day_original                            int64
minutes_per_stage_origina

Performing a spatial join with just the polygons to check the join is correctly performed (before perfroming the actual join with the night_timetables_polygons). I check both the inner join and the left join to see if there are differences. At this point, a point could be associated to multiple plygons because of the oerlap of polygons in the multi-floor area. This will be later filtered.

In [38]:
day_trajs_sjoin_left = gpd.sjoin(trajectories_day, day_polygons_clipped[['polygon_name','source_gis_file','polygon_floor','stage','stage_area_m2','geometry']], how='left', predicate='within')


print('Shape after left join:')
day_trajs_sjoin_left.shape

Shape after left join:


(1255049, 23)

In [39]:
day_trajs_sjoin_inner = gpd.sjoin(trajectories_day, day_polygons_clipped[['polygon_name','source_gis_file','polygon_floor','stage','stage_area_m2','geometry']], how='inner', predicate='within')

print('Shape after inner join:')
day_trajs_sjoin_inner.shape

Shape after inner join:


(1255045, 23)

In [40]:
# There are few not joined points, which might correspond to a small change of position due to the trajectory compression
no_points_in_polygon = day_trajs_sjoin_left[(day_trajs_sjoin_left['polygon_name'].isna())]
no_points_in_polygon.drop(columns='geometry')

Unnamed: 0,uid,macaddr_randomized,tid,floor_num_added,label_day_floor_change_id,datetime,timestamp_ap,lat,lng,vendor_name,...,observations_user_day_original,timespan_minutes_day_original,num_distinct_stage_day_original,minutes_per_stage_original,index_right,polygon_name,source_gis_file,polygon_floor,stage,stage_area_m2
412132,544216fa2250cf4925003407403cba9c1d0c46019a29ba...,1,3,0,3_4,2024-06-15 17:20:53+02:00,1718464853,41.372194,2.151736,,...,1190,354.87,7,50.7,,,,,,
412137,544216fa2250cf4925003407403cba9c1d0c46019a29ba...,1,3,0,3_4,2024-06-15 17:22:43+02:00,1718464963,41.372194,2.151736,,...,1190,354.87,7,50.7,,,,,,
925297,c1f1d408cdda39d3a7d46b0395afae9f8fd48649cae223...,1,2,0,2_0,2024-06-14 23:03:57+02:00,1718399037,41.373066,2.151757,,...,2480,490.32,4,122.58,,,,,,
925300,c1f1d408cdda39d3a7d46b0395afae9f8fd48649cae223...,1,2,0,2_0,2024-06-14 23:05:46+02:00,1718399146,41.373066,2.151757,,...,2480,490.32,4,122.58,,,,,,


### Associating each trajectory point to an event

As practically all the points were joined, for a cleaner output, I apply the inner join with the night_polygons_clipped and obtain the stages and their corresponding event timetables.

In [41]:
trajectories_events_day = gpd.sjoin(trajectories_day, day_timetables_polygons, how='inner', predicate='within')
trajectories_events_day.shape

(17881279, 33)

To make the association with the actual events, I need to filter with the hour of the events and the floor number.

In [42]:
# Keep only rows where the datetime is within the event's start and end time
# There is no overlap between the events that happen in the same stage, so I can use the <= condition on the upper bound
trajectories_events_matched_day = trajectories_events_day.loc[(trajectories_events_day['datetime'] >= trajectories_events_day['start_datetime']) &
                                                                 (trajectories_events_day['datetime'] <= trajectories_events_day['end_datetime']) &
                                                                 (trajectories_events_day['floor_num_added'] == trajectories_events_day['polygon_floor'])]
trajectories_events_matched_day.shape

(1029566, 33)

There are points that were geographically found, but were discarded with the time of events. In order to keep those trajectory points, I can find the difference in between the dataframes.

Since there are not duplicates of uid and datetime anymore, I can find the unmatched trajectory points and add them back to the matched trajectory points (with a specific label) and obtain the filtered trajectories_events_night.

In [43]:
# There are no duplicated trajectory points based on 'anonymized_macaddr' and 'datetime'
duplicates_traj_day = day_trajs_sjoin_inner.loc[~day_trajs_sjoin_inner['source_gis_file'].isin(['p5.0','p5.1','p5.2'])].groupby(['uid', 'datetime']).size().reset_index(name='duplicate_count')
print(f'Duplicate count based on uid and datetime: {len(duplicates_traj_day[duplicates_traj_day["duplicate_count"] > 1])}')

Duplicate count based on uid and datetime: 0


In Sónar by day I need also to filter out the unmatched points that do not belong to the actual floor (this must be done after the association of the trajectories with the polgons)

In [44]:
# Getting the unmacthed points with the function defined above
unmatched_points_events_day = get_unmatched_points(reference_df=day_trajs_sjoin_inner,
                                                     other_df=trajectories_events_matched_day[['uid','datetime', 'event_title']],
                                                     columns_join=['uid','datetime'])

# Fltering out the points that do not correspond to 
unmatched_points_events_day = unmatched_points_events_day.loc[unmatched_points_events_day['floor_num_added'] == unmatched_points_events_day['polygon_floor']]

unmatched_points_events_day.drop(columns='geometry')

Unnamed: 0,uid,macaddr_randomized,tid,floor_num_added,label_day_floor_change_id,datetime,timestamp_ap,lat,lng,vendor_name,...,timespan_minutes_day_original,num_distinct_stage_day_original,minutes_per_stage_original,index_right,polygon_name,source_gis_file,polygon_floor,stage,stage_area_m2,event_title
5,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 18:00:22+02:00,1718380822,41.373378,2.152060,,...,256.27,4,64.07,6,SONAR DIA - SonarVillage,p3,0,SonarVillage,9366,
6,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 18:02:21+02:00,1718380941,41.373387,2.151835,,...,256.27,4,64.07,6,SONAR DIA - SonarVillage,p3,0,SonarVillage,9366,
7,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 18:02:34+02:00,1718380954,41.373582,2.151956,,...,256.27,4,64.07,6,SONAR DIA - SonarVillage,p3,0,SonarVillage,9366,
8,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 18:04:08+02:00,1718381048,41.373375,2.152077,,...,256.27,4,64.07,6,SONAR DIA - SonarVillage,p3,0,SonarVillage,9366,
9,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 18:04:23+02:00,1718381063,41.373582,2.151956,,...,256.27,4,64.07,6,SONAR DIA - SonarVillage,p3,0,SonarVillage,9366,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1255031,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,1,2,1_1,2024-06-13 18:48:13+02:00,1718297293,41.372453,2.151916,"Apple, Inc.",...,482.65,4,120.66,16,SONAR DIA - SonarComplex,p5.2,2,Complex+D,1092,
1255032,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,1,0,1_2,2024-06-13 20:18:48+02:00,1718302728,41.372510,2.151637,"Apple, Inc.",...,482.65,4,120.66,12,SONAR DIA - SonarÀgora,p5.0,0,SonarÀgora,583,
1255038,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,0,2_0,2024-06-14 17:29:18+02:00,1718378958,41.372568,2.151624,"Apple, Inc.",...,736.40,4,184.10,12,SONAR DIA - SonarÀgora,p5.0,0,SonarÀgora,583,
1255041,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,1,2_1,2024-06-14 23:06:18+02:00,1718399178,41.372263,2.152061,"Apple, Inc.",...,736.40,4,184.10,14,SONAR DIA - Project Area,p5.1,1,Project Area,2603,


Assigning an explicit label for the event title and concatenating the two dataframes into the final `trajectories_events_day`.

In [45]:
unmatched_points_events_day['event_title'] = 'No event'

In [46]:
trajectories_events_day = pd.concat([trajectories_events_matched_day, unmatched_points_events_day]).sort_values(by=['uid','datetime'])
trajectories_events_day.drop(columns='geometry')

Unnamed: 0,uid,macaddr_randomized,tid,floor_num_added,label_day_floor_change_id,datetime,timestamp_ap,lat,lng,vendor_name,...,activity_type,stage,music_type,genre,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2,polygon_floor
0,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:45:49+02:00,1718379949,41.373153,2.151547,,...,Music,SonarVillage,DJ,,,,SONAR DIA - SonarVillage VIP,p3,9366.0,0
1,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:56:08+02:00,1718380568,41.373399,2.151526,,...,Music,SonarVillage,DJ,,,,SONAR DIA - SonarVillage VIP,p3,9366.0,0
2,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:57:26+02:00,1718380646,41.373175,2.151470,,...,Music,SonarVillage,DJ,,,,SONAR DIA - SonarVillage VIP,p3,9366.0,0
3,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:57:57+02:00,1718380677,41.373399,2.151526,,...,Music,SonarVillage,DJ,,,,SONAR DIA - SonarVillage VIP,p3,9366.0,0
4,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:59:15+02:00,1718380755,41.373175,2.151470,,...,Music,SonarVillage,DJ,,,,SONAR DIA - SonarVillage VIP,p3,9366.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1191048,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,0,2_0,2024-06-14 20:26:30+02:00,1718389590,41.372121,2.151921,"Apple, Inc.",...,,NA-lounge+d,,,,,SONAR DIA - Lounge+D,p5.0,608.0,0
1255041,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,1,2_1,2024-06-14 23:06:18+02:00,1718399178,41.372263,2.152061,"Apple, Inc.",...,,Project Area,,,,,SONAR DIA - Project Area,p5.1,2603.0,1
1191050,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,0,2_2,2024-06-14 23:07:24+02:00,1718399244,41.372204,2.151938,"Apple, Inc.",...,,NA-lounge+d,,,,,SONAR DIA - Lounge+D,p5.0,608.0,0
1255043,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,1,2_3,2024-06-14 23:08:08+02:00,1718399288,41.372263,2.152061,"Apple, Inc.",...,,Project Area,,,,,SONAR DIA - Project Area,p5.1,2603.0,1


**The points were correctly associated with their corresponding events.** The difference in size between this table and the original trajectories is just the points that do not fall within the polygons anymore due to the trajectory compression. 

Also, there is a small difference between the size of the geographically unmatched points in the end of the process and the points not matched above (`no_points_in_polygon`) because the latter did not take into accound the floor number and needed the association with the trajectories' floor number (manually checked).

In [None]:
no_points_in_polygon_updated =  get_unmatched_points(trajectories_day, 
                                                     trajectories_events_day,
                                                     columns_join=['uid','datetime'])

no_points_in_polygon_updated.drop(columns={'geometry_x','geometry_y'})

Unnamed: 0,uid,macaddr_randomized_x,tid_x,floor_num_added_x,label_day_floor_change_id_x,datetime,timestamp_ap_x,lat_x,lng_x,vendor_name_x,...,activity_type,stage,music_type,genre,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2,polygon_floor
412132,544216fa2250cf4925003407403cba9c1d0c46019a29ba...,1,3,0,3_4,2024-06-15 17:20:53+02:00,1718464853,41.372194,2.151736,,...,,,,,,,,,,
412137,544216fa2250cf4925003407403cba9c1d0c46019a29ba...,1,3,0,3_4,2024-06-15 17:22:43+02:00,1718464963,41.372194,2.151736,,...,,,,,,,,,,
470447,61b1ce6e61415207dcc8fe54e9577c4ed6e067d823ba08...,1,3,0,3_0,2024-06-15 20:12:33+02:00,1718475153,41.37223,2.151837,,...,,,,,,,,,,
560204,770fa04ab23e364f9de417304185dcdbe291ac94c5a338...,1,1,0,1_9,2024-06-13 14:53:06+02:00,1718283186,41.372223,2.15182,,...,,,,,,,,,,
796784,a359cd7daae5899256efe4c5d1342a86dfd2d85fa15a06...,1,2,0,2_4,2024-06-14 20:54:13+02:00,1718391253,41.372219,2.151811,,...,,,,,,,,,,
925297,c1f1d408cdda39d3a7d46b0395afae9f8fd48649cae223...,1,2,0,2_0,2024-06-14 23:03:57+02:00,1718399037,41.373066,2.151757,,...,,,,,,,,,,
925300,c1f1d408cdda39d3a7d46b0395afae9f8fd48649cae223...,1,2,0,2_0,2024-06-14 23:05:46+02:00,1718399146,41.373066,2.151757,,...,,,,,,,,,,
1129356,f2a02bc67a50072c12d3fd6d7d58307426681dd95ff02e...,1,2,0,2_0,2024-06-14 20:35:21+02:00,1718390121,41.372218,2.151805,,...,,,,,,,,,,


### Associating each updated trajectory point to their corresponding H3 cells

For plotting purposes, I also get the updated h3 cell from the h3 API. As the trajectory compression might have sligthly changed some locations, I recoompute the h3_cell for safety.

In [51]:
trajectories_events_day['h3_cell'] = [h3.latlng_to_cell(lat, lng, 13) for lat, lng in zip(trajectories_events_day['lat'], trajectories_events_day['lng'])]

In [52]:
print(f"There are {(trajectories_events_day['h3_cell_original'] != trajectories_events_day['h3_cell']).sum()} points that changed their h3_cell after the trajectory preprocessing.")

There are 6120 points that changed their h3_cell after the trajectory preprocessing.


## Writing the trajectories with their associated events 

### Writing the Sónar by night files.

Selecting the final columns that will be analyzed.

In [53]:
trajectories_events_night.columns

Index(['uid', 'macaddr_randomized', 'tid', 'datetime', 'timestamp_ap', 'lat',
       'lng', 'vendor_name', 'h3_cell_original', 'stage_original',
       'observations_user_night_original', 'timespan_minutes_night_original',
       'num_distinct_stage_night_original', 'minutes_per_stage_original',
       'geometry', 'index_right', 'sonar_type', 'day_label', 'start_datetime',
       'end_datetime', 'event_title', 'activity_type', 'stage', 'music_type',
       'genre', 'genre_grouped', 'views_youtube', 'polygon_name',
       'source_gis_file', 'stage_area_m2', 'h3_cell'],
      dtype='object')

In [54]:
selected_columns_night = ['uid', 'macaddr_randomized',
                          'tid',                               # Corresponds to the renamed label_night 
                          'datetime', 'timestamp_ap',          # Both formats if I need to do quick computations with the trajectories' timestamps
                          'lat', 'lng', 
                          'vendor_name', 
                          'sonar_type',                        # 'day_label', I discard the day_label column from the timetables to avoid confusions 
                          'start_datetime', 'end_datetime',    # Start and end of the events 
                          'event_title','music_type',          # activity_type is always Music in Sónar by night
                          'genre_grouped','views_youtube', 
                          'polygon_name', 'stage', 'stage_area_m2', 'h3_cell', # Relative location columns
                          'timespan_minutes_night_original', 'num_distinct_stage_night_original', 'minutes_per_stage_original',   # Old metrics obtained before trajectory preprocessing
                          'geometry'
                          ]
trajectories_events_night = trajectories_events_night[selected_columns_night]
trajectories_events_night.drop(columns='geometry')

Unnamed: 0,uid,macaddr_randomized,tid,datetime,timestamp_ap,lat,lng,vendor_name,sonar_type,start_datetime,...,music_type,genre_grouped,views_youtube,polygon_name,stage,stage_area_m2,h3_cell,timespan_minutes_night_original,num_distinct_stage_night_original,minutes_per_stage_original
0,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:38:47+02:00,1718483927,41.353501,2.129162,,Sónar by Night,2024-06-14 19:50:00+02:00,...,,,,SONAR NIT - Entrada,NA-Entrada,5438,8d394461e82b5bf,340.90,7,48.70
1,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:39:19+02:00,1718483959,41.353440,2.129005,,Sónar by Night,2024-06-14 19:50:00+02:00,...,,,,SONAR NIT - Entrada,NA-Entrada,5438,8d394461e82a67f,340.90,7,48.70
2,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:40:37+02:00,1718484037,41.353501,2.129162,,Sónar by Night,2024-06-14 19:50:00+02:00,...,,,,SONAR NIT - Entrada,NA-Entrada,5438,8d394461e82b5bf,340.90,7,48.70
3,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:49:15+02:00,1718484555,41.353426,2.128963,,Sónar by Night,2024-06-14 19:50:00+02:00,...,,,,SONAR NIT - Entrada,NA-Entrada,5438,8d394461e82a67f,340.90,7,48.70
4,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:49:21+02:00,1718484561,41.353365,2.128970,,Sónar by Night,2024-06-14 19:50:00+02:00,...,,,,SONAR NIT - Entrada,NA-Entrada,5438,8d394461e82a6ff,340.90,7,48.70
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
630777,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:26:21+02:00,1718425581,41.354558,2.130617,,Sónar by Night,2024-06-15 05:30:00+02:00,...,DJ,electronic_hypnotic,0.0,SONAR NIT - SonarLab,SonarLab x Printworks,9171,8d394461e95b2ff,510.43,5,102.09
630778,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:26:35+02:00,1718425595,41.354326,2.130864,,Sónar by Night,2024-06-15 05:30:00+02:00,...,DJ,electronic_hypnotic,0.0,SONAR NIT - SonarLab,SonarLab x Printworks,9171,8d394461e875abf,510.43,5,102.09
630779,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:27:57+02:00,1718425677,41.354300,2.130898,,Sónar by Night,2024-06-15 05:30:00+02:00,...,DJ,electronic_hypnotic,0.0,SONAR NIT - SonarLab Barra 1,SonarLab x Printworks,9171,8d394461e87533f,510.43,5,102.09
630780,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:28:41+02:00,1718425721,41.354254,2.130618,,Sónar by Night,2024-06-15 05:30:00+02:00,...,DJ,electronic_hypnotic,0.0,SONAR NIT - SonarLab,SonarLab x Printworks,9171,8d394461e875cff,510.43,5,102.09


Writing the files.

In [55]:
trajectories_events_night.to_csv(os.path.join(path_trajectories_events,'trajectories_events_night_compressed.csv'),index=False)

### Writing the Sónar by day files.

Selecting the final columns that will be analyzed.

In [56]:
trajectories_events_day.columns

Index(['uid', 'macaddr_randomized', 'tid', 'floor_num_added',
       'label_day_floor_change_id', 'datetime', 'timestamp_ap', 'lat', 'lng',
       'vendor_name', 'h3_cell_original', 'stage_original',
       'observations_user_day_original', 'timespan_minutes_day_original',
       'num_distinct_stage_day_original', 'minutes_per_stage_original',
       'geometry', 'index_right', 'sonar_type', 'day_label', 'start_datetime',
       'end_datetime', 'event_title', 'activity_type', 'stage', 'music_type',
       'genre', 'genre_grouped', 'views_youtube', 'polygon_name',
       'source_gis_file', 'stage_area_m2', 'polygon_floor', 'h3_cell'],
      dtype='object')

In [57]:
selected_columns_day = ['uid', 'macaddr_randomized',
                        'tid',                                                              # Corresponds to the renamed label_night
                        'floor_num_added', 'label_day_floor_change_id',                     # Columns related to the movement between floor
                        'datetime', 'timestamp_ap',                                         # Both formats if I need to do quick computations with the trajectories' timestamps
                        'lat', 'lng', 
                        'vendor_name', 
                        'sonar_type',                                                       # 'day_label', I discard the day_label column from the timetables to avoid confusions 
                        'start_datetime', 'end_datetime',                                   # Start and end of the events 
                        'event_title', 'activity_type', 'music_type', 
                        'genre_grouped','views_youtube',                            
                        'polygon_name', 'stage', 'h3_cell', 'stage_area_m2',                # Location related columns
                        'timespan_minutes_day_original', 'num_distinct_stage_day_original', 'minutes_per_stage_original',   # Old metrics obtained before trajectory preprocessing
                        'geometry'
                        ]
trajectories_events_day = trajectories_events_day[selected_columns_day]
trajectories_events_day.drop(columns='geometry')

Unnamed: 0,uid,macaddr_randomized,tid,floor_num_added,label_day_floor_change_id,datetime,timestamp_ap,lat,lng,vendor_name,...,music_type,genre_grouped,views_youtube,polygon_name,stage,h3_cell,stage_area_m2,timespan_minutes_day_original,num_distinct_stage_day_original,minutes_per_stage_original
0,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:45:49+02:00,1718379949,41.373153,2.151547,,...,DJ,,,SONAR DIA - SonarVillage VIP,SonarVillage,8d394461ca7267f,9366.0,256.27,4,64.07
1,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:56:08+02:00,1718380568,41.373399,2.151526,,...,DJ,,,SONAR DIA - SonarVillage VIP,SonarVillage,8d394461ca72bbf,9366.0,256.27,4,64.07
2,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:57:26+02:00,1718380646,41.373175,2.151470,,...,DJ,,,SONAR DIA - SonarVillage VIP,SonarVillage,8d394461ca7273f,9366.0,256.27,4,64.07
3,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:57:57+02:00,1718380677,41.373399,2.151526,,...,DJ,,,SONAR DIA - SonarVillage VIP,SonarVillage,8d394461ca72bbf,9366.0,256.27,4,64.07
4,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:59:15+02:00,1718380755,41.373175,2.151470,,...,DJ,,,SONAR DIA - SonarVillage VIP,SonarVillage,8d394461ca7273f,9366.0,256.27,4,64.07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1191048,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,0,2_0,2024-06-14 20:26:30+02:00,1718389590,41.372121,2.151921,"Apple, Inc.",...,,,,SONAR DIA - Lounge+D,NA-lounge+d,8d394461ca520bf,608.0,736.40,4,184.10
1255041,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,1,2_1,2024-06-14 23:06:18+02:00,1718399178,41.372263,2.152061,"Apple, Inc.",...,,,,SONAR DIA - Project Area,Project Area,8d394461ca52abf,2603.0,736.40,4,184.10
1191050,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,0,2_2,2024-06-14 23:07:24+02:00,1718399244,41.372204,2.151938,"Apple, Inc.",...,,,,SONAR DIA - Lounge+D,NA-lounge+d,8d394461ca5203f,608.0,736.40,4,184.10
1255043,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,1,2_3,2024-06-14 23:08:08+02:00,1718399288,41.372263,2.152061,"Apple, Inc.",...,,,,SONAR DIA - Project Area,Project Area,8d394461ca52abf,2603.0,736.40,4,184.10


Writing the files.

In [58]:
trajectories_events_day.to_csv(os.path.join(path_trajectories_events,'trajectories_events_day_compressed.csv'),index=False)