# Association of the *filtered* trajectories with the events

In the end, I need each point of the users' trajectories to be associated with their corresponding event. This notebook performs the operations needed for that. It takes as input the (fixed) timetables, the (fixed) GIS polygons, and the (reduced) connections. 

This file first filters the connection data and then associates it with their corresponding events.

In [6]:
import os

import pandas as pd
import geopandas as gpd
import h3

from shapely.geometry import Point

In [7]:
# Importing a custom module in a different file
import sys
sys.path.append('C:\Camilo\Estudio\Padova\Master thesis\master-thesis-code')
import constants

Assigning the path to read the preprocessed trajectories files:   

In [8]:
path_trajectories_preprocessed = r'..\..\Datasets\Processed\trajectories_preprocessed'

Assigning the path to write the trajectories files with their associated events (the result of this script):   

In [9]:
path_trajectories_events = r'..\..\Datasets\Processed\trajectories_events'

For both Sónar by night and Sónar by day, I first need to read the timetables.

In [10]:
# Reading the timetables and renaming two columns for clarity
sonar_timetables = pd.read_csv(r'..\..\Datasets\Processed\sonar_timetables_preprocessed.csv',
                               parse_dates = ['start_datetime','end_datetime'])
sonar_timetables.rename(columns={'title':'event_title','activity':'activity_type'}, inplace=True)

# Adding the timezone information so that the times are handled correctly
sonar_timetables['start_datetime'] = sonar_timetables['start_datetime'].dt.tz_localize('Europe/Madrid')
sonar_timetables['end_datetime'] = sonar_timetables['end_datetime'].dt.tz_localize('Europe/Madrid')

In [11]:
# Selecting only the relevant columns
sonar_timetables = sonar_timetables[['sonar_type', 'day_label', 'start_datetime', 'end_datetime',
                                     'event_title', 'activity_type', 'stage', 'music_type',
                                     'genre_grouped','views_youtube']]

In [12]:
sonar_timetables.dtypes

sonar_type                               object
day_label                                object
start_datetime    datetime64[ns, Europe/Madrid]
end_datetime      datetime64[ns, Europe/Madrid]
event_title                              object
activity_type                            object
stage                                    object
music_type                               object
genre_grouped                            object
views_youtube                           float64
dtype: object

## Association - Sónar by night process

### Associating the timetables with the polygons

As a starting point, I need an intermediate table that associates the timetables with their geographical information (contained in the polygons).

In [13]:
# Reading the polygons and combining them in a single GeoDataFrame
night_polygons_clipped = gpd.read_file(r'..\..\Datasets\Processed\Zonas SONAR clipped\sonar_night_polygons_clipped.json')

In [14]:
sonar_timetables

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre_grouped,views_youtube
0,Sónar by Day,Thursday 13 June,2024-06-13 15:00:00+02:00,2024-06-13 16:00:00+02:00,Rumbler,Music,SonarVillage,DJ,,
1,Sónar by Day,Thursday 13 June,2024-06-13 16:05:00+02:00,2024-06-13 16:50:00+02:00,Huda,Music,SonarVillage,LIVE,,
2,Sónar by Day,Thursday 13 June,2024-06-13 17:00:00+02:00,2024-06-13 18:10:00+02:00,Olof Dreijer & Diva Cruz (DJ + Percussion set),Music,SonarVillage,LIVE,,
3,Sónar by Day,Thursday 13 June,2024-06-13 18:20:00+02:00,2024-06-13 19:00:00+02:00,Toya Delazy,Music,SonarVillage,LIVE,,
4,Sónar by Day,Thursday 13 June,2024-06-13 19:05:00+02:00,2024-06-13 20:30:00+02:00,Surusinghe,Music,SonarVillage,DJ,,
...,...,...,...,...,...,...,...,...,...,...
141,Sónar by Night,Friday 15 June,2024-06-16 01:05:00+02:00,2024-06-16 01:55:00+02:00,Club Cringe,Music,SonarCar,DJ,electronic_hypnotic,55831.0
142,Sónar by Night,Friday 15 June,2024-06-16 02:05:00+02:00,2024-06-16 02:55:00+02:00,Julietta Ferrari,Music,SonarCar,DJ,electronic_hypnotic,0.0
143,Sónar by Night,Friday 15 June,2024-06-16 03:05:00+02:00,2024-06-16 03:55:00+02:00,Soto Asa,Music,SonarCar,LIVE,other_genres,153053094.0
144,Sónar by Night,Friday 15 June,2024-06-16 04:00:00+02:00,2024-06-16 04:50:00+02:00,Drazzit,Music,SonarCar,DJ,electronic_hypnotic,22948.0


In [15]:
# In this case I use an outer join because there are events 
# with no geographic information associated to them (e.g. they happen at Room+D -I did not find the corresponding polygon-),
# or are places that are not related to events (e.g. cashless areas, restaurants, etc.)
# and I do not want to discard any of them yet
night_timetables_polygons = pd.merge(sonar_timetables.loc[sonar_timetables['sonar_type']=='Sónar by Night'], 
                                     night_polygons_clipped[['polygon_name','source_gis_file','stage', 'stage_area_m2','geometry']],
                                     how='outer', on='stage')
night_timetables_polygons.sort_values(by=['sonar_type','day_label','event_title'], inplace=True)
night_timetables_polygons = gpd.GeoDataFrame(night_timetables_polygons)
night_timetables_polygons.drop(columns='geometry').head()

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2
51,Sónar by Night,Friday 14 June,2024-06-15 02:30:00+02:00,2024-06-15 04:00:00+02:00,Adriatique,Music,SonarClub,DJ,electronic_accessible,28697840.0,SONAR NIT - Zona VIP Club,av1-2,14914
52,Sónar by Night,Friday 14 June,2024-06-15 02:30:00+02:00,2024-06-15 04:00:00+02:00,Adriatique,Music,SonarClub,DJ,electronic_accessible,28697840.0,SONAR NIT - Zona VIP Club Barra,av1-2,14914
53,Sónar by Night,Friday 14 June,2024-06-15 02:30:00+02:00,2024-06-15 04:00:00+02:00,Adriatique,Music,SonarClub,DJ,electronic_accessible,28697840.0,SONAR NIT - SonarClub,p2,14914
54,Sónar by Night,Friday 14 June,2024-06-15 02:30:00+02:00,2024-06-15 04:00:00+02:00,Adriatique,Music,SonarClub,DJ,electronic_accessible,28697840.0,SONAR NIT - SonarClub Barra la Nueva,p2,14914
55,Sónar by Night,Friday 14 June,2024-06-15 02:30:00+02:00,2024-06-15 04:00:00+02:00,Adriatique,Music,SonarClub,DJ,electronic_accessible,28697840.0,SONAR NIT - SonarClub Barra,p2,14914


I need to add a start_datetime and a end_datetime for the polygons that are not in the timetables, so that I do not lose the observations that fall in these zones when filtering by time.

In [16]:
night_timetables_polygons.loc[night_timetables_polygons['event_title'].isna()].drop(columns='geometry')

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2
0,,,NaT,NaT,,,NA-Cashless1,,,,SONAR NIT - Cashless 1,p2,947
1,,,NaT,NaT,,,NA-Entrada,,,,SONAR NIT - Entrada,p1,5438
2,,,NaT,NaT,,,NA-Restauración,,,,SONAR NIT - Restauración,p3,5695
3,,,NaT,NaT,,,NA-autos_choques,,,,SONAR NIT - Autos de choques,p3,1522
4,,,NaT,NaT,,,NA-autos_choques_barra,,,,SONAR NIT - Autos de choques Barra,p3,1609


In [17]:
# Adding the start time and the end_datetime as the minimum and maximum times considered for the festival
# These were defined in the 3.preprocessing_filtering_splitting file and stored in the constants.py file

start_night_1 = pd.Timestamp(constants.START_NIGHT_1_STRING, tz='Europe/Madrid')
end_night_2 = pd.Timestamp(constants.END_NIGHT_2_STRING, tz='Europe/Madrid')

night_timetables_polygons.loc[night_timetables_polygons['event_title'].isna(),'start_datetime'] = start_night_1
night_timetables_polygons.loc[night_timetables_polygons['event_title'].isna(),'end_datetime'] = end_night_2

# I also add some explicit labels for clarity
night_timetables_polygons.loc[night_timetables_polygons['event_title'].isna(),'sonar_type'] = 'Sónar by Night'
night_timetables_polygons.loc[night_timetables_polygons['event_title'].isna(),'event_title'] = 'No event'

# Print to visualize the changes
night_timetables_polygons.loc[night_timetables_polygons['event_title']=='No event'].drop(columns='geometry')

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2
0,Sónar by Night,,2024-06-14 19:50:00+02:00,2024-06-16 08:00:00+02:00,No event,,NA-Cashless1,,,,SONAR NIT - Cashless 1,p2,947
1,Sónar by Night,,2024-06-14 19:50:00+02:00,2024-06-16 08:00:00+02:00,No event,,NA-Entrada,,,,SONAR NIT - Entrada,p1,5438
2,Sónar by Night,,2024-06-14 19:50:00+02:00,2024-06-16 08:00:00+02:00,No event,,NA-Restauración,,,,SONAR NIT - Restauración,p3,5695
3,Sónar by Night,,2024-06-14 19:50:00+02:00,2024-06-16 08:00:00+02:00,No event,,NA-autos_choques,,,,SONAR NIT - Autos de choques,p3,1522
4,Sónar by Night,,2024-06-14 19:50:00+02:00,2024-06-16 08:00:00+02:00,No event,,NA-autos_choques_barra,,,,SONAR NIT - Autos de choques Barra,p3,1609


### Associating each trajectory point to their corresponding stage

I can read the trajectories dataframe without scikit-mobility (I do not need any of the functionalities).

In [18]:
trajectories_night = pd.read_csv(os.path.join(path_trajectories_preprocessed, 'tdf_night_preprocessed_filtered.csv'), dtype={'vendor_name':str})
trajectories_night.shape

(1244622, 14)

Converting to a GeoDataframe with the adequate characteristics.

In [19]:
# Getting the geometry and converting to Geotadaframe
trajectories_night['geometry'] = gpd.points_from_xy(trajectories_night['lng'], trajectories_night['lat'])
trajectories_night = gpd.GeoDataFrame(trajectories_night, geometry='geometry', crs=night_timetables_polygons.crs)

# Converting the date
trajectories_night['datetime'] = pd.to_datetime(trajectories_night['datetime'])
trajectories_night['datetime'] = trajectories_night['datetime'].dt.tz_convert('Europe/Madrid')  

In [20]:
trajectories_night.dtypes

uid                                                         object
macaddr_randomized                                           int64
tid                                                          int64
datetime                             datetime64[ns, Europe/Madrid]
timestamp_ap                                                 int64
lat                                                        float64
lng                                                        float64
vendor_name                                                 object
h3_cell_original                                            object
stage_original                                              object
observations_user_night_original                             int64
timespan_minutes_night_original                            float64
num_distinct_stage_night_original                            int64
minutes_per_stage_original                                 float64
geometry                                                  geom

#### Final duplicates removal

The preprocessing pipeline for filtered version of the night dataset has not explicitly excluded the duplicates based of `uid` and `datetime`. The previous steps only removed duplicates based on other columns, and duplicates based on `uid` and `datetime` with impossible speeds. There might exist a few duplicates still, so I check that.

In [21]:
duplicate_counts = trajectories_night.groupby(['uid', 'datetime']).size().reset_index(name='duplicate_count')
duplicate_counts = duplicate_counts[duplicate_counts['duplicate_count'] > 1]
duplicates_traj_night = pd.merge(trajectories_night, duplicate_counts[['uid', 'datetime']], on=['uid', 'datetime'], how='inner')


print(f'Duplicate count based on uid and datetime: {len(duplicate_counts)}')
duplicates_traj_night.drop(columns='geometry')

Duplicate count based on uid and datetime: 161


Unnamed: 0,uid,macaddr_randomized,tid,datetime,timestamp_ap,lat,lng,vendor_name,h3_cell_original,stage_original,observations_user_night_original,timespan_minutes_night_original,num_distinct_stage_night_original,minutes_per_stage_original
0,0285d1774a266b5c763a5c57e9d7ebb18a854a16d33b22...,1,2,2024-06-16 04:14:18+02:00,1718504058,41.354400,2.131886,,8d394461e8630bf,NA-autos_choques_barra,205,262.95,5,52.59
1,0285d1774a266b5c763a5c57e9d7ebb18a854a16d33b22...,1,2,2024-06-16 04:14:18+02:00,1718504058,41.354400,2.131886,,8d394461e8630bf,NA-autos_choques_barra,205,262.95,5,52.59
2,039d812282ae7d8464e3623a15d4b24a76ed63a7b54522...,0,2,2024-06-16 07:06:39+02:00,1718514399,41.354403,2.132555,"Apple, Inc.",8d394461e86e63f,SonarPub,31,187.12,3,62.37
3,039d812282ae7d8464e3623a15d4b24a76ed63a7b54522...,0,2,2024-06-16 07:06:39+02:00,1718514399,41.354403,2.132555,"Apple, Inc.",8d394461e86e63f,SonarPub,31,187.12,3,62.37
4,03ae7b0cd085d67effd22e8a74e2680b3221a094d0f01c...,1,2,2024-06-16 06:57:29+02:00,1718513849,41.353738,2.130372,,8d394461e87043f,SonarClub,2319,549.05,7,78.44
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
317,fa11c39cf01ed3a366fc7d0557d21771e3d45c7255de3e...,1,1,2024-06-15 00:10:07+02:00,1718403007,41.354425,2.133170,,8d394461e8686ff,SonarPub,13,192.37,2,96.18
318,fc248522ce71b87a066e6a944de1ece5acad3d3e16435f...,1,2,2024-06-16 07:03:47+02:00,1718514227,41.354119,2.129764,,8d394461e82933f,SonarClub,97,474.77,5,94.95
319,fc248522ce71b87a066e6a944de1ece5acad3d3e16435f...,1,2,2024-06-16 07:03:47+02:00,1718514227,41.354119,2.129764,,8d394461e82933f,SonarClub,97,474.77,5,94.95
320,fe37a7c0bffbf522a0a0f9dcff03d92e1131a759a8badd...,1,2,2024-06-16 07:18:50+02:00,1718515130,41.352954,2.129544,,8d394461e80817f,NA-Entrada,3053,511.23,8,63.90


As the remaining duplicates are exact copies, I can simply remove them.

In [22]:
print(f'Filtered trajectory shape with duplicates: {len(trajectories_night)}')

trajectories_night = trajectories_night.drop_duplicates(subset=['uid', 'datetime']) # Drop duplicates based on 'uid' and 'datetime'

print(f'Filtered trajectory shape without duplicates: {len(trajectories_night)}')

Filtered trajectory shape with duplicates: 1244622
Filtered trajectory shape without duplicates: 1244461


In [23]:
print('Counts by MAC address type:')

trajectories_night.groupby('macaddr_randomized').size()

Counts by MAC address type:


macaddr_randomized
0     134850
1    1109611
dtype: int64

#### Join for the association

Performing a spatial join with just the polygons to check the join is correctly performed (before perfroming the actual join with the night_timetables_polygons). I check both the inner join and the left join to see if there are differences.

In [18]:
night_trajs_sjoin_left = gpd.sjoin(trajectories_night, night_polygons_clipped[['polygon_name','source_gis_file','stage','stage_area_m2','geometry']], how='left', predicate='within')

print('Shape after left join:')
night_trajs_sjoin_left.shape

Shape after left join:


(1244461, 20)

In [19]:
night_trajs_sjoin_inner = gpd.sjoin(trajectories_night, night_polygons_clipped[['polygon_name','source_gis_file','stage','stage_area_m2','geometry']], how='inner', predicate='within')

print('Shape after inner join:')
night_trajs_sjoin_inner.shape

Shape after inner join:


(1244461, 20)

All points were correctly joined spatially.

### Associating each trajectory point to an event

I apply the inner join with the night_timetables_polygons and obtain the stages and their corresponding event timetables.

In [20]:
trajectories_events_night = gpd.sjoin(trajectories_night, night_timetables_polygons, how='inner', predicate='within')
trajectories_events_night.shape

(13652505, 29)

To make the association with the actual events, I need to filter with the hour of the events.

In [21]:
# Keep only rows where the datetime is within the event's start and end time
# There is no overlap between the events that happen in the same stage, so I can use the <= condition on the upper bound
trajectories_events_matched_night = trajectories_events_night.loc[(trajectories_events_night['datetime'] >= trajectories_events_night['start_datetime']) &
                                                                 (trajectories_events_night['datetime'] <= trajectories_events_night['end_datetime'])]
trajectories_events_matched_night.shape

(1127063, 29)

There are points that were geographically found, but were discarded with the time of events. In order to keep those trajectory points, I can find the difference between the dataframes.

Since there are not duplicates of uid and datetime anymore, I can find the unmatched trajectory points and add them back to the matched trajectory points (with a specific label) and obtain the filtered trajectories_events_night.

In [22]:
def get_unmatched_points(reference_df, other_df, columns_join):
    
    merged = pd.merge(reference_df, other_df, 
                      on=columns_join,
                      indicator=True, how='left')
    diff_df = merged[merged['_merge'] == 'left_only']
    del diff_df['_merge']
    return diff_df

In [23]:
unmatched_points_events_night = get_unmatched_points(reference_df=night_trajs_sjoin_inner,
                                                     other_df=trajectories_events_matched_night[['uid','datetime', 'event_title']],
                                                     columns_join=['uid','datetime'])
unmatched_points_events_night.drop(columns='geometry')

Unnamed: 0,uid,macaddr_randomized,tid,datetime,timestamp_ap,lat,lng,vendor_name,h3_cell_original,stage_original,observations_user_night_original,timespan_minutes_night_original,num_distinct_stage_night_original,minutes_per_stage_original,index_right,polygon_name,source_gis_file,stage,stage_area_m2,event_title
80,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 23:33:11+02:00,1718487191,41.354105,2.129860,,8d394461e82927f,SonarClub,586,340.90,7,48.70,9,SONAR NIT - SonarClub,p2,SonarClub,14914,
81,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 23:33:25+02:00,1718487205,41.354046,2.129898,,8d394461e8292ff,SonarClub,586,340.90,7,48.70,9,SONAR NIT - SonarClub,p2,SonarClub,14914,
82,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 23:33:36+02:00,1718487216,41.353951,2.129868,,8d394461e87693f,SonarClub,586,340.90,7,48.70,9,SONAR NIT - SonarClub,p2,SonarClub,14914,
83,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 23:33:42+02:00,1718487222,41.353923,2.129927,,8d394461e87683f,SonarClub,586,340.90,7,48.70,9,SONAR NIT - SonarClub,p2,SonarClub,14914,
84,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 23:34:31+02:00,1718487271,41.353923,2.129927,,8d394461e87683f,SonarClub,586,340.90,7,48.70,9,SONAR NIT - SonarClub,p2,SonarClub,14914,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1244186,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 03:22:10+02:00,1718414530,41.354137,2.130741,,8d394461e8757bf,SonarLab x Printworks,714,510.43,5,102.09,2,SONAR NIT - SonarLab,av2-3,SonarLab x Printworks,9171,
1244187,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 03:23:09+02:00,1718414589,41.354479,2.130749,,8d394461e87587f,SonarLab x Printworks,714,510.43,5,102.09,2,SONAR NIT - SonarLab,av2-3,SonarLab x Printworks,9171,
1244188,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 03:23:57+02:00,1718414637,41.354137,2.130741,,8d394461e8757bf,SonarLab x Printworks,714,510.43,5,102.09,2,SONAR NIT - SonarLab,av2-3,SonarLab x Printworks,9171,
1244189,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 03:23:59+02:00,1718414639,41.354137,2.130741,,8d394461e8757bf,SonarLab x Printworks,714,510.43,5,102.09,2,SONAR NIT - SonarLab,av2-3,SonarLab x Printworks,9171,


Assigning an explicit label for the event title and concatenating the two dataframes into the final `trajectories_events_night`.

In [24]:
unmatched_points_events_night['event_title'] = 'No event'

**The points were correctly associated with their corresponding events.**

In [25]:
trajectories_events_night = pd.concat([trajectories_events_matched_night, unmatched_points_events_night]).sort_values(by=['uid','datetime'])
trajectories_events_night.drop(columns='geometry')

Unnamed: 0,uid,macaddr_randomized,tid,datetime,timestamp_ap,lat,lng,vendor_name,h3_cell_original,stage_original,...,end_datetime,event_title,activity_type,stage,music_type,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2
0,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:38:47+02:00,1718483927,41.353501,2.129162,,8d394461e82b5bf,NA-Entrada,...,2024-06-16 08:00:00+02:00,No event,,NA-Entrada,,,,SONAR NIT - Entrada,p1,5438
1,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:39:19+02:00,1718483959,41.353440,2.129005,,8d394461e82a67f,NA-Entrada,...,2024-06-16 08:00:00+02:00,No event,,NA-Entrada,,,,SONAR NIT - Entrada,p1,5438
2,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:40:37+02:00,1718484037,41.353501,2.129162,,8d394461e82b5bf,NA-Entrada,...,2024-06-16 08:00:00+02:00,No event,,NA-Entrada,,,,SONAR NIT - Entrada,p1,5438
3,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:49:15+02:00,1718484555,41.353426,2.128963,,8d394461e82a67f,NA-Entrada,...,2024-06-16 08:00:00+02:00,No event,,NA-Entrada,,,,SONAR NIT - Entrada,p1,5438
4,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:49:21+02:00,1718484561,41.353365,2.128970,,8d394461e82a6ff,NA-Entrada,...,2024-06-16 08:00:00+02:00,No event,,NA-Entrada,,,,SONAR NIT - Entrada,p1,5438
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1244617,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:29:12+02:00,1718425752,41.354254,2.130618,,8d394461e875cff,SonarLab x Printworks,...,2024-06-15 07:00:00+02:00,DJ Flight & MC Chickaboo,Music,SonarLab x Printworks,DJ,electronic_hypnotic,0.0,SONAR NIT - SonarLab,av2-3,9171
1244618,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:29:14+02:00,1718425754,41.354254,2.130618,,8d394461e875cff,SonarLab x Printworks,...,2024-06-15 07:00:00+02:00,DJ Flight & MC Chickaboo,Music,SonarLab x Printworks,DJ,electronic_hypnotic,0.0,SONAR NIT - SonarLab,av2-3,9171
1244619,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:31:02+02:00,1718425862,41.354254,2.130618,,8d394461e875cff,SonarLab x Printworks,...,2024-06-15 07:00:00+02:00,DJ Flight & MC Chickaboo,Music,SonarLab x Printworks,DJ,electronic_hypnotic,0.0,SONAR NIT - SonarLab,av2-3,9171
1244620,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:31:04+02:00,1718425864,41.354254,2.130618,,8d394461e875cff,SonarLab x Printworks,...,2024-06-15 07:00:00+02:00,DJ Flight & MC Chickaboo,Music,SonarLab x Printworks,DJ,electronic_hypnotic,0.0,SONAR NIT - SonarLab,av2-3,9171


### Associating each updated trajectory point to their corresponding H3 cells

I also get the updated h3 cell from the h3 API directly and compare it with the original values to check everything is correct.

In [26]:
trajectories_events_night['h3_cell'] = [h3.latlng_to_cell(lat, lng, 13) for lat, lng in zip(trajectories_events_night['lat'], trajectories_events_night['lng'])]

In [27]:
print(f"There are {(trajectories_events_night['h3_cell_original'] != trajectories_events_night['h3_cell']).sum()} points that changed their h3_cell after the trajectory filtering.")


There are 0 points that changed their h3_cell after the trajectory filtering.


## Association - Sónar by day process

#### Adjusting the timetables

I found that there were some events that have a time overlap in Sónar by Day that happen in the same space, and there is not a small-enough granularity to distinguish between the areas where these events happened. For this reason, I leave only the more general and correct 'Project area' label to these cases. 

In [25]:
sonar_timetables = sonar_timetables.loc[(sonar_timetables['stage'] != 'Project Area') |
                                        ((sonar_timetables['stage'] == 'Project Area') & (sonar_timetables['event_title'] == 'Project Area'))]

As a starting point, I need an intermediate table that associates the timetables with their geographical information (contained in the polygons).

In [26]:
# Reading the polygons and combining them in a single GeoDataFrame
day_polygons_clipped = gpd.read_file(r'..\..\Datasets\Processed\Zonas SONAR clipped\sonar_day_polygons_clipped.json')

#### Adding the floor number to the stages in day_polygons_clipped

In [27]:
# Mapping the 'source_gis_file' column to the corresponding floor values (the default is floor 0)

floor_assignment = {'p5.2': 2, 'p5.1': 1}

day_polygons_clipped['polygon_floor'] = day_polygons_clipped['source_gis_file'].map(floor_assignment).fillna(0).astype('Int8')
day_polygons_clipped.drop(columns='geometry')

Unnamed: 0,id,polygon_name,index,source_gis_file,stage,stage_area_m2,polygon_floor
0,88259806-b543-45e9-b0e9-87b0ac826ce6,SONAR DIA - SonarPark,0,p1,SonarPark,1914,0
1,036a9d19-b2a9-45a9-aa74-0569fc82ba8c,SONAR DIA - SonarPark Barra,1,p1,SonarPark,1914,0
2,aa598ed7-0b28-444d-8564-b434e0e34b82,SONAR DIA - SonarHall Paso,0,p2,NA-sonar_hall_paso,6973,0
3,cfcf08c7-1230-4df9-9309-6ef436090d99,SONAR DIA - SonarHall,1,p2,SonarHall,1319,0
4,97acc96a-1160-456d-abe7-4046ec78fc41,SONAR DIA - Food Trucks,2,p2,NA-food_trucks,1714,0
5,a03c6c6f-b506-4915-bf7f-fc609b2e20e9,SONAR DIA - Stage+D,3,p2,Stage+D,443,0
6,f10d8100-ef6d-468b-acc4-09f8497cac7c,SONAR DIA - SonarVillage,0,p3,SonarVillage,9366,0
7,0e9222e8-38be-49bf-bbdd-46a37ce4a899,SONAR DIA - SonarVillage VIP,1,p3,SonarVillage,9366,0
8,dc79c74e-b7e6-4a8d-9b6d-f8d20138e8c9,SONAR DIA - SonarVillage Barra 2,2,p3,SonarVillage,9366,0
9,fd73b9fb-8b96-473e-80ff-b648b8a18005,SONAR DIA - SonarVillage Barra 1,3,p3,SonarVillage,9366,0


### Associating the timetables with the polygons

In [31]:
# In this case I use an outer join because there are events 
# with no geographic information associated to them (e.g. they happen at Room+D -I did not find the corresponding polygon-),
# or are places that are not related to events (e.g. cashless areas, restaurants, etc.)
# and I do not want to discard any of them yet
day_timetables_polygons = pd.merge(sonar_timetables.loc[sonar_timetables['sonar_type']=='Sónar by Day'], 
                                     day_polygons_clipped[['polygon_name','source_gis_file','stage','stage_area_m2','polygon_floor','geometry']],
                                     how='outer', on='stage')
day_timetables_polygons.sort_values(by=['sonar_type','day_label','event_title'], inplace=True)
day_timetables_polygons = gpd.GeoDataFrame(day_timetables_polygons)
day_timetables_polygons.drop(columns='geometry').head(5)

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2,polygon_floor
25,Sónar by Day,Friday 14 June,2024-06-14 10:00:00+02:00,2024-06-14 14:00:00+02:00,AI & WEB3 Creative Summit,Talk,Room+D,,,,,,,
54,Sónar by Day,Friday 14 June,2024-06-14 16:55:00+02:00,2024-06-14 17:40:00+02:00,AMORE,Music,SonarPark,LIVE,,,SONAR DIA - SonarPark,p1,1914.0,0.0
55,Sónar by Day,Friday 14 June,2024-06-14 16:55:00+02:00,2024-06-14 17:40:00+02:00,AMORE,Music,SonarPark,LIVE,,,SONAR DIA - SonarPark Barra,p1,1914.0,0.0
3,Sónar by Day,Friday 14 June,2024-06-14 16:15:00+02:00,2024-06-14 17:00:00+02:00,Adelaida presents 'Muérdago',Music,Complex+D,LIVE,,,SONAR DIA - SonarComplex,p5.2,1092.0,2.0
52,Sónar by Day,Friday 14 June,2024-06-14 15:45:00+02:00,2024-06-14 16:45:00+02:00,Akazie,Music,SonarPark,DJ,,,SONAR DIA - SonarPark,p1,1914.0,0.0


I need to add a start_datetime and a end_datetime for the polygons that are not in the timetables, so that I do not lose the observations that fall in these zones when filtering by time.

In [32]:
day_timetables_polygons.loc[day_timetables_polygons['event_title'].isna()].drop(columns='geometry')

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2,polygon_floor
12,,,NaT,NaT,,,NA-cashless,,,,SONAR DIA - Cashless,p4,854.0,0
13,,,NaT,NaT,,,NA-food_trucks,,,,SONAR DIA - Food Trucks,p2,1714.0,0
14,,,NaT,NaT,,,NA-lounge+d,,,,SONAR DIA - Lounge+D,p5.0,608.0,0
15,,,NaT,NaT,,,NA-lounge_barra,,,,SONAR DIA - Lounge Barra,p5.0,101.0,0
16,,,NaT,NaT,,,NA-sonar_hall_paso,,,,SONAR DIA - SonarHall Paso,p2,6973.0,0


In [33]:
# Adding the start time and the end_datetime as the minimum and maximum times considered for the festival
# These were defined in the 3.preprocessing_filtering_splitting file and stored in the constants.py file

start_day_1 = pd.Timestamp(constants.START_DAY_1_STRING, tz='Europe/Madrid')
end_day_3 = pd.Timestamp(constants.END_DAY_3_STRING, tz='Europe/Madrid')

day_timetables_polygons.loc[day_timetables_polygons['event_title'].isna(),'start_datetime'] = start_day_1
day_timetables_polygons.loc[day_timetables_polygons['event_title'].isna(),'end_datetime'] = end_day_3

# I also add some explicit labels for clarity
day_timetables_polygons.loc[day_timetables_polygons['event_title'].isna(),'sonar_type'] = 'Sónar by Day'
day_timetables_polygons.loc[day_timetables_polygons['event_title'].isna(),'event_title'] = 'No event'

# Print to visualize the changes
day_timetables_polygons.loc[day_timetables_polygons['event_title']=='No event'].drop(columns='geometry')

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2,polygon_floor
12,Sónar by Day,,2024-06-13 09:30:00+02:00,2024-06-16 00:00:00+02:00,No event,,NA-cashless,,,,SONAR DIA - Cashless,p4,854.0,0
13,Sónar by Day,,2024-06-13 09:30:00+02:00,2024-06-16 00:00:00+02:00,No event,,NA-food_trucks,,,,SONAR DIA - Food Trucks,p2,1714.0,0
14,Sónar by Day,,2024-06-13 09:30:00+02:00,2024-06-16 00:00:00+02:00,No event,,NA-lounge+d,,,,SONAR DIA - Lounge+D,p5.0,608.0,0
15,Sónar by Day,,2024-06-13 09:30:00+02:00,2024-06-16 00:00:00+02:00,No event,,NA-lounge_barra,,,,SONAR DIA - Lounge Barra,p5.0,101.0,0
16,Sónar by Day,,2024-06-13 09:30:00+02:00,2024-06-16 00:00:00+02:00,No event,,NA-sonar_hall_paso,,,,SONAR DIA - SonarHall Paso,p2,6973.0,0


In Sónar by Day, there are some events that will not be geographically matched because there is no exact reference of where they happened. 

In [34]:
day_timetables_polygons.loc[day_timetables_polygons['polygon_name'].isna()].drop(columns='geometry')

Unnamed: 0,sonar_type,day_label,start_datetime,end_datetime,event_title,activity_type,stage,music_type,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2,polygon_floor
25,Sónar by Day,Friday 14 June,2024-06-14 10:00:00+02:00,2024-06-14 14:00:00+02:00,AI & WEB3 Creative Summit,Talk,Room+D,,,,,,,
26,Sónar by Day,Friday 14 June,2024-06-14 18:00:00+02:00,2024-06-14 19:00:00+02:00,AlphaTheta presents 'Euphonia' Workshop,Networking,Room+D,,,,,,,
10,Sónar by Day,Friday 14 June,2024-06-14 15:00:00+02:00,2024-06-14 21:00:00+02:00,Espai Oníric,Exhibition,Espai Oníric,,,,,,,
27,Sónar by Day,Friday 14 June,2024-06-14 16:00:00+02:00,2024-06-14 18:00:00+02:00,Future of Music with Revelator Labs & MUSIC x:...,Talk,Room+D 2,,,,,,,
11,Sónar by Day,Saturday 15 June,2024-06-15 15:00:00+02:00,2024-06-15 21:00:00+02:00,Espai Oníric,Exhibition,Espai Oníric,,,,,,,
24,Sónar by Day,Thursday 13 June,2024-06-13 18:00:00+02:00,2024-06-13 19:00:00+02:00,All Our Minds Workshop,Workshop,Room+D,,,,,,,
9,Sónar by Day,Thursday 13 June,2024-06-13 15:00:00+02:00,2024-06-13 21:00:00+02:00,Espai Oníric,Exhibition,Espai Oníric,,,,,,,
23,Sónar by Day,Thursday 13 June,2024-06-13 11:00:00+02:00,2024-06-13 13:30:00+02:00,Music Tech Sessions,Networking,Room+D,,,,,,,


### Associating each trajectory point to their corresponding stage

I can read the trajectories dataframe without scikit-mobility (I do not need any of the functionalities).

In [35]:
trajectories_day = pd.read_csv(os.path.join(path_trajectories_preprocessed, 'tdf_day_preprocessed_filtered.csv'), 
                               dtype={'floor_num_added':'Int8', 'vendor_name':str})
trajectories_day.shape

(2345995, 16)

Converting to a GeoDataframe with the adequate characteristics.

In [36]:
# Getting the geometry and converting to Geotadaframe
trajectories_day['geometry'] = gpd.points_from_xy(trajectories_day['lng'], trajectories_day['lat'])
trajectories_day = gpd.GeoDataFrame(trajectories_day, geometry='geometry', crs=night_timetables_polygons.crs)

# Converting the date
trajectories_day['datetime'] = pd.to_datetime(trajectories_day['datetime'])
trajectories_day['datetime'] = trajectories_day['datetime'].dt.tz_convert('Europe/Madrid')  

In [37]:
trajectories_day.dtypes

uid                                                       object
macaddr_randomized                                         int64
tid                                                        int64
floor_num_added                                             Int8
label_day_floor_change_id                                 object
datetime                           datetime64[ns, Europe/Madrid]
timestamp_ap                                               int64
lat                                                      float64
lng                                                      float64
vendor_name                                               object
h3_cell_original                                          object
stage_original                                            object
observations_user_day_original                             int64
timespan_minutes_day_original                            float64
num_distinct_stage_day_original                            int64
minutes_per_stage_origina

Performing a spatial join with just the polygons to check the join is correctly performed (before perfroming the actual join with the night_timetables_polygons). I check both the inner join and the left join to see if there are differences. At this point, a point could be associated to multiple polygons because of the overlap of polygons in the multi-floor area. This will be later filtered.

In [38]:
day_trajs_sjoin_left = gpd.sjoin(trajectories_day, day_polygons_clipped[['polygon_name','source_gis_file','polygon_floor','stage','stage_area_m2','geometry']], how='left', predicate='within')


print('Shape after left join:')
day_trajs_sjoin_left.shape

Shape after left join:


(2548150, 23)

In [39]:
day_trajs_sjoin_inner = gpd.sjoin(trajectories_day, day_polygons_clipped[['polygon_name','source_gis_file','polygon_floor','stage','stage_area_m2','geometry']], how='inner', predicate='within')

print('Shape after inner join:')
day_trajs_sjoin_inner.shape

Shape after inner join:


(2548150, 23)

All points were correctly joined spatially.

### Associating each trajectory point to an event

I apply the inner join with the day_timetables_polygons and obtain the stages and their corresponding event timetables.

In [40]:
trajectories_events_day = gpd.sjoin(trajectories_day, day_timetables_polygons, how='inner', predicate='within')
trajectories_events_day.shape

(33641971, 32)

To make the association with the actual events, I need to filter with the hour of the events and the floor number.

In [41]:
# Keep only rows where the datetime is within the event's start and end time
# There is no overlap between the events that happen in the same stage, so I can use the <= condition on the upper bound
trajectories_events_matched_day = trajectories_events_day.loc[(trajectories_events_day['datetime'] >= trajectories_events_day['start_datetime']) &
                                                                 (trajectories_events_day['datetime'] <= trajectories_events_day['end_datetime']) &
                                                                 (trajectories_events_day['floor_num_added'] == trajectories_events_day['polygon_floor'])]
trajectories_events_matched_day.shape

(1987452, 32)

There are points that were geographically found, but were discarded with the time of events. In order to keep those trajectory points, I can find the difference in between the dataframes.

Since there are not duplicates of uid and datetime anymore (these were explicitly remove in the preprocessing-trajectories script for Sónar by Day), I can find the unmatched trajectory points and add them back to the matched trajectory points (with a specific label) and obtain the filtered trajectories_events_night.

In [42]:
# There are no duplicated trajectory points based on 'anonymized_macaddr' and 'datetime'
duplicates_traj_day = day_trajs_sjoin_inner.loc[~day_trajs_sjoin_inner['source_gis_file'].isin(['p5.0','p5.1','p5.2'])].groupby(['uid', 'datetime']).size().reset_index(name='duplicate_count')
print(f'Duplicate count based on uid and datetime: {len(duplicates_traj_day[duplicates_traj_day["duplicate_count"] > 1])}')

Duplicate count based on uid and datetime: 0


In Sónar by day I need also to filter out the unmatched points that do not belong to the actual floor (this must be done after the association of the trajectories with the polygons)

In [43]:
# Getting the unmacthed points with the function defined above
unmatched_points_events_day = get_unmatched_points(reference_df=day_trajs_sjoin_inner,
                                                     other_df=trajectories_events_matched_day[['uid','datetime', 'event_title']],
                                                     columns_join=['uid','datetime'])

# Fltering out the points that do not correspond to 
unmatched_points_events_day = unmatched_points_events_day.loc[unmatched_points_events_day['floor_num_added'] == unmatched_points_events_day['polygon_floor']]

unmatched_points_events_day.drop(columns='geometry')

Unnamed: 0,uid,macaddr_randomized,tid,floor_num_added,label_day_floor_change_id,datetime,timestamp_ap,lat,lng,vendor_name,...,timespan_minutes_day_original,num_distinct_stage_day_original,minutes_per_stage_original,index_right,polygon_name,source_gis_file,polygon_floor,stage,stage_area_m2,event_title
17,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 18:00:22+02:00,1718380822,41.373378,2.152060,,...,256.27,4,64.07,6,SONAR DIA - SonarVillage,p3,0,SonarVillage,9366,
18,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 18:02:12+02:00,1718380932,41.373378,2.152060,,...,256.27,4,64.07,6,SONAR DIA - SonarVillage,p3,0,SonarVillage,9366,
19,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 18:02:21+02:00,1718380941,41.373387,2.151835,,...,256.27,4,64.07,6,SONAR DIA - SonarVillage,p3,0,SonarVillage,9366,
20,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 18:02:23+02:00,1718380943,41.373387,2.151835,,...,256.27,4,64.07,6,SONAR DIA - SonarVillage,p3,0,SonarVillage,9366,
21,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 18:02:34+02:00,1718380954,41.373582,2.151956,,...,256.27,4,64.07,6,SONAR DIA - SonarVillage,p3,0,SonarVillage,9366,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2548131,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,0,2_0,2024-06-14 11:57:42+02:00,1718359062,41.372575,2.151591,"Apple, Inc.",...,736.40,4,184.10,12,SONAR DIA - SonarÀgora,p5.0,0,SonarÀgora,583,
2548133,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,0,2_0,2024-06-14 11:59:31+02:00,1718359171,41.372575,2.151591,"Apple, Inc.",...,736.40,4,184.10,12,SONAR DIA - SonarÀgora,p5.0,0,SonarÀgora,583,
2548135,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,0,2_0,2024-06-14 17:29:18+02:00,1718378958,41.372568,2.151624,"Apple, Inc.",...,736.40,4,184.10,12,SONAR DIA - SonarÀgora,p5.0,0,SonarÀgora,583,
2548141,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,1,2_1,2024-06-14 23:06:18+02:00,1718399178,41.372263,2.152061,"Apple, Inc.",...,736.40,4,184.10,14,SONAR DIA - Project Area,p5.1,1,Project Area,2603,


Assigning an explicit label for the event title and concatenating the two dataframes into the final `trajectories_events_day`.

In [44]:
unmatched_points_events_day['event_title'] = 'No event'

In [45]:
trajectories_events_day = pd.concat([trajectories_events_matched_day, unmatched_points_events_day]).sort_values(by=['uid','datetime'])
trajectories_events_day.drop(columns='geometry')

Unnamed: 0,uid,macaddr_randomized,tid,floor_num_added,label_day_floor_change_id,datetime,timestamp_ap,lat,lng,vendor_name,...,event_title,activity_type,stage,music_type,genre_grouped,views_youtube,polygon_name,source_gis_file,stage_area_m2,polygon_floor
0,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:45:49+02:00,1718379949,41.373153,2.151547,,...,Coco Em,Music,SonarVillage,DJ,,,SONAR DIA - SonarVillage VIP,p3,9366.0,0
1,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:46:19+02:00,1718379979,41.373153,2.151547,,...,Coco Em,Music,SonarVillage,DJ,,,SONAR DIA - SonarVillage VIP,p3,9366.0,0
2,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:47:11+02:00,1718380031,41.373153,2.151547,,...,Coco Em,Music,SonarVillage,DJ,,,SONAR DIA - SonarVillage VIP,p3,9366.0,0
3,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:47:20+02:00,1718380040,41.373153,2.151547,,...,Coco Em,Music,SonarVillage,DJ,,,SONAR DIA - SonarVillage VIP,p3,9366.0,0
4,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:47:35+02:00,1718380055,41.373153,2.151547,,...,Coco Em,Music,SonarVillage,DJ,,,SONAR DIA - SonarVillage VIP,p3,9366.0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2548145,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,1,2_3,2024-06-14 23:08:08+02:00,1718399288,41.372263,2.152061,"Apple, Inc.",...,No event,,Project Area,,,,SONAR DIA - Project Area,p5.1,2603.0,1
2345991,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,0,2_4,2024-06-14 23:09:13+02:00,1718399353,41.372204,2.151938,"Apple, Inc.",...,No event,,NA-lounge+d,,,,SONAR DIA - Lounge+D,p5.0,608.0,0
2345992,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,0,2_4,2024-06-14 23:09:36+02:00,1718399376,41.372204,2.151938,"Apple, Inc.",...,No event,,NA-lounge+d,,,,SONAR DIA - Lounge+D,p5.0,608.0,0
2345993,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,0,2_4,2024-06-14 23:09:54+02:00,1718399394,41.372204,2.151938,"Apple, Inc.",...,No event,,NA-lounge+d,,,,SONAR DIA - Lounge+D,p5.0,608.0,0


**The points were correctly associated with their corresponding events.**

### Associating each updated trajectory point to their corresponding H3 cells

I also get the updated h3 cell from the h3 API directly and compare it with the original values to check everything is correct.

In [46]:
trajectories_events_day['h3_cell'] = [h3.latlng_to_cell(lat, lng, 13) for lat, lng in zip(trajectories_events_day['lat'], trajectories_events_day['lng'])]

In [47]:
print(f"There are {(trajectories_events_day['h3_cell_original'] != trajectories_events_day['h3_cell']).sum()} points that changed their h3_cell after the trajectory filtering.")

There are 0 points that changed their h3_cell after the trajectory filtering.


## Writing the trajectories with their associated events 

### Writing the Sónar by night files.

Selecting the final columns that will be analyzed.

In [48]:
trajectories_events_night.columns

Index(['uid', 'macaddr_randomized', 'tid', 'datetime', 'timestamp_ap', 'lat',
       'lng', 'vendor_name', 'h3_cell_original', 'stage_original',
       'observations_user_night_original', 'timespan_minutes_night_original',
       'num_distinct_stage_night_original', 'minutes_per_stage_original',
       'geometry', 'index_right', 'sonar_type', 'day_label', 'start_datetime',
       'end_datetime', 'event_title', 'activity_type', 'stage', 'music_type',
       'genre_grouped', 'views_youtube', 'polygon_name', 'source_gis_file',
       'stage_area_m2', 'h3_cell'],
      dtype='object')

In [None]:
selected_columns_night = ['uid', 'macaddr_randomized',
                          'tid',                               # Corresponds to the renamed label_night 
                          'datetime', 'timestamp_ap',          # Both formats if I need to do quick computations with the trajectories' timestamps
                          'lat', 'lng', 
                          'vendor_name', 
                          'sonar_type',                        # 'day_label', I discard the day_label column from the timetables to avoid confusions 
                          'start_datetime', 'end_datetime',    # Start and end of the events 
                          'event_title','music_type',          # activity_type is always Music in Sónar by night
                          'genre_grouped','views_youtube', 
                          'polygon_name','stage', 'stage_area_m2', 'h3_cell', # Location related columns
                          'observations_user_night_original','timespan_minutes_night_original', # Old metrics obtained before trajectory preprocessing
                          'num_distinct_stage_night_original', 'minutes_per_stage_original',    # Old metrics obtained before trajectory preprocessing
                          'geometry'
                          ]
trajectories_events_night = trajectories_events_night[selected_columns_night]
trajectories_events_night.drop(columns='geometry')

Unnamed: 0,uid,macaddr_randomized,tid,datetime,timestamp_ap,lat,lng,vendor_name,sonar_type,start_datetime,...,genre_grouped,views_youtube,polygon_name,stage,stage_area_m2,h3_cell,observations_user_night_original,timespan_minutes_night_original,num_distinct_stage_night_original,minutes_per_stage_original
0,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:38:47+02:00,1718483927,41.353501,2.129162,,Sónar by Night,2024-06-14 19:50:00+02:00,...,,,SONAR NIT - Entrada,NA-Entrada,5438,8d394461e82b5bf,586,340.90,7,48.70
1,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:39:19+02:00,1718483959,41.353440,2.129005,,Sónar by Night,2024-06-14 19:50:00+02:00,...,,,SONAR NIT - Entrada,NA-Entrada,5438,8d394461e82a67f,586,340.90,7,48.70
2,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:40:37+02:00,1718484037,41.353501,2.129162,,Sónar by Night,2024-06-14 19:50:00+02:00,...,,,SONAR NIT - Entrada,NA-Entrada,5438,8d394461e82b5bf,586,340.90,7,48.70
3,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:49:15+02:00,1718484555,41.353426,2.128963,,Sónar by Night,2024-06-14 19:50:00+02:00,...,,,SONAR NIT - Entrada,NA-Entrada,5438,8d394461e82a67f,586,340.90,7,48.70
4,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,2024-06-15 22:49:21+02:00,1718484561,41.353365,2.128970,,Sónar by Night,2024-06-14 19:50:00+02:00,...,,,SONAR NIT - Entrada,NA-Entrada,5438,8d394461e82a6ff,586,340.90,7,48.70
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1244617,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:29:12+02:00,1718425752,41.354254,2.130618,,Sónar by Night,2024-06-15 05:30:00+02:00,...,electronic_hypnotic,0.0,SONAR NIT - SonarLab,SonarLab x Printworks,9171,8d394461e875cff,714,510.43,5,102.09
1244618,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:29:14+02:00,1718425754,41.354254,2.130618,,Sónar by Night,2024-06-15 05:30:00+02:00,...,electronic_hypnotic,0.0,SONAR NIT - SonarLab,SonarLab x Printworks,9171,8d394461e875cff,714,510.43,5,102.09
1244619,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:31:02+02:00,1718425862,41.354254,2.130618,,Sónar by Night,2024-06-15 05:30:00+02:00,...,electronic_hypnotic,0.0,SONAR NIT - SonarLab,SonarLab x Printworks,9171,8d394461e875cff,714,510.43,5,102.09
1244620,fff1c4c048bd5253bb7c3996ed466e303c6b8253a93dbe...,1,1,2024-06-15 06:31:04+02:00,1718425864,41.354254,2.130618,,Sónar by Night,2024-06-15 05:30:00+02:00,...,electronic_hypnotic,0.0,SONAR NIT - SonarLab,SonarLab x Printworks,9171,8d394461e875cff,714,510.43,5,102.09


Writing the files.

In [50]:
trajectories_events_night.to_csv(os.path.join(path_trajectories_events,'trajectories_events_night_filtered.csv'),index=False)

### Writing the Sónar by day files.

Selecting the final columns that will be analyzed.

In [51]:
trajectories_events_day.columns

Index(['uid', 'macaddr_randomized', 'tid', 'floor_num_added',
       'label_day_floor_change_id', 'datetime', 'timestamp_ap', 'lat', 'lng',
       'vendor_name', 'h3_cell_original', 'stage_original',
       'observations_user_day_original', 'timespan_minutes_day_original',
       'num_distinct_stage_day_original', 'minutes_per_stage_original',
       'geometry', 'index_right', 'sonar_type', 'day_label', 'start_datetime',
       'end_datetime', 'event_title', 'activity_type', 'stage', 'music_type',
       'genre_grouped', 'views_youtube', 'polygon_name', 'source_gis_file',
       'stage_area_m2', 'polygon_floor', 'h3_cell'],
      dtype='object')

In [None]:
selected_columns_day = ['uid', 'macaddr_randomized',
                        'tid',                                                              # Corresponds to the renamed label_night
                        'floor_num_added', 'label_day_floor_change_id',                     # Columns related to the movement between floor
                        'datetime', 'timestamp_ap',                                         # Both formats if I need to do quick computations with the trajectories' timestamps
                        'lat', 'lng', 
                        'vendor_name', 
                        'sonar_type',                                                       # 'day_label', I discard the day_label column from the timetables to avoid confusions 
                        'start_datetime', 'end_datetime',                                   # Start and end of the events 
                        'event_title', 'activity_type', 'music_type', 
                        'genre_grouped','views_youtube',
                        'polygon_name','stage', 'stage_area_m2', 'h3_cell',                 # Location related columns
                        'observations_user_day_original', 'timespan_minutes_day_original',  # Old metrics obtained before trajectory preprocessing
                        'num_distinct_stage_day_original', 'minutes_per_stage_original',    # Old metrics obtained before trajectory preprocessing
                        'geometry'
                        ]
trajectories_events_day = trajectories_events_day[selected_columns_day]
trajectories_events_day.drop(columns='geometry')

Unnamed: 0,uid,macaddr_randomized,tid,floor_num_added,label_day_floor_change_id,datetime,timestamp_ap,lat,lng,vendor_name,...,genre_grouped,views_youtube,polygon_name,stage,stage_area_m2,h3_cell,observations_user_day_original,timespan_minutes_day_original,num_distinct_stage_day_original,minutes_per_stage_original
0,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:45:49+02:00,1718379949,41.373153,2.151547,,...,,,SONAR DIA - SonarVillage VIP,SonarVillage,9366.0,8d394461ca7267f,1017,256.27,4,64.07
1,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:46:19+02:00,1718379979,41.373153,2.151547,,...,,,SONAR DIA - SonarVillage VIP,SonarVillage,9366.0,8d394461ca7267f,1017,256.27,4,64.07
2,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:47:11+02:00,1718380031,41.373153,2.151547,,...,,,SONAR DIA - SonarVillage VIP,SonarVillage,9366.0,8d394461ca7267f,1017,256.27,4,64.07
3,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:47:20+02:00,1718380040,41.373153,2.151547,,...,,,SONAR DIA - SonarVillage VIP,SonarVillage,9366.0,8d394461ca7267f,1017,256.27,4,64.07
4,00154bc5831501b8bd95273b1181d9330c3bf5f34b1961...,1,2,0,2_0,2024-06-14 17:47:35+02:00,1718380055,41.373153,2.151547,,...,,,SONAR DIA - SonarVillage VIP,SonarVillage,9366.0,8d394461ca7267f,1017,256.27,4,64.07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2548145,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,1,2_3,2024-06-14 23:08:08+02:00,1718399288,41.372263,2.152061,"Apple, Inc.",...,,,SONAR DIA - Project Area,Project Area,2603.0,8d394461ca52abf,25,736.40,4,184.10
2345991,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,0,2_4,2024-06-14 23:09:13+02:00,1718399353,41.372204,2.151938,"Apple, Inc.",...,,,SONAR DIA - Lounge+D,NA-lounge+d,608.0,8d394461ca5203f,25,736.40,4,184.10
2345992,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,0,2_4,2024-06-14 23:09:36+02:00,1718399376,41.372204,2.151938,"Apple, Inc.",...,,,SONAR DIA - Lounge+D,NA-lounge+d,608.0,8d394461ca5203f,25,736.40,4,184.10
2345993,ffea2cb3179e7305ccb7b75a3f11c5a226710387fd72c5...,0,2,0,2_4,2024-06-14 23:09:54+02:00,1718399394,41.372204,2.151938,"Apple, Inc.",...,,,SONAR DIA - Lounge+D,NA-lounge+d,608.0,8d394461ca5203f,25,736.40,4,184.10


Writing the files.

In [53]:
trajectories_events_day.to_csv(os.path.join(path_trajectories_events,'trajectories_events_day_filtered.csv'),index=False)