# Groups of stops

The original dataset differentiates between the platforms at each stop and stops that proceed in one direction or the other. In this particular instance, this information is of no value and hinders analysis. Consequently, with the assistance of a file provided by the data supplier, the stops will be streamlined to enhance their comprehensibility and facilitate processing.

Furthermore, an additional file is utilized that contains the various ticket types present within the data set, along with the respective user types to which they pertain.

In [6]:
import pandas as pd
import json

#Full dataset
df=pd.read_csv('../data/full_data_set.csv')
#CSV file with the different types of tickets and the type of user to whom they belong
df_titolo=pd.read_csv('../data/kind_of_tickets_segmentation_or.csv')
df=df[df.titolo.isin(df_titolo.titolo.to_list())]

rutas=dict()
seriales=set(df.seriale.to_list())
for i in seriales:
    aux=df[df['seriale']==i][['lon','lat']].to_numpy()
    if not(len(aux) in rutas):
        rutas[len(aux)]=dict()
    rutas[len(aux)][i]=aux
    
with open('../Data/stop_aggr.json', 'r') as f:
    new_stops = json.load(f)
new_stops = {int(k):int(v) for k,v in new_stops.items()}    


Subsequently, the locations of all stops are loaded with their respective names and identifiers. It should be noted that five stops are not included in the aforementioned file and are, as such, loaded separately.

In [7]:

name_new_stops={-2: ["Piazzale Roma", 45.438038, 12.318223],
    -3: ["Stazione Mestre", 45.482675, 12.231809],
    -4: ["Aeroporto", 45.504976, 12.339106],
    -5: ["Lido bus", 45.390778, 12.353001],
    -1: ["TERRA", 45.491853, 12.242548]}
df_s=pd.read_csv('../data/stops.csv')
st=set(new_stops.values())
for i in st:
    if not(i in name_new_stops):
        aux=df_s[df_s.fermata==i]
        if len(aux)>0:
            name_new_stops[i]=[aux.descrizione.iloc[0],aux.lat.iloc[0],aux.lon.iloc[0]]


Next, the names of the stops that refer to specific stops and docks are replaced with the new "clustered" stops.

In [8]:

for i in new_stops:
    if len(df[df.fermata==i].fermata)>0 and df[df.fermata==i].fermata.iloc[0] in new_stops.keys():
        ax=name_new_stops[new_stops[df[df.fermata==i].fermata.iloc[0]]]
        df.loc[df.fermata==i,'DESCRIZIONE']=ax[0]
        df.loc[df.fermata==i,'lat']=ax[1]
        df.loc[df.fermata==i,'lon']=ax[2]
        df.loc[df.fermata==i,'FERMATA']=new_stops[df[df.fermata==i].fermata.iloc[0]]
        


The data_validazione column is converted into a datetime column.

In [9]:
df['data_validazione']=pd.to_datetime(df['data_validazione'])

## Save Data

The data is currently stored in three CSV files: one containing all the filtered data, another containing only the carnival period, and a third file containing the remaining data.

In [10]:
df[(df['data_validazione']>=pd.to_datetime('2023-02-04')) & (df['data_validazione']<=pd.to_datetime('2023-02-21'))].to_csv('data_filter_carnival.csv',index=False)
df[(df['data_validazione']<pd.to_datetime('2023-02-04')) | (df['data_validazione']>pd.to_datetime('2023-02-21'))].to_csv('data_filter_no_carnival.csv',index=False)
df.to_csv('data_filter_all.csv',index=False)


In [16]:
df

Unnamed: 0,id,data_validazione,seriale,fermata,descrizione,titolo,descrizione_titolo,lon,lat,DESCRIZIONE,FERMATA
0,9938501,2023-01-13 07:33:00,-3619126015,5031,"P.le Roma ""G",21402,Supp Mens.navigazione,12.319465,45.438667,P.le Roma (Hotel S. Chiar,5501.0
1,9938502,2023-01-13 07:33:00,-2854866114,511,VENEZIA,24101,Mensile Ordinario extra,12.318223,45.438038,Piazzale Roma,-2.0
2,9938503,2023-01-13 07:33:00,-2824220595,1017,Castellana S,11209,Bigl RETE UNICA 75',12.242548,45.491853,TERRA,-1.0
3,9938733,2023-01-13 07:36:00,-2824216539,5094,Lido S.M.E.,11209,Bigl RETE UNICA 75',12.368725,45.417992,"Lido (S.M.E.) ""B""",5001.0
4,9938886,2023-01-13 07:38:00,-2824204947,5031,"P.le Roma ""G",21402,Supp Mens.navigazione,12.319465,45.438667,P.le Roma (Hotel S. Chiar,5501.0
...,...,...,...,...,...,...,...,...,...,...,...
1780810,15470881,2023-03-14 23:58:00,-2864643315,162,Stazione MES,11209,Bigl RETE UNICA 75',12.231809,45.482675,Stazione Mestre,-3.0
1780811,15470882,2023-03-14 23:58:00,-2854956628,5026,Tronchetto F,11209,Bigl RETE UNICA 75',12.306348,45.440094,Tronchetto 'B' SX,5024.0
1780812,15470883,2023-03-14 23:59:00,-2850025054,384,Mestre Centr,23101,Mensile ordinario Rete Unica,12.242548,45.491853,TERRA,-1.0
1780813,15470884,2023-03-14 23:59:00,-2824225710,5024,"Tronchetto """,23101,Mensile ordinario Rete Unica,12.306348,45.440094,Tronchetto 'B' SX,5024.0
