# Package import

First of all, we import all the necessary libraries, apart from the fundamental ones, there are some additional ones:

- `cvxpy` for (convex) optimization, which is needed for trend filtering, which is a [Generalized Lasso problem](https://www.stat.cmu.edu/~ryantibs/papers/genlasso.pdf).
- `folium` is a package useful to build the map of lines and stops.

In [37]:
import pandas as pd
import numpy as np
import networkx as nx
import cvxpy as cp
import folium
from typing import Optional
import pytz

# The static data
- `stop_times` is a dataframe with stops (with related `stop_id`) for every and each `trip_id` and related `stop_sequence` number. It also has the cumulative distance traveled by the bus/subway/tram up until the specific stop (along the _shape_ describing the trip). 
- `trip_info` is a dataframe keeping the association between `route_id` and `trip_id`.
- `stop_list` is a dataframe listing the stops. It is useful since it has latitude and longitude of every stop.
- `routes` is a dataframe linking each route to its agency (ATAC, TPL or even Trenitalia) and to its type (bus, subway or tram).

The `low_memory` option here is important, you don't want to have mixed type columns when performing joins ;).

In [38]:
stop_times = pd.read_csv('data/static/stop_times.txt', low_memory=False)
trip_info = pd.read_csv('data/static/trips.txt', low_memory=False)
stop_list = pd.read_csv('data/static/stops.txt', low_memory=False)
routes = pd.read_csv('data/static/routes.txt', low_memory=False)

# Graphs and maps

## The first sequence of (inner) joins

First of all, we perform the first inner join (of many) between the `trip_info` table and the `stop_times` one on `trip_id`; we want to put together information related to trips and stop sequences. In other words, this way we are recovering the sequence of stops for each trip, even distinguishing between directions with the `direction_id`, which is just a binary field. The `stop_sequence` integer is the variable allowing the sequence recovering.

### What's a trip? 

Everything else is more or less clear, but we want to stress the difference between a _route_ and a _trip_ according to the [GTFS standard](https://developers.google.com/transit/gtfs/reference?hl=en). Specifically: 
> A trip is a sequence of two or more stops that occur during a specific time period.

While: 
> A route is a group of trips that are displayed to riders as a single service.

Therefore $\text{trip} \in \text{route}$

In [39]:
complete = trip_info.merge(stop_times, how = 'inner', on = 'trip_id')

In [40]:
complete.columns

Index(['route_id', 'service_id', 'trip_id', 'trip_headsign', 'trip_short_name',
       'direction_id', 'block_id', 'shape_id', 'wheelchair_accessible',
       'exceptional', 'arrival_time', 'departure_time', 'stop_id',
       'stop_sequence', 'stop_headsign', 'pickup_type', 'drop_off_type',
       'shape_dist_traveled', 'timepoint'],
      dtype='object')

We drop everything that it is not needed for this stage. We keep `shape_id` since we want to use shapes later on. 

In [41]:
complete = complete[['route_id', 'trip_id', 'stop_id', 'stop_sequence', 'direction_id', 'shape_id']]

The next inner join is to add route specific information (not trip specific). A route is what we call a _bus line_. This way we can filter out everything which is not handled by ATAC and subway/tram lines.

In [42]:
complete = complete.merge(routes, on = 'route_id', how = 'inner')
complete = complete[(complete['agency_id'] == 'OP1')&(complete['route_type'] == 3)]

With the next inner join we are recovering stop related information, like name, latitude and longitude.

In [43]:
complete = complete.merge(stop_list, how = 'inner', on = 'stop_id')

Here we do not need `trip id`, we remove the column and drop the duplicates w.r.t. route, stop sequence and direction. At this stage, we only want to build (and plot) an undirected graph of ATAC public transport relying on buses.

In [44]:
complete = (complete.drop('trip_id', axis = 1).drop_duplicates(['route_id', 'stop_sequence', 'direction_id']).reset_index())

## Building the graph

After having initialized the graph, we add nodes by simply reading the `stop_list` dataframe line by line. Then, we add edges by sorting by `stop_sequence` and by grouping by `route_id` and `direction_id` (`groupby` in Pandas preserves row ordering). Once done that, we can just iterate group by group, adding edges in a sequential order if not already present in the graph. 

In [45]:
init_graph = nx.Graph()

In [46]:
for _, row in stop_list.iterrows():
    init_graph.add_node(row['stop_id'], name = row['stop_name'], latitude = row['stop_lat'], longitude = row['stop_lon'])

In [47]:
routes_grouped = complete.sort_values(by='stop_sequence').groupby(['route_id', 'direction_id'])
for _, group in routes_grouped:
    stops = group['stop_id'].tolist()
    edges = [(stops[i], stops[i+1]) for i in range(len(stops)-1) if not init_graph.has_edge(stops[i], stops[i+1])]
    init_graph.add_edges_from(edges)
    
init_graph.remove_nodes_from(list(nx.isolates(init_graph))) # if present, we remove isolated nodes

## Plotting the graph over the map (with Folium)

We are plotting the edges (not the vertexes, to avoid overplotting) with Folium; the map is centered in the Campidoglio (XD), with a moderate zoom. To do this, we are adding the edges through `Polyline` objects added to the baseline _terrain layer_ map. The map is then saved locally and can be opened with a browser. 

In [48]:
rome = folium.Map(location = (41.89, 12.48), zoom_start=20)

# for _, row in stop_list.iterrows():
#     if init_graph.has_node(row['stop_id']):
#         folium.Marker(
#             location=[row['stop_lat'], row['stop_lon']],
#             popup=f"{row['stop_name']}",
#             icon = folium.Icon(prefix='fa', icon = 'bus', icon_size=(5, 5))
#         ).add_to(rome)

coords = [(edge[0], edge[1]) for edge in init_graph.edges()]
for _, group in routes_grouped:
    coords = [(lat, long) for lat, long in zip(group['stop_lat'], group['stop_lon'])]
    folium.PolyLine(locations=coords, color='blue', weight=1, opacity=0.5).add_to(rome)

In [49]:
rome.save('maps/test.html')

## An improvement: use _shape_ data

Instead of relying on latitude and longitude in order to visualize things, we use what Google Maps actually uses, AKA [_shape_](https://developers.google.com/transit/gtfs/reference?hl=en#shapestxt) data. This is not difficult to do, since we have a `shape.txt` file storing that information.

This what we could when developing a possible application.

In [50]:
shapes = pd.read_csv('data/static/shapes.txt')

We remove the shapes of what does not belong to our target data (ATAC + buses).

In [51]:
shapes = shapes[shapes['shape_id'].isin(complete['shape_id'])]

In [52]:
rome = folium.Map(location = (41.89, 12.48), zoom_start=20)
grouped = shapes.sort_values(by = 'shape_pt_sequence').groupby('shape_id')
for _, group in grouped:
    coords = [(lat, lon) for lat, lon in zip(group['shape_pt_lat'], group['shape_pt_lon'])]
    folium.PolyLine(locations=coords, color = 'blue', weight = 1, opacity = 0.5).add_to(rome)
    
rome.save('maps/improved_test.html')

# Handling live data

Now, we read the real time data, the scraped one, if scraping can be the right wording for it, since it was just GET calls to the Rome GTFS feed.

In [53]:
trip_live = pd.read_feather('data/trip-updates.feather')

In [54]:
pd.read_feather('data/trip-updates.feather')

Unnamed: 0,trip_id,start_time,start_date,route_id,stop_sequence,delay,time,uncertainty,stop_id
0,0#785-17,16:39:00,20230506,115,12,46,1683384892,21,20323
1,0#785-17,16:39:00,20230506,115,13,36,1683384964,21,20215
2,0#785-17,16:39:00,20230506,115,14,56,1683385076,21,79756
3,0#785-17,16:39:00,20230506,115,15,90,1683385168,21,72993
4,0#785-17,16:39:00,20230506,115,16,56,1683385222,21,20295
...,...,...,...,...,...,...,...,...,...
25968156,VJfe5102acfc4e0719a2adf07b4737b5a00084fec5,14:52:00,20230618,24,20,0,1687094142,0,75863
25968157,VJfe5102acfc4e0719a2adf07b4737b5a00084fec5,14:52:00,20230618,24,21,0,1687094173,0,78828
25968158,VJfe5102acfc4e0719a2adf07b4737b5a00084fec5,14:52:00,20230618,24,22,0,1687094253,0,71460
25968159,VJfe5102acfc4e0719a2adf07b4737b5a00084fec5,14:52:00,20230618,24,23,0,1687094322,0,77865


In [55]:
trip_live = trip_live.drop_duplicates(subset=['trip_id', 'start_time', 'start_date', 'stop_sequence'], keep = 'last', ignore_index=True)

The 0 problem: the first stop of every trip has 0 as its timestamp: we need to remove those rows since they are basically useless.

In [56]:
trip_live = trip_live[trip_live['time'] != 0]

Again , we consider only the bus lines belonging to ATAC, and only bus lines.

In [57]:
trip_live = trip_live.merge(routes, on = 'route_id', how = 'inner')
trip_live = trip_live[(trip_live['agency_id'] == 'OP1')&(trip_live['route_type'] == 3)]

## Add weather

Now we add the weather data, collected with the [Open Weather history API](https://openweathermap.org/history).

In [58]:
weather = pd.read_feather('data/weather_df.feather')
weather['weather_main'].value_counts()

weather_main
Clouds          461
Clear           414
Rain             98
Thunderstorm     24
Mist              6
Drizzle           3
Fog               2
Name: count, dtype: int64

We encode the weather as a binary variable where clear conditions are 0, and anything else is 1.

In [None]:
def weather_to_binary(weather: str):
    condition_to_binary = {"Clouds": 0, "Clear": 0, "Rain": 1, "Drizzle": 1, "Mist": 1, "Thunderstorm": 1, "Fog": 1}
    return condition_to_binary[weather]

weather.weather_main = weather.weather_main.apply(weather_to_binary)

Weather data is hourly, so we use floor division to move from the UNIX timestamp (in seconds) to the integer hours. This is the case for both real time data and weather. 

In [59]:
trip_live['hourly'] = trip_live['time'] // 3600
weather['hourly'] = weather['timestamp'] // 3600
weather.drop('timestamp', axis = 1, inplace=True)
trip_live = trip_live.merge(weather, how='inner', on='hourly')

## Put together sequential updates

Now we put together information related to sequential updates, each (disjoint) pair of records/stops in a trip is put together.

The trick is copying the dataset and then shifting the `stop_sequence` by -1. At this point, an inner join between the two dataset (the copied and the original one) is enough. Suffixes `_pre` and `_post` are enough to distinguish the two sets of fields.

In [60]:
trip_live_shifted = trip_live.copy()
trip_live_shifted['stop_sequence'] = trip_live_shifted['stop_sequence'] - 1
trip_live = trip_live.merge(trip_live_shifted, how = 'inner', on = ['route_id', 'trip_id', 'start_time', 'start_date', 'stop_sequence'], suffixes=('_pre', '_post'))

## Add stop info and (actual) distances

Now we add stop info, both for what concerns stop specific information and for what concerns the stop sequence. With `stop_times` we also get the [travelled distance in-between stops](https://developers.google.com/transit/gtfs/reference?hl=en#stop_timestxt). The problem is that the foreign key for `stop_times` is trip-related, and static data on trips is continuously updated (once a day); therefore, we cannot really do an inner join based on `trip_id` for what concerns dynamic/scraped data, in **any case**.

The idea is the following:

- You first perform an inner join between `stop_times` file and `trip_info`, based on `trip_id`s; we are more or less sure that this is consistent, since the text files come from the same GET request.
- We again use the trick of subtracting 1 to the stop sequence in order to join two stops in a sequence in the same row. This is basically an inner join of the dataframe with itself, in a way.
- We end up with a dataframe linking `route_id`s, `stop_id`s of the departing stops, `stop_id`s of the arrival stops and the distance travelled along the shape for each piece of route/edge. 
- At this point, finally, we can perform an inner join between the `trip_live` dataframe and the one we have been building here, using as key four attributes: [`route_id`, `stop_id_pre`, `stop_id_post`].

After everything, we simply compute the actual distance between stops by taking the difference between the cumulative traveled distance fields.

In [61]:
route_stop = trip_info.merge(stop_times, on='trip_id')
route_stop_shifted = route_stop.copy()
route_stop_shifted['stop_sequence'] -= 1

In [62]:
route_stop = route_stop.merge(route_stop_shifted, on = ['trip_id', 'route_id', 'stop_sequence'], suffixes=('_pre', '_post'))

In [63]:
route_stop.rename(columns={'stop_sequence':'stop_sequence_pre'}, inplace=True)
route_stop = route_stop[['route_id', 'stop_id_pre', 'stop_id_post', 'shape_dist_traveled_pre', 'shape_dist_traveled_post']]
route_stop = route_stop.drop_duplicates(subset=['route_id', 'stop_id_pre', 'stop_id_post'], ignore_index=True)

In [64]:
trip_live = trip_live.merge(route_stop, on=['route_id', 'stop_id_pre', 'stop_id_post'])

In [65]:
trip_live['stop_distance'] = trip_live['shape_dist_traveled_post']-trip_live['shape_dist_traveled_pre']
del route_stop, route_stop_shifted

## Add day of the week

We add the day of the week (as integer) in order to be more specific when building the (filtered) graph.

In [66]:
trip_live['time_pre_datetime'] = pd.to_datetime(trip_live['time_pre'], origin='unix', unit = 's', utc=True).dt.tz_convert('Europe/Rome')
trip_live['day_of_week'] = trip_live.time_pre_datetime.dt.weekday

## Add time
We take the difference between (prediction) timestamps to get the elapsed time between stops.

In [67]:
trip_live['elapsed'] = trip_live['time_post']-trip_live['time_pre']
trip_live = trip_live[trip_live['elapsed'] > 0]

Before continuing, we remove from `trip_live` the columns which are not necessary.

In [68]:
trip_live = trip_live[['weather_main_post', 'day_of_week', 'time_pre_datetime', 'elapsed', 'stop_distance', 'stop_id_post']]

In [69]:
trip_live.reset_index(inplace=True, drop=True)

In [70]:
trip_live.to_feather('trip_live.feather')

In [71]:
trip_live

Unnamed: 0,weather_main_post,day_of_week,time_pre_datetime,elapsed,stop_distance,stop_id_post
0,Clouds,5,2023-05-06 16:54:52+02:00,72,340,20215
1,Clear,5,2023-05-06 17:32:40+02:00,86,340,20215
2,Clear,5,2023-05-06 18:15:46+02:00,76,340,20215
3,Clear,5,2023-05-06 18:40:48+02:00,54,340,20215
4,Clear,5,2023-05-06 19:00:56+02:00,40,340,20215
...,...,...,...,...,...,...
15041192,Clear,3,2023-06-15 03:00:20+02:00,388,5208,71212
15041193,Clear,1,2023-06-13 03:07:00+02:00,88,5208,71212
15041194,Clear,1,2023-06-13 03:36:22+02:00,580,5208,71212
15041195,Clear,6,2023-05-07 01:00:42+02:00,216,1468,71460


# Trend filtering, validation and application
## Function to build the signal over the graph vertexes
The following is a function to define the signal over the vertexes. The way we are defining the signal over the vertex set is the following: 
> The signal for each vertex is the average value (averaging along the observations gathered through some filtering options) of the elapsed time to get to a specific vertex from the previous divided by the (shape) distance between the previous vertex to the next.

The resulting graph is thus related only to a specific filtering option (based on weather, day of the week and time of the day). We allow the choice of only one filtering option, in order to have enough observations to estimate meaningful average of the signal instances. 

In [72]:
def vertex_signal(complete_df: pd.DataFrame, routes_graph: nx.Graph, *, weather: Optional[int] = None,
                  day: Optional[int] = None, time: Optional[tuple[int, int]] = None) -> nx.Graph:
    """
    Function assigning signal over the public transport graph vertexes by averaging across the inbound edges
    elapsed time, according to a specific filtering option, passed as keyword argument.
    :param complete_df: The dataframe containing the preprocessed data.
    :param routes_graph: The graph, already built.
    :param weather: Main weather conditions, either 0 or 1.
    :param day: The day of the week, as integer in the range [0, 6].
    :param time: The daytime, as an interval specified by a tuple of two integers.
    :return: The graph with the signal defined over the vertex set.
    """
    routes_graph = routes_graph.copy()

    if sum([(weather is None), (day is None), (time is None)]) != 2:
        raise TypeError(
            'This functions builds the graph according to only one filtering option, you have to pass one and only one.')
    if weather is not None:
        mask = (complete_df['weather_main_post'] == weather)
    elif day is not None:
        mask = (complete_df['day_of_week'] == day)
    else:
        start_time, end_time = pd.to_datetime(time[0]).time(), pd.to_datetime(time[1]).time()
        mask = ((complete_df.time_pre_datetime.dt.time >= start_time) & (complete_df.time_pre_datetime.dt.time <= end_time))

    complete_df = complete_df[mask]
    if not len(complete_df):
        raise (ValueError('The filtering option you passed is wrong since no observation has matching fields.'))

    pd.options.mode.chained_assignment = None
    complete_df['elapsed'] /= complete_df['stop_distance']
    pd.options.mode.chained_assignment = 'warn'

    complete_df = complete_df[['elapsed', 'stop_id_post']].groupby('stop_id_post').mean()
    nx.set_node_attributes(routes_graph, complete_df.to_dict('index'))

    delete_vx = [x[0] for x in routes_graph.nodes('elapsed') if x[1] is None]
    routes_graph.remove_nodes_from(delete_vx)
    routes_graph.remove_nodes_from(list(nx.isolates(routes_graph)))
    return routes_graph

In [73]:
for node in nx.isolates(init_graph):
    print(node)

In [74]:
trip_live.columns

Index(['weather_main_post', 'day_of_week', 'time_pre_datetime', 'elapsed',
       'stop_distance', 'stop_id_post'],
      dtype='object')

In [75]:
test_graph = vertex_signal(trip_live, init_graph, wtr = 'clear')

## Trend filtering test, piecewise linear case ($D = \Delta^{(2)})$

After having built the graph, we define the function producing the difference operator from the graph according to [Tibshirani R. et al. (2015)](https://jmlr.org/papers/volume17/15-147/15-147.pdf).

In [76]:
import scipy
def difference_op(graph: nx.Graph, order: int) -> scipy.sparse.csr_array:
    """
    Produces a linear difference operator for graph trend filtering according to Tibshirani R. et al. (2015).
    :param graph: The graph from which to build the difference linear operator.
    :param order: The order of the difference operator.
    :return: The difference operator as a SciPy sparse row matrix.
    """
    if order == 1:
        out = nx.incidence_matrix(graph, oriented=True)
    elif order == 2:
        out = nx.laplacian_matrix(graph)
    elif not order % 2:
        out = scipy.sparse.csr_matrix(nx.laplacian_matrix(graph))**(order/2)
    else:
        out = (nx.incidence_matrix(graph, oriented=True) @ 
               scipy.sparse.csr_matrix(nx.laplacian_matrix(graph))**((order-1)/2))
    
    return out

Now, [remember](https://jmlr.org/papers/volume17/15-147/15-147.pdf) that what we are aiming at is a non-parametric regression in a Generalized Lasso problem fashion:
$$
\begin{align}
\min_{\beta} & \quad \frac{1}{2} \lVert Y - \beta \rVert_2^2 + \lambda \lVert D \beta \rVert_1 \\
\end{align}
$$
Where $D$ is an arbitrary difference (linear) operator specifying the signal structure we are enforcing through L1 penalization. The first term of the loss function is strictly convex in $\beta$, while the second term is convex in $\beta$ being a composition of a convex mapping and a linear mapping in $\beta$: therefore the problem is (strictly) convex and has a (unique) solution, being the loss coercive in $\beta$.

In [77]:
difference = difference_op(test_graph, 2)
vector_time = np.array([x[1] for x in test_graph.nodes(data = 'elapsed')])

In order to get the solution out, since this is a convex problem, we use CVXPY with the CVXOPT solver, and more info can be found [here](https://github.com/elsonidoq/py-l1tf/blob/master/l1tf/impl.py).

In [78]:
vlambda = 0.1 # Choosing the regularization hyperparameter
x = cp.Variable(shape=len(vector_time)) # Variable
obj = cp.Minimize((1/2) * cp.sum_squares(vector_time - x)
                  + vlambda * cp.norm(difference @ x, 1) ) # defining the optimization problem
prob = cp.Problem(obj)

In [81]:
prob.solve(solver = cp.CVXOPT, verbose = True)
print('Solver status: {}'.format(prob.status))

                                     CVXPY                                     
                                     v1.3.2                                    
(CVXPY) Sep 07 12:56:29 PM: Your problem has 5688 variables, 0 constraints, and 0 parameters.
(CVXPY) Sep 07 12:56:29 PM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Sep 07 12:56:29 PM: (If you need to solve this problem multiple times, but with different data, consider using parameters.)
(CVXPY) Sep 07 12:56:29 PM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
-------------------------------------------------------------------------------
                                  Compilation                                  
-------------------------------------------------------------------------------
(CVXPY) Sep 07 12:56:29 PM: Compiling problem (target solver=CVXOPT).
(CVXPY) Sep 07 12:56:29 PM: Reduction chain: Dcp2Cone -> CvxAttr2Constr -> ConeMatrixStuffin

In [86]:
x.value

array([0.32698475, 0.36641572, 0.25845404, ..., 0.33481919, 0.34364668,
       0.23764177])

In [87]:
congestion_dict = dict(zip(test_graph.nodes, x.value))

## Validate the trend-filtering mess

Our idea to validate the model in order to get a good $\lambda$ is the following:
- First of all we split the data in a train and a validation set, according to the time axis. The holdout set consists of 8 days, being the 20% of the overall data, starting from June 9th onwards.
- Once this is done, we perform trend filtering on the graph built with a single filter specification, with a specific choice of $\lambda$. What we get back is a dictionary, mapping each stop to its congestion value, the signal we are modelling defined as above.
- Now we take the holdout set, and we filter it according to the same filter we have used in order to build the graph to be trend filtered. 
- For each resulting row, we consider the arrival stop and the distance between the stops. We multiply the distance by the estimated congestion signal of the stop. Then we take the difference between the estimated elapsed time and the actual (forecasted...) elapsed time. Then we take the average of the absolute values of that quantity, which is our validation metric.
- We end up with a different chosen $\lambda$ for each filtering option. This makes sense, since the assumption that a unique $\lambda$ would work for all filtering options cannot hold; think about intervals of time when it rains, they are of course sparser than sunny or cloudy time periods, and it is thus more than reasonable that the information estimated when it rains is generally less representative. 

In [88]:
val_mask = trip_live['time_pre_datetime'] >= pd.to_datetime('2023-06-09').tz_localize("Europe/Rome")
train_data = trip_live[~val_mask]
val_data = trip_live[val_mask]

In [89]:
from typing import Dict
def trend_filter_validate(train: pd.DataFrame, val: pd.DataFrame, routes_graph: nx.Graph, lambda_seq: tuple[float, ...],
                          cond_filter: tuple) -> Dict[float, np.ndarray]:
    """Runs a validation using trend filtering on a given train-test split.

    :arg
        train (pd.DataFrame): the training data.
        val (pd.DataFrame): the validation data.
        routes_graph (nx.Graph): the networkx graph of bus routes.
        lambda_seq (tuple[float, ...]): the sequence of lambda values to try.
        cond_filter (tuple[float, ...]): the filter used to select validation data. A tuple with a key that is either
        "weather", "day", "time", and a corresponding value, e.g. ("day", 0) for Monday.

    :return
        (dict) a dictionary with validation metrics.
    """
    cond_filter_dict = {cond_filter[0]: cond_filter[1]}

    # Building the unfiltered graph on training data
    train_graph = vertex_signal(train, routes_graph, **cond_filter_dict)

    difference_operator = difference_op(train_graph, 2)
    time_vec = np.array([x[1] for x in train_graph.nodes(data='elapsed')])
    metric_dict = {}

    # Filtering the validation data
    if cond_filter[0] == 'weather':
        mask = (val['weather_main_post'] == cond_filter[1])
    elif cond_filter[0] == 'day':
        mask = (val['day_of_week'] == cond_filter[1])
    elif cond_filter[0] == 'time':
        start_time, end_time = pd.to_datetime(cond_filter[0]).time(), pd.to_datetime(cond_filter[1]).time()
        mask = ((val_data.time_pre_datetime.dt.time >= start_time) & (val_data.time_pre_datetime.dt.time <= end_time))
    else:
        raise ValueError('Illegal filtering option.')
    val = val[mask]

    for value_lambda in lambda_seq:
        # Filtering on training data
        x = cp.Variable(shape=len(time_vec))
        loss = cp.Minimize((1 / 2) * cp.sum_squares(time_vec - x)
                           + value_lambda * cp.norm(difference_operator @ x, 1))
        problem = cp.Problem(loss)
        problem.solve(solver=cp.CVXOPT, verbose=False)
        congestion_df = pd.DataFrame(zip(train_graph.nodes, x.value), columns=['stop_id_post', 'congestion'])

        # Compute validation metric for specific lambda
        val_congestion = val.merge(congestion_df, on='stop_id_post')
        error = np.absolute(val_congestion['congestion'] * val_congestion['stop_distance'] - val_congestion['elapsed'])
        metric_dict[value_lambda] = error

    return metric_dict

## The application

Once we have chosen the right $\lambda$ for each filter option, we can real-time query the system according to the required conditions (time, weather and day of the week), the system then retrieves the specific trend filtered sets of congestion values (three sets each time). In order to get unique values for each inbound stop/vertex, we perform a convex combination between the values coming from the three different sets (coming from three different filters). built for each of them. If a stop is missing from any of the sets (it may well be), we simply rescale the convex combination weights and consider only the sets where the stop is present. 