# Unify events with identical timestamps

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google/temporian/blob/last-release/docs/src/recipes/aggregate_duplicated.ipynb)

This recipe shows how to avoid having duplicated timestamps in an `EventSet`. Events with identical timestamps are aggregated with a moving window operation (e.g: sum, average, max, min), preserving the original timestamp values (which may be non-uniform).


For example, assume we've asynchronous sensor measurements, potentially from different sources. If there are two measurements at the same exact timestamp, we want to unify them and take their average value.

## Example data

Let's define some events with non-uniform timestamps to illustrate the use case. Some of the timestamps are repeated, those are the ones that we'll unify.

But, we've to be careful because there are events very close in time, but not actually duplicated. We don't want to interfere with those.

In [None]:
import temporian as tp

sensor_evset = tp.event_set(timestamps=[1.1, 2.01, 2.02, 2.02, 3.5, 3.51, 3.51, 4.5, 5.0],
                            features={"y": [1., 2., 3., 4., 5., 6., 7., 8., 9.],
                                      "z": [10., 20., 30., 40., 50., 60., 70., 80., 90.]
                                     }
                           )
sensor_evset.plot()

## Solution

In order to unify only the events with the exact same timestamp, we need to:
1. Get the list of unique timestamps.
2. Aggregate events at the exact same timestamp, making sure the moving window doesn't overlap with nearby measurements.

### 1. Get unique timestamps

The first step is to create a new sampling removing the duplicated timestamps at `2.02` and `3.51`:

In [None]:
# Remove duplicated timestamps
unique_t = sensor_evset.unique_timestamps()
unique_t

### 2. Moving window with shortest length

To create a moving window that doesn't overlap with two different timestamps at any point, it must be smaller than the smallest possible step. But we want a solution that works for any resolution, from daily sales to nano-second sensor measurements.

In `tp.duration.shortest`, we've defined the shortest possible interval that can be represented with a `float64` timestamp at maximum resolution:

In [None]:
shortest_length = tp.duration.shortest
shortest_length

Pretty small, right? Since null durations are not allowed, this is as close to zero as we can get. It's guaranteed that you'll never overlap two different timestamps using this.

Now we just need to run the aggregation function that we need, providing this small number as `window_length` and the unique timestamps as `sampling`:

In [None]:
unified_evset = sensor_evset.simple_moving_average(window_length=shortest_length, sampling=unique_t)
unified_evset

Of course, instead of the average value, other moving window operations like `moving_min` or `moving_max` could make more sense depending on the use case. If multiple measurements are expected at each timestamp, you could also want the moving standard deviation to get a confidence interval.

Also, keep in mind that this exact procedure would work well in an `EventSet` with multiple indexes, removing the duplicated timestamps in each index separately.

But let's keep the example simple for now 🙂