Primero, debe ser claro que para identificar el home location solo necesitas un stop table y una columna que tenga un location id, o un geohash o cualquier teselación. Por qué no hacemos que el punto de partida sea un stop table? Cargar un stop table que tenga trayectorias de todo un mes.
7:38
o más, digamos, para 5 usuarios.
7:39
Entonces el notebook puede ser sobre el módulo visit_attribution en general (más o menos lo que estás haciendo ahí de todas formas).
7:43
Lo primero es mostrar como se puede usar point_in_polygon para atribuir visitas a ubicaciones. Segundo, que hay otro método que usa la moda de los pings en el stop (para el cuál habría que cargar los labels también), y visualizar eso.
Parte dos es como se puede usar ese output muy fácilmente para estimar la casa:
Calcular el duration at night de cada stop. Equivalente a la columna "duration" pero sólo contando minutos que ocurran entre los dos parámetros de tiempo. Explicas eso.
Luego se filtra, y se cuentan días y semanas. Y ahí podemos mostrar el output que tienes en la última celda.
Finalmente, una función muy sencilla, dados unos parámetros como dusk_time, dawn_time, min_nights, min_weeks, min_night_dwell, window_weeks te retorna la ubicación más frecuente en la noche que cumpla esas restricciones.
(edited)
7:44
Como en el otro notebook, lo más importante es que haya un parrafito donde explicamos un poco qué está pasando.
7:44
Así como hiciste en el de stop detection.
7:45
Y habrá algunas visualizaciones o algo, si.

# Tutorial X: Home Attribution from Stop Table

This notebook shows how to process a stop table for detecting homes, using ```nomad```. 



Stop detection is an important step in pre-processing trajectory data and in making sense of trajectories by grouping together pings that reflect stationary behavior. The output of stop-detection algorithms is commonly a "stop table", indicating when a stop started, its duration, and a pair of coordinates that approximates the location of the group of pings (typically the centroid). Alternatively, ```nomad``` allows users to retrieve a cluster label for each ping (useful for plotting, for example).

In [None]:
%load_ext autoreload
%autoreload

In [None]:
def dawn_time(day_part, dawn_hour=6):
    s,e = day_part
    return np.min([(e.hour*60 + e.minute),dawn_hour*60]) - np.min([(s.hour*60 + s.minute),dawn_hour*60]) 

def dusk_time(day_part, dusk_hour=19):
    s,e = day_part
    return np.max([(e.hour*60 + e.minute)-dusk_hour*60,0]) - np.max([(s.hour*60 + s.minute)-dusk_hour*60, 0])

def slice_datetimes_interval_fast(start, end):
    full_days = (datetime.combine(end, time.min) - datetime.combine(start, time.max)).days
    if full_days >= 0:
        day_parts = [(start.time(), time.max), (time.min, end.time())]
    else:
        full_days = 0
        day_parts = [(start.time(), end.time()), (start.time(), start.time())]
    return full_days, day_parts

def duration_at_night_fast(start, end, dawn_hour = 6, dusk_hour = 19):
    full_days, (part1, part2) = slice_datetimes_interval_fast(start, end)
    total_dawn_time = dawn_time(part1, dawn_hour)+dawn_time(part2, dawn_hour)
    total_dusk_time = dusk_time(part1, dusk_hour)+dusk_time(part2, dusk_hour)
    return int(total_dawn_time + total_dusk_time + full_days*(dawn_hour + (24-dusk_hour))*60)

def clip_stays_date(traj, dates, dawn_hour = 6, dusk_hour = 19):
    start = pd.to_datetime(traj['start_datetime'])
    duration = traj['duration']

    # Ensure timezone-aware clipping bounds
    tz = start.dt.tz
    date_0 = pd.Timestamp(parse(dates[0]), tz=tz)
    date_1 = pd.Timestamp(parse(dates[1]), tz=tz)

    end = start + pd.to_timedelta(duration, unit='m')

    # Clip to date range
    start_clipped = start.clip(lower=date_0, upper=date_1)
    end_clipped = end.clip(lower=date_0, upper=date_1)

    # Recompute durations
    duration_clipped = ((end_clipped - start_clipped).dt.total_seconds() // 60).astype(int)
    duration_night = [duration_at_night_fast(s, e, dawn_hour, dusk_hour) for s, e in zip(start_clipped, end_clipped)]

    return pd.DataFrame({
        'id': traj['id'].values,
        'start': start_clipped,
        'duration': duration_clipped,
        'duration_night': duration_night,
        'location': traj['location']
    })

def count_nights(usr_polygon, dawn_hour = 6, dusk_hour = 19, min_dwell = 10):   
    nights = set()
    weeks = set()

    for _, row in usr_polygon.iterrows():
        d = row['start']
        d = pd.to_datetime(d)
        full_days, (part1, part2) = slice_datetimes_interval_fast(d, d + pd.to_timedelta(row['duration'], unit='m'))

        dawn1 = dawn_time(part1, dawn_hour)
        dusk1 = dusk_time(part1, dusk_hour)
        dawn2 = dawn_time(part2, dawn_hour)
        dusk2 = dusk_time(part2, dusk_hour)

        if full_days == 0:
            if dawn1 >= min_dwell:
                night = d - timedelta(days=1)
                nights.add(night.date())
                weeks.add((night - timedelta(days=night.weekday())).date())

            if (dusk1 + dawn2) >= min_dwell:
                night = d
                nights.add(night.date())
                weeks.add((night - timedelta(days=night.weekday())).date())

            if dusk2 >= min_dwell:
                night = d + timedelta(days=1)
                nights.add(night.date())
                weeks.add((night - timedelta(days=night.weekday())).date())
        else:
            if dawn1 >= min_dwell:
                night = d - timedelta(days=1)
                nights.add(night.date())
                weeks.add((night - timedelta(days=night.weekday())).date())

            for t in range(full_days + 1):
                night = d + timedelta(days=t)
                nights.add(night.date())
                weeks.add((night - timedelta(days=night.weekday())).date())

            if dusk2 >= min_dwell:
                night = d + timedelta(days=full_days + 1)
                nights.add(night.date())
                weeks.add((night - timedelta(days=night.weekday())).date())

    identifier = usr_polygon['id'].iloc[0]
    location = usr_polygon['location'].iloc[0]

    return pd.DataFrame([{
        'id': identifier,
        'location': location,
        'night_count': len(nights),
        'week_count': len(weeks)
    }])


def night_stops(stop_table, user='user', dawn_hour = 6, dusk_hour = 19, min_dwell = 10):
    # Date range
    start_date = str(stop_table['start_datetime'].min().date())
    weeks = stop_table['start_datetime'].dt.strftime('%Y-%U')
    num_weeks = weeks.nunique()

    # turn dates to datetime
    stop_table['start_datetime'] = pd.to_datetime(stop_table['start_datetime'])

    if 'id' not in stop_table.columns:
        stop_table['id'] = user

    end_date = (parse(start_date) + timedelta(weeks=num_weeks)).date().isoformat()
    dates = (start_date, end_date)
    df_clipped = clip_stays_date(stop_table, dates, dawn_hour, dusk_hour)
    df_clipped = df_clipped[(df_clipped['duration'] > 0) & (df_clipped['duration_night'] >= 15)]
    
    return df_clipped.groupby(['id', 'location'], group_keys=False).apply(count_nights(dawn_hour, dusk_hour, min_dwell)).reset_index(drop=True)

In [None]:
import geopandas as gpd

In [None]:
import nomad.io.base as loader
import nomad.visit_attribution as va
import nomad.stop_detection.lachesis as LACHESIS

In [None]:
traj_cols = {'uid':'uid',
             'x':'x',
             'y':'y',
             'timestamp':'timestamp'}

diaries_df = loader.from_file("../nomad/data/diaries", format="parquet", traj_cols=traj_cols,
                       parse_dates=True)
sparse_df = loader.from_file("../nomad/data/sparse_traj/", format="parquet", traj_cols=traj_cols,
                      parse_dates=True)
poi_table = gpd.read_file('garden_city.geojson')

# Reproject from gc_coords to web mercator
sparse_df.loc[:,'x'] = (sparse_df['x'] - 4265699)/15
sparse_df.loc[:,'y'] = (sparse_df['y'] + 4392976)/15

diaries_df.loc[:,'x'] = (diaries_df['x'] - 4265699)/15
diaries_df.loc[:,'y'] = (diaries_df['y'] + 4392976)/15

# Select data from 1 user
user = diaries_df.uid.unique()[0]
user_sample = sparse_df.loc[sparse_df['uid'] == user]

user_sample

In [None]:
DUR_MIN=5
DT_MAX=60
DELTA_ROAM=100

traj_cols = {'uid':'uid',
             'x':'x',
             'y':'y',
             'datetime':'local_timestamp'}

stop_table_lachesis = LACHESIS.lachesis(traj=user_sample,
                                        dur_min=DUR_MIN,
                                        dt_max=DT_MAX,
                                        delta_roam=DELTA_ROAM,
                                        traj_cols=traj_cols,
                                        keep_col_names=False,
                                        complete_output=True,
                                        datetime = 'local_timestamp')

labels_lachesis = LACHESIS._lachesis_labels(traj=user_sample,
                                            dur_min=DUR_MIN,
                                            dt_max=DT_MAX,
                                            delta_roam=DELTA_ROAM,
                                            traj_cols=traj_cols,
                                            datetime = 'local_timestamp')
labels_lachesis.name = 'cluster'

pred_lachesis = va.point_in_polygon(traj=user_sample,
                 labels=labels_lachesis,
                 stop_table=stop_table_lachesis,
                 poi_table=poi_table,
                 traj_cols=traj_cols,
                 is_datetime=True,
                 is_long_lat=False)

pred_lachesis

In [None]:
va.night_stops(pred_lachesis, user)