The program is used to calculate distances travelled by users for various activities. The program uses multiprocessing to handle large datasets.


The dataset is assumed to have columns latitude, longitude, altitude,time-stamp trajectory_id and individual_id where each row corresponds to a location point in a particular trajectory for an individual. Trajectories correspond to outdoor movements, including daily routines such as commuting and non-routine activities like leisure and sports.

In [None]:
import pandas as pd
import numpy as np
from math import radians, sin, cos, sqrt, atan2
from multiprocessing import Pool
import time



calculate_trajectory_distance(trajectory) is used to calculate total distance travelled in each trejectory. The function (in this case a dataframe of trejectory_id) takes a dataframe as parameter and create lat1, lat2, lng1 and lng2 numpy arrays. lat1 and lng2 are the orginal points and lat2 and lng2 points are immediate next points. haversine formula is uded to calculate distance each pair of points and then sumed to get total distance in 0ne trejectory

In [None]:
def calculate_trajectory_distance(trajectory):
    #get lat1 and lat2
    lat1 = trajectory['latitude'].iloc[:-1]
    lat2 = trajectory['latitude'].shift(-1).iloc[:-1]
    #shift the points by one to get second pair of points
    lng1 = trajectory['longitude'].iloc[:-1]
    lng2 = trajectory['longitude'].shift(-1).iloc[:-1]
    
    #haversine formula
    dlat = lat2 - lat1
    dlng = lng2 - lng1
    
    a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlng/2)**2
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
    distance = 6371 * c
    
    return distance.sum()

calculate_total_distances() function loads data fromcombined_trajectories.csv into a dataframe. converts latituted and longitudes into radians. It Uses Multiproccessing to call calculate_trajectory_distance on each group of 'individual_id', 'trajectory_id'. it stores trejectory distances and uses it to calculate total distance for every individual.



In [None]:
def calculate_total_distances():
    df = pd.read_csv("Dataset.csv")
    
    df['latitude'] = np.radians(df['latitude']) #convert toradians
    df['longitude'] = np.radians(df['longitude'])

    grouped = df.groupby(['individual_id', 'trajectory_id'])
    
    #multiprocess by calling calculate_trajectory_distance on each group simultaneosly
    pool = Pool() 
    trejectory_distances = pool.map(calculate_trajectory_distance, [grouped.get_group(x) for x in grouped.groups])
    pool.close()
    pool.join()
    
    #create dataframe with all trajectory distances
    individual_ids = [key[0] for key in grouped.groups.keys()]
    trajectory_ids = [key[1] for key in grouped.groups.keys()]
    trejectory_distance_df = pd.DataFrame({'individual_id': individual_ids, ' trajectory_id': trajectory_ids, 'total_distance': trejectory_distances})
    
    # Create a new dataframe with the total distance for each individual_id
    total_distances_df = trejectory_distance_df.groupby(['individual_id'])['total_distance'].sum().reset_index()
    return total_distances_df

In [None]:
start_time = time.time()
total_distance=calculate_total_distances()
end_time = time.time()
print(total_distance)
print(f"Total runtime for the function: {end_time - start_time:.3f} seconds.")