Skip to content
This repository has been archived by the owner on May 27, 2024. It is now read-only.

An application to analyze travel behavior data from OneBusAway users

License

Notifications You must be signed in to change notification settings

CUTR-at-USF/onebusaway-travel-behavior-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

onebusaway-travel-behavior-analysis Python application

Python utilities to process and analyze travel behavior data collected by OneBusAway and exported as csv file by oba-firebase-export.

matchAndMerge utility

This utility matches ground truth data activities in excel format with activities data generated by oba-firebase-export in csv format. The algorithm will match each activity in the ground truth dataset with the nearest activity in the oba-firebase-export generated file.

Run

To run the matchAndMerge script use python command and pass the desired arguments: python matchAndMerge.py --obaFile obaFile.csv --gtFile gtFile.xlsx

Required Input Data

matchAndMerge works over two expected input data files, one generates from OBA firebase export and one containing the ground truth data:

  • --obaFile <oba csv file> A csv file generated by OBA firebase export. The csv file must include the following columns:
    • Activity Start Date and Time* (UTC) Date and time recorded for the start of the activity using UTC timezone.
    • Origin location Date and Time (*best) (UTC) Date and time recorded for the location where the activity started using UTC timezone.
    • Duration* (minutes) Duration of the activity in minutes.
    • Origin-Destination Bird-Eye Distance* (meters) Euclidean distance (meters) between origin and destination recorded for the activity.
    • Google Activity Detected activity including Android supported activities plus 'OBA firebase export' additional activities ('IN_VEHICLE', 'ON_BICYCLE', 'RUNNING', 'WALKING', 'WALKING/RUNNING', 'STILL')
    • --gtFile <ground truth xlsx file> A xlsx file that must be formatted as shown below. The main (required) column descriptions are:
    • GT_Collector User name of the GT data collector
    • GT_Mode Activity mode ('WALKING', 'IN_VEHICLE', 'STILL', 'ON_BICYCLE', 'IN_BUS')
    • GT_Date Date of the recorded activity
    • GT_TimeOrig Time recorded at the origin of the recorded activity
    • GT_TimeMinuteRounded One (1) if the GT_TimeOrig value was rounded to the closest minute while recording the activity, zero (0) otherwise.
    • GT_TimeZone Time zone for the recorded activity
    • GT_TimeDest Time recorded at the destination of the recorded activity
    • GT_TimeDestMinuteRounded One (1) if the GT_TimeDest value was rounded to the closest minute while recording the activity, zero (0) otherwise.
GT_Collector GT_TourID GT_TripID GT_Mode GT_Date GT_TimeOrig GT_TimeMinuteRounded GT_TimeZone GT_LatOrig GT_LonOrig GT_LocationOrig GT_TimeDest GT_TimeDestMinuteRounded GT_LatDest GT_LonDest GT_LocDest
DoeJohn 1 1 IN_VEHICLE 3/4/2021 3:28:15 PM 0 America/Chicago 33.588713 -76.33308 2045 Small St 3:40:10 PM 1 35.617885 -76.312499 305 Large Dr
DoeJohn 1 2 WALKING 3/4/2021 3:41:51 PM 0 America/Chicago 23.617885 -86.312499 305 Holly Dr 3:58:01 PM 0 43.615829 -86.305452 Red Pen River
DoeJohn 1 3 STILL 3/4/2021 3:58:20 PM 0 America/Chicago 35.615829 -61.305452 Red Pen River 4:19:05 PM 0 56.615829 -61.305452 Red Pen River
DoeJohn 1 4 WALKING 3/4/2021 4:20:00 PM 1 America/Chicago 43.615829 -67.305452 Red Pen River 4:59:15 PM 0 65.617885 -67.312499 305 Holly Dr

Additional Optional Command Line Arguments

  • --outputDir <data folder> Takes a string with the name of the folder where the merged data and log files will be stored. If the folder does not exist, the application will try to create it. The default values is merger_output. Example usage: --outputDir outputData will look for the folder outputData.
  • --minActivityDuration <minutes> Minimum activity time span (in minutes), shorter activities will be dropped before merging. The default values is 5 minutes. For example --minActivityDuration 3 will remove, from the oba generated data, activities whose duration is less than 3 minutes.
  • --minTripLength <meters> Minimum distance (in meters) for a trip. Shorter trips will be dropped before merging. The default values is 50 meters. Example usage: --minTripLength 60 will remove, from the oba generated data, activities whose Origin-Destination Bird-Eye Distance* (meters) is less than 60 meters.
  • --tolerance <milliseconds> Maximum tolerated difference (milliseconds) between matched ground truth data start activity and OBA data start activity. By default, it is 3000 milliseconds. Example usage: --tolerance 5000 will consider only a difference equal or less than 5000 milliseconds while looking for a match between a ground truth data start activity and a OBA data start activity.
  • --iterateOverTol When used, the merging process is applied over tolerances iterating from 30000 to tolerance in steps of 30000. By default, the merging process will only be applied once over the tolerance defined by tolerance. Example usage: --no-iterateOverTol.
  • --no-removeStillMode When used, preprocess of input datasets will not eliminate the records with activity mode equal to STILL. By default, preprocess of input dataset eliminates the records with activity mode equal to STILL. Example usage: --no-removeStillMode.
  • --mergeOneToOne This flag will force the merging system to merge each Ground Truth trip with one and only one OBA record according to the other command line parameters. By default, this flag is set to False. In such case, the merger will match each Ground Truth trip with all the OBa records that starts after the Ground Truth trip starts and before the Ground Truth trip ends. Example usage: --mergeOneToOne
  • --repeatGtRows This flag will force the merging system to repeat a GT trip as many rows as matches are found before exporting the output. By default, this flag is set to False. In such case, the merger wil only include one GT data row per trip while merging with a device. Example usage: --repeatGtRows
  • --deviceList <User ID txt file> Takes a string with the name of a txt file including the IDs of devices to be used for match and merge. The whole list of devices must go in the first row of the txt file. The list of devices must be comma separated. Example usage: --deviceList "fileWithDeviceIDs.txt".

Output file format

The output csv file generated by the matchAndMerge.py script has the following format:

GT_Collector GT_Date GT_TimeOrig GT_TimeOrigMinuteRounded GT_TimeZone GT_LatOrig GT_LonOrig GT_LocationOrig GT_TimeDest GT_TimeDestMinuteRounded GT_LatDest GT_LonDest GT_LocDest GT_Comments GT_DateTimeCombined GT_DateTimeDestCombined GT_TourID GT_TripID GT_Mode GT_DateTimeOrigUTC_Backup GT_DateTimeDestUTC Google Activity Activity Start Date and Time* (UTC) Activity Destination Date and Time* (UTC) Manual Assignment Trip ID User ID Device Trip ID Google Activity Confidence Time_Difference Distance_Difference Vehicle type Region ID Origin location Date and Time (*best) (UTC) Activity Start/Origin Time Diff* (minutes) Origin latitude (*best) Origin longitude (*best) Origin Horizontal Accuracy (meters) (*best) Origin Location Provider (*best) Destination Location Date and Time (*best) (UTC) Activity End/Destination Time Diff* (minutes) Destination latitude (*best) Destination longitude (*best) Destination Horizontal Accuracy (meters) (*best) Destination Location Provider (*best) Duration* (minutes) Origin-Destination Bird-Eye Distance* (meters) Chain ID Chain Index Tour ID Tour Index Ignoring Battery Optimizations Talk Back Enabled Power Save Mode Enabled Origin fused Date and Time (UTC) Origin fused latitude Origin fused longitude Origin fused Horizontal Accuracy (meters) Origin gps Date and Time (UTC) Origin gps latitude Origin gps longitude Origin gps Horizontal Accuracy (meters) Origin network Date and Time (UTC) Origin network latitude Origin network longitude Origin network Horizontal Accuracy (meters) Destination fused Date and Time (UTC) Destination fused latitude Destination fused longitude Destination fused Horizontal Accuracy (meters) Destination gps Date and Time (UTC) Destination gps latitude Destination gps longitude Destination gps Horizontal Accuracy (meters) Destination network Date and Time (UTC) Destination network latitude Destination network longitude Destination network Horizontal Accuracy (meters) GT_DateTimeOrigUTC
DoeJohn 8/14/21 14:39:00 1 America/New_York 36.147942 -82.476045 NoHo Flats - Home 14:46:00 1 36.1475913 -82.4718322 N Newport and W Fig 2021-08-24 14:39:00-04:00 2021-08-24 14:46:00-04:00 1 1 WALKING 2021-08-24 18:39:00+00:00 2021-08-24 18:46:00+00:00 WALKING 2021-08-24 18:43:49+00:00 2021-08-24T18:46:52Z 234 asieEWEfej2aejfh3r4wsp0s343q 289 0.83 289 209.7216397 0 2021-08-24 18:44:27+00:00 0.6166667 36.1481099 -82.4739184 30.714 network 2021-08-24 18:50:33+00:00 3.6666667 36.15018044 -82.46762726 4.7475247 gps 3.0333333 660.2517 73 3 FALSE FALSE FALSE 2021-08-24T18:42:27Z 36.1476199 -82.4746442 15.102 2021-08-24T18:50:33Z 36.15018044 -82.46762726 4.7475247 2021-08-24T18:44:27Z 36.1481099 -82.4739184 30.714 2021-08-24T18:42:27Z 36.1476199 -82.4746442 15.102 2021-08-24T18:50:33Z 36.15018044 -82.46762726 4.7475247 2021-08-24T18:44:27Z 36.1481099 -82.4739184 30.714 2021-08-24 18:39:00+00:00
DoeJohn 8/14/21 14:46:00 1 America/New_York 36.1475913 -82.4718322 N Newport and W Fig 14:57:00 1 36.1522225 -70.4284092 Publix Channelside 2021-08-24 14:46:00-04:00 2021-08-24 14:57:00-04:00 1 2 SCOOTER 2021-08-24 18:46:00+00:00 2021-08-24 18:57:00+00:00 ON_BICYCLE 2021-08-24 18:47:21+00:00 2021-08-24T18:58:08Z 235 asieEWEfej2aejfh3r4wsp0s343q 307 0.99 81 503.471453 0 2021-08-24 18:50:33+00:00 3.1833334 36.15018044 -82.46762726 4.7475247 gps 2021-08-24 18:58:41+00:00 0.53333336 36.152163 -70.4276277 17.765 network 10.783334 1980.3054 73 5 FALSE FALSE FALSE 2021-08-24T18:46:28Z 36.1480456 -82.4719809 79.973 2021-08-24T18:50:33Z 36.15018044 -82.46762726 4.7475247 2021-08-24T18:48:30Z 36.1502028 -82.4716052 87.6 2021-08-24T18:56:40Z 36.152351 -70.4286857 32.03 2021-08-24T19:00:33Z 36.15139218 -70.42861502 8.689676 2021-08-24T18:58:41Z 36.152163 -70.4276277 17.765 2021-08-24 18:46:00+00:00
DoeJohn 8/14/21 14:57:00 1 America/New_York 36.1522225 -70.4284092 Publix Channelside 15:00:00 1 36.1512789 -70.4287438 Grand Central 2021-08-24 14:57:00-04:00 2021-08-24 15:00:00-04:00 1 3 WALKING 2021-08-24 18:57:00+00:00 2021-08-24 19:00:00+00:00 WALKING 2021-08-24 18:58:08+00:00 2021-08-24T19:01:08Z 236 asieEWEfej2aejfh3r4wsp0s343q 308 0.76 68 77.04583302 0 2021-08-24 18:58:41+00:00 0.53333336 36.152163 -70.4276277 17.765 network 2021-08-24 19:00:41+00:00 0.43333334 36.1513342 -70.4287049 10.911 fused 2.9833333 140.25807 73 6 FALSE FALSE FALSE 2021-08-24T18:56:40Z 36.152351 -70.4286857 32.03 2021-08-24T19:00:33Z 36.15139218 -70.42861502 8.689676 2021-08-24T18:58:41Z 36.152163 -70.4276277 17.765 2021-08-24T18:56:40Z 36.152351 -70.4286857 32.03 2021-08-24T19:00:33Z 36.15139218 -70.42861502 8.689676 2021-08-24T18:58:41Z 36.152163 -70.4276277 17.765 2021-08-24 18:57:00+00:00

Acknowledgements

This project was funded under the National Institute for Congestion Reduction (NICR).

License

/*
 * Copyright (C) 2021 University of South Florida
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

About

An application to analyze travel behavior data from OneBusAway users

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages