Setup: The purpose of this code is to clean GPS device point data in recreation and tourism settings. In order to successfully run the code you will need:

(i) GPS device data
(ii) a shapefile of your study location

If necessary, you may also need:

(i) a shapefile of a buffer area for GPS device dropbox areas
(ii) a shapefile of a buffer area for GPS device data download areas

The setup assumes that all GPS tracks and files are saved and stored in the same folder with the appropriate projection system.

In [1]:
import os
import arcpy
import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor

# Get and print the version of arcpy (version used 3.1.2)
print(f"arcpy version: {arcpy.GetInstallInfo()['Version']}")

# Get and print the version of pandas (version used 1.4.4)
print(f"pandas version: {pd.__version__}")

arcpy version: 3.1.2
pandas version: 1.4.4


In [2]:
###Set up key paths

##Base directory folder to retrieve files from and save completed files to for the project
base_dir = r"C:\Users\colby\OneDrive\Documents\Research\Grand Canyon\Raw_GPS_23"

##Folder name for the folder containing the GPS data (must be within base directory)
folder_with_data = "GRCA_GPS_TRACKS_2023"

##The geodatabase you wish to use to temporarily store and work with files as you clean the data
target_gdb = r"C:\Users\colby\OneDrive - The Pennsylvania State University\Documents\ArcGIS\Projects\TTRA_Presentation\TTRA_Presentation.gdb"

In [3]:
##Identify location of relevant shapefiles

##Shapefile of the destination boundary for your study site. Procedure for doing this involved creating a polygon around National Park Service-provided roads and trails in the South Rim and removing entrances/Forest service roads outside the park caught by polygon creation by hand.
destination_boundary = os.path.join(base_dir, "GRCA_SRim_Boundary")

#For some studies, drop boxes may be located within the study boundary. If necessary, create an appropriately sized polygon for removing those points so that they do not generate noise, either by hand or using the buffer tool from the point. In this case, we used the buffer tool set at 150m.
dropbox_buffer = os.path.join(base_dir, "Dropbox_Buffer")

#For some studies, data in GPS devices may be uploaded within the study site and devices may record "ghost" points when they are uploaded at researcher facilities. If necessary, create an appropriately sized polygon for removing points from those locations so that they do not generate noise, either by hand or using the buffer tool from the point. In this case, we used the buffer tool set at 75m from the worker cabins we downloaded points from.
download_buffer = os.path.join(base_dir, "Cabin_Buffer")

In [4]:
##Set your arcpy environment and permit overwrite

arcpy.env.workspace = target_gdb
arcpy.env.overwriteOutput = True

Section 1: In this section you will do the high level importing and cleaning of your data. This will include importing all of your data into your geodatabase, pre-processing your fields, and removing points based on spatial parameters (i.e., destination boundary, dropbox boundary, download area boundary) relevant to your study site, as applicable.

In [5]:
###Import data. In order for this to work, all data must be recorded as a shape file.
in_data = os.path.join(base_dir, folder_with_data)
filter = "*.shp"

arcpy.intelligence.BatchImportData(in_data, target_gdb, filter)

In [6]:
### Combine all separate shapefiles for each unique device/participant into one file

##Set the workspace
workspace = target_gdb

#Making an empty list to put filepaths to feature classes in
feature_classes = []

#Walking workspace recursively checking type, and appending filepath to list
for dirpath, dirnames, filenames in arcpy.da.Walk(workspace,datatype="FeatureClass", type="Point"):
    for filename in filenames:
        desc = arcpy.Describe(os.path.join(dirpath, filename))
        if desc.shapeType == "Point":
            feature_classes.append(os.path.join(dirpath, filename))

#Filename for saving
output = "raw_points"

#merging my list of feature classes to a new dataset
arcpy.Merge_management(feature_classes, output)

In [7]:
##It is unnecessary to retain most fields recorded by GPS devices, however, specific needs for each project may vary. Below are the minimum necessary fields to run the code.

process_raw = os.path.join(target_gdb, "raw_points")

raw_points = pd.DataFrame.spatial.from_featureclass(process_raw)

##Select the field for the object identification (OID) created by ArcGIS
OID = "OBJECTID"

##Select the field for group identification (GID), which refers the participant's unique ID
GID = "tident"

##Select the field(s) for time
time= "ltime"

##Select the field from
shape = "SHAPE"

##Retained fields
retained_fields = [OID,GID,time,shape]

##Create dataframe of necessary points only
raw_points = raw_points[retained_fields]

##Export datapoints back to geodatabase
raw_points.spatial.to_featureclass(os.path.join(target_gdb, "processed_points"))

##Print pandas dataframe
raw_points

Unnamed: 0,OBJECTID,tident,ltime,SHAPE
0,1,010112,2023/05/23 09:40:05,"{""x"": 425226.1964999996, ""y"": 3988277.36350000..."
1,2,010112,2023/05/23 09:40:20,"{""x"": 425227.1490000002, ""y"": 3988277.47619999..."
2,3,010112,2023/05/23 09:40:34,"{""x"": 425226.8783, ""y"": 3988277.6180000007, ""s..."
3,4,010112,2023/05/23 09:40:49,"{""x"": 425226.35730000027, ""y"": 3988277.6317, ""..."
4,5,010112,2023/05/23 09:41:05,"{""x"": 425225.7078999998, ""y"": 3988277.63719999..."
...,...,...,...,...
1456027,1456028,239922,2023/06/24 17:30:16,"{""x"": 397641.2697999999, ""y"": 3990550.77590000..."
1456028,1456029,239922,2023/06/24 17:30:31,"{""x"": 397639.67449999973, ""y"": 3990549.2974999..."
1456029,1456030,239922,2023/06/24 17:30:46,"{""x"": 397638.59970000014, ""y"": 3990549.0776000..."
1456030,1456031,239922,2023/06/24 17:31:01,"{""x"": 397637.48840000015, ""y"": 3990550.9037999..."


In [8]:
# Use Convert Time Field to convert “ltime” to Date format (labeling may vary). Input time format may vary by device.

in_table = "processed_points"
input_time_field = time
input_time_format = "yyyy/MM/dd HH:mm:ss"
output_time_field = "ltime_converted"

arcpy.management.ConvertTimeField(in_table, input_time_field, input_time_format, output_time_field)

In [9]:
# Clip new merged shapefile using a polygon of the study area (with the survey locations cut out of the polygon – use continue feature tool)

in_features = "processed_points"
clip_features = destination_boundary
out_feature_class = "processed_points_clip"

arcpy.analysis.Clip(in_features, clip_features, out_feature_class)

In [10]:
## Remove points around dropbox using 150m buffer

in_features = "processed_points_clip"
erase_features = dropbox_buffer
out_feature_class = "processed_points_clip_dropbox"

arcpy.analysis.PairwiseErase(in_features, erase_features, out_feature_class)

In [11]:
###Remove points around cabin using 75m buffer

in_features = "processed_points_clip_dropbox"
erase_features = download_buffer
out_feature_class = "processed_points_clip_dropbox_cabin"

arcpy.analysis.PairwiseErase(in_features, erase_features, out_feature_class)

Section 2: The preceding section provides the highest level of data cleaning, however, the subsequent sections allow the researcher to dig deeper into errors that may occur from working with GPS devices. Section 2 relies on automatic commands to systematically remove points from likely errors (e.g., devices being turned off, noise). It will require defining assumptions that are pertinent to the set up of your GPS devices and reasonable behaviors of your participants.

In [12]:
# Use the Points to Track Segments tool to create a new set of features where all points are converted to track segments. Use the participant identification number (in this case, “tident”) as the group field. Deselect Error On Duplicate Timestamps.

in_features = "processed_points_clip_dropbox_cabin"
date_field = "ltime_converted"
out_feature_class = "processed_lines"
group_field = GID
include_velocity = "INCLUDE_VELOCITY"
error_on_duplicate_timestamps = "ALLOW_DUPLICATE_TIMESTAMPS"

arcpy.intelligence.PointsToTrackSegments(in_features, date_field, out_feature_class, group_field, error_on_duplicate_timestamps = "ALLOW_DUPLICATE_TIMESTAMPS")

In [13]:
##Assumptions:

##The GPS devices used in this study relied on 15 second intervals and the fastest method for travel were by car', therefore, we assumed that distances between points above 465 meters (i.e., 70 mph) were likely due to error. Change the field preceding the ">" sign to match the field indicating distance between points.

speed_assumption = "distance_m > 465"

##The GPS devices used in this study relied on 15 second intervals, therefore, we assumed that recordings with intervals above 20 seconds were indicators of error. Change the field preceding the ">" sign to match the field indicating time between points.

time_assumption = "dt_sec > 20"

In [14]:
# Select all attributes (line segments) with a distance greater than 465 meters (traveling greater than 70 miles per hour). Delete the selection.

in_layer_or_view = "processed_lines"
selection_type = "NEW_SELECTION"
where_clause = speed_assumption

Distance_Selection = arcpy.management.SelectLayerByAttribute(in_layer_or_view, selection_type, where_clause)


##Delete selected attributes.
arcpy.management.DeleteFeatures(Distance_Selection)

In [15]:
# Create new text fields equal to your group identification plus the unique point identification for each entry for further examination. First make sure your fields, such as identification, are text. These expressions should occur by default in your data by now, but revise them (i.e., the group ID, object ID) if they do not.

in_table = "processed_lines"
field = "Text_OID"
expression = "!OBJECTID!"

arcpy.management.CalculateField(in_table, field, expression, field_type = "TEXT")

##Now make a field that combines Group Identification+OBJECTID.

in_table = "processed_lines"
field = "ID_2"
expression = "!group_id! + !Text_OID!"

arcpy.management.CalculateField(in_table, field, expression)

###Remove Unnecessary Fields (again)

in_table = "processed_lines"
fields = ["Text_OID"]

arcpy.DeleteField_management(in_table, fields)

In [16]:
# Use the Feature Vertices to Points tool to transform the line segments of participant feature classes to points. Again, stay working in the same geodatabase. Use start vertex as the Point Type. Add the new point-based feature classes to the map.

in_features = "processed_lines"
out_feature_class = "processed_points"
point_location = "START"

arcpy.management.FeatureVerticesToPoints(in_features, out_feature_class, point_location)

In [17]:
# Use select by attributes to delete all rows (points) containing duration values greater than 20. This won't remove all errors, but it is a good start.

in_layer_or_view = "processed_points"
selection_type = "NEW_SELECTION"
where_clause = time_assumption

Time_Selection = arcpy.management.SelectLayerByAttribute(in_layer_or_view, selection_type, where_clause)

arcpy.management.DeleteFeatures(Time_Selection)

Section 3: The preceding section is helpful for automatic removal of GPS tracks within ArcGIS Pro. However, GPS data can be noisy and may require some manual effort to identify lingering errors. For example, in the Grand Canyon study, some people left the study site for some time (e.g., to shop, camp outside the park boundaries) and the returned. Their data is still valuable, however, it could be mixed up with other data that is from device error (e.g., devices that recorded points in illogical destinations due to noise or topography), user error (e.g., a participant who appeared to have a device on them as they did a helicopter tour of the park) or researcher error (e.g., points recorded on GPS devices retrieved from visitor centers). The code below is intended to facilitate manual examination of data points that express unique or illogical behavior to help determine if they are valuable or warrant removal because they are derived from some form of error.

In [18]:
##Parse .csv for analysis and create data labels for clarity indicating the labels for points, participant identification, and field for time

processed_points_2 = os.path.join(target_gdb, "processed_points")

df = pd.DataFrame.spatial.from_featureclass(processed_points_2)
df.rename(columns={'OBJECTID':'FID','group_id': 'GID', 'd_start':'dtime'},inplace=True)

##Convert time to proper time field format
df['dtime'] = pd.to_datetime(df['dtime'])

##Sort data by field and ID
df.sort_values(by=['GID', 'dtime'], inplace=True)
df

##Dataframe copy

seq_timedf = df.copy()
seq_timedf

Unnamed: 0,FID,dtime,d_start_s,d_end,d_end_s,dt_sec,dt_min,distance_m,oid_start,oid_end,speed_mps,speed_mph,speed_kph,speed_knt,GID,ID_2,ORIG_FID,SHAPE
120,121,2023-05-23 09:43:05.000000,2023-05-23 09:43:05 AM,2023-05-23 09:43:20.000000,2023-05-23 09:43:20 AM,15,0.25,175.969048,318011,317746,11.73127,26.242147,42.232571,22.803712,010112,010112121,121,"{""x"": 425322.9512, ""y"": 3988475.3912000004, ""s..."
123,124,2023-05-23 09:43:20.000000,2023-05-23 09:43:20 AM,2023-05-23 09:43:35.000002,2023-05-23 09:43:35 AM,15,0.25,252.31622,317746,317011,16.821081,37.62775,60.555893,32.697491,010112,010112124,124,"{""x"": 425241.1732999999, ""y"": 3988631.13770000..."
126,127,2023-05-23 09:43:35.000002,2023-05-23 09:43:35 AM,2023-05-23 09:43:50.000000,2023-05-23 09:43:50 AM,14,0.233333,294.266755,317011,316746,21.019054,47.018362,75.668594,40.857678,010112,010112127,127,"{""x"": 424998.49340000004, ""y"": 3988562.3785999..."
129,130,2023-05-23 09:43:50.000000,2023-05-23 09:43:50 AM,2023-05-23 09:44:05.000000,2023-05-23 09:44:05 AM,15,0.25,194.721218,316746,316207,12.981415,29.038645,46.733092,25.233793,010112,010112130,130,"{""x"": 424826.56900000013, ""y"": 3988323.6786, ""..."
132,133,2023-05-23 09:44:05.000000,2023-05-23 09:44:05 AM,2023-05-23 09:44:20.000000,2023-05-23 09:44:20 AM,15,0.25,231.525945,316207,311746,15.435063,34.52731,55.566227,30.003293,010112,010112133,133,"{""x"": 424646.99230000004, ""y"": 3988398.8019999..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
119342,123004,2023-05-30 17:03:58.000000,2023-05-30 05:03:58 PM,2023-05-30 17:04:13.000002,2023-05-30 05:04:13 PM,15,0.25,287.925486,6798,6799,19.195032,42.938136,69.102117,37.312072,Current Track: 30 MAY 2,Current Track: 30 MAY 2126042,126042,"{""x"": 398834.1743000001, ""y"": 3988251.39489999..."
119345,123007,2023-05-30 17:04:13.000002,2023-05-30 05:04:13 PM,2023-05-30 17:04:28.000000,2023-05-30 05:04:28 PM,14,0.233333,294.047063,6799,6164,21.003362,46.98326,75.612102,40.827174,Current Track: 30 MAY 2,Current Track: 30 MAY 2126045,126045,"{""x"": 398816.4205, ""y"": 3987964.0963000003, ""s..."
119348,123010,2023-05-30 17:04:28.000000,2023-05-30 05:04:28 PM,2023-05-30 17:04:43.000000,2023-05-30 05:04:43 PM,15,0.25,282.096477,6164,6165,18.806432,42.06886,67.703155,36.556694,Current Track: 30 MAY 2,Current Track: 30 MAY 2126048,126048,"{""x"": 398812.02660000045, ""y"": 3987670.1625999..."
119351,123013,2023-05-30 17:04:43.000000,2023-05-30 05:04:43 PM,2023-05-30 17:04:58.000000,2023-05-30 05:04:58 PM,15,0.25,303.484176,6165,5228,20.232278,45.258393,72.836202,39.328312,Current Track: 30 MAY 2,Current Track: 30 MAY 2126051,126051,"{""x"": 398819.33550000004, ""y"": 3987388.2380999..."


In [19]:
##Use the shift command to record the subsequently recorded point for each point from each participant

seq_timedf.sort_values(by=['GID', 'dtime'], inplace=True)
seq_timedf[['GID_2', 'dtime_2']] = seq_timedf.groupby('GID')[['GID', 'dtime']].shift(-1)
seq_timedf['GID_2'] = seq_timedf['GID_2'].astype(str).str.split('.0')
seq_timedf['GID_2'] = seq_timedf['GID_2'].str.get(0)
seq_timedf

Unnamed: 0,FID,dtime,d_start_s,d_end,d_end_s,dt_sec,dt_min,distance_m,oid_start,oid_end,speed_mps,speed_mph,speed_kph,speed_knt,GID,ID_2,ORIG_FID,SHAPE,GID_2,dtime_2
120,121,2023-05-23 09:43:05.000000,2023-05-23 09:43:05 AM,2023-05-23 09:43:20.000000,2023-05-23 09:43:20 AM,15,0.25,175.969048,318011,317746,11.73127,26.242147,42.232571,22.803712,010112,010112121,121,"{""x"": 425322.9512, ""y"": 3988475.3912000004, ""s...",0,2023-05-23 09:43:20.000000
123,124,2023-05-23 09:43:20.000000,2023-05-23 09:43:20 AM,2023-05-23 09:43:35.000002,2023-05-23 09:43:35 AM,15,0.25,252.31622,317746,317011,16.821081,37.62775,60.555893,32.697491,010112,010112124,124,"{""x"": 425241.1732999999, ""y"": 3988631.13770000...",0,2023-05-23 09:43:35.000002
126,127,2023-05-23 09:43:35.000002,2023-05-23 09:43:35 AM,2023-05-23 09:43:50.000000,2023-05-23 09:43:50 AM,14,0.233333,294.266755,317011,316746,21.019054,47.018362,75.668594,40.857678,010112,010112127,127,"{""x"": 424998.49340000004, ""y"": 3988562.3785999...",0,2023-05-23 09:43:50.000000
129,130,2023-05-23 09:43:50.000000,2023-05-23 09:43:50 AM,2023-05-23 09:44:05.000000,2023-05-23 09:44:05 AM,15,0.25,194.721218,316746,316207,12.981415,29.038645,46.733092,25.233793,010112,010112130,130,"{""x"": 424826.56900000013, ""y"": 3988323.6786, ""...",0,2023-05-23 09:44:05.000000
132,133,2023-05-23 09:44:05.000000,2023-05-23 09:44:05 AM,2023-05-23 09:44:20.000000,2023-05-23 09:44:20 AM,15,0.25,231.525945,316207,311746,15.435063,34.52731,55.566227,30.003293,010112,010112133,133,"{""x"": 424646.99230000004, ""y"": 3988398.8019999...",0,2023-05-23 09:44:20.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
119342,123004,2023-05-30 17:03:58.000000,2023-05-30 05:03:58 PM,2023-05-30 17:04:13.000002,2023-05-30 05:04:13 PM,15,0.25,287.925486,6798,6799,19.195032,42.938136,69.102117,37.312072,Current Track: 30 MAY 2,Current Track: 30 MAY 2126042,126042,"{""x"": 398834.1743000001, ""y"": 3988251.39489999...",Current Track:,2023-05-30 17:04:13.000002
119345,123007,2023-05-30 17:04:13.000002,2023-05-30 05:04:13 PM,2023-05-30 17:04:28.000000,2023-05-30 05:04:28 PM,14,0.233333,294.047063,6799,6164,21.003362,46.98326,75.612102,40.827174,Current Track: 30 MAY 2,Current Track: 30 MAY 2126045,126045,"{""x"": 398816.4205, ""y"": 3987964.0963000003, ""s...",Current Track:,2023-05-30 17:04:28.000000
119348,123010,2023-05-30 17:04:28.000000,2023-05-30 05:04:28 PM,2023-05-30 17:04:43.000000,2023-05-30 05:04:43 PM,15,0.25,282.096477,6164,6165,18.806432,42.06886,67.703155,36.556694,Current Track: 30 MAY 2,Current Track: 30 MAY 2126048,126048,"{""x"": 398812.02660000045, ""y"": 3987670.1625999...",Current Track:,2023-05-30 17:04:43.000000
119351,123013,2023-05-30 17:04:43.000000,2023-05-30 05:04:43 PM,2023-05-30 17:04:58.000000,2023-05-30 05:04:58 PM,15,0.25,303.484176,6165,5228,20.232278,45.258393,72.836202,39.328312,Current Track: 30 MAY 2,Current Track: 30 MAY 2126051,126051,"{""x"": 398819.33550000004, ""y"": 3987388.2380999...",Current Track:,2023-05-30 17:04:58.000000


In [20]:
##Identify a length of time between points that is likely to have occurred because of error, manually readjusting based on the subsequent steps until you consistently stop identifying errors. In our case, this was 16 hours.

seq_timedf2 = seq_timedf.copy()
seq_timedf2['POI_timedelta'] = seq_timedf2['dtime_2'] - seq_timedf2['dtime']
seq_time_filt = seq_timedf2['POI_timedelta'] > '0 days 16:00:00'

##Createa field to serve as a filter to identify when this occurs for each recorded point
seq_timedf2['POI_timedelta_filt'] = seq_time_filt
seq_timedf2['POI_timedelta_filt'] = seq_timedf2['POI_timedelta_filt'].astype(int)
seq_timedf2['POI_timedelta_filt'] = seq_timedf2.groupby('GID')['POI_timedelta_filt'].cumsum()
seq_timedf2['GID_2'] = seq_timedf2['GID'].astype(str)+'.'+seq_timedf2['POI_timedelta_filt'].astype(str)
seq_timedf2

Unnamed: 0,FID,dtime,d_start_s,d_end,d_end_s,dt_sec,dt_min,distance_m,oid_start,oid_end,...,speed_kph,speed_knt,GID,ID_2,ORIG_FID,SHAPE,GID_2,dtime_2,POI_timedelta,POI_timedelta_filt
120,121,2023-05-23 09:43:05.000000,2023-05-23 09:43:05 AM,2023-05-23 09:43:20.000000,2023-05-23 09:43:20 AM,15,0.25,175.969048,318011,317746,...,42.232571,22.803712,010112,010112121,121,"{""x"": 425322.9512, ""y"": 3988475.3912000004, ""s...",010112.0,2023-05-23 09:43:20.000000,0 days 00:00:15,0
123,124,2023-05-23 09:43:20.000000,2023-05-23 09:43:20 AM,2023-05-23 09:43:35.000002,2023-05-23 09:43:35 AM,15,0.25,252.31622,317746,317011,...,60.555893,32.697491,010112,010112124,124,"{""x"": 425241.1732999999, ""y"": 3988631.13770000...",010112.0,2023-05-23 09:43:35.000002,0 days 00:00:15.000002,0
126,127,2023-05-23 09:43:35.000002,2023-05-23 09:43:35 AM,2023-05-23 09:43:50.000000,2023-05-23 09:43:50 AM,14,0.233333,294.266755,317011,316746,...,75.668594,40.857678,010112,010112127,127,"{""x"": 424998.49340000004, ""y"": 3988562.3785999...",010112.0,2023-05-23 09:43:50.000000,0 days 00:00:14.999998,0
129,130,2023-05-23 09:43:50.000000,2023-05-23 09:43:50 AM,2023-05-23 09:44:05.000000,2023-05-23 09:44:05 AM,15,0.25,194.721218,316746,316207,...,46.733092,25.233793,010112,010112130,130,"{""x"": 424826.56900000013, ""y"": 3988323.6786, ""...",010112.0,2023-05-23 09:44:05.000000,0 days 00:00:15,0
132,133,2023-05-23 09:44:05.000000,2023-05-23 09:44:05 AM,2023-05-23 09:44:20.000000,2023-05-23 09:44:20 AM,15,0.25,231.525945,316207,311746,...,55.566227,30.003293,010112,010112133,133,"{""x"": 424646.99230000004, ""y"": 3988398.8019999...",010112.0,2023-05-23 09:44:20.000000,0 days 00:00:15,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
119342,123004,2023-05-30 17:03:58.000000,2023-05-30 05:03:58 PM,2023-05-30 17:04:13.000002,2023-05-30 05:04:13 PM,15,0.25,287.925486,6798,6799,...,69.102117,37.312072,Current Track: 30 MAY 2,Current Track: 30 MAY 2126042,126042,"{""x"": 398834.1743000001, ""y"": 3988251.39489999...",Current Track: 30 MAY 2.0,2023-05-30 17:04:13.000002,0 days 00:00:15.000002,0
119345,123007,2023-05-30 17:04:13.000002,2023-05-30 05:04:13 PM,2023-05-30 17:04:28.000000,2023-05-30 05:04:28 PM,14,0.233333,294.047063,6799,6164,...,75.612102,40.827174,Current Track: 30 MAY 2,Current Track: 30 MAY 2126045,126045,"{""x"": 398816.4205, ""y"": 3987964.0963000003, ""s...",Current Track: 30 MAY 2.0,2023-05-30 17:04:28.000000,0 days 00:00:14.999998,0
119348,123010,2023-05-30 17:04:28.000000,2023-05-30 05:04:28 PM,2023-05-30 17:04:43.000000,2023-05-30 05:04:43 PM,15,0.25,282.096477,6164,6165,...,67.703155,36.556694,Current Track: 30 MAY 2,Current Track: 30 MAY 2126048,126048,"{""x"": 398812.02660000045, ""y"": 3987670.1625999...",Current Track: 30 MAY 2.0,2023-05-30 17:04:43.000000,0 days 00:00:15,0
119351,123013,2023-05-30 17:04:43.000000,2023-05-30 05:04:43 PM,2023-05-30 17:04:58.000000,2023-05-30 05:04:58 PM,15,0.25,303.484176,6165,5228,...,72.836202,39.328312,Current Track: 30 MAY 2,Current Track: 30 MAY 2126051,126051,"{""x"": 398819.33550000004, ""y"": 3987388.2380999...",Current Track: 30 MAY 2.0,2023-05-30 17:04:58.000000,0 days 00:00:15,0


In [21]:
##Create dataframe containing filter for time delta for manual evaluation

seq_time_filt2 = seq_time_filt==True
df10 = seq_timedf2.copy()
df10 = pd.DataFrame(df10.loc[seq_time_filt2])
df10

Unnamed: 0,FID,dtime,d_start_s,d_end,d_end_s,dt_sec,dt_min,distance_m,oid_start,oid_end,...,speed_kph,speed_knt,GID,ID_2,ORIG_FID,SHAPE,GID_2,dtime_2,POI_timedelta,POI_timedelta_filt
34407,36425,2023-05-24 07:57:32.000000,2023-05-24 07:57:32 AM,2023-05-24 07:57:47.000000,2023-05-24 07:57:47 AM,15,0.25,0.175203,127008,127009,...,0.042049,0.022704,010812,01081236451,36451,"{""x"": 399288.4274000004, ""y"": 3990139.48399999...",010812.1,2023-05-25 08:02:54.000000,1 days 00:05:22,1
16102,16530,2023-05-23 15:09:04.000000,2023-05-23 03:09:04 PM,2023-05-23 15:09:19.000000,2023-05-23 03:09:19 PM,15,0.25,275.858711,97025,97026,...,66.206091,35.748346,015012,01501216544,16544,"{""x"": 398945.14630000014, ""y"": 3985072.8684, ""...",015012.1,2023-05-24 11:42:19.000002,0 days 20:33:15.000002,1
25033,25836,2023-05-23 18:40:22.000000,2023-05-23 06:40:22 PM,2023-05-23 18:40:40.000000,2023-05-23 06:40:40 PM,18,0.3,9.733768,137761,137011,...,1.946754,1.05116,015512,01551225861,25861,"{""x"": 399509.0744000003, ""y"": 3990058.9836, ""s...",015512.1,2023-05-24 12:19:30.000000,0 days 17:39:08,1
1661,1672,2023-05-23 10:47:21.000000,2023-05-23 10:47:21 AM,2023-05-23 10:47:36.000000,2023-05-23 10:47:36 AM,15,0.25,34.165912,281975,281976,...,8.199819,4.427538,016712,0167121672,1672,"{""x"": 412952.9735000003, ""y"": 3980319.8103, ""s...",016712.1,2023-05-24 10:43:06.000000,0 days 23:55:45,1
80439,83858,2023-05-28 17:38:28.000000,2023-05-28 05:38:28 PM,2023-05-28 17:38:39.000000,2023-05-28 05:38:39 PM,11,0.183333,187.457698,97124,97125,...,61.349792,33.126161,055522,05552285196,85196,"{""x"": 398940.4781999998, ""y"": 3985005.9911, ""s...",055522.1,2023-05-29 17:40:10.000002,1 days 00:01:42.000002,1
75584,78874,2023-05-28 15:01:17.000000,2023-05-28 03:01:17 PM,2023-05-28 15:01:32.000000,2023-05-28 03:01:32 PM,15,0.25,309.265574,97409,97129,...,74.223738,40.07752,059512,05951279254,79254,"{""x"": 398957.8125, ""y"": 3985174.9508999996, ""s...",059512.1,2023-05-29 17:40:17.000000,1 days 02:39:00,1
132517,136283,2023-06-03 16:40:42.000000,2023-06-03 04:40:42 PM,2023-06-03 16:40:57.000000,2023-06-03 04:40:57 PM,15,0.25,291.62944,97497,97142,...,69.991066,37.792065,091112,091112139328,139328,"{""x"": 398966.62600000016, ""y"": 3985243.603, ""s...",091112.1,2023-06-04 11:06:01.000000,0 days 18:25:19,1
186663,192997,2023-06-08 16:12:11.000002,2023-06-08 04:12:11 PM,2023-06-08 16:12:26.000000,2023-06-08 04:12:26 PM,14,0.233333,307.911854,97631,97160,...,79.177334,42.752241,131712,131712196082,196082,"{""x"": 398967.5290000001, ""y"": 3985121.78160000...",131712.1,2023-06-09 12:22:46.000000,0 days 20:10:34.999998,1
261136,268081,2023-06-14 16:41:37.000000,2023-06-14 04:41:37 PM,2023-06-14 16:41:52.000000,2023-06-14 04:41:52 PM,15,0.25,58.405264,317919,317920,...,14.017263,7.568699,170912,170912271221,271221,"{""x"": 425279.1261, ""y"": 3988629.2709, ""spatial...",170912.1,2023-06-15 13:04:53.000000,0 days 20:23:16,1
247477,253936,2023-06-14 12:00:20.000002,2023-06-14 12:00:20 PM,2023-06-14 12:00:35.000000,2023-06-14 12:00:35 PM,14,0.233333,317.228095,5345,5346,...,81.572939,44.045761,171712,171712257049,257049,"{""x"": 398821.19770000037, ""y"": 3987343.3882, ""...",171712.1,2023-06-15 14:32:19.000000,1 days 02:31:58.999998,1


Determine errant points and tracks based off of time and place that need to be deleted. Make your determination based on your best judgement due to (i) length of time from preceding points and (ii) visual examination of the points in ArcGIS. You can do this by looking the point field for seemingly errant points in ArcGIs and confirming if they indicate a spatial pattern that may result from error. For example, in our case, if points occurred near cabin or GPS device drop-off location/far away from preceding points in time and space.

To conduct the manual examination, follow the code blocks below for each participant ID that was identified above as being susceptible to error. Record the entire tracks and individual points that caused issues below. Depending on your comfort with Python, you can follow the code below to delete these tracks, or, if you prefer, you can delete them by hand in ArcGIS.

In [22]:
##Identify tracks with errors. This deeper dive may sometimes help you identify errors from other steps in your process. For example, technicians may save errant tracks or mislabel them. These tracks may entirely be the results of error entirely and warrant removal, whereas other times they simply were mislabeled. Below are mislabeled tracks for deletion with justification based on visual examination:

##Current Track: 22 Jun is 239812 and researcher error and should be deleted
##Current Track: 05 Jun is 111212 and researcher error and should be deleted

##Below are acceptable mislabeled tracks:
##Current Track: 28 May 2 is 059712
##Current Track: May 30 2 is 079812
##Current Track: 10 Jun 2 is 152212
##Current Track: 16 Jun 2 is 192212

In [23]:
##Identify points with errors.

##Points with issues final point at Main entrance
GID_last_points = [172012, 172112, 175812, 172412]

##Points with issues at final two points at Main entrance
GID_two_last_points =[179812]

##Points with issues at final three points at Main Entrance
GID_three_last_points = [171712, 172312]

##Points with issues at final four points at Main Entrance
GID_four_last_points = [175112]


##Questionable points:
##179912 has a series of points three days after the rest that are drive from GPD dropbox up desert view (don't end at cabins)
##059512 has a series of points from main entrance that lead to the cabin about a day from all other points
##055522 has a series of points from main entrance that lead to the cabin about a day from all other points
##231712 much of the device error around GPS dropbox

##Odd behaviors, but data seems worth retaining:
##010812 has lapses and stays in a cabin for a long time but doesn't seem like researcher error
##016712 appears to have camped outside the park and returned
##176312 appears to have camped outside the park and returned
##015012 appears to have camped outside the park and returned
##170912 appears to have camped outside the park and returned
##131712 appears to have camped outside the park and returned
##091112 appears to have camped outside the park and returned
##015512 odd lag/error with normal behavior before and after (perhaps device turned off while in campground)
##177012 appears to have camped outside the park and returned
odd_data = ["010812", "016712", "176312", "015012", "170912", "131712", "091112", "015512","177012"]

##Note on when cleaning stopped: Inspection ended at timedelta of 16:35 because . Therefore, all GIDs with time delta greater than that were inspected in this dataset.

In [24]:
###Method for searching for errors in GID. Use the code below to create a dataframe listing all points from participant identifications flagged with having potential error individually. Open ArcGIs and use the "Search by Attribute Function" to highlight all points from a participant identification. Visually examine all points to identify a pattern that may explain the apparent error. Determine if the pattern suggests behavior worth retaining (e.g., visitor left the destination but later returned) and record it above for record. If the pattern appears to reveal an error instead (e.g., one or two points appear at the end of the track in a distant location, the device appears to have been dropped off at the dropbox and recorded the researcher driving through the destination), then record it appropriately above.

error_GID = "177012"
seq_timedf10 = df.copy()
seq_timedf_10 = seq_timedf10['GID'] == error_GID
seq_timedf_10 = pd.DataFrame(seq_timedf10.loc[seq_timedf_10])
seq_timedf_10

Unnamed: 0,FID,dtime,d_start_s,d_end,d_end_s,dt_sec,dt_min,distance_m,oid_start,oid_end,speed_mps,speed_mph,speed_kph,speed_knt,GID,ID_2,ORIG_FID,SHAPE
245818,252263,2023-06-14 11:37:38.000000,2023-06-14 11:37:38 AM,2023-06-14 11:37:53.000000,2023-06-14 11:37:53 AM,15,0.25,109.994881,318557,318558,7.332992,16.403463,26.398771,14.254163,177012,177012255376,255376,"{""x"": 425325.73220000044, ""y"": 3988545.7146000..."
245836,252281,2023-06-14 11:37:53.000000,2023-06-14 11:37:53 AM,2023-06-14 11:38:08.000000,2023-06-14 11:38:08 AM,15,0.25,145.989589,318558,319428,9.732639,21.77133,35.037501,18.918693,177012,177012255394,255394,"{""x"": 425301.44600000046, ""y"": 3988652.9574999..."
245854,252299,2023-06-14 11:38:08.000000,2023-06-14 11:38:08 AM,2023-06-14 11:38:23.000000,2023-06-14 11:38:23 AM,15,0.25,123.005572,319428,324120,8.200371,18.343739,29.521337,15.94021,177012,177012255412,255412,"{""x"": 425388.2432000004, ""y"": 3988770.28219999..."
245872,252317,2023-06-14 11:38:23.000000,2023-06-14 11:38:23 AM,2023-06-14 11:38:38.000000,2023-06-14 11:38:38 AM,15,0.25,32.6954,324120,325474,2.179693,4.875843,7.846896,4.236975,177012,177012255430,255430,"{""x"": 425467.7326999996, ""y"": 3988864.0999, ""s..."
245890,252335,2023-06-14 11:38:38.000000,2023-06-14 11:38:38 AM,2023-06-14 11:38:53.000000,2023-06-14 11:38:53 AM,15,0.25,40.222481,325474,333662,2.681499,5.998352,9.653395,5.212404,177012,177012255448,255448,"{""x"": 425497.8269999996, ""y"": 3988876.8519, ""s..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
274137,281172,2023-06-15 14:29:23.000000,2023-06-15 02:29:23 PM,2023-06-15 14:29:38.000002,2023-06-15 02:29:38 PM,15,0.25,331.555486,6338,6339,22.103699,49.444649,79.573317,42.966054,177012,177012284325,284325,"{""x"": 398812.6365, ""y"": 3987783.3194999993, ""s..."
274139,281174,2023-06-15 14:29:38.000002,2023-06-15 02:29:38 PM,2023-06-15 14:29:53.000000,2023-06-15 02:29:53 PM,14,0.233333,348.520783,6339,5381,24.894342,55.687149,89.61963,48.390617,177012,177012284327,284327,"{""x"": 398817.8108999999, ""y"": 3987451.89519999..."
274141,281176,2023-06-15 14:29:53.000000,2023-06-15 02:29:53 PM,2023-06-15 14:30:08.000000,2023-06-15 02:30:08 PM,15,0.25,330.365918,5381,5382,22.024395,49.267249,79.28782,42.811899,177012,177012284329,284329,"{""x"": 398831.4801000003, ""y"": 3987103.73809999..."
274148,281183,2023-06-15 14:32:23.000000,2023-06-15 02:32:23 PM,2023-06-15 14:32:38.000002,2023-06-15 02:32:38 PM,15,0.25,300.496389,97807,97808,20.033093,44.812826,72.119133,38.941127,177012,177012284337,284337,"{""x"": 399007.3848000001, ""y"": 3986329.22419999..."


In [25]:
##Create dataframe of all values that come after large time delta with odd data excluded (i.e., data that was not from error, but odd user behavior), which will then inherently only retain data with errors.

df11 = seq_timedf2.copy()
seq_time_filt3 = df11["POI_timedelta_filt"] == 1
df11 = df11.loc[seq_time_filt3]
df11 = df11.groupby('GID',group_keys=False).apply(lambda x:x[1:])
df11 = df11[~df11["GID"].isin(odd_data)]
df11["GID"].value_counts()

179912                     2529
055522                       33
059512                       21
231712                       14
175112                        4
171712                        3
172312                        3
Current Track: 22 JUN 2       3
179812                        2
172012                        1
172112                        1
172412                        1
175812                        1
Name: GID, dtype: Int64

In [26]:
##Create a dataframe of points that should be deleted.

Delete_Points = pd.DataFrame(df11[['FID','SHAPE']])
Delete_Points

Unnamed: 0,FID,SHAPE
99616,103050,"{""x"": 398931.87959999964, ""y"": 3984743.9547000..."
99619,103053,"{""x"": 398943.8317999998, ""y"": 3984988.8246, ""s..."
99622,103056,"{""x"": 398972.0845999997, ""y"": 3985250.4955, ""s..."
99625,103059,"{""x"": 399010.15780000016, ""y"": 3985485.2457999..."
99628,103062,"{""x"": 399045.32239999995, ""y"": 3985812.1413000..."
...,...,...
350224,360549,"{""x"": 399057.4068, ""y"": 3986551.712200001, ""sp..."
350226,360552,"{""x"": 399059.1699000001, ""y"": 3986549.33940000..."
351459,362255,"{""x"": 399043.1338999998, ""y"": 3986555.37260000..."
351460,362256,"{""x"": 399066.91530000046, ""y"": 3986546.2654999..."


In [27]:
##Create a dataframe of GIDs that need to be deleted (i.e., full participant tracks)

Bad_GID = df.copy()
Bad_GIDs = ["Current Track: 22 JUN 2","Current Track: 05 JUN 2"]
Bad_GID2 = pd.DataFrame(Bad_GID[Bad_GID["GID"].isin(Bad_GIDs)])
Bad_GID2

Unnamed: 0,FID,dtime,d_start_s,d_end,d_end_s,dt_sec,dt_min,distance_m,oid_start,oid_end,speed_mps,speed_mph,speed_kph,speed_knt,GID,ID_2,ORIG_FID,SHAPE
178006,184126,2023-06-06 21:15:10.000002,2023-06-06 09:15:10 PM,2023-06-06 21:15:25,2023-06-06 09:15:25 PM,14,0.233333,184.983593,317869,317358,13.213114,29.556943,47.56721,25.684179,Current Track: 05 JUN 2,Current Track: 05 JUN 2187196,187196,"{""x"": 425304.7335000001, ""y"": 3988429.78250000..."
178007,184127,2023-06-06 21:15:25.000000,2023-06-06 09:15:25 PM,2023-06-06 21:15:40,2023-06-06 09:15:40 PM,15,0.25,267.285187,317358,317359,17.819012,39.860062,64.148445,34.637309,Current Track: 05 JUN 2,Current Track: 05 JUN 2187197,187197,"{""x"": 425268.85309999995, ""y"": 3988611.1905000..."
178008,184128,2023-06-06 21:15:40.000000,2023-06-06 09:15:40 PM,2023-06-06 21:15:55,2023-06-06 09:15:55 PM,15,0.25,277.03749,317359,316874,18.469166,41.314416,66.488998,35.901104,Current Track: 05 JUN 2,Current Track: 05 JUN 2187198,187198,"{""x"": 425005.6963999998, ""y"": 3988564.90179999..."
178009,184129,2023-06-06 21:15:55.000000,2023-06-06 09:15:55 PM,2023-06-06 21:16:10,2023-06-06 09:16:10 PM,15,0.25,169.947093,316874,316519,11.329806,25.344097,40.787302,22.023331,Current Track: 05 JUN 2,Current Track: 05 JUN 2187199,187199,"{""x"": 424842.26159999985, ""y"": 3988341.3215999..."
178010,184130,2023-06-06 21:16:10.000000,2023-06-06 09:16:10 PM,2023-06-06 21:16:25,2023-06-06 09:16:25 PM,15,0.25,192.130733,316519,312443,12.808716,28.652328,46.111376,24.898094,Current Track: 05 JUN 2,Current Track: 05 JUN 2187200,187200,"{""x"": 424672.69489999954, ""y"": 3988351.8133000..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
343197,351582,2023-06-22 17:01:54.000000,2023-06-22 05:01:54 PM,2023-06-22 17:02:09,2023-06-22 05:02:09 PM,15,0.25,304.700704,6438,5558,20.31338,45.439813,73.128169,39.485961,Current Track: 22 JUN 2,Current Track: 22 JUN 2354784,354784,"{""x"": 398813.0422, ""y"": 3987621.3333, ""spatial..."
343202,351587,2023-06-22 17:02:09.000000,2023-06-22 05:02:09 PM,2023-06-22 17:02:24,2023-06-22 05:02:24 PM,15,0.25,295.932619,5558,5559,19.728841,44.132234,71.023829,38.349711,Current Track: 22 JUN 2,Current Track: 22 JUN 2354789,354789,"{""x"": 398822.0549999997, ""y"": 3987316.84940000..."
351459,362255,2023-06-24 11:16:53.000000,2023-06-24 11:16:53 AM,2023-06-24 11:17:08,2023-06-24 11:17:08 AM,15,0.25,25.472541,97015,97978,1.698169,3.798703,6.11341,3.30097,Current Track: 22 JUN 2,Current Track: 22 JUN 2365466,365466,"{""x"": 399043.1338999998, ""y"": 3986555.37260000..."
351460,362256,2023-06-24 11:17:08.000000,2023-06-24 11:17:08 AM,2023-06-24 11:17:23,2023-06-24 11:17:23 AM,15,0.25,25.389538,97978,97979,1.692636,3.786325,6.093489,3.290213,Current Track: 22 JUN 2,Current Track: 22 JUN 2365467,365467,"{""x"": 399066.91530000046, ""y"": 3986546.2654999..."


In [28]:
##Simplify the dataframe

Bad_GID3 = pd.DataFrame(Bad_GID2[['FID','SHAPE']])
Bad_GID3

Unnamed: 0,FID,SHAPE
178006,184126,"{""x"": 425304.7335000001, ""y"": 3988429.78250000..."
178007,184127,"{""x"": 425268.85309999995, ""y"": 3988611.1905000..."
178008,184128,"{""x"": 425005.6963999998, ""y"": 3988564.90179999..."
178009,184129,"{""x"": 424842.26159999985, ""y"": 3988341.3215999..."
178010,184130,"{""x"": 424672.69489999954, ""y"": 3988351.8133000..."
...,...,...
343197,351582,"{""x"": 398813.0422, ""y"": 3987621.3333, ""spatial..."
343202,351587,"{""x"": 398822.0549999997, ""y"": 3987316.84940000..."
351459,362255,"{""x"": 399043.1338999998, ""y"": 3986555.37260000..."
351460,362256,"{""x"": 399066.91530000046, ""y"": 3986546.2654999..."


In [29]:
##Concatentate dataframes with points/tracks that must be deleted from points layer

Delete_Points2 = pd.concat([Delete_Points, Bad_GID3])

In [30]:
##Import of dataframe to ArcGIS geodatabase

Delete_Points2.spatial.to_featureclass(location=os.path.join(target_gdb, "delete_points"))

'C:\\Users\\colby\\OneDrive - The Pennsylvania State University\\Documents\\ArcGIS\\Projects\\TTRA_Presentation\\TTRA_Presentation.gdb\\delete_points'

In [31]:
##Select the overlapping points between the original shapefile and the shapefile with the points to delete

in_layer = "processed_points"
overlap_type = "INTERSECT"
select_features = "delete_points"

Error_Selection = arcpy.management.SelectLayerByLocation(in_layer, overlap_type, select_features,selection_type = "NEW_SELECTION")

##Delete the points

arcpy.management.DeleteFeatures(Error_Selection)

In [32]:
##With the data downloaded, do some final house cleaning by examining descriptive attributes

processed_points_3 = pd.DataFrame.spatial.from_featureclass(os.path.join(target_gdb, "processed_points"))
processed_points_3.head()

Unnamed: 0,OBJECTID,d_start,d_start_s,d_end,d_end_s,dt_sec,dt_min,distance_m,oid_start,oid_end,speed_mps,speed_mph,speed_kph,speed_knt,group_id,ID_2,ORIG_FID,SHAPE
0,1,2023-05-23 09:20:06.000002,2023-05-23 09:20:06 AM,2023-05-23 09:20:21,2023-05-23 09:20:21 AM,14,0.233333,148.485239,318087,317083,10.606089,23.725184,38.181919,20.616539,16712,167121,1,"{""x"": 425325.01520000026, ""y"": 3988489.8688999..."
1,2,2023-05-23 09:20:21.000000,2023-05-23 09:20:21 AM,2023-05-23 09:20:36,2023-05-23 09:20:36 AM,15,0.25,167.024317,317083,317084,11.134954,24.908225,40.085836,21.64457,16712,167122,2,"{""x"": 425258.99170000013, ""y"": 3988622.8131000..."
2,3,2023-05-23 09:20:36.000000,2023-05-23 09:20:36 AM,2023-05-23 09:20:51,2023-05-23 09:20:51 AM,15,0.25,206.065967,317084,317085,13.737731,30.73048,49.455832,26.703951,16712,167123,3,"{""x"": 425092.5296, ""y"": 3988609.8121000007, ""s..."
3,4,2023-05-23 09:20:51.000000,2023-05-23 09:20:51 AM,2023-05-23 09:21:06,2023-05-23 09:21:06 AM,15,0.25,226.657126,317085,316774,15.110475,33.801226,54.39771,29.372346,16712,167124,4,"{""x"": 424924.0088999998, ""y"": 3988491.33899999..."
4,5,2023-05-23 09:21:06.000000,2023-05-23 09:21:06 AM,2023-05-23 09:21:21,2023-05-23 09:21:21 AM,15,0.25,172.77394,316774,316275,11.518263,25.765663,41.465746,22.38966,16712,167125,5,"{""x"": 424796.6535, ""y"": 3988303.9354, ""spatial..."


In [33]:
##Update fields as needed to proper labeling scheme

processed_points_3.loc[processed_points_3["group_id"] == "Current Track: 28 MAY 2", "group_id"] = "059712"
processed_points_3.loc[processed_points_3["group_id"] == "Current Track: 10 JUN 2", "group_id"] = "152212"
processed_points_3.loc[processed_points_3["group_id"] == "Current Track: 30 MAY 2", "group_id"] = "079812"
processed_points_3.loc[processed_points_3["group_id"] == "Current Track: 16 JUN 2", "group_id"] = "192212"

In [34]:
##Check trip length for consistency (both minimumum and maximum.

trip_length = processed_points_3.groupby('group_id')['d_start'].agg(['min', 'max'])

# Compute the difference in days between max and min 'd_start'
trip_length['trip_length'] = (trip_length['max'] - trip_length['min'])

trip_length.sort_values(by='trip_length')

Unnamed: 0_level_0,min,max,trip_length
group_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
155012,2023-06-10 10:50:45.000000,2023-06-10 10:51:30,0 days 00:00:45
095612,2023-06-03 16:11:31.000000,2023-06-03 16:19:01,0 days 00:07:30
191912,2023-06-16 16:36:35.000000,2023-06-16 17:04:10,0 days 00:27:35
029812,2023-05-26 11:05:32.000000,2023-05-26 11:38:47,0 days 00:33:15
175012,2023-06-14 11:07:58.000000,2023-06-14 11:54:13,0 days 00:46:15
...,...,...,...
027212,2023-05-26 09:35:16.000000,2023-05-28 02:54:16,1 days 17:19:00
215112,2023-06-20 12:54:41.000000,2023-06-22 06:38:18,1 days 17:43:37
052412,2023-05-28 16:59:01.000002,2023-05-30 11:19:51,1 days 18:20:49.999998
136312,2023-06-08 17:23:34.000000,2023-06-10 11:45:07,1 days 18:21:33


In [35]:
# Delete illogically long or short trips after examining them in ArcGIS Pro
processed_points_3 = processed_points_3[~processed_points_3['group_id'].isin(['155012', '095612'])]

In [36]:
##Check sample size and number of total observations

uniqGID = processed_points_3['group_id'].nunique()
print("unique GIDs:" + str(uniqGID))
waypoints = processed_points_3.shape[0]
print("# of waypoints:" + str(waypoints))

unique GIDs:222
# of waypoints:348229


In [37]:
processed_points_3.spatial.to_featureclass(location=os.path.join(target_gdb, "cleaned_points"))

'C:\\Users\\colby\\OneDrive - The Pennsylvania State University\\Documents\\ArcGIS\\Projects\\TTRA_Presentation\\TTRA_Presentation.gdb\\cleaned_points'

In [38]:
##Delete obvious erroneous points manually (delete key in ArcGIS):
# 61 points around GPS device dropbox location outside the buffer range were manually deleted

# COMMENT:Below are possible errors missed after performing the above steps (There were fewer than 100 errors that I identified, which is great for dataset of 350,000+):
#   1. 11 points appear in a cluster in the canyon far off trail with no apparent lead up on Bright Angel Trailhead near Powell Point. These may be deleted, but again, that may be better to leave to the individual researcher based on their needs/analysis.
#   2. 6 Points next to Navajo overlook that seem excessively far into the canyon even accounting for error by the same device. These may be deleted, but again, that may be better to leave to the individual researcher based on their needs/analysis.