# Master Segment Shapefile

The cells below join together the segment shapefiles of all the planning areas across Utah into one master segment shapefile. Please run the cells in order and follow the instructions as you go to ensure correct results.

In [14]:
# import libraries
import pandas as pd
import numpy as np
import arcpy
from arcgis.gis import *
import datetime
gis = GIS()

arcpy.env.overwriteOutput = True
#add multithreading in future version -- something like arcpy.env.parallel processing = %

## Input Data

Before getting started we need to gather each region's shapefile (USTM, Wasatch Front, Cache, Dixie, Summit Wasatch, and Iron). We also need a forecast area lookup file. Please update the paths below depending on their names and file locations.

In [15]:
# Set the workspace environment to the folder where the output shapefile will be stored
arcpy.env.workspace = 'C:/Users/mshah/Documents/GitHub/UDOT-Master-Segments/outputs/'

# List of input shapefile paths
usPath = '../data/0_USTM_v3.0 - 2023-08-17_DRAFT/Segments_UD_20230729/Segments_UD_20220729b.shp'
wfPath = '../data/1_WF/Segments_WF_20231101/Segments_WF_20231101.shp'
caPath ='../data/2_Cache/Segments_CA_20231023/Segments_CA_20231023a.shp'
dxPath ='../data/3_Dixie/Segments_DX_20220915b/Segments_DX_20220915b.shp'
swPath ='../data/4_SuWsv2_2023-09-13_DRAFT/Segments_SW_20230913/Segments_SW_20230913.shp'
irPath ='../data/5_IronCo - v1.0 - 2023-09-13_DRAFT/Segments_IR_20230912b/Segments_IR_20230912b.shp'

# forecast area lookup file
forecastPath = r'../data/cofips_subareaid_forecastarea.csv'

Now that you have the location of each segment shapefile please visually inspect if all the shapefiles meet the following criteria

 1. Each shapefile only includes those segments whose centroid falls within their corresponding subarea.
 2. Each shapefile includes a SUBAREAID field that is correct.
 
You may want to manually add the counties and subarea polygon layers to help with visual inspection.

If any of the shapefiles do not meet the criteria, you must fix that shapefile before moving forward. Below is an example of how to add subareaid to the cache and iron shapefiles.

In [16]:
# The CacheCo Segment file does not include a SUBAREAID column, so we add one:
caPathNew = caPath.replace('a.shp', 'b.shp')
arcpy.management.CopyFeatures(caPath, caPathNew)
arcpy.management.AddField(caPathNew, "SUBAREAID", "SHORT")
arcpy.management.CalculateField(caPathNew, "SUBAREAID", 2)

# The IronCo Segment file does not include a SUBAREAID column, so we add one: 
irPathNew = irPath.replace('b.shp', 'c.shp')
arcpy.management.CopyFeatures(irPath, irPathNew)
arcpy.management.AddField(irPathNew, "SUBAREAID", "SHORT")
arcpy.management.CalculateField(irPathNew, "SUBAREAID", 5)

Once the shapefiles have been vizually inspected and fixed to meet the criteria, create a list of all the input shapefiles to be merged together and processed.

In [17]:
# List of input shapefile paths
shpPaths = [
    usPath,
    wfPath,
    caPathNew, #notice the updated path
    dxPath,
    swPath,
    irPathNew #notice the updated path
]

## Merge Segments

Now that all segment shapefiles have the correct segments and their subareaid is correct, we can move forward by merging them together.

In [18]:
# Output merged shapefile path
inputShp = '../intermediate/Merged_Segments.shp'

# Use arcpy.management.Merge with FieldMappings to merge the shapefiles
fieldMappings = arcpy.FieldMappings()
fieldMappings.mergeRule = 'Join'
arcpy.management.Merge(shpPaths, inputShp, fieldMappings)

We now check the CRS and for duplicate SEGIDS within the merged file. If the crs is not correct, or if duplicate SEGIDs exist, we must fix those before performing the other steps. This fixing should be done to the individual subarea segment shapefiles instead of as a whole here. 

(Technically these checks should be performed individually for each segment shapefile before this script, but we double check here anyway)

### Check CRS

In [19]:
try:
    # Use arcpy.Describe to get information about the shapefile
    desc = arcpy.Describe(inputShp)
    
    # Check if the shapefile has a spatial reference (projection)
    if desc.spatialReference is not None:
        # Get the name of the coordinate system
        coordinate_system_name = desc.spatialReference.name
        
        # Print the coordinate system information
        print(f"Coordinate System: {coordinate_system_name}")
    else:
        print("The shapefile does not have a defined coordinate system.")
except arcpy.ExecuteError:
    print(arcpy.GetMessages(2))
except Exception as e:
    print(str(e))

Coordinate System: NAD_1983_UTM_Zone_12N


### Check for Duplicates

In [20]:
# Create a set to store unique SEGID values
uniqueSegIDs = set()

# Create a list to store duplicate SEGID values
duplicateSegIDs = []

# Use a SearchCursor to iterate through the SEGID field
with arcpy.da.SearchCursor(inputShp, ['SEGID']) as cursor:
    for row in cursor:
        segid = row[0]
        if segid in uniqueSegIDs:
            duplicateSegIDs.append(segid)
        else:
            uniqueSegIDs.add(segid)

# Check if there are any duplicate SEGIDs
if len(duplicateSegIDs) > 0:
    print("Duplicate SEGIDs found in Merged_Segments.shp:")
    for segid in duplicateSegIDs:
        print(segid)
else:
    print("No duplicate SEGIDs found in Merged_Segments.shp.")

No duplicate SEGIDs found in Merged_Segments.shp.


## Prepare Final Output

Within the following cells, we clean up the fields of the merged segment file. This includes selecting the necessary fields as well as recalculating certain fields to ensure they are correct.

### Select Fields

In [21]:
# Specify the input shapefile path
fields_to_keep = ['FID', 'Shape', 'SEGID', 'PLANAREA', 'SUBAREAID', 'BMP','EMP', 'AADT2019', 'DISTANCE']# 'DISTANCE', 'CO_FIPS', 'FAC_WDAVG', 'FAC_SPR', 'FAC_SUM', 'FAC_FAL', 'FAC_WIN']
outputShp = r'../intermediate/Merged_Segments.shp'

try:  
    # Delete unwanted fields from the copied shapefile
    fields_to_delete = [field.name for field in arcpy.ListFields(outputShp) if field.name not in fields_to_keep]
    arcpy.management.DeleteField(outputShp, fields_to_delete)

    print("Columns deleted successfully.")
except arcpy.ExecuteError:
    print(arcpy.GetMessages(2))
except Exception as e:
    print(str(e))

Columns deleted successfully.


### Recalculate Distance

Although the DISTANCE Field already exists in the dataset, we need to recalculate it to ensure all SEGIDs in the state have a correct value. The following code does just that by adding the DISTANCE field to the shapefile.

In [23]:
# Specify the input shapefile path and other variables
outputShp = r'C:/Users/mshah/Documents/GitHub/UDOT-Master-Segments/intermediate/Merged_Segments.shp'

try:
    # calculate distance using the length function
    arcpy.management.CalculateGeometryAttributes(outputShp, "DISTANCE LENGTH", '', "SQUARE_METERS")
    arcpy.management.CalculateField(outputShp, "DISTANCE", "!DISTANCE! * 0.000621371192", "PYTHON3")
    
    # print statements for checking
    distanceField = "DISTANCE"
    print(f"{distanceField} field calculated successfully.")
    df = pd.DataFrame.spatial.from_featureclass(outputShp)
    print(df[['DISTANCE']].head(5))
    
except arcpy.ExecuteError:
    print(arcpy.GetMessages(2))
except Exception as e:
    print(str(e))

DISTANCE field calculated successfully.
    DISTANCE
0   0.666642
1   15.36987
2  30.002021
3  14.194335
4  17.323272


### Recalculate CO_FIPS

Similar to the DISTANCE FIELD, although CO_FIPS already exists, we need to calculate it again to ensure it is correct.

In [24]:
# Read in county boundaries from UGRC Website as well as intermediate shapefiles
countyShp = r'../data/Utah_County_Boundaries/Counties.shp'
copyShp = r'../intermediate/Copy_Segments.shp'
outputShp = r'../intermediate/Merged_Segments.shp'
int_centroids = r'../intermediate/Centroids.shp'
int_cofip_centroids = r'../intermediate/Centroids_with_CO_FIPS.shp'

# Specify the input shapefile path
co_fields_to_keep = ['FID', 'Shape', 'SEGID', 'PLANAREA', 'DISTANCE','SUBAREAID', 'BMP','EMP', 'AADT2019', 'FIPS']

# Create a new shapefile by copying the target shapefile
arcpy.CopyFeatures_management(outputShp,copyShp)

try: 
    # Perform FeatureToPoint to create centroids
    arcpy.management.FeatureToPoint(copyShp, int_centroids, "INSIDE")
    
    # Perform spatial join with centroids and subarea shapefile
    arcpy.analysis.SpatialJoin(target_features=int_centroids, join_features=countyShp, out_feature_class=int_cofip_centroids, join_type="KEEP_COMMON", match_option="WITHIN")
   
    # Perform spatial join with clipped centroids and segments
    arcpy.analysis.SpatialJoin(target_features=copyShp, join_features=int_cofip_centroids, out_feature_class=outputShp, join_type="KEEP_COMMON", match_option="INTERSECT")
    
    # Delete unwanted fields from the copied shapefile
    co_fields_to_delete = [field.name for field in arcpy.ListFields(outputShp) if field.name not in co_fields_to_keep]
    if co_fields_to_delete:
        arcpy.management.DeleteField(outputShp, co_fields_to_delete)

    # Delete the intermediate files
    #arcpy.management.Delete(copyShp)
    arcpy.management.Delete(int_centroids)
    arcpy.management.Delete(int_cofip_centroids)
    
except arcpy.ExecuteError:
    print(arcpy.GetMessages(2))
except Exception as e:
    print(str(e))

### Manual Overrides

There are a few segments that need some manual overridding. (if any are missing, please add them and adjust the code to handle it) 
 - DIXIE_5134 should be SUBAREAID 3 and CO_FIPS should be 53
 - 1822_000.0 should be SUBAREAID 1 and CO_FIPS should be 49

In [25]:
# Define the input shapefiles
copyShp = r'../intermediate/Copy_Segments.shp'  # Source shapefile with the specific segment
outputShp = r'../intermediate/Merged_Segments.shp'
outputShpDixie = r'../intermediate/Dixie_Segment.shp'

# Create a feature layer and apply a SQL query to select the desired row
arcpy.management.MakeFeatureLayer(copyShp, "Temp_Layer")
arcpy.management.SelectLayerByAttribute("Temp_Layer", "NEW_SELECTION", "SEGID = 'Dixie_5134'")

# Copy the selected features to a new shapefile
arcpy.management.CopyFeatures("Temp_Layer", outputShpDixie)
arcpy.management.Delete("Temp_Layer")

# Manually Calculate the fields for Dixie_5134
arcpy.management.AddField(outputShpDixie, "FIPS", "SHORT")
arcpy.management.CalculateField(outputShpDixie, "SUBAREAID", 3)
arcpy.management.CalculateField(outputShpDixie, "FIPS", 53)

# Use the Append tool to merge the Dixie segment onto the shapefile with all other segments
arcpy.management.Append(outputShpDixie, outputShp, "NO_TEST")

# Delete Dixie Segment & Copy Segment
arcpy.management.Delete(outputShpDixie)
arcpy.management.Delete(copyShp)

### Calculate Forecast Area

Here we calculate the forecast area based on a lookup csv table.

In [26]:
# Define the input feature layer (shapefile)
outputShp = r'../intermediate/Merged_Segments.shp'

# Add the new field to the shapefile
new_field_name = 'F_KEY'
arcpy.AddField_management(outputShp, new_field_name, 'TEXT')

# Calculate the values in the new field based on the expression
expression = """str(!SUBAREAID!) + '_' + str(!FIPS!).split('.')[0]"""
arcpy.CalculateField_management(outputShp, new_field_name, expression, 'PYTHON3')

# Perform the join between the shapefile and the CSV file
join_field = 'F_KEY'
arcpy.JoinField_management(outputShp, join_field, forecastPath, join_field)        

# Delete unwanted fields from the copied shapefile
f_fields_to_keep = ['FID', 'Shape', 'SEGID', 'PLANAREA', 'DISTANCE','SUBAREAID', 'BMP','EMP', 'AADT2019', 'CO_FIPS', 'F_AREA']
f_fields_to_delete = [field.name for field in arcpy.ListFields(outputShp) if field.name not in f_fields_to_keep]
if f_fields_to_delete:
    arcpy.management.DeleteField(outputShp, f_fields_to_delete)


## Pandas Double Check

Now that we did all the processing, lets double check that nothing looks fishy using pandas.

In [27]:
# we should also do a 'pandas' check were we make sure every value from every field has data that that data isn't wrong
inputShp = r'../intermediate/Merged_Segments.shp'
df_merged = pd.DataFrame.spatial.from_featureclass(inputShp)
df_merged = df_merged.drop(columns={'SHAPE'})

In [28]:
# check if there are any na values
print(df_merged.isna().any().any())

False


In [29]:
# some basic checks
df_merged.describe()

Unnamed: 0,FID,BMP,EMP,DISTANCE,AADT2019,SUBAREAID,CO_FIPS
count,8716.0,8716.0,8716.0,8716.0,8716.0,8716.0,8716.0
mean,4357.5,26.777619,28.384945,1.752572,9440.789123,1.146397,34.473153
std,2516.236807,75.986861,76.177823,3.569012,20164.178718,1.208746,16.912874
min,0.0,0.0,0.0,0.025089,0.0,0.0,1.0
25%,2178.75,0.0,0.8775,0.371951,500.0,0.0,21.0
50%,4357.5,1.5265,2.879,0.648408,2917.5,1.0,35.0
75%,6536.25,7.94425,10.581,1.376615,10967.0,1.0,49.0
max,8715.0,499.375,502.577,55.338206,287320.0,5.0,57.0


## Final Shapefile

Now that our merged segment shapefile file looks good and is double checked, we can output it into the outputs folder.

In [30]:
# output final merged segment file
inputShp = r'../intermediate/Merged_Segments.shp'
outputShp = r'Final_Segments'

arcpy.CopyFeatures_management(inputShp, outputShp)
    