# Tutorial 02-01 - ArcPy and Cursors

Our colleagues at GeoNinjas PythonAnalytics are doing some analysis with highways in California.  They've asked us to help them develop a repeatable process to clean up the attributes of some of their data.  They currently have the route number of their highways as a text field but would like to make it numeric.

## Explore a feature class with ArcPy

#### 1.  Import arcpy
First, start by importing arcpy.  This will give you access to the tools in the package.

In [3]:
import arcpy

#### 2.  Define the location for a feature class

Now you'll define the path to a feature class.  In this case, the feature class should be in a sub-folder of the folder that this notebook is in.  Yous can use dot notation to describe the folder that we're starting in and path to the feature class from there.  This feature class is in a file geodatabase.

In [1]:
fc = './Chapter 02 Files/Chapter 02 - Working with Maps.gdb/Highways_Intersect'

#### 3.  Confirm that the feature class exists

Now that you have a path to a feature class, there are a couple things we can do to get an idea of what the data in that feature class looks like without having to open the feature class in ArcGIS Pro or read all the data.  You can start by making sure that it exists.

In [4]:
arcpy.Exists(fc)

True

#### 4.  Get the count of features in the feature class

Now that you know the feature class exists, you can start to inquire about the shape of the data.  You might be interested in how many records or features are in this feature class.  You can start by getting a count.

In [5]:
arcpy.management.GetCount(fc)

It's worth noting that arcpy returns a *results* object there instead of just returning a number.  If you want to actually use that number for anything, you can index into the *results* object. 

In [8]:
results = arcpy.management.GetCount(fc)

print(type(results))
print(type(results[0]))
int(results[0])

<class 'arcpy.arcobjects.arcobjects.Result'>
<class 'str'>


6191

#### 5.  List the fields of the feature class

Now that you've got an idea of how many features are in the feature class, you can also find out some information about the fields in the feature class.  Start by listing the fields in the feature class.

In [6]:
arcpy.ListFields(fc)

[<Field object at 0x15d58354e80[0x15d581c0a70]>,
 <Field object at 0x15d58354e20[0x15d59519e10]>,
 <Field object at 0x15d58354a00[0x15d59519fb0]>,
 <Field object at 0x15d583549a0[0x15d59519e90]>,
 <Field object at 0x15d58354940[0x15d59519470]>,
 <Field object at 0x15d583548e0[0x15d59519550]>,
 <Field object at 0x15d58354880[0x15d59519510]>,
 <Field object at 0x15d58354250[0x15d595194f0]>]

The list of **Field** objects isn't super useful as-is, but you can turn those into human readable information that will really tell you what's going on in this feature class.

In [7]:
for field in arcpy.ListFields(fc):
    print(field.name, field.type, sep='\t')

OBJECTID	OID
Shape	Geometry
NAMELSAD	String
NAME	String
HWY_NUM	String
TYPE	String
Shape_Length	Double
HWY_NUM_INT	SmallInteger


## Use a SearchCursor to identify unique values

Now that we've got an idea of what the shape of our data is, let's take a look at some of the attribute data.  Since our task is to clean up the highway numbers, we should start by finding the field containing the highway numbers.  In our previous step, it looks like there's a field called *HWY_NUM*.  That's probably a pretty good place to start.  

If you want to find out what's in that field pythonically, though, you'll need to access the attribute data using Python.  This is where you can use one of the most powerful tools arcpy has to offer, the **SearchCursor**.  The **SearchCursor** class is in arcpy's data access (da) module.

NOTE - There is a legacy version of SearchCursor that can be accessed by calling `arcpy.SearchCursor`.  We don't recommend using this class as it's older functionality and only remains in arcpy to support legacy scripts.  We always recommend using `arcpy.da.SearchCursor` going forward.

#### 1.  Read all values in a field using a SearchCursor

In the following cell, you'll create an empty list.  Then you'll use a SearchCursor to read a field and put all the values from that field into your empty list

In [9]:
# create an empty list to add values to
all_values = []

# iterate through rows using a Search Cursor
for row in arcpy.da.SearchCursor(fc, ['HWY_NUM']):
    
    # add each value into the list
    all_values.append(row[0])

In [12]:
len(all_values)

6191

#### 2.  Get the unique values from a list

Now you have gathered all the values in the *HWY_NUM* field.  You can iterate through them, but it would probably be easier to remove any duplicates.  You can use a handy Python data type called a **set** to do this.  Sets are similar to lists or tuples but cannot contain duplicate values.  So if you turn your list into a set, it will drop any duplicates.

In [10]:
unique_set = set(all_values)
len(unique_set)

200

It's often easier to work with a list than a set though.  It's a pretty common pattern to turn a list into a set and then back into a list to remove duplicates.

In [11]:
unique_values = list(set(all_values))
print(len(unique_values))
unique_values[0:5]

200


['144', '37', '170', '32', '243']

## Convert string values to integers

For the purposes of our use case, we need to convert those unique values to integers.  They're currently strings.  We can start by just trying to convert each value.

#### 1.  Try to convert all route numbers to integers

Now you'll create an empty list to store your integers.  Then you'll iterate through each of your unique values, convert them to integers, then put them in your new list.

In [12]:
# create an empty list
unique_ints = []

# iterate through unique values
for str_val in unique_values:
    
    # convert to integer and append to list
    unique_ints.append(
        int(str_val)
    )

ValueError: invalid literal for int() with base 10: '86S'

Looks like you hit a value that can't be converted to an integer.  Since you're dealing with highways, this is probably the southbound lane of an interstate or large highway.  For our purposes, we can just drop that.  

#### 2.  Remove letter characters from route numbers

We should probably help future-proof our code and take care of any "N" strings as well in case they show up.  You can use the Python string's built-in **replace** method to remove these letters.

Now repeat the previous code and add a line to replace the characters "N" and "S"

In [13]:
# create an empty list
unique_ints = []

# iterate through unique values
for str_val in unique_values:
    
    # replace problem characters with nothing
    str_val = str_val.replace("S","").replace("N","")
    
    # convert to integer and append to list
    unique_ints.append(
        int(str_val)
    )

ValueError: invalid literal for int() with base 10: ' '

#### 3.  Ignore blank values

It looks like you got another error in your conversion.  This one is related to what should be empty values in your data.  It appears you have some records that have blank spaces (" ") instead of actually being null.  You can modify our code to ignore those records.

In [14]:
# create an empty list
unique_ints = []

# iterate through unique values
for str_val in unique_values:
    
    # exclude any nulls or blank spaces
    if str_val is not None and str_val != ' ':
    
        # replace problem characters with nothing
        str_val = str_val.replace("S","").replace("N","")

        # convert to integer and append to list
        unique_ints.append(
            int(str_val)
        )

Now you can use Python's included **sorted** method to view the list of integers in order.

In [22]:
sorted(unique_ints)

[1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 22,
 23,
 24,
 25,
 26,
 27,
 29,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 46,
 47,
 49,
 50,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 65,
 66,
 67,
 68,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 82,
 83,
 84,
 85,
 86,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 94,
 95,
 97,
 98,
 99,
 99,
 101,
 103,
 105,
 107,
 108,
 110,
 111,
 113,
 114,
 115,
 116,
 118,
 119,
 120,
 121,
 123,
 125,
 126,
 128,
 129,
 130,
 131,
 132,
 133,
 134,
 135,
 137,
 138,
 139,
 140,
 142,
 144,
 145,
 149,
 150,
 152,
 154,
 155,
 156,
 160,
 162,
 163,
 165,
 166,
 168,
 170,
 174,
 178,
 180,
 183,
 184,
 185,
 186,
 187,
 188,
 190,
 191,
 192,
 198,
 199,
 200,
 202,
 204,
 205,
 209,
 210,
 213,
 215,
 216,
 217,
 221,
 223,
 225,
 227,
 232,
 233,
 237,
 238,
 241,
 242,
 243,
 245,
 246,
 247,
 255,
 259,
 260,
 261,
 262,
 263,
 273,
 274,
 275,
 280,


##  Add a new field

Now that you've figured out the logic for converting your highway numbers to integers, you can add a new field to save those values in the feature class.  This way we can return the results of our processing to our colleagues.  You'll use an arcpy method called **AddField** in the *management* module.

#### 1.  Use the arcpy AddField method to add a field

Use the `arcpy.management.AddField` geoprocessing tool to add a field to the feature class

In [15]:
arcpy.management.AddField(
    in_table = fc,
    field_name = 'HWY_NUM_INT',
    field_type = 'SHORT' # short integer
)

## Use UpdateCursor to calculate the new field

Now that you've got your logic down for cleaning up our highway numbers, you can update our dataset with the cleaned integer values.  You'll use the **UpdateCursor** from the *data access (da)* module.  This is a really handy tool to use.  It's similar to a geoprocessing tool called "Calculate Field" that you might be familiar with, but this expands the logic you can include.  It can also be considerably faster than the geoprocessing tool.

NOTE - We're going to use the UpdateCursor as a **context manager** this is a helpful Python concept.  Basically we're going to add a line to our script that sets the context for everything that's to follow.  When the following logic is complete, the context manager cleans up and deletes any of the local variables within the logic.  This helps us avoid leaving our cursors active and locking up our datasets.

The syntax and logic for the UpdateCursor is very similar to the SearchCursor.  With the UpdateCursor, though, you get the ability to update the row and save the data in our feature class.  In the code block below, you'll do the following:
- set up your context with the UpdateCursor
- iterate through each row of the cursor
- use the logic you devleoped in a previous step to convert your string values to integers
- use the **updateRow()** method on the cursor to write the integers to the *HWY_NUM_INT* field

In [16]:
# using the UpdateCursor as a context manager
with arcpy.da.UpdateCursor(fc, ['HWY_NUM', 'HWY_NUM_INT']) as cursor:
    
    # Iterate through each row.  Each row will have two values.  
    # One for HWY_NUM and one for HWY_NUM_INT
    for row in cursor:
        
        # exclude any nulls or blank spaces
        if str_val is not None and str_val != ' ':
    
            # replace problem characters with nothing
            str_val = str_val.replace("S","").replace("N","")
            
            # convert to integer
            int_val = int(str_val)
        
        # handle the nulls
        else:
            int_val = None

        # set the "HWY_NUM_INT" value to the converted integer
        row[1] = int_val
        
        # use the cursor to update the row
        cursor.updateRow(row)