# Tutorial 10-01 - Concurrency with Threads

Now back at our job at GeoNinjas PythonAnalytics our colleagues have gotten interested in wildfire damage to structures in California.  They've asked us to set up a process to find out how many structures could be impacted by wildfires this year.  They'd like us to make the process as fast and repeatable as possible to account for changing conditions.

Let's develop a process to use ArcGIS Online to query a building footprints layer with wildfire boundaries.  We can find a dataset with current wildfire boundaries that we can use later.  For now, we can use a representative dataset like 2020 wildfires to work out the process.

## Set up GIS object and gather data

#### 1.  Import packages and set up a GIS object.

The first thing you'll need to do is import the packages you're going to use and set up a GIS object.  This will allow you to connect to ArcGIS Online and get data from the Living Atlas.

In [None]:
import arcgis
import time

gis = arcgis.GIS('home')

#### 2.  Find structure and wildfire data.

Now that you're logged in, you can search the Living Atlas and get the data we need to start your analysis.  You're going to want a layer that represents structures.  Luckily, there's a dataset called **USA Structures**.  This is a simplified polygon representation of each structure footprint in the United States greater than 450 sq ft.  You'll also need a layer of wildfires to work with.  You'll use the Living Atlas's **California Fire Perimeters 2020** layer for this.  You're going to identify them by their Item IDs, which is globally unique and the most consistent and repeatable way to identify a dataset in ArcGIS Online.

In [None]:
# get the layer for USA structures
item_id_structures = '0ec8512ad21e4bb987d7e848d14e7e24'
item_structures = gis.content.get(item_id_structures)
lyr_structures = item_structures.layers[0]

# get the layer for 2020 wildfires
item_id_wildfires = '37ab7a4a05ff485aba40a53deaa20ca1'
item_wildfires = gis.content.get(item_id_wildfires)
lyr_wildfires = item_wildfires.layers[1]

## Query the structures in a single wildfire

First, it's a good idea to set up your logic on a single wildfire.  Then you can repeat that for all the wildfires.  First you'll need to query a specific feature from the wildfire layer.

#### 1.  Query a single wildfire feature.

Pick a wildfire to start with.  In this case, we've chosen an example fire from the beginning alphabetically.  Picking the first one that comes up would be totally valid too.  You just need one to start with.

In [None]:
fset_single_wildfire = lyr_wildfires.query("FIRE_NAME = 'AVILA'")
fset_single_wildfire

Now that you've got a single wildfire feature to start with, you can create a geometry filter that we can use to query the structures layer.

#### 2.  Access a single feature

When you queried for a single feature, what you got as a response was a `FeatureSet` object.  This object has some descriptive information about the dataset such as the spatial reference and fields.  In this specific use case, however, you're just interested in the first feature.

In [None]:
# get the feature from the FeatureSet
wildfire_feature = fset_single_wildfire.features[0]

#### 3.  Access the geometry and attributes of a feature

You can access both the geometry and attribute data of a feature.  To access the geometry (which we'll use for a spatial filter) you can call the `.geometry` property of a feature.  To access attribute data of a feature, you can call the `.get_value()` method and provide a field name.

In [None]:
# get the geometry from the single feature
wildfire_geom = wildfire_feature.geometry

# get the wildfire name
wildfire_name = wildfire_feature.get_value("FIRE_NAME")

print(wildfire_name)

#### 4.  Create a spatial filter

You can use that geometry to create a spatial filter.  You'll use the **intersects** filter in the ArcGIS API for Python's geometry module.

In [None]:
# Create a spatial filter to find structures that intersect the wildfire
wildfire_filter = arcgis.geometry.filters.intersects(
    wildfire_geom, sr = wildfire_geom['spatialReference']
)

#### 5.  Use the spatial filter to query the structures layer.

You can use this geometry filter to query the structures layer.  You're going to pass the geometry filter as a query parameter and only return the structures that intersect with that geometry.  As we only want the count here, you'll use the parameter *return_count_only*.

In [None]:
# Query the structures layer for structures that intersect the wildfire
structures = lyr_structures.query(
    geometry_filter = wildfire_filter,
    return_count_only=True
)
print(structures)

## Create a repeatable function to query

#### 1.  Re-structure your code as a function.

You've worked out the logic we want for a single feature, but you're going to want to repeat that on many features.  It'll be good practice (especially for what's coming up) to turn that logic into a function that you can re-use.  You'll directly copy some of the code we already wrote and just change some variables to local variables.

It's also a good idea wrap all our logic in a try/except block in case anything goes wrong with a single wildfire.

In [None]:
def query_structures_by_wildfire(wildfire_feature,
                                 structures_layer):
    
    try:
        # Get the wildfire geometry and name
        wildfire_geom = wildfire_feature.geometry
        wildfire_name = wildfire_feature.attributes['FIRE_NAME']

        # Create a spatial filter to find structures that intersect the wildfire
        wildfire_filter = arcgis.geometry.filters.intersects(
            wildfire_geom, sr = wildfire_geom['spatialReference']
        )

        # Query the structures layer for structures that intersect the wildfire
        structures = structures_layer.query(
            geometry_filter = wildfire_filter,
            return_count_only=True
        )

        # Return the wildfire name and the number of structures
        return {
            'Wildfire': wildfire_name,
            'Structures': structures
            }
    
    # If an error occurs, return the wildfire name and None for the structures
    except Exception as e:

        # print the error so we know which wildfire failed
        print(wildfire_name, e)

        return {
            'Wildfire': wildfire_name,
            'Structures': None
            }

#### 2.  Test the function.

Now you can try out your function and ensure it returns the results you expect.

In [None]:
fset_single_wildfire = lyr_wildfires.query("FIRE_NAME = 'OAK'")

query_structures_by_wildfire(
    wildfire_feature = fset_single_wildfire.features[0],
    structures_layer = lyr_structures
)

## Repeat the query for multiple wildfires.

Let's repeat this query sequentially for multiple wildfires and note the time that it takes for each.  We'll start by querying all the wildfires.  Then we'll run our query on a sample set and time the results.

#### 1.  Query all the wildfires

Similarly to how you queried a single wildfire, you can query all the wildfires in one operation.  Then you can iterate through the `.features` of the resulting `FeatureSet`.

In [None]:
# query all the wildfires
fset_wildfires = lyr_wildfires.query(
    where = "1=1"
)


#### 2.  Loop through each wildfire feature and query structure data.

Now you'll use a for loop too repeat this query on with each wildfire.  In the code block below, there are some extra lines of code to time the operation.  In addition to timing the entire operation, there's some code to time the query operation for each wildfire.

In [None]:

all_results = []

# start a timer for the total time
total_start = time.time()

# iterate through the wildfires
for wildfire in fset_wildfires.features:
    
    # timer for individual features
    loop_start = time.perf_counter()
    
    # run the query for each wildfire
    results = query_structures_by_wildfire(
                            wildfire_feature = wildfire,
                            structures_layer = lyr_structures
                        )
    
    all_results.append(results)
    
    # close out the timer
    loop_end = time.perf_counter()
    
    print(results, loop_end - loop_start)
    
# close out the timer for total time
total_end = time.time()
print(total_end - total_start)

## Use Threads to operate concurrently

In the previous step you iterated through each wildfire got our results sequentially.  Now let's say we want to speed up that process.  Since the computing of all this information is occurring on a server (and not locally) and the server is optimized for dealing with requests from many requestors, we can take advantage of that and send multiple requests at concurrently.

####  1.  Import the concurrent package

You can start by importing a package that's included with Python's base environment.  The **concurrent** package contains tools for handling thread-based concurrency.

In [None]:
import concurrent

#### 2.  Implement multithreading with the concurrent package

There are a couple important things going on in the code block below that you might not have seen before.  The first two lines creating a timestamp and an empty list are familiar enough.  You'll use that timestamp to time your workflow and see if it's faster than iterating sequentially.  The empty list is for collecting all our results.

The third line of code below is where we introduce a new concept.  You're going to use a **ThreadPoolExecutor** from the concurrent package's **futures** module.  The ThreadPoolExecutor is what will handle your requests.  In our case you're allowing the ThreadPoolExecutor to manage up to ten requests concurrently (set by the `max_workers` parameter).

Next, you'll iterate through each wildfire again, but this time you'll `submit` your function and parameters to the ThreadPoolExecutor (aliased as  *executor* in this script).  This is worth paying attention to because the function isn't necessarily being executed right at that moment in the iteration.  The `submit` method adds our job to the ThreadPoolExecutor's job list and returns a **future** object.  You'll append each of those futures to our empty futures list (*exec_futures*) so you can check on the status of them and retrieve the results.

Once we've submitted all our requests to the ThreadPoolExecutor, you can check them to see they've finished. You can use the `as_completed` function from the concurrent package's futures module.  That will allow you iterate over our list of futures and gather the results as they complete.

In [None]:
# start a timer to time the whole operation
mt_start = time.time()

# create a list to collect all the results
all_results = []

# Use a ThreadPoolExecutor to query structures for each wildfire
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
    
    # Create a list to store the future objects
    exec_futures = []

    # Iterate through each wildfire feature
    for wildfire in fset_wildfires.features:

        # Submit a query task for each wildfire
        exec_result = executor.submit(
            query_structures_by_wildfire, # our function
            wildfire_feature = wildfire, # parameters for our function
            structures_layer = lyr_structures
        )

        # Append the future object to the list
        exec_futures.append(exec_result)

    # Iterate through the future objects as they complete
    for f in concurrent.futures.as_completed(exec_futures):
        all_results.append(f.result())

# End timer and print the total time
mt_end = time.time()
print(mt_end - mt_start)

That was a bit more complicated than just iterating through all the wildfires sequentially.  It might not have seemed like it was worth it as you were getting into it.  After all, each of these query operations doesn't necessarily take all that long.  After comparing the durations of the sequential (for loop) and concurrent executions though, you can see that operating concurrently made a significant reduction in runtime.  



#### 3.  Package results as a DataFrame

As a final step, You can turn our results into a DataFrame and pass that on to whatever downstream consumers you might have.  Check out the chapter on Data Engineering for potential next steps.

In [None]:
import pandas
df = pandas.DataFrame(all_results)

In [None]:
df.head()