# Devlog 2023-07-14

_author: Tyler Coles_

Demonstrating the new `filter_geo` utility function.

First we prepare a simulation as usual by selecting a GEO/IPM/MM and defining our simulation parameters.

In [1]:
import time
from datetime import date

from epymorph.data import geo_library_static, ipm_library, mm_library
from epymorph.simulation import Simulation

geo = geo_library_static['us_counties_2015']()
ipm_builder = ipm_library['sirs']()
mvm_builder = mm_library['sparsemod']()

param = {
    # MM
    'phi': 40.0,
    # IPM
    'beta': 0.4,
    'gamma': 0.25,
    'xi': 0.011,
    # Initializer
    'location': 0,
    'seed_size': 1000
}

print(f"Geo contains {geo.nodes} nodes.")


Geo contains 3220 nodes.


`us_counties_2015` is a big geo! Using the `sparsemod` movement model is going to take a while...

In [2]:
sim = Simulation(geo, ipm_builder, mvm_builder)

t0 = time.perf_counter()
sim.run(param=param, start_date=date(2010, 1, 1), duration_days=30)
t1 = time.perf_counter()
print(f"Complete in {(t1 - t0):.3f} s")


Complete in 270.819 s


Whew. This is fine if we _need_ to run a full simulation, but what if we're working in some kind of implementation loop?

Maybe we're tweaking parameters little by little based on the observation of our results. If the simulation runs faster, we can try more parameters. And maybe we don't need a full simulation to know if our parameters are good or not. For this kind of purpose, it would be handy to simulate on a subset of the nodes from the geo. But it would be a pain to create a whole new geo that is just a subset of another one.

Now we can easily pull this off with geo filtering!

The first step is to select which nodes should be in our subset, then we can create a filtered geo.

Selections are nothing more than an array of indices. Utility function `top` is a convenient way to get the top-N nodes, but any method to choose a set of indices will work. (There's also `bottom` for instance.)

In [3]:
from epymorph.util import top

# Here we're selecting the 200 nodes with the most population.
selection = top(200, geo['population'])

# `filter_geo` then returns a new geo by applying a selection
# to every attribute in the original geo.
geo_filtered = geo.filter(selection)

print(f"Original has {geo.nodes} nodes; filtered has {geo_filtered.nodes}.")


Original has 3220 nodes; filtered has 200.


Running the simulation on the filtered geo is much faster!

In [4]:
sim = Simulation(geo_filtered, ipm_builder, mvm_builder)

t0 = time.perf_counter()
sim.run(param=param, start_date=date(2010, 1, 1), duration_days=30)
t1 = time.perf_counter()
print(f"Complete in {(t1 - t0):.3f} s")


Complete in 16.025 s
