Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement StructArrays as core for agent data #492

Closed
wants to merge 2 commits into from

Conversation

fbanning
Copy link
Member

StructArrays in Agents.jl?

The mission

Attempting to solve #396 via a complete reimplementation of agent data in the ABM struct by replacing the model.agents Dict with a StructArray. Current API should remain "as is" as far as possible.

The caveats

Tests mostly work fine but not all of them, of course, because so much of the underlying structure of Agents.jl has to be changed to accomodate a move away from the battle-tested dictionary approach.

Mixed models don't work right now (not the focus of this first iteration). Those tests have been commented out because the thrown errors were annoying me.

The benchmarks

Arbitrary benchmark code:

using BenchmarkTools, Agents, Random

mutable struct SAAgent <: AbstractAgent
    id::Int
    money::Float64
end

function init(n)
    Random.seed!(42)
    p = Dict(
        :abs_growth => 0.0,
    )
    m = ABM(SAAgent, properties = p)
    for i in 1:n
        add_agent!(SAAgent(i, rand(m.rng)), m)
    end
    return m
end

function m_s!(m)
    m.abs_growth = 0 
    for a in allagents(m)
        gains = a.money * rand(m.rng, [-0.005, -0.001, 0.01])
        m.abs_growth += gains
        a.money += gains
    end
end

@benchmark begin
    m = init(2_500)
    adata = [:money]
    mdata = [:abs_growth]
    run!(m, dummystep, m_s!, 2_500; adata=adata, mdata=mdata);
end

Dict:

BenchmarkTools.Trial:
  memory estimate:  3.58 GiB
  allocs estimate:  43936674
  --------------
  minimum time:     3.347 s (14.65% GC)
  median time:      3.535 s (12.17% GC)
  mean time:        3.535 s (12.17% GC)
  maximum time:     3.724 s (9.95% GC)
  --------------
  samples:          2
  evals/sample:     1

StructArrays:

BenchmarkTools.Trial:
  memory estimate:  7.12 GiB
  allocs estimate:  198792134
  --------------
  minimum time:     13.096 s (6.24% GC)
  median time:      13.096 s (6.24% GC)
  mean time:        13.096 s (6.24% GC)
  maximum time:     13.096 s (6.24% GC)
  --------------
  samples:          1
  evals/sample:     1

The conclusion

Either my implementation is borked (actually highly likely because this is a major overhaul) or it's just not more performant to use StructArrays like I tried to do here.

Eager to hear your thoughts. :)

@fbanning fbanning marked this pull request as draft April 22, 2021 16:22
src/core/model.jl Outdated Show resolved Hide resolved
@fbanning
Copy link
Member Author

Improvements have been made with this naive implementation of a Dict{Int,Int}(id => index):

BenchmarkTools.Trial: 
  memory estimate:  7.16 GiB
  allocs estimate:  201602564
  --------------
  minimum time:     10.755 s (8.20% GC)
  median time:      10.755 s (8.20% GC)
  mean time:        10.755 s (8.20% GC)
  maximum time:     10.755 s (8.20% GC)
  --------------
  samples:          1
  evals/sample:     1

@fbanning
Copy link
Member Author

fbanning commented Apr 23, 2021

Extending on the benchmark, this one also aggregates some data across all agents and should be a better way to judge the potential benefits of directly accessing columns of the StructArray :

using BenchmarkTools, Agents, Random
using StatsBase: mean

mutable struct SAAgent <: AbstractAgent
    id::Int
    money::Float64
end

function init(n)
    Random.seed!(42)
    p = Dict(
        :abs_growth => 0.0,
        :mean_wealth => 0.0,
    )
    m = ABM(SAAgent, properties = p)
    for i in 1:n
        add_agent!(SAAgent(i, rand(m.rng)), m)
    end
    return m
end

function m_s!(m)
    m.abs_growth = 0 
    for a in allagents(m)
        gains = a.money * rand(m.rng, [-0.005, -0.001, 0.01])
        m.abs_growth += gains
        a.money += gains
    end
    m.mean_wealth = mean(a.money for a in allagents(m)) # for Dictionary
    m.mean_wealth = mean(m.agents.money) # for StructArray
end

@benchmark begin
    m = init(2_500)
    adata = [:money]
    mdata = [:abs_growth, :mean_wealth]
    run!(m, dummystep, m_s!, 2_500; adata=adata, mdata=mdata);
end

Dict:

BenchmarkTools.Trial: 
  memory estimate:  3.59 GiB
  allocs estimate:  43956686
  --------------
  minimum time:     3.820 s (10.55% GC)
  median time:      3.894 s (13.87% GC)
  mean time:        3.894 s (13.87% GC)
  maximum time:     3.967 s (17.07% GC)
  --------------
  samples:          2
  evals/sample:     1

StructArrays:

BenchmarkTools.Trial: 
  memory estimate:  7.17 GiB
  allocs estimate:  201630070
  --------------
  minimum time:     10.408 s (8.25% GC)
  median time:      10.408 s (8.25% GC)
  mean time:        10.408 s (8.25% GC)
  maximum time:     10.408 s (8.25% GC)
  --------------
  samples:          1
  evals/sample:     1

As you can see, aggregating data across all agents costs some time (about 10% increase) in the regular Agents.jl approach.
With the StructArray approach, this aggregation does not seem to have any measurable cost to it at all.

So while in this benchmark it's overall ~3x as slow, takes up twice the memory and needs 5 times as many allocations, the expected speedup of aggregation over columns is real. I'm really wondering why the memory estimate is doubled though...

@AayushSabharwal
Copy link
Collaborator

This piqued my interest. Currently, I'm a bit tight on time as my semester is coming to a close, so I haven't been able to explore the code properly. However, if I understand correctly, add/deleting an agent is an O(n) operation, since the subsequent elements have to be shifted back an index, and the corresponding entries in the Dict updated? If so, I have a couple ideas, but I'm not sure of their feasibility. Just throwing them out to see if it's valid,

  • To delete an agent, it could just be swapped with the last one and popped off the end? That way shifting of all higher indices is avoided, and only one (the former last index) needs to be updated.
  • Deletes could possibly be pooled and avoid even swapping if the indices are simply invalidated. All invalid indices are stored in a SortedSet. Any access to these indices throws an error/does whatever is appropriate. When an agent is added, it simply takes the smallest invalid index. This would maintain the sorted order of ids in the array. However, this approach will leave a lot of invalid indices if the agent population falls drastically.

It's possible I've misunderstood something. If so, please correct me.

@Libbum
Copy link
Member

Libbum commented Apr 26, 2021

Decent ideas @AayushSabharwal. We're looking at any way it's possible to leverage this without causing major slow-downs in other regions of the codebase. It's all 'is this possible' at the moment, so any further ideas are welcomed.

@fbanning
Copy link
Member Author

fbanning commented Apr 26, 2021

To delete an agent, it could just be swapped with the last one and popped off the end

Definitely feasible, I think.

Keep in mind that the current benchmark only adds 100 agents and doesn't remove any agents at all. So at least from my point of view, the process of deleting agents is of a lower priority right now.

All invalid indices are stored in a `SortedSet`

Possible, yes. However, it would leave a big memory footprint because we would have to keep the rows in the StructArray so that the indexing functions as before.
If we delete an agent, this will leave an unused row in our agent data until a new agent is added there. Furthermore, constant checks are needed to always check whether a given id is valid in the current context of the model. It's basically introducing another lookup which can be avoided by an up-to-date dict in the first place.
Or imagine an empty model where the user adds an agent with id = 1 and another one with id = 101 (for whatever reasons). Having static indices with a set of invalid indices would then require us to add 99 rows to the StructArray which are just placeholders to be filled later, whenever the user decides to add an agent with exactly this id. But the model still needs to store all these "placeholder agents".
I would rather not go down that route with respect to memory efficiency.

It's all 'is this possible' at the moment

Currently I'm inclined to say "yes, but it isn't really more performant".

Compared to our regular "id => struct" implementation, performance is bad enough as it currently is. Just lazily accessing the rows (i.e. accessing a single agent's field values via model[id].property) takes so much computational time, that it just doesn't seem sensible to opt for StructArrays instead of regular structs.
The benefit of aggregation isn't used often enough in standard ABMs (i.e. altering microdata of heterogenous agents) that it would really allow us to leverage the power of a StructArray. The downside seems just too big for me at the moment. If we can somehow avoid that downside of LazyRow access then we might actually have a trade-off that is worthy of consideration. Right now (with my naive implementation here), this doesn't seem feasible.

so any further ideas are welcome

Most definitely!

@Datseris
Copy link
Member

I guess we can close this? Thanks a lot Frederik for the effort, but the conclusion is that it doesn't really offer that much benefit?

@fbanning
Copy link
Member Author

I guess we can close this? Thanks a lot Frederik for the effort, but the conclusion is that it doesn't really offer that much benefit?

I agree because nobody has commented on this PR since April and nobody seems to have found any better ways of implementing StructArrays. It was a nice idea but sadly didn't fulfill its promise. I'll just close this draft PR.

If at some point somebody wants to dive back into this topic, I'd be very happy to chime back in. For now we got other things to focus on. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants