Undefined behavior due to item_factory::m_templates map scrambling #19566

Coolthulhu · 2016-11-30T20:23:27Z

Introduced in #19058

That pointer indirection wasn't needless after all.

Hard to replicate due to high dependency on system, compiler and possibly runtime, but can be reproduced like:

Start game
Create some items (preferably of different types)
Create new item types (artifacts)
m_templates is scrambled to keep the tree structure
Items' pointers to their itypes now point at undefined locations
Crash - or worse

Fixing this would require reverting #19058 or getting rid of artifacts and prohibiting future runtime types.
Runtime types would be a good thing to have and artifacts - while rare - are still an important part of the game.

The text was updated successfully, but these errors were encountered:

mugling · 2016-11-30T20:33:08Z

Good find. We should allow runtime types - they should just be stored in a separate map which avoids a performance regression and especially the above crash

Coolthulhu · 2016-11-30T20:43:57Z

Sounds possibly complex and would certainly slow down find_template( possibly_runtime_type ), most likely even more so than double indirection.

mugling · 2016-11-30T21:46:42Z

No, you only need to check the second map if the first one returns no match. In that way you only pay the price for indirection when you actually need it

mutability · 2016-12-01T01:13:38Z

unique_ptr + a single map is much clearer and simpler. I eyeballed the original PR and I don't see a hotspot where the (marginal) performance difference between a unique_ptr and a real pointer matters (isn't the cost dominated by the map lookup?). Is there a hotspot?

mugling · 2016-12-01T10:37:34Z

Profiling shows item::find_template is a hotspot. Can't do much about the need to call it or the cost of the hash function.

Currently:

    auto found = m_templates.find( id );
    if( found != m_templates.end() ) {
        return &found->second;
    }

After

    auto found = m_templates.find( id );
    if( found != m_templates.end() ) {
        return &found->second;
    }

    // now lookup in the runtime types hash

Isn't much more complex and retains the better performance - I agree it's dwarfed by the hash function but given it's the only possible optimisation I'd rather take what we can have

Coolthulhu · 2016-12-01T10:57:47Z

How much overhead does the unique_ptr deference actually add?

mugling · 2016-12-01T11:09:44Z

How much overhead does the unique_ptr deference actually add?

Indeterminate but it's the only possible optimisation so we should take it given that pushing runtime types to their own map isn't complex and is arguably conceptually better (for example it makes finding them for serialization much easier)

mugling · 2016-12-01T11:10:14Z

I'm going to PR something along those lines as I don't think it's a particularly tricky fix but an important one

mutability · 2016-12-01T13:07:29Z

If you can hide the implementation then two maps isn't so bad, but get _all_itypes will be a stumbling block. NB std::map doesn't hash

mugling · 2016-12-01T13:13:31Z

NB std::map doesn't hash

True - we should actually investigate if std::unordered_map has better performance.

mutability · 2016-12-01T13:22:45Z

I think that's a better path; I am extremely sceptical that unique_ptr has measurable overhead (look at the code: g++'s implementation of operator-> calls get() calls std::get<0>(); that can all be resolved at compile time, and getting the pointer should be no more expensive than accessing a struct member; I see no fences or conditionals in that path)

mugling · 2016-12-01T13:24:20Z

I'd prefer the JSON loaded and the runtime types to be separate anyway - it makes serialization of the latter easier.

mugling · 2016-12-01T14:58:22Z

Intended peformance test harness:

#include "catch/catch.hpp"

#include "item.h"

TEST_CASE( "perf_test" ) {
    int n = 1000;

    std::vector<std::string> res;
    const auto opts = item_controller->get_all_itype_ids();

    res.reserve( opts.size() * n );
    for( int i = 0; i != n; ++i ) { 
        for( const auto &e : opts ) {
            res.push_back( item::find_type( e )->nname( 1 ) );
        }
    }
    INFO( res.size() );
}

Then time ./test/cata_test perf_test for std::map vs std::unordered_map

Any suggestions?

mutability · 2016-12-01T15:03:59Z

Put the timing inside the testcase so you're not measuring test harness startup cost, which is large.

mugling · 2016-12-01T15:04:36Z

Put the timing inside the testcase so you're not measuring test harness startup cost, which is large.

How?

mutability · 2016-12-01T15:07:05Z

clock_gettime(CLOCK_MONOTONIC) or clock_gettime(CLOCK_PROCESS_CPUTIME_ID) or std::chrono::high_resolution_clock ...

mugling · 2016-12-01T15:09:09Z

Going to go with std::chrono as that should be vaguely cross-platform

mugling · 2016-12-01T15:19:43Z

#include "catch/catch.hpp"

#include "item.h"
#include "item_factory.h"

#include <chrono>

TEST_CASE( "perf_test" ) {
    int n = 5000;

    std::vector<const itype *> res;
    const auto opts = item_controller->get_all_itype_ids();

    const auto begin = std::chrono::high_resolution_clock::now();

    res.reserve( opts.size() * n );
    for( int i = 0; i != n; ++i ) { 
        for( const auto &e : opts ) {
            res.push_back( item::find_type( e ) );
        }
    }

    const auto finish = std::chrono::high_resolution_clock::now();

    std::cout << "Fetched " << res.size() << " entries in "
              << std::chrono::duration_cast<std::chrono::milliseconds>( finish - begin ).count() << "ms"
              << std::endl;
}

mugling · 2016-12-01T15:41:12Z

Huge difference:

std::unordered_map is 1684ms
std::map is 12824ms

Thats an order of magnitude in the most performance critical function in the game!

Coolthulhu · 2016-12-01T15:48:09Z

Good improvement, but is it really the most performance-critical?
Where is it used so much?

mugling · 2016-12-01T15:49:44Z

When I profile item::find_type is ~2% of cpu time

The pointer indirection is significant also, increases std::map to 16386ms

mugling · 2016-12-01T15:51:34Z

So std::unordered_map without a separate map for runtime types is literally 10x faster. It's possible we could provide a custom hash function given that our JSON id's are fairly predictable

mugling · 2016-12-01T15:53:06Z

Some strategic local caching could also help:

void func() {
    static itype *foo = item::find_type( "foo" );
    item obj( foo );

    // do something with obj
}

Coolthulhu · 2016-12-01T15:59:19Z

Now that I think about it, having two maps - one for load time and one for runtime types - solves nothing regarding runtime types.
The problem is not the split, the problem is that runtime types must not be stored directly in a structure that can move the data around.

It would be better to bring back the old double-indirection maps, then add a std::unordered_map<itype_id, itype *> cache on top.
This would have multiple advantages:

No crashes
Performance advantages of raw pointers in all cases
No performance regression for runtime types
Only one hashing per invocation
"Free" collision checking between load time and runtime types

Such a structure will be needed for runtime types anyway, unless we want to go back to full double-indirection with all the degraded performance it implies.

mugling · 2016-12-01T16:02:00Z

Yes your right. I'm going to work on an implementation of that along with looking for any calls to item::find_type that occur within a loop.

Any thoughts (yourself or @mutability) about a better hashing function? For example we know that almost all identifiers start [a-z]?

mugling · 2016-12-01T16:03:13Z

Why not just store the actual definitions in a std::list then implement the cache on top of that?

Coolthulhu · 2016-12-01T16:09:58Z

Why not just store the actual definitions in a std::list then implement the cache on top of that?

Sometimes we may want to actually access only one type of definition (load/run/abstract). Then the cache alone would not be enough.

The one place I'd expect to only look for uncached data would be load time: regenerating cache before load time finishes would be a lot of time, much more than double deference.

mugling · 2016-12-01T16:15:00Z

No I mean why store the definitions in an associative container when you can use std::list. The latter doesn't invalidate existing definitions when new are added. This lets you cache itype * in other parts of the code. This could help a lot for example in mapgen?

mugling · 2016-12-01T16:17:18Z

Asides you need stable storage anyway because item::type depends on this

Coolthulhu · 2016-12-01T16:32:45Z

No I mean why store the definitions in an associative container when you can use std::list. The latter doesn't invalidate existing definitions when new are added.

Associative+unique_ptr would be easier to work with when something inevitably needs to be added to it.
The cost of associative storage would be negligible since we'd be caching it all anyway, while the extra effort to maintain a non-standard approach would not be negligible.

This could help a lot for example in mapgen?

Mapgen currently discards all json data after using it. I assume it's to avoid having 50k lines of json stored in memory in some way.

mutability · 2016-12-01T16:49:07Z

Using std::list for ownership and a map of pointers into the list is kinda pointless, why not use unique_ptr in the map directly if you're going to do that?

aside: it's safe to cache the result of unique_ptr::get if you want (for the lifetime of the underlying object)

mutability · 2016-12-01T16:51:17Z

so you could e.g. go back to the obviously-correct single-map-of-unique_ptrs, make sure you only ever add to the map, and where the lookup+unique_ptr overhead is an issue you can cache the raw pointer.

mugling · 2016-12-01T16:55:45Z

I'm going to try implementing a separate container for runtime types and then update item::find_type to check if no match is found in the JSON types

mugling · 2016-12-01T18:54:23Z

@Coolthulhu I've almost got proper runtime type support implemented. It allows loading of arbitrary types per world.

One of the problems is that artifacts have two different formats - the one loaded from artifact_data via Item_factory and then the one loaded by itype_artifact_foo::deserialize(). These don't have the same JSON format as it appears nobody has kept the code in-sync.

I'd like to move to just one JSON loading function - can support for existing artifacts be lossy? The payoff here is removal of a lot of ugly code and a true system for runtime item types.

EDIT: I gave up on this but might implement it further in a future PR

kevingranade · 2016-12-07T03:26:24Z

I'm not sure whether this was supposed to be closed or not, but I'd like to point out what looks to be the simplest solution: Item definitions loaded from JSON go in a std::unordered_map. Item definitions generated at runtime go in a std::map. If an itype_id is not found in the first, the second is checked. The former is very fast, especially if it has very many entries. The second is fast enough if it is small. The second does not reallocate it's contents on insertion. This is far simpler and faster than wrapping every item in a unique_ptr.

Coolthulhu · 2016-12-07T03:34:01Z

The second does not reallocate it's contents on insertion.

Is that actually guaranteed? Or at least guaranteed for all feasible compilers?

kevingranade · 2016-12-07T03:39:04Z

On Dec 6, 2016 7:34 PM, "Coolthulhu" <notifications@github.com> wrote: The second does not reallocate it's contents on insertion. Is that actually guaranteed? Or at least guaranteed for all feasible compilers? It is a requirement of the stdc++ standard. http://en.cppreference.com/w/cpp/container/map/insert "No iterators or references are invalidated."

mugling · 2016-12-07T11:51:16Z

Performance testing showed std::unordered_map was much faster. Why do we need two maps in this case if the reference invalidation bug doesn't eixst and more importantly have we actually solved the problem?

mutability · 2016-12-07T12:22:33Z

Good point re references not being invalidated. I wonder what the original problem really was here. I thought references got invalidated when unordered_map was mutated, but looking more closely it's actually only iterators. So we might get away with just one map. I think something similar to "frozen" (under another name) is still useful to catch cases where code is trying to work with itemtypes before they are ready to be used (e.g. as autopickup was), even if we end up with just one map.

kevingranade · 2016-12-07T18:03:35Z

Actually I was under the impression that references to unordered_map elements were invalidated on rehash, since they aren't a single unordered_map with no additional indirection should be sufficient.

kevingranade · 2016-12-07T18:06:14Z

I think something similar to "frozen" (under another name) is still useful to catch cases where code is trying to work with itemtypes before they are ready to be used (e.g. as autopickup was), even if we end up with just one map. This circles back around to being an assertion, not something we should be checking in release builds on every access.

mugling · 2016-12-07T18:13:57Z

Actually I was under the impression that references to unordered_map elements were invalidated on rehash, since they aren't a single unordered_map with no additional indirection should be sufficient.

This would seem to be the case

This circles back around to being an assertion, not something we should be checking in release builds on every access.

I'd prefer no assertions in release builds but I think I have the minority viewpoint amongst the other other developers.

mutability · 2016-12-07T18:55:40Z

This circles back around to being an assertion, not something we should be checking in release builds on every access.

Please see e.g. #19661 as an example of why we need these in release builds; it is the release builds that are being used and if they don't trigger in those builds then we lose useful information.

kevingranade · 2016-12-07T20:12:36Z

This circles back around to being an assertion, not something we should be checking in release builds on every access. I'd prefer no assertions in release builds but I think I have the minority viewpoint amongst the other other developers. I'm using the same definition of assertion as you, this should be checked in debug builds but not release builds.

kevingranade · 2016-12-07T20:20:37Z

On Dec 7, 2016 10:55 AM, "Oliver Jowett" <notifications@github.com> wrote: This circles back around to being an assertion, not something we should be checking in release builds on every access. Please see e.g. #19661 <#19661> as an example of why we need these in release builds; it is the release builds that are being used and if they don't trigger in those builds then we lose useful information. This indicates to me that we need more testing, not that we should subvert the definition of an assertion and check them in release builds. As mugling pointed out earlier, we have debugmsg for this if there is a significant chance that the invariant is at risk of being violated in some obscure case that it is not practical to exercise during testing.

mutability · 2016-12-07T20:30:01Z

As mugling pointed out earlier, we have debugmsg for this if there is a significant chance that the invariant is at risk of being violated in some obscure case that it is not practical to exercise during testing.

As I said earlier: give me an ENFORCE macro that works exactly like assert does except that it is not affected by NDEBUG, change all the existing (cheap) assert() calls to use that, and I am happy, you can use assert as you feel it should be used and I will use ENFORCE for checks I want to be retained in a release build. I do not understand why you want to reduce the number of sanity checks in a release build if they are not causing a performance impact. Like it or not, we do not have great testing, it seems unlikely there will be a fundamental change in testing coverage in the near future, and eliminating the checks that we do have is a step backwards.

kevingranade · 2016-12-07T22:44:41Z

The fact that NDEBUG was not defined previously was an accident, defining it in release builds gets us to the expected behavior of assert(). Anyone who wants to write a release_assert() macro and *selectively* move some uses of assert to it can feel free to do so, it will likely go in with little debate as long as the asserts selected are reasonable. I do not expect this to be a precondition for defining NDEBUG in release builds.

mutability · 2016-12-07T22:49:02Z

Can you explain why you want to remove the current asserts from release builds? I still do not understand that.

mugling · 2016-12-07T22:59:40Z

Can you explain why you want to remove the current asserts from release builds?

That's not really my goal - I'd prefer instead to encourage massive usage of assert throughout the codebase (we currently have <100 instances in a 250Kloc project) which isn't likely if those checks remain in RELEASE builds.

That said the decision we ultimately reach needs to be have broad support among all of the current developers so different opinions are equally relevant and we should aim to reach consensus.

mutability · 2016-12-07T23:15:08Z

My current preference would be something like:

#define release_assert assert
#ifdef RELEASE
#  define debug_assert(x)
#else
#  define debug_assert assert
#endif

since that retains the platform-specific bits of assert (alertboxes etc) but still lets you liberally use debug_assert for more expensive checks.

Coolthulhu added the <Crash / Freeze> Fatal bug that results in hangs or crashes. label Nov 30, 2016

mugling self-assigned this Dec 1, 2016

mugling mentioned this issue Dec 1, 2016

Better handling of item templates #19583

Merged

mugling closed this as completed in #19583 Dec 3, 2016

Undefined behavior due to item_factory::m_templates map scrambling #19566

Undefined behavior due to item_factory::m_templates map scrambling #19566

Comments

Coolthulhu commented Nov 30, 2016

mugling commented Nov 30, 2016 • edited

Coolthulhu commented Nov 30, 2016

mugling commented Nov 30, 2016

mutability commented Dec 1, 2016

mugling commented Dec 1, 2016 • edited

Coolthulhu commented Dec 1, 2016

mugling commented Dec 1, 2016

mugling commented Dec 1, 2016

mutability commented Dec 1, 2016 via email

mugling commented Dec 1, 2016

mutability commented Dec 1, 2016

mugling commented Dec 1, 2016

mugling commented Dec 1, 2016 • edited

mutability commented Dec 1, 2016

mugling commented Dec 1, 2016

mutability commented Dec 1, 2016

mugling commented Dec 1, 2016

mugling commented Dec 1, 2016

mugling commented Dec 1, 2016

Coolthulhu commented Dec 1, 2016

mugling commented Dec 1, 2016

mugling commented Dec 1, 2016

mugling commented Dec 1, 2016 • edited

Coolthulhu commented Dec 1, 2016

mugling commented Dec 1, 2016

mugling commented Dec 1, 2016

Coolthulhu commented Dec 1, 2016

mugling commented Dec 1, 2016

mugling commented Dec 1, 2016

Coolthulhu commented Dec 1, 2016

mutability commented Dec 1, 2016

mutability commented Dec 1, 2016

mugling commented Dec 1, 2016

mugling commented Dec 1, 2016 • edited

kevingranade commented Dec 7, 2016 via email

Coolthulhu commented Dec 7, 2016

kevingranade commented Dec 7, 2016 via email

mugling commented Dec 7, 2016

mutability commented Dec 7, 2016 via email

kevingranade commented Dec 7, 2016 via email

kevingranade commented Dec 7, 2016 via email

mugling commented Dec 7, 2016

mutability commented Dec 7, 2016

kevingranade commented Dec 7, 2016 via email

kevingranade commented Dec 7, 2016 via email

mutability commented Dec 7, 2016 via email

kevingranade commented Dec 7, 2016 via email

mutability commented Dec 7, 2016

mugling commented Dec 7, 2016

mutability commented Dec 7, 2016

mugling commented Nov 30, 2016 •

edited

mugling commented Dec 1, 2016 •

edited

mugling commented Dec 1, 2016 •

edited

mugling commented Dec 1, 2016 •

edited

mugling commented Dec 1, 2016 •

edited