Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined behavior due to item_factory::m_templates map scrambling #19566

Closed
Coolthulhu opened this issue Nov 30, 2016 · 50 comments · Fixed by #19583
Closed

Undefined behavior due to item_factory::m_templates map scrambling #19566

Coolthulhu opened this issue Nov 30, 2016 · 50 comments · Fixed by #19583
Assignees
Labels
<Crash / Freeze> Fatal bug that results in hangs or crashes.

Comments

@Coolthulhu
Copy link
Contributor

Introduced in #19058

That pointer indirection wasn't needless after all.

Hard to replicate due to high dependency on system, compiler and possibly runtime, but can be reproduced like:

  • Start game
  • Create some items (preferably of different types)
  • Create new item types (artifacts)
  • m_templates is scrambled to keep the tree structure
  • Items' pointers to their itypes now point at undefined locations
  • Crash - or worse

Fixing this would require reverting #19058 or getting rid of artifacts and prohibiting future runtime types.
Runtime types would be a good thing to have and artifacts - while rare - are still an important part of the game.

@Coolthulhu Coolthulhu added the <Crash / Freeze> Fatal bug that results in hangs or crashes. label Nov 30, 2016
@mugling
Copy link
Contributor

mugling commented Nov 30, 2016

Good find. We should allow runtime types - they should just be stored in a separate map which avoids a performance regression and especially the above crash

@Coolthulhu
Copy link
Contributor Author

Sounds possibly complex and would certainly slow down find_template( possibly_runtime_type ), most likely even more so than double indirection.

@mugling
Copy link
Contributor

mugling commented Nov 30, 2016

No, you only need to check the second map if the first one returns no match. In that way you only pay the price for indirection when you actually need it

@mutability
Copy link
Contributor

unique_ptr + a single map is much clearer and simpler. I eyeballed the original PR and I don't see a hotspot where the (marginal) performance difference between a unique_ptr and a real pointer matters (isn't the cost dominated by the map lookup?). Is there a hotspot?

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

Profiling shows item::find_template is a hotspot. Can't do much about the need to call it or the cost of the hash function.

Currently:

    auto found = m_templates.find( id );
    if( found != m_templates.end() ) {
        return &found->second;
    }

After

    auto found = m_templates.find( id );
    if( found != m_templates.end() ) {
        return &found->second;
    }

    // now lookup in the runtime types hash

Isn't much more complex and retains the better performance - I agree it's dwarfed by the hash function but given it's the only possible optimisation I'd rather take what we can have

@Coolthulhu
Copy link
Contributor Author

How much overhead does the unique_ptr deference actually add?

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

How much overhead does the unique_ptr deference actually add?

Indeterminate but it's the only possible optimisation so we should take it given that pushing runtime types to their own map isn't complex and is arguably conceptually better (for example it makes finding them for serialization much easier)

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

I'm going to PR something along those lines as I don't think it's a particularly tricky fix but an important one

@mutability
Copy link
Contributor

mutability commented Dec 1, 2016 via email

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

NB std::map doesn't hash

True - we should actually investigate if std::unordered_map has better performance.

@mutability
Copy link
Contributor

I think that's a better path; I am extremely sceptical that unique_ptr has measurable overhead (look at the code: g++'s implementation of operator-> calls get() calls std::get<0>(); that can all be resolved at compile time, and getting the pointer should be no more expensive than accessing a struct member; I see no fences or conditionals in that path)

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

I'd prefer the JSON loaded and the runtime types to be separate anyway - it makes serialization of the latter easier.

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

Intended peformance test harness:

#include "catch/catch.hpp"

#include "item.h"

TEST_CASE( "perf_test" ) {
    int n = 1000;

    std::vector<std::string> res;
    const auto opts = item_controller->get_all_itype_ids();

    res.reserve( opts.size() * n );
    for( int i = 0; i != n; ++i ) { 
        for( const auto &e : opts ) {
            res.push_back( item::find_type( e )->nname( 1 ) );
        }
    }
    INFO( res.size() );
}

Then time ./test/cata_test perf_test for std::map vs std::unordered_map

Any suggestions?

@mutability
Copy link
Contributor

Put the timing inside the testcase so you're not measuring test harness startup cost, which is large.

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

Put the timing inside the testcase so you're not measuring test harness startup cost, which is large.

How?

@mutability
Copy link
Contributor

clock_gettime(CLOCK_MONOTONIC) or clock_gettime(CLOCK_PROCESS_CPUTIME_ID) or std::chrono::high_resolution_clock ...

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

Going to go with std::chrono as that should be vaguely cross-platform

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

#include "catch/catch.hpp"

#include "item.h"
#include "item_factory.h"

#include <chrono>

TEST_CASE( "perf_test" ) {
    int n = 5000;

    std::vector<const itype *> res;
    const auto opts = item_controller->get_all_itype_ids();

    const auto begin = std::chrono::high_resolution_clock::now();

    res.reserve( opts.size() * n );
    for( int i = 0; i != n; ++i ) { 
        for( const auto &e : opts ) {
            res.push_back( item::find_type( e ) );
        }
    }

    const auto finish = std::chrono::high_resolution_clock::now();

    std::cout << "Fetched " << res.size() << " entries in "
              << std::chrono::duration_cast<std::chrono::milliseconds>( finish - begin ).count() << "ms"
              << std::endl;
}

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

Huge difference:

std::unordered_map is 1684ms
std::map is 12824ms

Thats an order of magnitude in the most performance critical function in the game!

@Coolthulhu
Copy link
Contributor Author

Good improvement, but is it really the most performance-critical?
Where is it used so much?

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

When I profile item::find_type is ~2% of cpu time

The pointer indirection is significant also, increases std::map to 16386ms

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

So std::unordered_map without a separate map for runtime types is literally 10x faster. It's possible we could provide a custom hash function given that our JSON id's are fairly predictable

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

Some strategic local caching could also help:

void func() {
    static itype *foo = item::find_type( "foo" );
    item obj( foo );

    // do something with obj
}

@Coolthulhu
Copy link
Contributor Author

Now that I think about it, having two maps - one for load time and one for runtime types - solves nothing regarding runtime types.
The problem is not the split, the problem is that runtime types must not be stored directly in a structure that can move the data around.

It would be better to bring back the old double-indirection maps, then add a std::unordered_map<itype_id, itype *> cache on top.
This would have multiple advantages:

  • No crashes
  • Performance advantages of raw pointers in all cases
  • No performance regression for runtime types
  • Only one hashing per invocation
  • "Free" collision checking between load time and runtime types

Such a structure will be needed for runtime types anyway, unless we want to go back to full double-indirection with all the degraded performance it implies.

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

Yes your right. I'm going to work on an implementation of that along with looking for any calls to item::find_type that occur within a loop.

Any thoughts (yourself or @mutability) about a better hashing function? For example we know that almost all identifiers start [a-z]?

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

Why not just store the actual definitions in a std::list then implement the cache on top of that?

@Coolthulhu
Copy link
Contributor Author

Why not just store the actual definitions in a std::list then implement the cache on top of that?

Sometimes we may want to actually access only one type of definition (load/run/abstract). Then the cache alone would not be enough.

The one place I'd expect to only look for uncached data would be load time: regenerating cache before load time finishes would be a lot of time, much more than double deference.

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

No I mean why store the definitions in an associative container when you can use std::list. The latter doesn't invalidate existing definitions when new are added. This lets you cache itype * in other parts of the code. This could help a lot for example in mapgen?

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

Asides you need stable storage anyway because item::type depends on this

@Coolthulhu
Copy link
Contributor Author

No I mean why store the definitions in an associative container when you can use std::list. The latter doesn't invalidate existing definitions when new are added.

Associative+unique_ptr would be easier to work with when something inevitably needs to be added to it.
The cost of associative storage would be negligible since we'd be caching it all anyway, while the extra effort to maintain a non-standard approach would not be negligible.

This could help a lot for example in mapgen?

Mapgen currently discards all json data after using it. I assume it's to avoid having 50k lines of json stored in memory in some way.

@mutability
Copy link
Contributor

Using std::list for ownership and a map of pointers into the list is kinda pointless, why not use unique_ptr in the map directly if you're going to do that?

aside: it's safe to cache the result of unique_ptr::get if you want (for the lifetime of the underlying object)

@mutability
Copy link
Contributor

so you could e.g. go back to the obviously-correct single-map-of-unique_ptrs, make sure you only ever add to the map, and where the lookup+unique_ptr overhead is an issue you can cache the raw pointer.

@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

I'm going to try implementing a separate container for runtime types and then update item::find_type to check if no match is found in the JSON types

@mugling mugling self-assigned this Dec 1, 2016
@mugling
Copy link
Contributor

mugling commented Dec 1, 2016

@Coolthulhu I've almost got proper runtime type support implemented. It allows loading of arbitrary types per world.

One of the problems is that artifacts have two different formats - the one loaded from artifact_data via Item_factory and then the one loaded by itype_artifact_foo::deserialize(). These don't have the same JSON format as it appears nobody has kept the code in-sync.

I'd like to move to just one JSON loading function - can support for existing artifacts be lossy? The payoff here is removal of a lot of ugly code and a true system for runtime item types.

EDIT: I gave up on this but might implement it further in a future PR

@kevingranade
Copy link
Member

kevingranade commented Dec 7, 2016 via email

@Coolthulhu
Copy link
Contributor Author

The second does not reallocate it's contents on insertion.

Is that actually guaranteed? Or at least guaranteed for all feasible compilers?

@kevingranade
Copy link
Member

kevingranade commented Dec 7, 2016 via email

@mugling
Copy link
Contributor

mugling commented Dec 7, 2016

Performance testing showed std::unordered_map was much faster. Why do we need two maps in this case if the reference invalidation bug doesn't eixst and more importantly have we actually solved the problem?

@mutability
Copy link
Contributor

mutability commented Dec 7, 2016 via email

@kevingranade
Copy link
Member

kevingranade commented Dec 7, 2016 via email

@kevingranade
Copy link
Member

kevingranade commented Dec 7, 2016 via email

@mugling
Copy link
Contributor

mugling commented Dec 7, 2016

Actually I was under the impression that references to unordered_map elements were invalidated on rehash, since they aren't a single unordered_map with no additional indirection should be sufficient.

This would seem to be the case

This circles back around to being an assertion, not something we should be checking in release builds on every access.

I'd prefer no assertions in release builds but I think I have the minority viewpoint amongst the other other developers.

@mutability
Copy link
Contributor

This circles back around to being an assertion, not something we should be checking in release builds on every access.

Please see e.g. #19661 as an example of why we need these in release builds; it is the release builds that are being used and if they don't trigger in those builds then we lose useful information.

@kevingranade
Copy link
Member

kevingranade commented Dec 7, 2016 via email

@kevingranade
Copy link
Member

kevingranade commented Dec 7, 2016 via email

@mutability
Copy link
Contributor

mutability commented Dec 7, 2016 via email

@kevingranade
Copy link
Member

kevingranade commented Dec 7, 2016 via email

@mutability
Copy link
Contributor

Can you explain why you want to remove the current asserts from release builds? I still do not understand that.

@mugling
Copy link
Contributor

mugling commented Dec 7, 2016

Can you explain why you want to remove the current asserts from release builds?

That's not really my goal - I'd prefer instead to encourage massive usage of assert throughout the codebase (we currently have <100 instances in a 250Kloc project) which isn't likely if those checks remain in RELEASE builds.

That said the decision we ultimately reach needs to be have broad support among all of the current developers so different opinions are equally relevant and we should aim to reach consensus.

@mutability
Copy link
Contributor

My current preference would be something like:

#define release_assert assert
#ifdef RELEASE
#  define debug_assert(x)
#else
#  define debug_assert assert
#endif

since that retains the platform-specific bits of assert (alertboxes etc) but still lets you liberally use debug_assert for more expensive checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
<Crash / Freeze> Fatal bug that results in hangs or crashes.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants