Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoCommon:FramebufferManager: Mark cache as valid after refresh #11172

Merged
merged 2 commits into from Nov 15, 2022

Conversation

K0bin
Copy link
Contributor

@K0bin K0bin commented Oct 16, 2022

Small thing I noticed after #11098 got merged.

Otherwise we might never hit the early return if either depth or color doesnt have any active tiles.

dvessel added a commit to dvessel/dolphin that referenced this pull request Oct 24, 2022
dvessel added a commit to dvessel/dolphin that referenced this pull request Oct 25, 2022
@K0bin
Copy link
Contributor Author

K0bin commented Nov 3, 2022

Is anything blocking this? The change is fairly trivial.

@iwubcode
Copy link
Contributor

iwubcode commented Nov 3, 2022

I'd like @Pokechu22 or someone else familiar with this stuff to look it over. Basically, we never set valid to true directly, we always call PopulateEFBCache so it feels kind of "wrong" to just set it by itself here.

@Pokechu22
Copy link
Contributor

So, what I see is that PopulateEFBCache always sets valid to true, and presumably the concern is that in tiled mode, if there are no active tiles (frame_access_mask is 0 for all of them), PopulateEFBCache will never be called, so valid will not be updated, and it'll need to repeatedly check all tiles to confirm that frame_access_mask is 0. I think, at least? That does seem like a bug.

However, I'm less sure of what "valid" even means, since if one call to PopulateEFBCache sets valid to true, that affects all tiles. Thus I'm not sure if the condition (m_efb_color_cache.tiles[i].frame_access_mask != 0 && (!m_efb_color_cache.valid || !m_efb_color_cache.tiles[i].present)) makes sense, as once one tile calls PopulateEFBCache, the valid check no longer applies - if setting valid to false is supposed to be shorthand for marking all tiles as not present, that doesn't behave correctly. I think "valid" probably needs to be renamed to more clearly indicate what it actually controls.

@iwubcode
Copy link
Contributor

iwubcode commented Nov 3, 2022

I see what you're saying, a rename sounds like a good idea. I have no idea what we should call it however.

I would still hesitate to sign off on this PR until we know what the preconditions are for the valid replacement. If valid = true means we called PopulateEFBCache then that would not be true and so I would think another change would make more sense.

I think renaming valid is a good candidate for another PR.

@K0bin
Copy link
Contributor Author

K0bin commented Nov 3, 2022

I think valid is straight up redundant if tiled efb is enabled since it's only set to false in InvalidatePeekCache which also sets all tiles as missing. It only exists to make sure we don't have to iterate over all tiles over and over again.

@Pokechu22
Copy link
Contributor

Would it make sense to remove the non-tiled path entirely (instead treating the non-tiled case as one full-screen tile), and also replace valid with something like num_present (incremented/decremented as needed)?

@iwubcode
Copy link
Contributor

iwubcode commented Nov 3, 2022

with something like num_present (incremented/decremented as needed)?

Maybe active_tile_count would be more explicit?

@K0bin
Copy link
Contributor Author

K0bin commented Nov 4, 2022

active_tile_count would lead to the exact problem that this PR is trying to solve:

Calling RefreshPeekCache repeatedly when none of the efb tiles have been accessed would make it iterate over the tiles every single time.

EDIT:

I've solved it by adding a separate field for the refresh. Much clearer now.

Otherwise we might never hit the early return
if either depth or color doesnt have any active
tiles.
@iwubcode
Copy link
Contributor

iwubcode commented Nov 4, 2022

Well, I'll look at what you provided but my thinking looking at the code in master (I hadn't really looked at this much) is that we're using the wrong construct. tiles is a vector with a max size. It only has two values, something to denote present and something to get its frame_access_mask.

This feels like it'd be more appropriate to be a map of index to frame_access_mask. Then:

  • Indexing is near constant
  • The map will be small, only the available tiles will be present
  • Knowing a tile exists is as simple as checking if the map contains the key
  • No need to loop during invalidation, just call clear()

@K0bin
Copy link
Contributor Author

K0bin commented Nov 4, 2022

The problem with making it a map is that tiles that aren't present still have a frame_access_mask.

RefreshPeekCache specifically looks for those: Tiles that have been accessed in the last 8 frames (frame_access_mask != 0) but don't exist in the cache because they were invalidated (!present).

@AdmiralCurtiss
Copy link
Contributor

AdmiralCurtiss commented Nov 4, 2022

@iwubcode Just as a side note, a std::map for a small set (< a few hundred or so) of small values (single digit bytes) is often slower than just a std::vector or std::array on modern CPUs. Your performance bottleneck tends to be fetching data from RAM, and contiguous memory has much more predictable (and thus cache-friendly) access patterns than a std::map that heap-allocates its individual elements. Yes, even with the logically speaking O(n) lookup time instead of the O(log n).

You should obviously profile this on a case-by-case basis, but be careful with such ideas that look fine on paper; modern CPUs are weird!

@iwubcode
Copy link
Contributor

iwubcode commented Nov 4, 2022

Ah good point, missed that. That's unfortunate as it would have simplified things greatly but what you have works. Will be curious to hear @Pokechu22 's thoughts but I'm signing off.

@iwubcode
Copy link
Contributor

iwubcode commented Nov 4, 2022

@AdmiralCurtiss - oh is the sample size here that small? I would opt for an array (or possibly a vector) with very tiny sizes for reasons you stated. But I thought a couple hundred to 10'000 is a better case for a map. 10'000+ is a better case for unordered_map (if it's actually unordered data, the hashing can benefit larger sets but smaller sets it's not worth the cost). Obviously this is a generalization and profiling is the best way to go. I do forget about cache friendliness though. I tend to deal with larger data sets. I'll have to keep that in mind. Thanks!

Still, part of the reason I stated this was I thought the size was substantial enough. Of course, the other part was design. I ended up being wrong about both I guess :D

@AdmiralCurtiss
Copy link
Contributor

Our default is 64 tiles.

@iwubcode
Copy link
Contributor

iwubcode commented Nov 4, 2022

Ahha, I was thinking of the accuracy setting, which still isn't very large (in my head I thought 1024+). 64 is definitely tiny and I agree a map is not necessary there. Thanks @AdmiralCurtiss

Copy link
Contributor

@iwubcode iwubcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Untested.

@JMC47 JMC47 merged commit 8a1c28b into dolphin-emu:master Nov 15, 2022
11 checks passed
@K0bin K0bin deleted the efb-refresh branch November 15, 2022 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants