feat: proactive weather IV switching#295
Conversation
|
You still ignore our concerns about load? RM could simply consider weather change on visualization.. and poracle already handles this successfully... What's the reason behind this? It's good you added a toggle for DB... But not sure if that feature is the correct place in general... Metrics will tell :) |
| } | ||
| if scanParameters.ProcessPokemon { | ||
| decoder.UpdatePokemonBatch(ctx, dbDetails, scanParameters, newWildPokemon, newNearbyPokemon, newMapPokemon, decodedGmo.ClientWeather, protoData.Account) | ||
| if scanParameters.ProcessWeather && scanParameters.ProactiveIVSwitching { |
There was a problem hiding this comment.
Why is this handled as scan Parameter, not just as config check?
There was a problem hiding this comment.
So that it could be configured per area?
|
Someone send me data so I can collect performance metric. :) |
|
Also yes IV prediction is available sometimes, and the data is only available in Golbat, not in ReactMap. |
|
Some preliminary test data. 👀 |
|
Some thoughts (without full review):
I remain unconvinced of the value of this given that new encounters will be along in short order with any reasonable scanning strategy and most of the pokemon impacted are total junk. You state that IV can be predicted - under what circumstances and what proportion of pokemon? We need to work out how the large number of internal locks that will be taken out reduces concurrency of processing of the pokemon which will be being delivered by the scanner (the very same ones that are being locked will be the ones we see in early GMOs and be receiving encounters for). So the timings for the change aren't enough to understand the impact - it's the overall flow through the system that we need to find some measure for. I'm not quite sure (perhaps we can measure the time for processing of GMOs and encounters to complete and see if there is a statistical impact) - I think this is already logged. Clearly this is going to be hard to see on lightly loaded systems so we'll need to consider the load volume required to gather evidence. I also have some concern about the webhook volume generated. |
|
Fixed an important bug. Now the performance is like: @jfberry We can certainly look into having a single masterfile if we are to merge this. Also while ohbemgo has a local file cache, it only writes to it and never reads from it. (I copied this implementation from ohbemgo.) As you can see, the db updates are only about 0-5 rows per cell changes on average (the last number). These are also exactly the portions that IVs are non-trivially predicted (usually due to it being Ditto or something similar). The number of locked rows probably could be optimized further. |
|
Re webhooks: also fixed so that only the "cp updated" pokemon will get a webhook, so 0-5 per weather cell update. |
|
Regarding "I remain unconvinced of the value of this given that new encounters will be along in short order with any reasonable scanning strategy", @lenisko please explain why you want WatWowMap/ReactMap#1120. Are you not using a reasonable scanning strategy? |
|
After a bit more optimization: |
Since it sounded like an easy change that would notify users about outdated data. Considering the most are using routes that would land in the same place in 10 minutes, that's in worst case 10 minutes of misleading data on the map. Short answer for question without any knowledge around PR (I'm busy next week). |
the issue is not so much whether it is easy it’s more the total volume of pokemon that will have their IV stripped at the same time (on the hour) - question are therefore
obviously this the has a downstream impact on other tools but the webhook issue may have been mitigated above, I haven’t had a chance to review yet |
|
What is the most important metric you want to be measured? It sounds like these are just indirect measures that in the end do not amount to anything. If everything finishes instantaneously with negligible effect on the system, who cares? "the total volume of pokemon that will have their IV stripped": depends on the weather and the current spawn pool. It could be 0% or 100% for community days. For the current spawn pool, it looks like around 10-20%. "How many Pokemon can be proactively updated": also depends on the event. As I mentioned above, for the current event (and most events) this is looking like about 0-5 Pokemon per weather cell. |
it’s a proxy to help understand the quantum of change. If db updates are entirely disallowed that mitigates a large proportion of potential problems and if webhooks are restricted to those which have a value rather than switching to ‘no IV’ that also reduces a category of contention. I am absolutely in favour of having an accurate representation of the pokemon in memory, but bulk changes have a history of causing problems and so I am seeking to proactively avoid this |
|
Db updates even when enabled (disabled by default) are also restricted to the same pokemon as webhooks (those with a CP set). The performance I'm currently seeing is better than what I expected honestly but I haven't got a lot of data. By the way, currently about 50% of the Pokemon scans are wasted because the data structure only supports queries by lat lon bounding box, and only ~50% of the area of the bounding box actually contains the weather cell (this percentage varies by location). If our in-memory data structure is instead using S2 cells then it would make this more efficient. :) |
That’s encouraging - your provisional results are also better than I expected. Database is the largest concern for obvious reasons
yes when I made my original comments about memory size it was with the assumption a cell to Pokemon mapping would be needed. Your solution sidesteps this using current data structures so then it becomes a pure perf/concurrency impact |
|
Yes, I was actually suggesting to use S2 cell data structure for the entire Pokemon tree. Now under a much higher load: Thoughts? |
When I did my (desktop) research on data structures, I considered an S2-index (and quadkey) before settling on the rtree structure. The general consensus seemed to be that the S2 structures did not react well to rapid change and suited a more static set of objects being searched. This would map to spawnpoints quite well potentially (if you accepted that you could have inactive spawnpoints present), but didn't map well to active pokemon. The other consideration was read/write concurrency - the tidwell rtree library had a copy-on-write model, which allowed scanning without blocking concurrent change. I'd be very happy for an alternative implementation to be proposed if it either gained us substantially on performance (less likely), memory, or facilitated something new (like instance sharding, simplification of interface to the extent that it became straightforward to roll out to other object types) I'd encourage this to be a separate PR unless this ultimately becomes blocked due to performance issues
What constitutes "higher"? It's hard to total this without writing a script (which I don't have time for right now), but going down this list it looks like approximately 50 cells were modified (out of how many?), resulting in perhaps 25,000 pokemon being updated although this would only issue a few hundred webhooks/database updates. My original question was around the cpu impact and whether gmo processing or encounter time that would also run concurrently saw a change in performance.
|
How many workers are sending you data? It may be that we need to understand any other constraints before trusting your readings. go 1.25 has experimental change to GC which can be activated and may give instant feedback whether they are GC peaks (as they'd be different in shape at least!) From the logs you can clearly see the impact of the concurrent changes on the hour, as they start to build up everything slows right down - I would expect users to notice if they are in reactmap, and I also suspect that the times of processing other work would be impacted. |
|
Yes they are GC peaks. They were there before weather switches; they were there when I was testing the previous PR (#244) as well. There are lots of workers sending me data apparently, not sure how many. I have at least 729 active weather cells. Probably even the biggest mapper wouldn't have as many. :) |
I'm not sure that's true, large mappers have many thousands of workers and each one could be across multiple weather cells |
|
Apparently some people have much more weather cells. 😠 y u guys map so much |
|
Got it, since I poked about it.. count me in :) |
|
Okay I added concurrency limit. From my testing for the past few days, I can no longer see any weather cell switching back and forth: the fastest update interval was 3 second and this only happened once for one cell. I think it's fine to not add a mutex since we have mutex on the pokemon row, and furthermore the updates are now also gated by timestamp. @jfberry What do you think? Would you like to review the code, or should testers start testing in production, or both simultaneously? :) |
|
Seems sensible for you to test with some real users alongside other reviews. First read through:
|
|
|
It is golbat which loads the cached copy into ohbem when the initial retrieval fails giving the cached functionality. |
|
That doesn't seem like a good design but okay. Fixed both. |
|
I argued golbat should take care of it all and just give the masterfile to ohbem in all cases (which makes sense now) but this was the way Len/Fabio agreed to divide the work. |
|
I was referring to that if the cache path is going to be passed to Gohbem then Gohbem should handle the logic of reading cache file as well instead of just writing. :) |
|
Summoning testers @lenisko @M3G4THEKING |
|
Swapped on prod. I will provide update in a day or two. |
You are still on this branch? Any negative on this change? @lenisko |
|
@Fabio1988 Been on it for a week or so, saw no issues. Switched back when bans hit. |
|
So as far as I can see, there is nothing blocking merging. It would be very useful to have this. Hit that merge button when? |

This implements proactive Pokemon IV switching upon weather changes. By default, all entries will only be updated in memory unless configured to also write to DB (in that case, only new entries with valid IV will be written).
Needs testing. Anyone? :) @lenisko @M3G4THEKING
Fixes #131.