-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/save bulk transactions #827
Conversation
…to bulk up all operations into one query. In very small benchmarks this is over 2x faster and I am very confident under load it will be much higher. Right now my testing is on zero load and its already 2x faster. Example: ZoneDatabase::SaveCharacterData_OldWay 685273, done... Took 0.013000 seconds ZoneDatabase::SaveCharacterData_NewWay 685273, done... Took 0.005000 seconds Note, two indexes need to be added to the pet tables, as part of the update hits these columns and they are doing full table scans. ALTER TABLE `character_pet_inventory` ADD INDEX `char_idIndex` (`char_id`); ALTER TABLE `character_pet_info` ADD INDEX `char_idIndex` (`char_id`);
zone/client.cpp
Outdated
@@ -606,10 +607,118 @@ void Client::RemoveExpendedAA(int aa_id) | |||
database.QueryDatabase(StringFormat("DELETE from `character_alternate_abilities` WHERE `id` = %d and `aa_id` = %d", character_id, aa_id)); | |||
} | |||
|
|||
//bool Client::Save(uint8 iCommitNow) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't do this. This code is saved in the version control software :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I left that there for ease of changing back and forth between the old/new. I was assuming before this would even be accepted I would do a full cleanup. First experience of a PR to this project.
zone/tasks.cpp
Outdated
@@ -280,7 +280,138 @@ bool TaskManager::LoadTasks(int singleTask) | |||
|
|||
return true; | |||
} | |||
std::string TaskManager::SaveClientStateQuery(Client *c, ClientTaskState *state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring the fact VS lets this compile for some reason. This is also written in a way that may possibly fail NRVO and since this is a performance focused commit, that is a very relevant point :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hopefully this is resolved with pass by ref
From a performance standpoint, I think it might be better for all of these to take an out parameter taken by reference. This would allow us to just continue to grow a single std::string object instead of having a bunch of independently growing std::string objects that ultimately get combined into a single std::string. |
I fully agree and was already thinking about that. Think of this as an initial rough draft / Proof of concept. I wrote most of this last night. I will have edits up shortly. |
@livermana, thank you so much for spending the time and putting together a PR, we always appreciate folks who take initiative to contribute to the project There are quite a few things I want to highlight about this PR that definitely won't make it to mainline for the following reasons
On the Performance Standpoint Since I'm assuming this is related to your chiming in on the lag problems thread I will just again restate the following With PEQ at 800-1000 toons on the daily, we barely break 100 IOPS a second, with barely 1MB/s in writes with minimal spikes here and there. That is virtually nothing Also with your benchmark, the current stock code on some middle of the line server hardware produces the following timings
This is a sub 1ms operation that is not going to hurt even when it is synchronous These operations also happen very infrequently where it is not going to even matter. There are many many factors that play into the overall performance of a server and since the server is essentially an infinite loop, anything within that loop can influence the amount of time that a CPU is not spent idling (Network, I/O, overly CPU intensive operations etc.). Hardware is of influence, your MySQL server configuration is of influence and most of all software is of influence EQEmu used to be way way way more resource intensive and we've come along way to where that is not even an issue anymore. We have one outstanding bug that is isolated to the networking layer that made its way through because we never saw it on PEQ during our normal QA routines We are currently working on the code to measure application layer network stats so folks can give us dump metrics off of so we can give a proper fix. We've seen what happens during the network anomaly during a CPU profile and there's not much that it is going to show alone but where it is spending most of its time. We folks at EQEmu definitely have jobs that have been a deterrer from resolving said bug but we will have it on lockdown soon enough as we know exactly what we need to do, the only thing in our way is time as a resource As far as your other stuck PR that could definitely be useful, but I defer to @KimLS to review that PR As far as this PR is concerned though, the footprint of these queries are so incredibly small, it is not worth trying to optimize further unless you wanted to wrap a handful of saves in two transaction begin and end calls but that is about it |
All well thought out and detailed reply, though a touch of sting in it.
This was a new method for QueryMulti, other statements were not impacted (though I apologize If I am wrong in this). I checked that the statements in question were not parameterized, just raw SQL send down the stack.
This was very much understood, just trying to limit the network hits. No reason to do multiple network hits when you can do one. Between crossing kernel boundaries , network switches, etc.
Thank you for the charts, very nice.
This was stated in the PR itself that it was an issue and I would resolve it. No need to bring that back up i would think. Ultimately I think I steeped into a sore point and apologize for any interference on my part. This was simply created last night as I saw a post bout disabling some of the save queries helped with their lag, so I looked into said function. Saw the missing indexes, saw the multiple transactions being done (which is usually a bad practice for throughput and inconsistent data) and saw the lock on a single database connection. |
No worries, I just wanted to bring things into context as I've spent ridiculous amounts of time with the low level aspects of our codebase and performance that it is easy to focus in on things that aren't exactly glaring performance issues If you have indexes that are obvious that we are hitting a lot and it could save some time by all means present the case there, but I would also want to see it backed up with numbers, how often its being called, the depth of the records that are being called, benchmarks etc. Adding indexes also have the overhead of being maintained but since we don't intensively insert and when we do we insert in bulk, it wouldn't be much of an overhead to maintain new indexes you most likely would introduce as well For example here was some work done last year where we did 50+ toon burn against raid mob with massive amounts of packets flying around and realized there is indeed some glaring performance issues at the network level where we were negligibly sending way too many packet updates. Hard numbers clearly illustrated massive benefit where the zone would choke and collapse upon itself from the sheer number of packets and trying to keep up https://gist.github.com/Akkadius/28b7ad2fdd82bdd15ea737c68f404346
You can also review just about any changelog notation that has a I apologize if I came off with a sting in anyway, I just have a very good understanding of where our current technical expenditures lie and have spent way way more time than I care to have spent - so folks can have a performant and stable server experience By all means hit up our Discord channel and we are happy to discuss other things as well if you are thinking of certain things to improve upon as well or even open a Github issue |
Note, I am sorry about this also including the 'stuck' characters but I forgot to make a branch before the pull request and the PR is on master and didn't want to mess it up. Can strip that feature out if need be.
For this feature however: I have enabled multi results in mysql connection and changed the Save method in client.cpp to bulk up all operations into one query.
In very small benchmarks this is over 2x faster and I believe under load it will be much higher. Right now my testing is on a zero load and its already 2x faster.
Example:
ZoneDatabase::SaveCharacterData_OldWay 685273, done... Took 0.013000 seconds
ZoneDatabase::SaveCharacterData_NewWay 685273, done... Took 0.005000 seconds
Note, two indexes need to be added to the pet tables, as part of the update hits these columns and they are doing full table scans.
ALTER TABLE
character_pet_inventory
ADD INDEX
char_idIndex
(char_id
);ALTER TABLE
character_pet_info
ADD INDEX
char_idIndex
(char_id
);Note, this only a stop gap measure, and while it limits "fsyncs" during transactions, this may have to be rethought out. IMO a distruptor ring buffer, and a 1 sec heartbeat is a better way to handle this. Publish all bulk events to the ring buffer which are serialized and appened to queue per character. either the 1 sec event or the size of the buffer will 'flush' the data to mysql. If multiple saves happen (For whatever reason simply camping a character causes 3x saves!) during that period, last character save wins. So instead of 3x hitting the database only 1x does.