Feature/save bulk transactions #827

RekkasGit · 2019-03-08T14:52:39Z

Note, I am sorry about this also including the 'stuck' characters but I forgot to make a branch before the pull request and the PR is on master and didn't want to mess it up. Can strip that feature out if need be.

For this feature however: I have enabled multi results in mysql connection and changed the Save method in client.cpp to bulk up all operations into one query.

In very small benchmarks this is over 2x faster and I believe under load it will be much higher. Right now my testing is on a zero load and its already 2x faster.
Example:
ZoneDatabase::SaveCharacterData_OldWay 685273, done... Took 0.013000 seconds
ZoneDatabase::SaveCharacterData_NewWay 685273, done... Took 0.005000 seconds

Note, two indexes need to be added to the pet tables, as part of the update hits these columns and they are doing full table scans.

ALTER TABLE character_pet_inventory
ADD INDEX char_idIndex (char_id);

ALTER TABLE character_pet_info
ADD INDEX char_idIndex (char_id);

Note, this only a stop gap measure, and while it limits "fsyncs" during transactions, this may have to be rethought out. IMO a distruptor ring buffer, and a 1 sec heartbeat is a better way to handle this. Publish all bulk events to the ring buffer which are serialized and appened to queue per character. either the 1 sec event or the size of the buffer will 'flush' the data to mysql. If multiple saves happen (For whatever reason simply camping a character causes 3x saves!) during that period, last character save wins. So instead of 3x hitting the database only 1x does.

…cations.

…to bulk up all operations into one query. In very small benchmarks this is over 2x faster and I am very confident under load it will be much higher. Right now my testing is on zero load and its already 2x faster. Example: ZoneDatabase::SaveCharacterData_OldWay 685273, done... Took 0.013000 seconds ZoneDatabase::SaveCharacterData_NewWay 685273, done... Took 0.005000 seconds Note, two indexes need to be added to the pet tables, as part of the update hits these columns and they are doing full table scans. ALTER TABLE `character_pet_inventory` ADD INDEX `char_idIndex` (`char_id`); ALTER TABLE `character_pet_info` ADD INDEX `char_idIndex` (`char_id`);

mackal · 2019-03-08T19:43:00Z

zone/client.cpp

@@ -606,10 +607,118 @@ void Client::RemoveExpendedAA(int aa_id)
 	database.QueryDatabase(StringFormat("DELETE from `character_alternate_abilities` WHERE `id` = %d and `aa_id` = %d", character_id, aa_id));
 }

+//bool Client::Save(uint8 iCommitNow) {


Don't do this. This code is saved in the version control software :P

Ahh I left that there for ease of changing back and forth between the old/new. I was assuming before this would even be accepted I would do a full cleanup. First experience of a PR to this project.

mackal · 2019-03-08T19:45:18Z

zone/tasks.cpp

@@ -280,7 +280,138 @@ bool TaskManager::LoadTasks(int singleTask)

 	return true;
 }
+std::string TaskManager::SaveClientStateQuery(Client *c, ClientTaskState *state)


Ignoring the fact VS lets this compile for some reason. This is also written in a way that may possibly fail NRVO and since this is a performance focused commit, that is a very relevant point :P

hopefully this is resolved with pass by ref

mackal · 2019-03-08T19:47:01Z

From a performance standpoint, I think it might be better for all of these to take an out parameter taken by reference. This would allow us to just continue to grow a single std::string object instead of having a bunch of independently growing std::string objects that ultimately get combined into a single std::string.

RekkasGit · 2019-03-08T20:05:37Z

From a performance standpoint, I think it might be better for all of these to take an out parameter taken by reference. This would allow us to just continue to grow a single std::string object instead of having a bunch of independently growing std::string objects that ultimately get combined into a single std::string.

I fully agree and was already thinking about that. Think of this as an initial rough draft / Proof of concept. I wrote most of this last night.

I will have edits up shortly.

Akkadius · 2019-03-08T22:01:14Z

@livermana, thank you so much for spending the time and putting together a PR, we always appreciate folks who take initiative to contribute to the project

There are quite a few things I want to highlight about this PR that definitely won't make it to mainline for the following reasons

We don't want to enable multiple statements per query, this leaves our non-parameterized statements open to SQL injection and is a security risk
You don't need multiple statements in one query send to perform a transaction, we have methods to start transactions and commit them with examples in the code for where it makes sense
Multiple statements can be wrapped inside said transaction to reduce the overall parsing and execution time because the server just does it all in one fell atomic swoop
EQEmu is not database heavy at all, there was once upon a time where we did all kinds of stupid things but 100's of hours have been poured over reducing our performance bottlenecks across the board in departments of CPU, I/O and even Network. To illustrate here is PEQ: http://peq.akkadius.com:19999/#menu_disk_submenu_sda;theme=slate;help=true
Don't mix two separate PR's together

On the Performance Standpoint

Since I'm assuming this is related to your chiming in on the lag problems thread I will just again restate the following

With PEQ at 800-1000 toons on the daily, we barely break 100 IOPS a second, with barely 1MB/s in writes with minimal spikes here and there. That is virtually nothing

Also with your benchmark, the current stock code on some middle of the line server hardware produces the following timings

[Debug] ZoneDatabase::SaveCharacterData 1, done... Took 0.000107 seconds
[Debug] ZoneDatabase::SaveCharacterData 1, done... Took 0.000091 second

This is a sub 1ms operation that is not going to hurt even when it is synchronous

These operations also happen very infrequently where it is not going to even matter.

There are many many factors that play into the overall performance of a server and since the server is essentially an infinite loop, anything within that loop can influence the amount of time that a CPU is not spent idling (Network, I/O, overly CPU intensive operations etc.). Hardware is of influence, your MySQL server configuration is of influence and most of all software is of influence

EQEmu used to be way way way more resource intensive and we've come along way to where that is not even an issue anymore. We have one outstanding bug that is isolated to the networking layer that made its way through because we never saw it on PEQ during our normal QA routines

We are currently working on the code to measure application layer network stats so folks can give us dump metrics off of so we can give a proper fix. We've seen what happens during the network anomaly during a CPU profile and there's not much that it is going to show alone but where it is spending most of its time.

We folks at EQEmu definitely have jobs that have been a deterrer from resolving said bug but we will have it on lockdown soon enough as we know exactly what we need to do, the only thing in our way is time as a resource

As far as your other stuck PR that could definitely be useful, but I defer to @KimLS to review that PR

As far as this PR is concerned though, the footprint of these queries are so incredibly small, it is not worth trying to optimize further unless you wanted to wrap a handful of saves in two transaction begin and end calls but that is about it

RekkasGit · 2019-03-08T22:43:48Z

All well thought out and detailed reply, though a touch of sting in it.
I would suggest still adding the indexes to the two tables.

We don't want to enable multiple statements per query, this leaves our non-parameterized statements open to SQL injection and is a security risk

This was a new method for QueryMulti, other statements were not impacted (though I apologize If I am wrong in this). I checked that the statements in question were not parameterized, just raw SQL send down the stack.

You don't need multiple statements in one query send to perform a transaction, we have methods to start transactions and commit them with examples in the code for where it makes sense.

This was very much understood, just trying to limit the network hits. No reason to do multiple network hits when you can do one. Between crossing kernel boundaries , network switches, etc.

EQEmu is not database heavy at all, there was once upon a time where we did all kinds of stupid things but 100's of hours have been poured over reducing our performance bottlenecks across the board in departments of CPU, I/O and even Network. To illustrate here is PEQ: http://peq.akkadius.com:19999/#menu_disk_submenu_sda;theme=slate;help=true

Thank you for the charts, very nice.

5 Don't mix two separate PR's together

This was stated in the PR itself that it was an issue and I would resolve it. No need to bring that back up i would think.

Ultimately I think I steeped into a sore point and apologize for any interference on my part. This was simply created last night as I saw a post bout disabling some of the save queries helped with their lag, so I looked into said function. Saw the missing indexes, saw the multiple transactions being done (which is usually a bad practice for throughput and inconsistent data) and saw the lock on a single database connection.
So i tried a proof of concept, saw a marked improvement (database was seperate of the eqemu instance), and data consistency achieved. Thus I made the PR to see if it could help out.

Akkadius · 2019-03-08T22:57:51Z

No worries, I just wanted to bring things into context as I've spent ridiculous amounts of time with the low level aspects of our codebase and performance that it is easy to focus in on things that aren't exactly glaring performance issues

If you have indexes that are obvious that we are hitting a lot and it could save some time by all means present the case there, but I would also want to see it backed up with numbers, how often its being called, the depth of the records that are being called, benchmarks etc. Adding indexes also have the overhead of being maintained but since we don't intensively insert and when we do we insert in bulk, it wouldn't be much of an overhead to maintain new indexes you most likely would introduce as well

For example here was some work done last year where we did 50+ toon burn against raid mob with massive amounts of packets flying around and realized there is indeed some glaring performance issues at the network level where we were negligibly sending way too many packet updates. Hard numbers clearly illustrated massive benefit where the zone would choke and collapse upon itself from the sheer number of packets and trying to keep up

https://gist.github.com/Akkadius/28b7ad2fdd82bdd15ea737c68f404346

	- Before and After packet differences under similar load/tests (Packets per second)
		- 7,000 - 8,000 	OP_Animation pps	After: 600-800 pps
		- 13,0000 - 17,000 	OP_MobHealth pps	After: 1-10 pps
		- 15,0000 - 20,000 	OP_ClientUpdate pps	After: 500-1,000 pps

You can also review just about any changelog notation that has a performance tag with it to also view work done there as well

I apologize if I came off with a sting in anyway, I just have a very good understanding of where our current technical expenditures lie and have spent way way more time than I care to have spent - so folks can have a performant and stable server experience

By all means hit up our Discord channel and we are happy to discuss other things as well if you are thinking of certain things to improve upon as well or even open a Github issue

RekkasGit added 2 commits February 22, 2019 19:17

fix for mob movement getting 'stuck' when paths don't have updated lo…

7a28a6e

…cations.

mackal reviewed Mar 8, 2019

View reviewed changes

pass reference of query around, remove commented code.

5559375

Akkadius closed this Mar 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/save bulk transactions #827

Feature/save bulk transactions #827

RekkasGit commented Mar 8, 2019

mackal Mar 8, 2019

RekkasGit Mar 8, 2019

mackal Mar 8, 2019

RekkasGit Mar 8, 2019

mackal commented Mar 8, 2019

RekkasGit commented Mar 8, 2019 •

edited

Loading

Akkadius commented Mar 8, 2019

RekkasGit commented Mar 8, 2019

Akkadius commented Mar 8, 2019

Feature/save bulk transactions #827

Feature/save bulk transactions #827

Conversation

RekkasGit commented Mar 8, 2019

mackal Mar 8, 2019

Choose a reason for hiding this comment

RekkasGit Mar 8, 2019

Choose a reason for hiding this comment

mackal Mar 8, 2019

Choose a reason for hiding this comment

RekkasGit Mar 8, 2019

Choose a reason for hiding this comment

mackal commented Mar 8, 2019

RekkasGit commented Mar 8, 2019 • edited Loading

Akkadius commented Mar 8, 2019

RekkasGit commented Mar 8, 2019

Akkadius commented Mar 8, 2019

RekkasGit commented Mar 8, 2019 •

edited

Loading