Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pcsx2: Improve GameDB handling #2567

Merged
merged 6 commits into from Aug 29, 2018

Conversation

Projects
None yet
3 participants
@turtleli
Copy link
Member

commented Aug 26, 2018

Changes:

  • Remove GameDB write code - I removed the dialog, so nothing uses this code now.
  • Simplify Game_Data member functions - The overloads were kinda unnecessary and it fixes an infinite recursion warning on FreeBSD (basically, wxWidgets 3 wxString conversion fun).
  • Simplify hashing - There's no point in making everything lowercase when the actual key is case-sensitive, and we can just use std::hash instead. It improves load time a bit (more so on Windows than on Linux) and fixes a FreeBSD compile error.
  • Remove the Blocktable - it hardly improves performance compared to directly using the unordered_map. The blocktable was only marginally better on Windows and it made no difference on Linux (and before I removed the lowercase conversion in the hash function the blocktable was more than marginally worse on Windows). Note that this slightly changes the behaviour if there are duplicate serial entries in the GameDB (current behaviour is to use only the last duplicate entry, PR behaviour is to merge the entries, with the last duplicate entry replacing any fields that are present in previous duplicate entries). Though this shouldn't matter much because we shouldn't really be keeping/introducing any duplicate GameDB entries.
  • Don't store the GameDB serial in the Game_Data key list - it's already stored in the id member variable, so there's no real need to store it again. It improves load time a little on Windows (didn't notice Linux improvements).
  • Remove HashMap from Utilities - It's now unused due to the hashing change.

On my Windows system the GameDB load time on a VS2017 release build goes from ~134ms to ~118ms (debug build went from something like ~2200-2300ms to ~1800ms). On Linux the load time goes from ~132ms to ~128ms.

turtleli added some commits Aug 26, 2018

pcsx2: Remove GameDB write related code
There's no use for it now that the database editor has been removed.

Also remove the Game_Data POD comment because I don't think it makes
sense and remove an unused variable.
pcsx2: Simplify Game_Data member functions
There's no need to have a lot of overloads accepting wxChar*, char* cand
const wxString&. Keep only the const wxString& versions and remove the
rest. This fixes an infinite recursion warning on FreeBSD.

Also simplify sectionExists and getSection to avoid unnecessary
conversion to and from wxString.
pcsx2: Simplify GameDB hashing
Converting the string to lowercase is unnecessary when the actual entry
is still case sensitive.

Also just use std::hash of std::string and std::wstring instead, which
fixes a FreeBSD compile error (cannot convert to const char*).
utilities: Remove HashMap
It's now unused.
@avih

This comment has been minimized.

Copy link
Member

commented Aug 27, 2018

Nice.

Simplify hashing - There's no point in making everything lowercase when the actual key is case-sensitive

Not sure I understand this one. Does this mean the serial (or CRC?) is now case sensitive at GameIndex.dbf ?

@turtleli

This comment has been minimized.

Copy link
Member Author

commented Aug 27, 2018

The serial is case sensitive, but it's like that on master as well.

@turtleli

This comment has been minimized.

Copy link
Member Author

commented Aug 27, 2018

I should probably provide a better explanation.

Essentially, I have no clue why it's converting the string to lowercase in the hash function. It increases the hashing time but I don't see what for. If the intention was to make the key lookup case-insensitive then no, it doesn't do that.

@avih

This comment has been minimized.

Copy link
Member

commented Aug 27, 2018

Please be patient with me as I don't recall the details.

A hash is usually used when you want a fast access to some already-stored data by some key.

I'm guessing the stored data is the GameIndex.dbf items and the keys are the serials, yes?

When is it being accessed? and where is the access key taken from while accessing it by key?

@turtleli

This comment has been minimized.

Copy link
Member Author

commented Aug 27, 2018

I'm guessing the stored data is the GameIndex.dbf items and the keys are the serials, yes?

Yes.

When is it being accessed? and where is the access key taken from while accessing it by key?

The data is accessed when loading a game. The key is provided by the GameDB file when creating the database or from the disc serial (ELF filename modified into serial form) when loading a game.

Actually, now that I think about it, what was probably missing was a case insensitive equality operator like

struct StringCompareNoCase
{
    bool operator()(const wxString &lhs, const wxString &rhs) const
    {
        return lhs.CmpNoCase(rhs) == 0;
    }
}

which actually would have required a case insensitive hash. I assume that's what was intended, otherwise the lowercase really makes no sense.

@avih

This comment has been minimized.

Copy link
Member

commented Aug 27, 2018

The data is accessed when loading a game

Right, so performance isn't really a concern here, like it would have been, e.g. if it was tested every frame at GSdx (and even there it'd be completely negligible to convert to lower case, but that'd be at least at a more performance critical path).

Of course, unnecessary code should be dropped even if it's fast.

The key is provided by the GameDB file when creating the database or from the disc serial

Right, so if the case is not identical between the games db serial (which is entered manually at GameIndex.dbf) and the generated serial then there will be no match.

Obviously if only one of them is converted to lower case then it's no good, and it seems the one at GameIndex.dbf was converted to lower case.

what was probably missing was a case insensitive equality operator like ..

Is it possible that the ELF-derived serial is already converted to lower case elsewhere?
Otherwise, if the games db serial is converted and the ELF-derived one isn't, then there would have never been any matches... (unless the ELF-derived key is only lower-case string to begin with).

@turtleli

This comment has been minimized.

Copy link
Member Author

commented Aug 27, 2018

Right, so performance isn't really a concern here,

Yes, main thing was reducing the compile errors/warnings on FreeBSD, but I got sidetracked questioning why some of the stuff was necessary...

it seems the one at GameIndex.dbf was converted to lower case.
Is it possible that the ELF-derived serial is already converted to lower case elsewhere?
Otherwise, if the games db serial is converted and the ELF-derived one isn't, then there would have never been any matches... (unless both key sources only had lower-case string to begin with).

No. Both serials are uppercase. The hash function in master calculates the hash based on the lower case version of the serial but doesn't modify the serial itself.

The hash is used only for the first part of the lookup to find the bucket the data is in (if it exist). The key is then compared with the other keys in the bucket to find a match based on what the equality criteria is.

Converting the serial to lowercase in the hash only changes what bucket the data is accessed/stored in. If the serial comparison was case insensitive then that would make sense as it'd put all the hash for all upper/lower/mixed case variations of the serial would be the same into the same bucket and would prevent duplicate keys from being stored.

@@ -139,8 +110,10 @@ void DBLoaderHelper::ReadGames()
if (!extractMultiLine()) extract();

if (!m_keyPair.IsOk()) continue;
if (m_keyPair.CompareKey(m_gamedb.getBaseKey()))
if (m_keyPair.CompareKey(m_gamedb.getBaseKey())) {

This comment has been minimized.

Copy link
@lightningterror

lightningterror Aug 28, 2018

Member

I'd prefer if the bracket started in the next line under if. To keep it consistent with the rest of the code in the file.

Edit: Actually nvm, it's 50-50 so it doesn't matter.

@avih

This comment has been minimized.

Copy link
Member

commented Aug 28, 2018

Thanks for the explanation.

The hash function in master calculates the hash based on the lower case version

"in master" means before this patchset, yes?

Anyway, yes, that's what I meant, though I didn't realize it's used both converted and in original case:

The hash is used only for the first part of the lookup to find the bucket the data is in (if it exist). The key is then compared with the other keys in the bucket to find a match based on what the equality criteria is.

Converting the serial to lowercase in the hash only changes what bucket the data is accessed/stored in

Which I find a bit weird. I agree that if lower case is only used for locating the bucket then it's indeed useless.

I also assume that all the serials both at GameIndex.dbf and the ones derived from the ELF are all upper case, though that might require some closer inspection, at least of the DB.

Is there any reason not to treat all serials as case-less (specifically convert all to lower case both for the task of choosing a bucket and for the final comparison)?

It seems to me that completely ignoring the case would be more resilient to user errors at the DB, and I find it extremely hard to believe that two serials which differ only by case actually refer to genuinely different disks, right?

And who knows, it might fix some lower-case serial at the DB which were never matched till now ;)

@turtleli

This comment has been minimized.

Copy link
Member Author

commented Aug 28, 2018

"in master" means before this patchset, yes?

Yes.

Is there any reason not to treat all serials as case-less (specifically convert all to lower case both for the task of choosing a bucket and for the final comparison)?

It seems to me that completely ignoring the case would be more resilient to user errors at the DB, and I find it extremely hard to believe that two serials which differ only by case actually refer to genuinely different disks, right?

There is a good reason. I would have to redo my commits. ;)

Nah, it's probably fine to make it case insensitive unless someone else knows a reason for why it should be case sensitive. I'll update the PR soon(ish).

@avih

This comment has been minimized.

Copy link
Member

commented Aug 28, 2018

There is a good reason. I would have to redo my commits

Actually, I also find it a good reason. It also keeps the code simpler, and doesn't take away any robustness (which the original code didn't have anyway).

I'd be fine with just taking the serials as is everywhere.

I'm guessing that if someone adds a serial to the DB, it's their responsibility to make sure that it's actually recognized when running the game disk. It worked so far ;)

@turtleli

This comment has been minimized.

Copy link
Member Author

commented Aug 28, 2018

Well, it's not really much more complex, but yeah, I guess having people make sure any serials they add are in uppercase shouldn't be asking too much of them.

Update aborted. ;)

@avih

This comment has been minimized.

Copy link
Member

commented Aug 28, 2018

Now push, push!

turtleli added some commits Aug 26, 2018

pcsx2: Remove GameDB blocktable
It doesn't provides much of a performance improvement over directly
using an unordered map. This change also means that if there are
duplicate GameDB entries then they'll be merged together instead of
having only the last entry take effect.

Also increase the unordered map reserve size.
pcsx2: Don't store the GameDB serial in the key list
It's already available in the id field, so storing it again is a waste
of space and CPU cycles.

@turtleli turtleli force-pushed the turtleli:improve-gamedb-handling branch from c9f7686 to 10740b2 Aug 29, 2018

@turtleli

This comment has been minimized.

Copy link
Member Author

commented Aug 29, 2018

Updated to increase minimum bucket count. No other changes.

Will merge after buildbots are done.

@turtleli turtleli merged commit a922a0b into PCSX2:master Aug 29, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@turtleli turtleli deleted the turtleli:improve-gamedb-handling branch Aug 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.