New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New TheGamesDB API support for the scraper #520

Open
wants to merge 8 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@acrummyidea
Copy link

acrummyidea commented Jan 10, 2019

Adding new TheGamesDB JSON API scraper, addressing #504 an #516 .
The scraper has all functionality the old scraper had (search by ID search by name, result list etc).
The new api does not directly return the name of publishers and game developers instead it relies on a separate call to get the full list. To avoid downloading such lists I have included two maps containing the relevant information.
To handle JSON responses I have used RapidJson which is now a dependency for the build (I have updated CMake configura files to this end). RapidJson is a headers only library which simplifies things (no new runtime dependencies)
The changes were significant enough that I have created a new class and removed the old scraper to avoid potential reviewers to have to deal with an unwieldy diff.

@acrummyidea acrummyidea changed the title Json scraper New TheGamesDB API support for the scraper Jan 10, 2019

@5schatten

This comment has been minimized.

Copy link

5schatten commented Jan 10, 2019

I struggled a bit to apply this PR as .patch file but once I've forked the repo & merged the PR it works quite fine for me. Tried to scrape some games (Doom / Doom 2 /Quake), could search for custom names (Doom II instead of Doom 2) got the metadata & artwork so I guess it should work.

@acrummyidea

This comment has been minimized.

Copy link

acrummyidea commented Jan 10, 2019

Thanks for testing it!

@5schatten

This comment has been minimized.

Copy link

5schatten commented Jan 10, 2019

No problem I guess we have to thank you for the patch 👍

I was wondering if it possible to add more artwork to the gamelist? Here is an example of a gamelist created by UXS and in addition to the image it downloads an marquee too.

But I guess TGDB lacks additional media in many cases https://thegamesdb.net/game.php?id=50175 ?

@cmitu

This comment has been minimized.

Copy link

cmitu commented Jan 10, 2019

Thank you for submitting this. I've compiled the PR on a PI and did a few tests by

  • individually some ROMs, it seems to work fine.
  • batch scraping seems to be slow - first entry is scraped fine, then the next one just stalls. I'll try to do more testing for this.
  • when the game's description contains ( something ), the text between the parenthesis is replaced with spaces - see https://imgur.com/a/CxjJTjb .
    EDIT: It seems it's only happening with the game in question because of the Japanese chars - https://thegamesdb.net/game.php?id=48705, I scraped Donkey Kong (arcade) and it's not cutting the description between the parenthesis.
@joolswills

This comment has been minimized.

Copy link
Member

joolswills commented Jan 10, 2019

Thanks.

The code styling / indentation does not match that of the rest of the code base. I believe this will need more work before it can be accepted (Feedback from other devs welcome).

It may be better for us to just include the screenscraper PR for now (Which I need to review also).

@acrummyidea

This comment has been minimized.

Copy link

acrummyidea commented Jan 10, 2019

I used the default settings on clang format. I can clean it up tonight. I won't deny getting this in would be a big motivator for me.

@5schatten

This comment has been minimized.

Copy link

5schatten commented Jan 10, 2019

@cmitu @acrummyidea
IMHO it would be great if you could combine your efforts & melt your code = profit for us 👍

An overhauled built-in scraper with the ability to either scrape TGDB or screescraper would be... wait for it... legendary 🥇

@acrummyidea

This comment has been minimized.

Copy link

acrummyidea commented Jan 10, 2019

It's very easy to do since the current code supports multiple scrapers. The PRs can be basically merged sequentially

@cmitu

This comment has been minimized.

Copy link

cmitu commented Jan 10, 2019

@5schatten as @acrummyidea said, you can have both and select between them, ES already supports this.

@5schatten

This comment has been minimized.

Copy link

5schatten commented Jan 10, 2019

Ofc ES supports multiple sources if they are available. But I can just select TGDB atm & as I said earlier it would be great if we could not only scrape some information & an image but also the marquee and maybe the video preview too if it's available.

@cmitu

This comment has been minimized.

Copy link

cmitu commented Jan 10, 2019

@acrummyidea I've re-tested a bit the batch scraping (scraping a whole system at once) and it manages to finish, it's just that after getting the first 2-3 entries, the next ones seem to be retrieved slowly (a couple minutes ?). The actual info comes into screen (metadata plus thumbnail), it looks like the image download & saving happens slowly before advancing to the next entry. It might be some throttling on the server side though.
All tests were on a stock RetroPie install, Raspberry Pi 3B.

@acrummyidea

This comment has been minimized.

Copy link

acrummyidea commented Jan 10, 2019

@cmitu I suspect throttling. I have reloaded all my ROMs to test but in interactive mode. Did you do it disabling interactive?
@5schatten I'll take a look at what else I can fill in in the gameslist. The original scraper was only getting a thumbnail and boxart. The API gives a list of boxart and fanart per game. If there's room in the ES data structure I can add them.

@acrummyidea

This comment has been minimized.

Copy link

acrummyidea commented Jan 11, 2019

@joolswills I have changed the code to adopt the project's coding style. It should be good to go now
I have prepared a clang-format config as close as I could get it to the style I have seen in the .cpp files in the project. I prepared a gist with it: https://gist.github.com/acrummyidea/0cc539b72459d9b5186e57781dfd50cf
I have used clang-format 7 (the stock clang-format in raspbian is older).
@cmitu the config file may be useful to format you PR as well :)
Also regarding the performance I have tried scraping a whole platform without user input and although it's not fast it did not hang, taking between 1 and 2 seconds per game. I'm pretty sure that TheGamesDB throttles accesses with the public API key.
Down the line we should/could add parallel requests in the scraper to speed things up.

@acrummyidea

This comment has been minimized.

Copy link

acrummyidea commented Jan 11, 2019

@cmitu the issue with the missing text in parenthesis is a UI font issue, I tried the very same rom and it work for me (does show Japanese characters). I checked the corresponding gamelist.xml and it’s also ok.

@cmitu

This comment has been minimized.

Copy link

cmitu commented Jan 11, 2019

@acrummyidea Yes, I switched the font and the chars are shown, so it's not a scraper problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment