Giant Bomb database cache and updater

Joshua Marketis edited this page May 3, 2016 · 1 revision
Clone this wiki locally

Giant Bomb rate limit their API, so GWL is designed to cache data and periodically update it, rather than make live requests. The exception to this is search, which is currently always a live request.

Giant Bomb say...

We restrict the number of requests made per user/hour. We officially support 200 requests per resource, per hour. In addition, we implement velocity detection to prevent malicious use. If too many requests are made per second, you may receive temporary blocks to resources. These features are included to help keep the API healthy for all of our users. If you have a question regarding rate limiting, please comment in our API Developer forum. It is suggested that you cache responses in your app to prevent duplicated requests from making unique requests.

On page load of a game page, the site checks if the game exists in the database. If it does, the page loads from the cached data, if it doesn't a request is made to Giant Bomb's Game API resource and the data is cached in the database.

Update

The updater requests 100 games from the Games API resource and logs the returned json into the apiLog table. This is run every 5 minutes (12 times an hour, 288 times a day). At the time of writing there are 49,361 games in the Giant Bomb database, requiring 494 requests to update every game. This means the entire database is updated every 2 days. This process could be run much faster under the current rate limits, but we prefer to be good citizens.

This task can be scheduled in crontab using the provided python script.

Crontab: 1*/5 * * * * cd /var/www/html/gwl/cron/ && python3 updateGameCache.py >> output_logs.op

Process

The processor runs every minute, taking the oldest entry in the apiLog and processing it, updating and adding games it finds in the json returned by the API. In addition to processing the logs added by the updater, this task also processes the results of searches done by site users. Once the API response has been processed, the json result is deleted from the database.

This task can be scheduled in crontab using the provided python script.

Crontab: * * * * * cd /var/www/html/gwl/cron/ && python3 processAPILog.py >> output_logs.op