Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[contents] Add server side caching for all requests (If-Modified-Since) #889

Merged
merged 1 commit into from
Nov 19, 2018
Merged

[contents] Add server side caching for all requests (If-Modified-Since) #889

merged 1 commit into from
Nov 19, 2018

Conversation

logmanoriginal
Copy link
Member

@logmanoriginal logmanoriginal commented Oct 26, 2018

This PR adds a cache for 'getContents' to '/cache/server'. All contents are cached by default (even in debug mode). If debug mode is enabled, the cached data is overwritten on each request.

In normal mode RSS-Bridge adds the 'If-Modified-Since' header with the timestamp from the previously cached data (if available) to the request.

Find more information on 'If-Modified-Since' here:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since

  • If the server responds with "304 Not Modified", the cached data is returned.
  • If the server responds with "200 OK", the received data is written to the cache (creates a new cache file if it doesn't exist yet).
  • No changes were made for all other response codes.
  • Servers that don't support the 'If-Modified-Since' header, will respond with "200 OK".

For servers that respond with "304 Not Modified", the required bandwidth will decrease and RSS-Bridge will responding faster.

Files in the cache are forcefully removed after 24 hours.

Notice: Only few servers actually do support 'If-Modified-Since'. Thus, most bridges won't be affected by this change.


I have only tested a few bridges (maybe 10) and so far only "Bastamag" and "Bundesbank" are responding with "304 Not Modified".

I did some timing for "Bundesbank" on 10 consecutive requests (debug mode enabled, but not skipping 'If-Modified-Since'). Compared to the current master it shows an improvement of two seconds (approx. 7 seconds on master, 5 seconds on this PR). Bridges that load more contents might show even better results.

Let me know if you have any suggestion for improvement.

@logmanoriginal logmanoriginal added the New-Feature This is a new feature label Oct 26, 2018
@logmanoriginal logmanoriginal self-assigned this Oct 26, 2018
@em92
Copy link
Contributor

em92 commented Nov 12, 2018

I would suggest to add $cache_opts to getContent

diff --git a/lib/contents.php b/lib/contents.php
index de36bbd..5493123 100644
--- a/lib/contents.php
+++ b/lib/contents.php
@@ -1,13 +1,26 @@
 <?php
-function getContents($url, $header = array(), $opts = array()){
+function getContents($url, $header = array(), $opts = array(), $cache_opts = array()){
+       $cache_opts = array_merge(array(
+               'path' => CACHE_DIR . '/server',
+               'purge_cache_time' => 86400, // 24 hours
+               'force_cache' => false,
+       ), $cache_opts);
+
        // Initialize cache
        $cache = Cache::create('FileCache');
-       $cache->setPath(CACHE_DIR . '/server');
-       $cache->purgeCache(86400); // 24 hours (forced)
+       $cache->setPath($cache_opts['path']);
+       $cache->purgeCache($cache_opts['purge_cache_time']);
 
        $params = [$url];
        $cache->setParameters($params);
 
+       if ($cache_opts['force_cache']) {
+               $result = $cache->loadData();
+               if (!is_null($result)) {
+                       return $result;
+               }
+       }
+
        debugMessage('Reading contents from "' . $url . '"');
 
        $ch = curl_init($url);

One of the usecases - fetch youtube videos (to get upload dates) from large unordered playlists. With combinantion of certain changes it can fully fix #647 (which is incorrectly closed, should be reopened).

Example usage (run more than 1 time to see latency difference):

<?php
ini_set('display_errors', '1');
error_reporting(E_ALL);
define('DEBUG', true);

require_once __DIR__ . '/lib/RssBridge.php';

define('CACHE_DIR', __DIR__ . '/cache');

cache::setDir(__DIR__ . '/caches/');

echo getContents("https://www.youtube.com/watch?v=3QwR8FBhq3Q", [], [], [
  'force_cache' => true,
  'path' => CACHE_DIR . '/youtube',
  'purge_cache_time' => 60*60*24*2 // 48 hours
]);

@logmanoriginal
Copy link
Member Author

Please correct me if I'm wrong. What you are asking for is basically a function similar to getSimpleHTMLDOMCached, but without the SimpleHTMLDOM (i.e. getContentsCached), right?

@em92
Copy link
Contributor

em92 commented Nov 13, 2018

Thanks for getSimpleHTMLDOMCached. Didn't know about that. Ignore my previous suggestion.

This commit adds a cache for 'getContents' to '/cache/server'. All
contents are cached by default (even in debug mode). If debug mode
is enabled, the cached data is overwritten on each request.

In normal mode RSS-Bridge adds the 'If-Modified-Since' header with
the timestamp from the previously cached data (if available) to the
request.

Find more information on 'If-Modified-Since' here:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since

If the server responds with "304 Not Modified", the cached data is
returned.

If the server responds with "200 OK", the received data is written
to the cache (creates a new cache file if it doesn't exist yet).

No changes were made for all other response codes.

Servers that don't support the 'If-Modified-Since' header, will
respond with "200 OK".

For servers that respond with "304 Not Modified", the required band-
width will decrease and RSS-Bridge will responding faster.

Files in the cache are forcefully removed after 24 hours.

Notice: Only few servers actually do support 'If-Modified-Since'.
Thus, most bridges won't be affected by this change.
@logmanoriginal logmanoriginal merged commit 7b261d1 into RSS-Bridge:master Nov 19, 2018
@em92 em92 mentioned this pull request Nov 26, 2018
@logmanoriginal logmanoriginal deleted the ModificationCaching branch November 26, 2018 16:52
infominer33 pushed a commit to web-work-tools/rss-bridge that referenced this pull request Apr 17, 2020
…e) (RSS-Bridge#889)

This commit adds a cache for 'getContents' to '/cache/server'. All
contents are cached by default (even in debug mode). If debug mode
is enabled, the cached data is overwritten on each request.

In normal mode RSS-Bridge adds the 'If-Modified-Since' header with
the timestamp from the previously cached data (if available) to the
request.

Find more information on 'If-Modified-Since' here:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since

If the server responds with "304 Not Modified", the cached data is
returned.

If the server responds with "200 OK", the received data is written
to the cache (creates a new cache file if it doesn't exist yet).

No changes were made for all other response codes.

Servers that don't support the 'If-Modified-Since' header, will
respond with "200 OK".

For servers that respond with "304 Not Modified", the required band-
width will decrease and RSS-Bridge will responding faster.

Files in the cache are forcefully removed after 24 hours.

Notice: Only few servers actually do support 'If-Modified-Since'.
Thus, most bridges won't be affected by this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
New-Feature This is a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants