Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] added HeiseBridge #744

Merged
merged 8 commits into from
Mar 23, 2019
Merged

[WIP] added HeiseBridge #744

merged 8 commits into from
Mar 23, 2019

Conversation

Dreckiger-Dan
Copy link
Contributor

@Dreckiger-Dan Dreckiger-Dan commented Jul 12, 2018

The bridge expands the RSS-feed of the german news site www.heise.de

Running this on my server throws a PHP memory exception. I was too lazy to change the memory limit so I just did it in the code. (Commented out in this PR). Is there anything I should do to try to catch this exception or something like that?

I did this for personal use and I'm in no way a PHP-expert. So, if you have suggestions on how to improve the code please leave a comment and I will try to implement it.

TODO:

  • remove more ads
  • support multi-page articles
  • support reviews

@Dreckiger-Dan Dreckiger-Dan changed the title added HeiseBridge [WIP] added HeiseBridge Jul 13, 2018
Copy link
Member

@logmanoriginal logmanoriginal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice bridge, I like it a lot. 👍

Find below a few suggestions

protected function parseItem($feedItem) {
$item = parent::parseItem($feedItem);

$article = getSimpleHTMLDOMCached($item['uri']) or returnServerError('Could not open article: ' . $url);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think $url works in this context, as that variable is never initialized. $item['uri'] should work

}
$article = $article->find('div.article-content', 0);

$item['author'] = $author;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is no other place where $author is used, you can assign the value directly (without variable).

@logmanoriginal
Copy link
Member

Is there anything I should do to try to catch this exception or something like that?

Definitely yes. Currently the bridge throws an error on each request, which is not desirable. You can reduce the likelihood of it happening by adding a user limit for the number of articles to return (default should be a reasonable value like 3 or 5).

The way this needs to be done is a bit awkward (because you need to know the inner workings of RSS-Bridge):

1) Introduce a parameter to specify the optional user limit

'limit' => array(
	'name' => 'Limit',
	'type' => 'number',
	'required' => false,
	'title' => 'Specify number of full articles to return',
	'defaultValue' => 5
)

Notice: The limit is not required, so you should handle situations where it isn't set (use default value).

2) Check if enough items have been added (only add titles for the rest)

This is how your parseItem function could implement the user limit. In this example all titles get returned, but articles are only loaded until the limit is reached (at ... follows your remaining code)

protected function parseItem($feedItem) {
	$item = parent::parseItem($feedItem);

	$limit = $this->getInput('limit') ?: 5;

	if(count($this->items) >= $limit) {
		return $item;
	}
...

Notice: If the limit is not set, the default limit (5) is used. You could use a constant to make sure the same limits are used in the parameters and the code.

This is what I see for a limit of 1:

image

@logmanoriginal
Copy link
Member

remove more ads
support reviews

Request: Support multi-page articles 😁

Example: https://www.heise.de/developer/artikel/LaTeX3-Neue-Mechanismen-fuer-die-naechste-Generation-von-Dokumenten-4108699.html


You can load the print preview instead of the regular article to get rid of ads (and load multi-page articles in one go!). This should work on all articles:

https://www.heise.de/developer/artikel/LaTeX3-Neue-Mechanismen-fuer-die-naechste-Generation-von-Dokumenten-4108699.html

https://www.heise.de/developer/artikel/LaTeX3-Neue-Mechanismen-fuer-die-naechste-Generation-von-Dokumenten-4108699.html?view=print

Just add ?view=print when loading the contents

@logmanoriginal
Copy link
Member

I rebased on master and made some necessary changes to receive contents (pagination and no-ads). This works for the most part. Instead of filtering the stuff that isn't needed, it now only takes elements that are of interest. There is of course a chance that something is still missing, but that can be added later anyway.

@logmanoriginal logmanoriginal merged commit 281eaac into RSS-Bridge:master Mar 23, 2019
@logmanoriginal
Copy link
Member

The current bridge is now part of the repository. If you are still interested in adding support for reviews, please open a new PR.

infominer33 pushed a commit to web-work-tools/rss-bridge that referenced this pull request Apr 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants