You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a lot of requests about fetching config issue. It'll help us A LOT if you give a try to fix it on your own following the doc.
If you failed to fix it yourself, tick the following boxes:
I've tried myself without success
I've replaced HOST in the issue title with the host of the URL that can't be fetched (ie: nytimes.com, 20minutes.fr, bbc.com, etc.)
Describe what's wrong:
I found an URL of the Article in web.archive.org and would like to save it. Instead I got a banner of web.archive only with no article content:
f43.me can't parse it too:
The text was updated successfully, but these errors were encountered:
This is a difficult one. I didn't find the time to give it a deeper look yet. web-archive-org is archiving many websites. And my guess is, that they are using the most of the original site's html. So we can't provide a site specific config which fits all archived websites like golem, Spiegel etc. Maybe I can find a way to strip that header and get the main content in a more or less nice view for some websites. No promises!
The <div class"golemContentoHide" is obviously from Golem and not from web-archive and for the green <!--comments--> we can't trigger the content. It must be real html entities.
And the use of JavaScript by web-archive could be tricky. So I don't know, if I could look on it next week or end of month.
But of course, it would be very nice, to be able to catch the archive generally.
Sorry I didn't find a way to snip out the content from original site. For golem it helps to set a body: //article[1] but that destroys a catch for faz.net which is fetching quite good without a config.
So that is more a feature request to the devs. Maybe with a new keyword for site_config: try_webarchive: //div[@class='old-topic']
Before submitting the issue, please read:
If wallabag can't parse / extract content for a given link, please first read the documentation about it:
http://doc.wallabag.org/en/user/errors_during_fetching.html#how-can-i-help-to-fix-that
We have a lot of requests about fetching config issue. It'll help us A LOT if you give a try to fix it on your own following the doc.
If you failed to fix it yourself, tick the following boxes:
HOST
in the issue title with the host of the URL that can't be fetched (ie:nytimes.com
,20minutes.fr
,bbc.com
, etc.)Content related:
Describe what's wrong:
I found an URL of the Article in web.archive.org and would like to save it. Instead I got a banner of web.archive only with no article content:
f43.me can't parse it too:
The text was updated successfully, but these errors were encountered: