Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default text selection on popular sites #6

Open
georgjaehnig opened this issue Mar 7, 2014 · 15 comments
Open

Default text selection on popular sites #6

georgjaehnig opened this issue Mar 7, 2014 · 15 comments

Comments

@georgjaehnig
Copy link

When there is no text selected, the extension could select a default text based on the current domain and a given XPath set in the extension. For instance on http://www.bbc.com/news/world-europe-26465962, the news text could be selected automatically. And on Wikipedia, the article text.

@dpash
Copy link

dpash commented Mar 7, 2014

Readability.com might be a sensible method for striping the content out. This is how OpenSpritz does it.

@ds300
Copy link
Owner

ds300 commented Mar 7, 2014

readability.com would be nice, but I don't like the idea of waiting on external API calls, and I've noticed OpenSpritz can sometimes take several seconds to load. After all, 'jetzt' means 'now' in german.

I propose the following:

  • a function which takes a dom node and compiles its content to jetzt instructions, similarly to how I've done it for plain strings.
  • a function for best-guessing the parent node of an article by, e.g. finding the node with the most <p> children.
  • a map from url patterns to xpath/css selector/node extraction functions, for popular sites where the best-guess doesn't work well enough.

Thoughts?

@Gyran
Copy link
Contributor

Gyran commented Mar 7, 2014

Or when the user starts jetz without any text selected, the element the mouse is hoovering can light up and if the user clicks again that text will be read (like the inspect element function).

@rtuin
Copy link

rtuin commented Mar 7, 2014

Great idea Gyran!

Re; using an external service for this: one reason i prefer Jetzt over OpenSpritz is that Jetzt can read local documents. Due to how OpenSpritz works it can only read publicly available documents/pages.

@Anahkiasen
Copy link

👍

@georgjaehnig
Copy link
Author

I like the current solution pretty much that allows the reader to select a text block with the mouse.

However, there's a little bug: It seems, that HTML comments are included in this automatic selection. See http://www.spiegel.de/politik/deutschland/krim-krise-ex-kanzler-gerhard-schroeder-kritisiert-eu-a-957728.html as an example. When selecting the whole article text, after jetzting the first paragraph, an HTML comment is jetzted.

(But BTW: Really great work, this extension!!)

@ds300
Copy link
Owner

ds300 commented Mar 9, 2014

Oh man that's annoying. Thanks for pointing it out though. Just further motivation to get a proper dom parser on the go :)

@h0ru5
Copy link
Collaborator

h0ru5 commented Mar 9, 2014

@georgjaehnig the comments that appear in that page example are inside script tags, so this seems to be the same issue as #29 - I confirmed that this works with the PR in #31

@ecsplendid
Copy link

I appreciate that the readability API can take several seconds, and also I noticed on OpenSpritz it doesn't even work at all on Guardian articles. That said, it would be an excellent additional feature. The default alt-s behaviour can be slightly clunky depending on the underlying HTML structure and also might pick up images and their captions in the middle which might not make sense. It's also an annoying extra step which in my opinion adds little value over simply selecting the text manually. When you are on a website and press alt-r, it could query the Readability API and at least take a best guess.

@h0ru5
Copy link
Collaborator

h0ru5 commented Mar 10, 2014

I think this could be a way to go: https://github.com/fb55/readabilitySAX
fb55 offers a readability port that can be used inside the browser

@ecsplendid
Copy link

^ h0ru5 has the best idea :>

@peteruithoven
Copy link
Collaborator

I agree with h0ru5. Other reasons: privacy and offline usage and probably speed.
Here are some other js based readabilty scripts I posted earlier:
#26 (comment)

@h0ru5
Copy link
Collaborator

h0ru5 commented Mar 10, 2014

@peteruithoven I skimmed through the list, but I found the best starting point in the SAX-based one.
I think https://github.com/BaNzounet/readability/blob/master/src/helpers.js might be possible to use as well, they seperated the node stuff from a helpers.js in pure js

@peteruithoven
Copy link
Collaborator

Another reason not to use a external api like readability is that you can't use it on services that require a login, like mail.

@peteruithoven
Copy link
Collaborator

So it has been quite a while, but this might be an interesting development:
https://hacks.mozilla.org/2017/04/fathom-a-framework-for-understanding-web-pages/
https://mozilla.github.io/fathom/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants