Skip to content

Conversation

@cdharris
Copy link
Contributor

@cdharris cdharris commented May 14, 2020

This PR is the work in progress start of prototyping functionality to support parsing and rendering 'readable' versions of a webpage which are lightweight, both to consume and to sync between devices.

STATUS:
Development has paused, due to requiring refactored annotation-InPageUI-sidebar work to support annotations here. The work done to refactor this linking will progress this issue.

_ Milestone 1, a simple reader prototype on desktop is approaching complete, there's a few more nuanced areas to investigate described below in Currently Planning (Milestone 1) before moving onto items in Currently Planning (Milestone 2) and transporting the agreed upon Milestone 2 specs here in full. _

Currently Planning (Milestone 2)

  • Investigate and plan allowing the creation of annotations from reader view.
  • How best to handle images in reader
  • Plan the extent of our indexing pipeline refactoring and when

Currently Planning (Milestone 1)

  • Prototype parsing reader view from document copy from the tab itself (triggered via sidebar or indexing pipeline)
  • Investigate using JSdom parser for consistency, using it to get tests working and profiling it against the native browser implementation, validating it for mobile use.

Milestone 1: Preparation & Prototype done

A Simple reader implementation on Desktop

Reader View on Desktop

  • In dashboard view, add a button to each result item that open it in the reader view.
  • The reader view is a full screen modal if clicked on normally.
  • The reader view can open in a new tab if ctrl+clicked.
  • The reader view displays the HTML output from readability parser.
  • The reader view applies it's own styles to the HTML.
  • Confirm we have the ability to add styles to the readable version in a way that allows all the customization we currently forsee.

Readability Parser (v1)

  • Check dom-distill works in firefox.
  • Add the chromium dom-distill readability parsing library to the extension build.
  • Using mozilla/readability now instead due to better extensibility, better build and better isolation.
  • Expose library through a background function.
  • In the process of opening the reader view, the page content is fetched and parsed for displaying.
  • A Loading screen is shown for fetching and parsing.
  • Investigate annotation support (do existing annotations continue to work in reader view). Yes annotations anchor on the reader view with the simple find text occurrence strategy.

Save Readable output from Desktop (Manually)

  • Add parsing library (via Worldbrain wrapper) to desktop build.
  • Add database schema change to be able to save the Readable text output.
  • Use a new table, and include when the version was generated, and through which parsing scheme.
  • Add storage module for new readable version table
  • Setup initial background script infrastructure for parsing the document and saving the data
  • Save readable version after parsing when opening readable version from dashboard link.
  • Reader View, load from database if exists, parse via background script if it doesn't.

@cdharris
Copy link
Contributor Author

The first simple reader prototype on desktop is approaching complete, there's a few more nuanced technical areas to investigate described above in Currently Planning (Milestone 1):

  • Prototype parsing reader view from document copy from the tab itself (triggered via sidebar or indexing pipeline)
  • Investigate using JSdom parser for consistency, using it to get tests working and profiling it against the native browser implementation, validating it for mobile use.

Then moving onto more planning items in Currently Planning (Milestone 2) and transporting the agreed upon Milestone 2 specs here in full.

  • Investigate and plan allowing the creation of annotations from reader view.
  • How best to handle images in reader
  • Plan the extent of our indexing pipeline refactoring and when

(OP status updated too)

@blackforestboi
Copy link
Member

Apart from stuff we already talked about some work on the reader yesterday, here notes in priority:

  • don't use the modal because of convoluted purpose and styles, make separate page type, and make top bar sticky
  • close reader with "esc" if nothing is selected or sidebar is not open
  • force image urls to be aboslute, not relative. Images don't load here and links don't work - but on other pages like guaardian: https://en.wikipedia.org/wiki/University_of_North_Carolina_at_Chapel_Hill
  • Make sure to show placeholder image in case it can't be loaded due to missing internet connection.
  • we cant use "dangerouslysetinnerhtml" to inject HTML
  • reader page should have its own querystring so reload is possible without losing page
  • ensure security recommendations are met: https://github.com/mozilla/readability#security
  • "report page" mechanism hooks into analytics, see new mockups
  • when reader view failed, show error. Example: https://web.archive.org/web/20030426064425/http://www.fac.unc.edu/FacilityInfo/index.asp
  • change tab title to article title: "Reader: "
  • make sure we have the ability to easily add modular parsing templates for differerent websites that users can extend.
  • add ways of adding
  • twitter.com doesn't work somehow

@poltak poltak closed this Aug 4, 2020
@poltak poltak deleted the feature/reader branch August 4, 2020 03:43
@poltak poltak restored the feature/reader branch August 4, 2020 03:44
@poltak poltak reopened this Aug 4, 2020
@poltak
Copy link
Member

poltak commented Aug 4, 2020

Sorry. I checked the wrong checkbox in my git client when removing my local branch of this, which resulted in the remote branch getting deleted too. Lucky GH has a restore branch feature :D

@Victor239
Copy link

Hi, what's the status of this? Asking because I don't notice it on the roadmap.

@blackforestboi
Copy link
Member

Hey!

Sorry we tabled this for now as we realised we were tackling too many things. No estimation when it will land back on the roadmap.

@Victor239 Victor239 mentioned this pull request Jan 20, 2021
11 tasks
@KyleFN
Copy link

KyleFN commented May 25, 2021

What does it mean that this PR is closed?

@blackforestboi
Copy link
Member

We may open it up again once we start working on it again. It has been dormant for too long.

We know its a wanted feature but we currently dont have capacity to finish it.

Right now we're working on the multi-device sync and cloud access. As part of that we'll also prepare a document storage that would enable saving, syncing and viewing (reader) archives in a more robust way.

@KyleFN
Copy link

KyleFN commented May 25, 2021

In all honesty Memex feels like a bait and switch from all it's previous promises so I'm not holding out hope for this feature anymore.

Initially it was all about offline-first, end-to-end encryption and being an actual memex (AKA store of all previous browsing history), but now it's just about social sharing of select snippets.

I don't see why I would use this when Hypothesis is much better for social annotations, if I even cared for that feature. But more to the point I don't see why you've abandoned your USP to become the same sort of social media company as others instead of an actually useful and private productivity tool.

@blackforestboi
Copy link
Member

Sorry folks to disappoint.

As you can see from my previous message from January 20, stopping this work for now was not news.
We're a young startup and still trying to find our ways to a product that is used enough to support its continuous development. Sometimes we were just too optimistic what we can tackle, didn't think through the implications well enough before starting a feature or didn't prioritise well enough. And ultimately we also need to earn money with this, otherwise we end up again with just some hacky, unmaintained piece of code.

The only thing we really cut out was the HISTORY full-text search. You can still search all the pages you bookmark or annotate.
The reason why offline-first and e2e encryption were necessary was because noone wants to upload their history to some random startup, not because we are privacy purists. We highly value privacy and would never look at user data if it was available to us, but we also need to be practical. One of the reasons Memex is not as far as it could be IS because of the implications of a privacy/offline-first product: Slow iterations, difficult migrations, difficult sync, difficult search, expensive development and maintenance.

From all the onboarding calls and user interviews we had in the past year it's clear that the full-text history search is not what most people really need, even though for some its a useful feature.
What we found is that people need ways of better collaborating with people in curating, annotating and discussing content on granular levels, organise and search the best things they find and get them into their creator workflows (Roam, Notion, Blogs, Twitter etc). That's why we have this focus now, which is also in line with our long-term mission.

The workflows we focus on now are:

  1. Multi-device support, cloud accessibility and API (will also make the reader feature more feasible again)
  2. Collaboratively annotating and discussing locally stored PDFs to support academic and business R&D workflows
  3. Multi-Content support. First: saving/annotating images

I am sorry that the product and its iterations don't serve you as much anymore, or not yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants