Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTNI-199 ⁃ Checklist for finishing up releasable product. #1

Closed
17 of 18 tasks
blackforestboi opened this issue Apr 3, 2017 · 10 comments
Closed
17 of 18 tasks
Labels

Comments

@blackforestboi
Copy link
Member

blackforestboi commented Apr 3, 2017

These are the things that still need to be done in order to make the upgraded WebMemex extension releasable:


Want to back this issue? ?utm_campaign=plugin&utm_content=tracker%2F59103681&utm_medium=issues&utm_source=github Post a bounty on it! We accept bounties via ?utm_campaign=plugin&utm_content=tracker%2F59103681&utm_medium=issues&utm_source=github Bountysource.

@blackforestboi
Copy link
Member Author

blackforestboi commented Oct 23, 2017

so, now that @poltak hustled through and replaced the search TWICE, we are on the finish line!

The last things to do are:
(@everyone: Any other ideas?)

Features

PRIO 1:

  • Prefiltering urls in import MTNI-141 ⁃ Url filter to reduce number of indexed documents #112 (@arpit, do you think you can tackle this until the end of the week, otherwise @poltak may be quicker?)
  • Copywriting on Acknowledgements page & settings (@oliversauter)
  • Import from Old Extension (@poltak)
  • Menu bar icon in smaller size (@oliversauter)
  • Change text in blacklist dialogue and include the url/domain that is about to be deleted. "Do you want to delete all visits to asana.com?"
  • change wording from "recording" to "indexing" pages in the drop down (@oliversauter)
  • Adding uninstall survey link (see old extension: https://github.com/WorldBrain/Research-Engine/blob/master/src/js/background.js#L366)
  • Adding new onboarding screen for first time users.
  • Making a feature switch for all the console logs. Most of the things are not important to a use, and sometimes can even be cluttering for them. We should discuss what to show and what is only important to devs.

PRIO 2:

  • Filtering by bookmarks MTNI-210 ⁃ Filter by bookmarks #26 (taken on by @mukeshkharita)
  • Putting previously failed URLs on the back of the import queue. With each pause, or cancelling, and then restarting, all failed URLs are fetched again.
  • First loading bar in import process needs a bit explanatory text "Please wait while we analyze & prepare your browsing history & bookmarks"
  • handling of cyrillic, chinese and japanese characters?
  • When first using extension with empty list, show text "You have not made any history yet, either visit some pages and come back, or import your existing history" Image of moonlanding drawing.

Bugs

  • Sometimes the screenshot and favicons are not properly saved. Reproducing by closing some tabs earlier, shortly after the indexing is kicked off.
  • PDFs are not indexed in imports
  • The term search is an "OR" search. > should be "AND"
  • Reducing size of screenshots (currently 1.2MB per page) (potential library?) Target: 50kb?
  • when searching after searching something else, the loading bar is not shown anymore. Therefore there is no indication for a user that an actual search happens. It just updates the results.
  • Quick Blacklist does not work -> it deletes everything, if you chose to, but does not add the page to the blacklist.
  • if no favicon available, dont show anything > right now it shows missing image placeholder.
  • some words weirdly indexed (see comment)

@blackforestboi
Copy link
Member Author

@poltak Weird search times:
I assume in this case though, that the time it takes to load all of the results (which are almost 500) could be the dominating factor to the "total search" time. For the same query in the address bar its about half that time. The bug with the screenshot-size might be related?
screen shot 2017-10-23 at 00 48 53

@blackforestboi
Copy link
Member Author

Testrun on Firefox:

When trying to run the importer, it is stuck in the preparation mode and throws this error:
screen shot 2017-10-23 at 17 04 49

Also when opening pages, after 10 seconds Firefox freezes. I assume because of the same error.

@poltak
Copy link
Member

poltak commented Oct 24, 2017

@oliversauter Need a bit more details about the first one. Is it reproducible at all and with different types of search?
The search branch is working great in FF for me as well. Is this FF problem on a build of master or search branch?

If on the search branch, make sure to reinstall the ext from scratch and wipe your data whenever you update as well.

RE listed bugs:

Sometimes the screenshot and favicons are not properly saved. Reproducing by closing some tabs earlier, shortly after the indexing is kicked off.

This is the expected behaviour of the WebExt API. If the user closes a tab while it's extracting tab data, it will crash as the tab won't exist anymore in memory.

Reducing size of screenshots

We currently store the full res PNG screenshot. We could change to JPEG format and specify to the WebExt API to capture them at lower quality. Relevant API. We only use these as thumbnails for the Overview UI right now, so we can easily get them down to ~50Kb with lower quality JPEGs, but depends if you have any future plans (like displaying them larger somewhere).

when searching after searching something else, the loading bar is not shown anymore

It should still be there, it's just below the results. Either we can stop rendering the results list view immediately as a new query is detected (just leaving the spinner), or we can move the spinner somewhere, like overlaying on top of the results list.

Quick Blacklist does not work

This one already fixed in my new search branch. Will come later.

if no favicon available, dont show anything > right now it shows missing image placeholder.

In the results list? For pages that don't have favicon data, it should not render any image element there. Can you give me a URL of a page where I could reproduce this behaviour and see if I can fix it?

@blackforestboi
Copy link
Member Author

Need a bit more details about the first one. Is it reproducible at all and with different types of search?

My wonder was here at first, why the total search time was so high, even though the term search time was rather low. My assumption is that the gap of about 3000ms came from loading all those documents from pouch. It was especially impactful for terms that appear in many pages.
An idea to solve this: if the page docs related to a term-key are stored in order of their initial saving, we could use the same pagination approach of only loading 20 at a time?

so we can easily get them down to ~50Kb with lower quality JPEGs, but depends if you have any future plans (like displaying them larger somewhere).

What storage-size would be reasonable so we could display them a bit larger, maybe 3-4 times the size of right now?

or we can move the spinner somewhere, like overlaying on top of the results list.

Why not removing the previous results, if a search is made, so the regular spinner is on top again?
If a person changes the query, they don't to see the old results anymore anyway.

Can you give me a URL of a page where I could reproduce this behaviour and see if I can fix it?

I saw it a couple of times with twitter pages.

This is the expected behaviour of the WebExt API. If the user closes a tab while it's extracting tab data, it will crash as the tab won't exist anymore in memory.

Should we maybe then delete the page completely?

@poltak
Copy link
Member

poltak commented Oct 24, 2017

With the first one, need to be able to reproduce it first or else everything is just speculation. Will see if I can try get it reproducible with larger data set later today.

What storage-size would be reasonable so we could display them a bit larger, maybe 3-4 times the size of right now?

All depends on the amount of visual artefacts you want visible. That would be the main result of changing to JPEG and lowering quality. I'll try a few quality values later and send you some screenshots of what it would look like.

I saw it a couple of times with twitter pages.

Really need specific examples of where it's happening for you so I can reproduce.

Should we maybe then delete the page completely?

Yeah, it would be nice, but as the entire page visit scenario is a bunch of async stuff, there's no guarantee of what stage it got up to before the user cancelled the tab (the state of the DBs, both for index and pouch, is unknown). This one would be better as its own issue to work towards and see how we can handle different scenarios.

@blackforestboi
Copy link
Member Author

Will see if I can try get it reproducible with larger data set later today.

Hack: looked for terms in DB that have 300+ urls in the map.

Really need specific examples of where it's happening for you so I can reproduce.

I figured it is for twitter pages that I import, where this problem appears.

This one would be better as its own issue to work towards and see how we can handle different scenarios.

Ok defo something for later.

@poltak
Copy link
Member

poltak commented Nov 2, 2017

When first using extension with empty list, show text "You have not made any history yet, either visit some pages and come back, or import your existing history" Image of moonlanding drawing.

This one would be a bit messy to implement. We'd either need to do:

  • have a initial query on both DBs at the start of every search, seeing if there's anything in there
  • have a one-time flag stored somewhere that says "no data yet", but this would need to be checked every time we index a file (and swap it if not set)

Open to other suggestions.

Making a feature switch for all the console logs. Most of the things are not important to a use, and sometimes can even be cluttering for them. We should discuss what to show and what is only important to devs.

Anything logged to console should only really be a concern to devs; shouldn't really need to concern ourselves with standard users opening the console - they can if they want. At the moment it should be mostly timers for various events and various errors that are caught. None are needed at all, but the errors are nicer than hiding them, IMO. Any ideas with what you want/don't want in there?

@blackforestboi
Copy link
Member Author

This one would be a bit messy to implement.

Ok, was just an idea. Let's do it later. I think it is a good onboarding feature. But defo for later improvement

Any ideas with what you want/don't want in there?

In the old extension I often got the feedback that in the console of a page (not the extension console) it was cluttered with (error) messages, thus being annoying for developers who use the console to investigate other things unrelated to the extension.

So prio would be to remove all logs that appear on the page logs. I currently only see 1 though:
html-pipeline: 1101.947021484375ms

poltak added a commit that referenced this issue Nov 6, 2017
- this stuff gets run in content script, hence pollutes every page's console that it runs on
- #1 (comment)
@blackforestboi
Copy link
Member Author

THANK YOU all for your awesome work in the past months. You made this possible.
Yesterday we pushed the new version of the tool into the Chrome and Firefox Addon Stores.
worldbrain.io/download_extensions.

Closing this as all things for the release have been done.

@blackforestboi blackforestboi changed the title Checklist for finishing up releasable product. MTNI-199 ⁃ Checklist for finishing up releasable product. Apr 19, 2018
poltak pushed a commit that referenced this issue Jun 2, 2020
Pull in changes from upstream
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants