Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added bookmark creation listener #46

Merged
merged 6 commits into from
Jul 31, 2017
Merged

Added bookmark creation listener #46

merged 6 commits into from
Jul 31, 2017

Conversation

bohrium272
Copy link
Member

@bohrium272 bohrium272 commented Jul 4, 2017

In reference to #39
Hope I added it in the correct file 馃槄

@poltak
Copy link
Member

poltak commented Jul 6, 2017

Hope I added it in the correct file聽馃槄

Yes, the global background listeners like this have just been going into src/background.js so far. That's fine for now (maybe much later we'll reorganise it all a bit).

For the actual data transformation logic, I'd probably do that somewhere near where the page visits are handled (src/activity-logger/background/log-bookmark-creation.js maybe?).

For the bookmark doc transformation, you should be able to reuse the transformToBookmarkDoc fn in src/imports/background/imports-preparation. And yes, that doesn't really make sense there; planning to refactor bookmarks stuff out of there when I can figure out a good place.

Important to note though, is that for every bookmark doc there is an associated page doc. Bookmark doc is essentially just the bookmark-related metadata for a page, similar to visit docs.
So I'd have a look at how the logPageVisit fn in log-page-visit module currently works with creating a page doc then creating a visit doc based on it. One important thing to note is that log-page-visit gets page data from a tab, but with bookmarks you can't assume a tab since you can add bookmarks through the manager or whatever.

One of the many modules implemented with the imports feature is the fetch-page-data module, which essentially affords the same page data fetch but via XHR rather than tab, so that should be of use.

@bohrium272
Copy link
Member Author

So if I understand correctly, following should be the flow:
Bookmark listener triggered -> Get page data using fetch-page-data.js -> Get page doc and visit doc using log-page-visit.js -> Call transformToBookmarkDoc
Question: I don't see storeVisit returning a page doc, I see it returning a visit doc.

@poltak
Copy link
Member

poltak commented Jul 8, 2017

That flow seems mostly right, apart from:

Get page doc and visit doc using log-page-visit.js

You will have to create the page doc from the data returned by fetch-page-data. I think the easiest way is to look at a page doc in the DB to see how it is structured, and match that structure using what you get from fetch-data-ouput. There is some existing page doc creation code but they are specific to their use-cases (depend on a tab ID or an existing page stub in DB). To be unified somehow later. Visits don't fit into the use-case of creating a new bookmark, so you can ignore that.

One thing, however, is I think the page doc creation step should be skipped if there is already a matching page doc in the DB. Match would have to be done via URL. Then you would just create a new bookmark doc from the BookmarkTreeNode and reference to that existing page doc (all in the transformToBookmarkDoc params).

So the flow would be something like:
Bookmark listener triggered -> Get existing page doc OR create using data from fetch-page-data.js -> Call transformToBookmarkDoc -> store produced docs

What do you think?

@bohrium272
Copy link
Member Author

Yes the flow seems appropriate. We should think about documenting all of this somewhere. For a new developer page, visit, and bookmark docs are a whole new concept. I can work on the documentation part.

@poltak
Copy link
Member

poltak commented Jul 10, 2017

We should think about documenting all of this somewhere

Yes, agreed. Codetour.md contains a high level overview of what each module does, although it needs to be updated a bit for some new things. That may be best for now, unless you want to detail the flows of certain use-cases. Code-wise, I encourage and have been using JSDoc, at least for exported module interfaces, as its flexible and fairly wide-spread and can be parsed by a variety of tools and editors/IDEs. If we want to fully document the code later, these annotations will help a lot.

@poltak
Copy link
Member

poltak commented Jul 10, 2017

One more minor thing: had a chat with @oliversauter the other day briefly regarding limiting the freeze-dry (tl;dr: it generates a local HTML version of the page for offline viewing/viewing at visit time) (issue #42).

For new bookmarks, we want to automatically store the freeze-dry version EDIT: IF user has specified; freeze-dry-bookmarks boolean in local storage will denote this

I'll update the fetch-page-data module later to optionally do this, so when you call it with a URL it will return { favIconURI, content, freezeDryHTML }. Currently just returns favIconURI and content.

EDIT: Done. Use the opts arg doc'd in fetch-page-data to specify wanted page data.

@poltak poltak mentioned this pull request Jul 10, 2017
3 tasks
poltak added a commit that referenced this pull request Jul 14, 2017
Limit freeze dry versions. Addresses most of #42, but option 1 will be addressed further in #46
@bohrium272
Copy link
Member Author

bohrium272 commented Jul 15, 2017

Get existing page doc OR create using data from fetch-page-data.js

What if I call reIdentifyOrStorePage() with a null tabId from https://github.com/WorldBrain/WebMemex/blob/master/src/page-storage/store-page.js#L42 ?

@bohrium272
Copy link
Member Author

@poltak added some code for creating page and bookmark docs. Would like your review 馃槄

//https://developer.mozilla.org/en-US/Add-ons/WebExtensions/API/bookmarks/onCreated
//Check if there already exists a page with this URL.
const samePageCandidates = (await findPagesByUrl({url})).rows.map(row => row.doc);
const reusablePage = await tryReidentifyPage({null, bookmarkInfo.url, samePageCandidates})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tryReidentifyPage, while still not implemented, is intended for the flow of a page visit, as it requires a browser tab. I think easier here is to just set reusablePage to the latest page from samePageCandidates results, if any. If reusablePage not null, it should skip the fetchPageData() and related logic below.

//Check if there already exists a page with this URL.
const samePageCandidates = (await findPagesByUrl({url})).rows.map(row => row.doc);
const reusablePage = await tryReidentifyPage({null, bookmarkInfo.url, samePageCandidates})
const pageData = fetchPageData(bookmarkInfo.url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchPageData requires an object with keys url and opts flags specifying which data to attempt to fetch.

Also use await as it's an asynchronous function (Promise returning function), so you'll need to wait for the result before continuing on. Without await, the call to fetchPageData will return and set pageData to a Promise object and continue on immediately.

When you use await, you can surround the call with try { } catch (err) { } finally { } to handle any errors thrown during the Promise's life, which fetchPageData takes advantage of as there's many cases where certain data can't be fetched from a URL. In the case of a fetch error, a minimal page doc can still be made.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I missed await by mistake. Anything specific to do if an error occurs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, still make the page doc but only with url, title, and timestamp + id data from the BookmarkTreeNode as the doc _id's timestamp and nonce values, respectively. The generatePageDocId from src/page-storage can help you create the _id.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that fetchPageData has some default options set. Any particular requirement for a bookmark?
EDIT: Nevermind, I saw they are all set to false 馃槄

url: bookmarkInfo.url,
content: pageData.content,
_id: getPageId(pageData),
_attachments: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good here with the doc structure and fields. With _attachments, Pouch will expect each one to be an object of at least shape: { content_type: string, data: Blob }. For content_type, if you already have a Blob instance, you can use .type property to get the content_type. So here content_type: freezeDryBlob.type.

Also instead of importing formatFavIconAttachment, that function just wraps around a call to dataURLToBlob from blob-util package. So I think better to use that directly here on the pageData.favIconURI.

}
const bookmarkDoc = transformToBookmarkDoc(pageDoc, bookmarkInfo)
db.put(bookmarkDoc);
db.put(pageDoc);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have multiple docs to insert, just use db.bulkDocs([pageDoc, bookmarkDoc]) and leave scheduling to Pouch. As these calls return Promise, putting those two calls without await queues both doc insert Promises on the event loop around the same time.

Overall the general flow of what happens on bookmark create, and your understanding of how all the different parts works seems fine! These are just little things.

const favIconBlob = await dataURLToBlob(pageData.favIconURI)
const freezeDryBlob = new Blob([pageData.freezeDryHTML], {type: 'text/html;charset=UTF-8'})
try {
const pageData = await fetchPageData(bookmarkInfo.url, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike var, const and let aren't hoisted so here pageData is just declared within the scope of the try. Means that the two lines above that get the blobs should throw reference errors trying to access things on undefined pageData.

I would think they should be inside the try, as they can only be acted on if fetchPageData doesn't reject.

}
} catch (err) {
console.log("Error occurred while fetching page data: " + err.toString())
} finally {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finally blocks happen regardless of whether an error is thrown in the try block, i.e., always, even if you return from try/catch. They're generally a good place to do stuff that you want to do regardless of outcome. So the db.bulkDocs() call might be good here and in the try and catch you just alter some pageDoc and bookmarkDoc variables that are declared in the outer scope?

const freezeDryBlob = new Blob([pageData.freezeDryHTML], {type: 'text/html;charset=UTF-8'})
try {
const pageData = await fetchPageData(bookmarkInfo.url, {
includeFreezeDry: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Freeze-dry should only be included and attached to the page doc if the freeze-dry-bookmarks boolean is set in local storage (handled in Options page). But maybe just leave that to last as it should just be some little conditions. Apart from that, looking good so far!

}
} finally {
const bookmarkDoc = await transformToBookmarkDoc(pageDoc, bookmarkInfo)
db.bulkDocs([bookmarkDoc, pageDoc])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pageDoc is still not in scope here. try, catch, finally blocks all have their own scope, and don't overlap. Best thing just declare it before the try block (in the inherited parent scope) as let pageDoc = { url: ..., ... }, then add content and attachments to the object in the try block. catch block shouldn't need anything - if it fails, we can't and don't need to do anything.

Also you should put this entire addListener callback in its own function somewhere and then you could test it out in dev by importing it from the UI and making a simple button to call it.

@bohrium272
Copy link
Member Author

I separated the function, I guess that's what the conflict is about 馃槄

Copy link
Member

@poltak poltak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. The idea and flow seems to be all right, but just some used interfaces are wrong. But you should be able to test this in your browser and see where the errors are.

This branch is far behind master now as well, so I think it would be wise to rebase on master, making sure it works with latest changes. Should only be one simple conflict in src/background.js.

const samePageCandidates = (await findPagesByUrl({urlToFind})).rows.map(row => row.doc)
if (samePageCandidates.length > 0) {
let pageDoc = {}
for (var i = samePageCandidates.length - 1; i >= 0; i--) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating over the results isn't necessary here. If you look at the code for findPagesByUrl it uses exact match of the url in the query selector, hence this loop is always going to break on the first value. Basically you should only need the last page doc it finds. They should be sorted by time, but not sure which order. Either the first or last result should be the one you want to assign pageDoc to

}
try {
console.log("Trying to get page data")
const freezeDryFlag = window.localStorage.getItem('freeze-dry-bookmarks')
Copy link
Member

@poltak poltak Jul 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

window.localStorage and browser.storage.local are different. All of our stuff is in browser.storage.local, so you will want to access it via that API. Have a look around other parts of the code to see examples of fetching values from browser.storage.local

EDIT: here is a discussion about it

}
if (freezeDryFlag) {
fetchPageDataOptions.includeFreezeDry = true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this condition should be part of fetchPageDataOptions declaration. eg const fetchPageDataOptions = { includeFreezeDry: freezeDryFlag, ... }

if (freezeDryFlag) {
fetchPageDataOptions.includeFreezeDry = true
}
const pageData = await fetchPageData(bookmarkInfo.url, fetchPageDataOptions)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a look at the function signature for fetchPageData. It accepts an object of arguments, so you would call it something like fetchPageData({ url: bookmarkInfo.url, opts: fetchPageDataOptions })

}, {
content_type: favIconBlob.type,
data: favIconBlob,
}]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a read of how to store attachments in Pouch here. The doc._attachments field when inserting should be an object with named keys rather than array.

@poltak poltak merged commit 1c3d80d into WorldBrain:master Jul 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants