Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Interest in adding autocomplete widget? #70

Closed
KyleAMathews opened this Issue Dec 30, 2011 · 22 comments

Comments

Projects
None yet
3 participants

I wrote a JS fuzzymatching library for a GTD app I'm building. I'm using it to power an autocomplete widget for quickly jumping between places within the app.

I've been following (and learning from) this project with interest for awhile and am interested in helping out if I can. If there's interest, it'd be pretty easy to wire up an autocomplete widget that searches across all loaded pages.

You can see a demo of the library at http://kyleamathews.github.com/Fuzzymatcher.js/

Owner

WardCunningham commented Jan 3, 2012

Kyle -- The demo is awesome. What is the best way to integrate it? Some ideas come to mind:

  • Search on page names, not page content.
  • Load a list of page names from the origin server (new function)
  • Possibly load and merge similar lists from nearby federated servers

A server might be judged nearby when a page citing it is in the dom. As more pages are loaded, the breadth of the search would expand. This seems natural enough, no?

Sounds perfect!

It makes sense as well to do as I'm doing on my GTD app and group results by server. The screengrab below shows a sample search. We could do the same here but group by server with the origin server always first.

Another part of my design that might have application here is the metadata I'm adding to the results. The yellow stuff are tags and the green are projects. I'm not familiar enough with this project to know but there might be metadata which would be helpful to add to the results.

sample autocomplete

Owner

WardCunningham commented Jan 3, 2012

Each federated server has a gradient based favicon that looks good even when squeezed to font size proportions. Each page has a Title which includes mixed case and punctuation. This is converted to a "slug" which appears in urls. We sometimes include the first paragraph of a page with its title and favicon to make a bibliographic style citation. All of these would be good choices.

A good way to proceed is for you to fork the repo and then code away. If you'd like my help, let me know how I can support you.

Sounds great. When I have some free time again, I'll put together a little prototype to see how things come together.

One quick question -- is there a list anywhere of publicly available servers? That'd be helpful for testing this.

Owner

WardCunningham commented Jan 4, 2012

There isn't a public list yet. Since login logic doesn't exist yet any server is quite vulnerable. Here is one that doesn't save edits: http://home.c2.com:1111

Update: There are several public (but not yet advertised) sites with suggestive content:

Owner

WardCunningham commented Jan 20, 2012

Kyle -- Let me know what would help you get started on a prototype. I'm excited to see this capability work before we have too many pages to find by browsing. -- Ward

This little thing called paid work has been getting in the way :) Nasty business, making money.

There are two things I could use help with.

One, how to get and maintain a record of what pages are available on each server the client is connected to. You mentioned above the possibility of creating a url that would return that information. That'd work great as would converting to using Backbone and creating on page load models for each wiki page. Using Backbone would be especially ideal as the autocomplete widget could just watch for events on the page collection and keep itself updated that way.

Two, is how to open a page once someone selects it from the autocomplete widget. I'm assuming there's a function of some sort that's called?

Owner

WardCunningham commented Jan 23, 2012

In client.coffee:

doInternalLink = wiki.doInternalLink = (name, page) -> 
     << function to open new page >>

and

resolveFrom = wiki.resolveFrom = (addition, callback) ->
    << function that manages resolutionContext for links >>

in server/sinatra/server.rb:

get '/system/slugs.json' do
     << function to return list of pages as array of slugs >>

I just pushed this server side route. Try http://ddd.fed.wiki.org/system/slugs.json for a demo.

None of this is perfect. I know.

I suggest you add a search box in the footer next to the login stuff. Search only the origin server using /system/slugs.json. Don't worry that these aren't true pages titles, we'll get there. Don't worry about other sites mentioned in the dom yet. But resolutionContext is the sort of thing you will eventually want.

We'll have Backbone soon enough. The issues search surfaces will help guide the Backbone models.

Contributor

asolove commented Jan 23, 2012

In the touch client I am pondering how to load in the page data. I am leaning towards populating a collection of pages with just their title and site attributes, then fetching the actual page content only when they are displayed. Does that sound like a reasonable approach?

I am curious about how we see the search widget fitting with the rest of the UI, specifically

  • Is it part of a page (perhaps the welcome page?) or in the chrome outside any of the pages.
  • When opening a link, do we close all the existing pages as if we are starting over afresh? Or should we open the new page to the right of existing pages, as if we are following along the existing path?

@wardcunningham looks good!

@asolove I don't see any reason not to load everything. Most pages are quite small so loading everything in one JSON object wouldn't take that long and the interface feels much more responsive when the wiki pages load instantly vs. having to wait for a roundtrip to the server. Also, the code is a lot simpler when you're not doing lazy loading.

Contributor

asolove commented Jan 23, 2012

@KyleAMathews Right now, all images are stored as base64-encoded strings inside the json for the article. So I don't think pulling everything down is a real option performance-wise. (We should, of course, just allow image uploading whenever we can convince all the server implementations to do so.) As far as the code is concerned, the view is already binding to change events on the model, so it is as simple as model.fetch() and waiting for things to update.

@asolove fair enough.

Owner

WardCunningham commented Jan 23, 2012

I suspect many federated sites will be modest. How much can one person write?

But then, one could easily clone ten pages for every page they write themselves. Let's call that the fork-ratio. It might be 100 or more. So if you're thinking you'd have 100 pages then you probably need room for 10000.

I've also found it convenient to generate pages from other databases using scripts like this. One site I use at work has 40,000 pages.

I'd be content to know that a places has been preserved for pre-fetching page titles or even whole pages. This would have to be considered an optimization that might not always work.

Aside: I abandoned the notion of a "red link" that is known to not resolve. It was more valuable when turn-around was really low and the "you write it" page seemed a slap-in-the-face when it completely replaced the originating page. I've considered turning a blue-link red should it be found to not exist via 404. Maybe then a second click would mean go ahead and make the new page.

Owner

WardCunningham commented Jan 23, 2012

Drag-and-drop'd images are currently stored in base64 encoded data urls. It would be possible for a smart server to quietly move this data into an asset-management system based on only the current protocols. This would mean that when you fork the pages you get the asset reference instead of the asset. But maybe a smart fork could move related assets between asset management systems. Lets stick with the simple data urls for now even if it means avoiding huge pictures. It has the right semantics and is already working.

Our very first images were hand-crafted into the JSON as http references. For exmaple, the Indie Web Camp page calls out a remote asset rather than embed it as a data url.

@wardcunningham My Saturday is looking nice and empty so I'm going to tackle getting this built. Has anything changed since #70 (comment) that'll affect what I need to do?

Owner

WardCunningham commented Feb 10, 2012

The client now uses the information collected in the resolutionContext when following internal links. This goes a long way toward making a corner of the federation feel like a single site. The current client-side complexity led me to dynamically construct a related fetchContext which is searched, 404 at a time, until an extant page is found.

Note, if I had the slugs.json for each potential server in client-side memory then I wouldn't have to ping a sequence of servers looking for the page. Your search logic must anticipate where my links might lead. That would have you doing a breadth-first search over the space that I'm just threading through. Maybe we can work together.

Suppose that we keep a list of servers for which we've retrieved the slugs.json. Then, when I follow a link, and it takes me to a server not on that list, I pull the slugs.json for it. Maybe in an hour of browsing I touch ten sites. Then you would have ten lists to search, all already in memory. Here is what that might seem like to the user:

  • When I search for something I may not find it.
  • Disappointed by search, I go looking on my own, still no luck.
  • So I try search again, this time I get lots more hits, all from the places I'd been looking.

This seems rather natural. When I search for food after browsing dog sites, I'm likely to find dog food.

A good start would be to assume that the slug.json cache is present whenever searching. The client should preload this cache with the slug.json from the origin. Call it wiki.searchCache and put objects in there that you'd like to search. Maybe objects like:

site: wiki.example.com
slug: welcome-visitors
title: "Welcome Visitors"
icon: http://wiki.example.com/favicon.png
synopsis: "Welcome to the Smallest ..."

The title: and synopsis: would be optional but could be present if the file has been read. Perhaps icon: is redundant since it can be constructed from site:. This depends upon you and when you think that construction should happen. Maybe the server could deliver title and synopsis with a different protocol than slugs.json. Lots depends on how fast your code wants to work.

(If we add a modification timestamp then we can generate a Recent Changes report from the same information. See 009694e)

Didn't get as far as I hoped Saturday. But I did get the development environment setup and read through the code so hopefully later this week I can jump on this again.

Owner

WardCunningham commented Feb 13, 2012

Cool. Any mysteries?

Nope, the code is pretty straightforward and it looks like it won't be too bad to do as you suggest and fetch the slugs.json file from each server you connect to and store that locally.

Things would definitely be easier with Backbone -- is anyone actively working on that? Backbone is a perfect fit for SFW so the transition would be smooth.

Contributor

asolove commented Feb 14, 2012

I am actively working on the Backbone rewrite here: https://github.com/asolove/Smallest-Federated-Wiki/commits/master Right now, it is capable of displaying things exactly like the current version but does so in an MVC control flow for everything except the plugins. The next step is converting the existing plugins to each have a Backbone view, which will also mean standardizing the way that plugins specify what actions, buttons, etc. are available. If you are interested, definitely feel free to work on something and send back pull requests.

Ah, found it -- the backbone stuff is all under the touch directory. I knew you'd said you were working on Backbone-ifying things but couldn't find anything before when I went looking.

Cool, I'll take a gander through your code then.

Owner

WardCunningham commented Sep 19, 2012

We haven't done the incremental searching or the fuzzy matching. But we do have a pretty good approach to federated search so it seems like a good time to close this issue. Of course we're always interested in seeing something new work. And now there is data on the client to be searched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment