Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

[express] handle the "unclaimed" case in authentication & "asynchronize" owner read/write #107

Merged
merged 2 commits into from Jan 27, 2012

Conversation

Projects
None yet
3 participants
Contributor

thomaslee commented Jan 27, 2012

Ward & I walked through these changes pretty extensively, but feel free to hit me up if you have any questions.

Essentially just addressing a few bugs in the node/express port of the SFW server.

Collaborator

nrn commented Jan 27, 2012

Thanks Tom. I appreciate the contribution, but I do have a couple of questions though. I know I've got a lot of bugs still, but in the interest of learning as I go here it would help to know what bugs a commit fixes.

The changes to authenticated also worry me. It's route middleware that could potentially get called a lot, and now requires a hit to the file system where it didn't before. I'm also not sure we want the default behavior of a wiki that thinks it's unclaimed to be treat everyone like they own it. The idea now is that an unclaimed wiki can be claimed, but to exploit that on an established wiki you would have to break it in such a way that the wiki thought it wasn't claimed, but it wasn't so broken that you couldn't reclaim it yourself. With this new setup anything falsy happens in the path.exists call and the wiki is accepting all writes.

The setOwner call back stuff is probably a good idea though. I hadn't run into a problem, but that should be more robust going forward.

thanks
--nick

Contributor

thomaslee commented Jan 27, 2012

Hey Nick,

Quick response before I crawl into bed -- I've already stayed up far too late :)

As you've identified, there's room for improvement on the performance of authenticated() -- we can cache the "claimed" state in memory, for example.

wrt these specific changes, we were just trying to address what I think might just be a bit of a misunderstanding: if I understood Ward correctly (and I'm sure he'll correct me if I'm lying!) an "unclaimed" Wiki should be editable by anyone. The ruby/sinatra code base reflects this (see server.rb:84 and server.rb:208 as at b24872c...).

https://github.com/WardCunningham/Smallest-Federated-Wiki/blob/master/server/sinatra/server.rb#L84
https://github.com/WardCunningham/Smallest-Federated-Wiki/blob/master/server/sinatra/server.rb#L208

Once a Wiki has been claimed, the openid filesystem "tag" gets created. The logic in master for the express server seems to be overlooking the "unclaimed" case.

Hope I'm making sense there!

I don't understand most of what you were saying from "The idea now is ..." onwards, sorry -- it must be getting too late.

And yup, the setOwner thing was more a simple "correctness" thing. Wasn't causing issues, wasn't likely to cause issues, but theoretically it could have caused issues. :)

In any case, thanks for taking a look.

Cheers,
Tom

Collaborator

nrn commented Jan 27, 2012

Yeah, as you can tell I was up way too late as well :)

So instead of having separate claimed logic like the ruby implementation does, I've instead been using the owner variable to show that the wiki is unclaimed if it is falsy. The whole
weird dance with owner and setOwner was so that we didn't have to read the file system every time a request came in to see if a wiki was claimed, and if so, by who. We could cache the claimed status, but I think a falsy owner covers all the use cases I can see.

Good point on how the ruby works, I had missed that. So I'll have to champion the idea that current behavior is wrong :)
Edit: Whoops, I put fake tags around this and it disappeared them.
(fictional_user_docs)When started, a wiki by default is a read only place. Any changes a visitor makes are stored only to their local computer, for them to own on their local file system. Only when someone steps up and claims a wiki as their own can it be edited on the server, and then only by them.(/fictional_user_docs)

I think that this behavior would improve security, making the default action if a wiki is broken and can't tell who owns it to be read only instead of accept all writes. It would also prevent heartbreak from users in the case where someone starts editing a wiki thinking "i'll claim this later, for whatever inane reason i'm too lazy right now, even though I'm putting in time, effort, and important data", and then someone else just comes along and claims it out from under them.

Looking back up I appear to have mostly been babbling incoherently after "The idea now is", sorry. I think I covered my arguments in a less crazy way this time... I hope.

I am absolutely glad for "correctness" changes, my ideas on best practices in node are still evolving, so that is actually quite helpful :)

Thanks,
--nick

Owner

WardCunningham commented Jan 27, 2012

I'm going to hold off on this merge until Nick has a chance to think through the logic of the changes we made. I'll explain myself here.

Why make unclaimed sites writable?

Stephen and I made this choice when we first added authentication. I encouraged it because it preserved the existing behavior and kept our integration tests working. Its also how the original wiki worked: 100% open.

Nick correctly points out that anyone could claim a shared wiki. That doesn't make sense. It is one more way that our federated codebase does not support effective sharing. This doesn't concern me. I'd rather find more ways to get people into their own servers.

I've used this "write first and claim later" workflow to amaze people when I demo. Faced with the protest: "I don't want to setup a server" I just say, "You already have one". We then go to "their site" on my farm and start editing. Two edits into the experience they start wondering how this could possibly happen and what makes it theirs. Then I say, one more step: claim this site by pressing this (G) button right here. Done.

This is an instance of the Have-Fun-First pattern for encouraging viral behavior. Here is what the viral loop could look like:

  • read something
  • edit something (privately)
  • edit something (publicly)
  • claim the edits
  • recruit readers (the inner-loop)
  • develop frustration with particular farm
  • move content between farms (see below)
  • launch personally owned site
  • turn personally owned site into farm
  • recruit writers to new farm (the outer-loop)

I'm suspicious of most viral loops as get rich quick schemes. However, since we can return to the author (or anyone else) the complete contents of the site as one large JSON, we give everyone the opportunity to defect. Git has this property too. GitHub does not.

Why touch the filesytem with each request?

Touching the filesystem with each request is foolish in a high volume environment. And, of course, high volume is where node.js is known to shine.

However, when I first got the farm logic working in sinatra I blew an important demo because I had cached information incorrectly flowing between sites. I've since de-optimized the sinatra server in order to get the logic right first.

My suggestion is to write node.js code in the style that confines de-optimizations and follows all the performance patterns one would expect of node.js code, even if it isn't currently optimized.

Now what?

More discussion? Do the pull? What do you think?

Collaborator

nrn commented Jan 27, 2012

I definitely agree that the do things fist and set it up later workflow amazes people. However I think a typical scenario will see this demoed virally with the demoers own wiki, making the magic editing locally, and then pushing the demoee's changes up to a farmed server they then own. The other way still works with an unclaimed site and default-data, but the demo is local until the claim happens. I think this setup does more to funnel visitors into contributors then an open site, however I'm willing to give up this argument to you're judgement here.

However, I'm still going to argue about how we handle the claimed/owned state internally in the express port :)
So I see two options:
1) we hit the file system every time to see if the wiki is owned, and if so by who
2) we could keep track of it in memory and only hit the file system when needed.

This commit moves us to what I see as a bad combination of the two where we track ownership in memory, but then don't trust that owner information as to the claimed status. While I agree that emphasizing the logic over the performance is key at this stage,I don't think that tracking the owner in memory and persisting to disk is premature optimization.

Collaborator

nrn commented Jan 27, 2012

If i'm not overstepping my bounds what I'd like to do is merge in the setOwner cb changes, but remove isClaimed, and instead trust that a truthy owner is the same as the wiki being claimed, and go with your judgment on changing an unclaimed wiki until I can come up with more compelling arguments :)

Owner

WardCunningham commented Jan 27, 2012

Nick, I like your plan. Can you pull the mods from Tom's repo and put together a pull request?

One thing more you could do: We thought that the claim logic modifications would make the integration tests pass again. Unfortunately we had trouble getting RSpec/Sinatra running on Tom's machine. It was late. Would you see if our work makes the tests pass again. It would make us feel good to know that. Thanks.

Collaborator

nrn commented Jan 27, 2012

Ohh, good point, that is probably the main problem! I was thinking the tests were using my view of how an unclaimed wiki worked. I'll go ahead and do that.

@WardCunningham WardCunningham merged commit 4278d66 into WardCunningham:master Jan 27, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment