Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Legal documents fathead plugin #37

Closed
wants to merge 2 commits into from

3 participants

@megamattron

Hi, here's a plugin for Docracy's complete database of open source legal contracts. Since I work at Docracy I just made a url that generates a fathead compatible output.txt, so you'll see that the fetch.sh is basically all you need, it just saves the output of the url straight to output.txt.

Wasn't sure what to do about the category, I kept it a generic "legal documents" rather than getting more specific. We do have finer grained categorization though if that would be interesting.

Hope I did this right!

@megamattron megamattron referenced this pull request in duckduckgo/zeroclickinfo-longtail
Closed

Longtail defined enough for development? #1

@rpicard

Thanks for your work on this, but I don't quite see what the use case here is. How would this information help someone searching for "NDA"?

NDA A           legal_documents                                     A quick and dirty one-way (disclosing party-to-recipient) NDA that can be easily modified to suit your particular circumstance.    http://www.docracy.com/0fj9jsndtwt/nda
@megamattron
@yegg
Owner

Fathead is meant to explain the topic and then link to someplace with more information, like

https://duckduckgo.com/Non-disclosure_agreement

does now. Maybe you just want to change the keywords to

NDA example
convertible note example

i.e. add example and then it would make more sense?

@megamattron
@yegg
Owner

Yup, the title is the keyword :). Fathead is essentially a keyword mapping with a bit of fuzziness around the edges.

@megamattron
@yegg
Owner

Yup, duplicate it. Usually we have triggers where you don't have to put it, but in this case it doesn't make too much sense to add example and sample as triggers, so I'd just duplicate it.

@yegg
Owner

Actually, if you want to get fancy you can make a redirect from one to the other :)

@megamattron

Ok, I've made the change to the data export to add sample and example to all the entries, and I've updated the example queries to reflect that.

Do you guys have a way to test different plugins for relevancy, i.e. by click rates or something like that? I ask because I suspect that a lot of people are actually looking for an example of a document, rather than a description of a document when they search for something like "NDA". Would be nice to see if that was the case.

In any case, let me know if you need anything else!

@yegg
Owner

I definitely want to do what we think is best, though there aren't any great tests for relevancy on this small of the query space.

You'd be surprised -- the primary use case is generally to get a quick definition of what the thing is, though in this case it could be 50/50.

So I could see a path to a third (better) way where you have a first sentence that defines the term and then a second t hat describes the example doc and then you click for more.

@megamattron
@yegg
Owner

We do have some hybrids internally, e.g. https://duckduckgo.com/?q=blink+182 is a fathead plugin but the schedule info is coming from spice.

However, none of this stuff is yet exposed in an easy to use fashion.

I think a better approach would be to rewrite this request to be the third option and then we'd make it preferential. OR we could launch with what we have (the example stuff) and then work for it for v2.

@megamattron
@yegg
Owner

It would be easier to do that insert for a Spice plugin, which is designed to add additional information.

@megamattron

Ok, do you need anything else from me this v1 fathead plugin, or can we put it up? Happy to extend it to provide additional info to a separate spice plugin in the future too.

@yegg
Owner

I'll let @rpicard take over now!

@rpicard

@megamattron Hey Matt, I'm still having trouble understanding the use case that this solves. Who is searching for Y Combinator Series AA Investors' Rights Agreement sample or Old Docracy Terms Of Service (Beta Version) sample?

I can sort of see it for something like this:

Consulting Agreement sample A           legal_documents                                     A standard consulting agreement for a Delaware corporation that needs to hire a consultant. http://www.docracy.com/49/consulting-agreement

...but even for something like that or NDA example, which I can imagine someone searching for, it seems like this is just a search result, which doesn't really make sense as Zero-Click Info. ZCI are supposed to provide the user with information right on the results page. I don't see what "Zero-click" information this provides the user on the results page.

Maybe there's some way to modify the purpose of the plugin to make it more useful as ZCI, rather than just a result pinned to the top of the page.

@megamattron
@rpicard

1) I think that if you just have a row for each one, like "intern offer letter" and "bill of sale," you won't need to have a separate one for " * example" and " * sample." I can add those words as triggers on the back end.

2) For the abstract, maybe prefixing it with something like "Document example: " would make it more clear that you're talking about a specific document, and not giving a description of "consulting forms" in general.

3) Also, you have some documents with titles like "Consulting Agreement for a Fixed Price Contract (1)." Maybe dropping the "(1)" would make it more useful. Of course, that means you would end up with conflicts after you also dropped the "(1) (1)" from "Consulting Agreement for a Fixed Price Contract (1) (1)," so you'd have to drop one altogether. I'd say one with a title that might actually get searched is better than two that are just taking up space.

4) Looking at the data, it seems to me that you might be able to do some processing to generalize some of those really specific items. For example:

  • "Barebones Contracting Agreement" -> "contracting agreement"
  • "Basic Hosting Agreement" -> "hosting agreement"
  • "Generic shortform NDA" -> "shortform NDA"
  • "Simple release for events" -> "release for events"
  • "Y Combinator Series AA Termsheet" -> "series AA termsheet"
  • "Foundry Group standard Bylaws" -> "standard bylaws"

Automatically dropping the first word(s) if it's/they're on a blacklist, including (barebones, basic, generic, simple, Y combinator, Foundry Group, etc.), then dropping duplicates would improve the quality of the data. :sparkles:

5) I like the idea of action links. One that says [customize] and points to this page could be useful.

What do you think?

@megamattron

1) Sounds good.

2) No problem. But keep the same abstract that we have now, just prefix with that text, correcT?

3) Most of those "(1)" type documents are branches of documents with limited changes, so I think I'll probably just drop them entirely. I'll check to make sure there's nothing good there, but I think that's a safe route to go.

4) I'll look into this and see what's possible. Am I correct in thinking you need a perfect match with the title for it be a match and show ZCI? Let me take a crack at simplifying the title if that's the case, as you suggested. We also have a tagging system which indicates some generic categories, that might be useful.

5) Is there a way in the format I should specify those links, or just append them to the abstract as links?

Thanks!

@megamattron

I've updated the feed to handle points 1-4, and I'll add in 5 as soon as I can understand where to put the links. I'm not including a range of documents and have generified the titles a lot so they should come up more readily on different keyword searches. Take a look and see what you think.

@rpicard

@megamattron See the links in brackets here: https://duckduckgo.com/?q=oil+production+in+saudi+arabia

I'm thinking that's what we should do for this one.

@megamattron
@rpicard

@megamattron Yep! That's all you need to do.

@megamattron

Great, I added the links. Let me know if you need anything else!

@rpicard

It doesn't look like those commits have been added to this pull request yet.

@megamattron
@rpicard

Oh, right! My bad, I'll give it a look now.

@rpicard

It looks like there are some stray newlines in there. For example, see "Community Garden Lease" (line 581 / 582).

@megamattron
@rpicard

Content

These would be helpful if altered a little. It looks like they fell through some cracks in your processing:

  • [42] LinkedIn CEO Employment Letter
  • [43] Groupon Restricted Stock Unit Award
  • [44] Privacy Policy of .com
  • [55] Employment Offer Letter for CTO of

These (and others like them) probably won't be helpful in there, but it's tough to weed out everything, so I'm not going to let them hold up the Fathead. Still though, I want to mention that there are still some in there:

  • [121] Indemnification Agreement (BK) (#LegalHack Edition)
  • [133] Trademark License Agreement (#LegalHack Edition)
  • [148] #HacktheAct Pitch: DMCA for Trademarks (2.0) (DMTA)
  • [112] #HacktheAct Example Pitch by BLIP Students: DMCA for Trademarks

There are a lot of "Terms of Service" related items. I think it might be good to have the specific terms of service for searches like "Twitter terms of service" (which is in there) and others. That said, I think it would be better to use a more specific data set for those searches, such as tos;dr. I think I'll spend a little time with that, so if you want to remove the "Terms of Service" items for specific companies, it may be a good call.

Format

I wasn't able to run output.txt through the processing scripts. It looks like you may have too many tabs between the "categories" section and the "abstract" section. Please go through your processing scripts to be sure you're following the correct format. :dart:

@megamattron
@rpicard

@megamattron It looks like you guys are experiencing some downtime right now, so I can fetch just yet. I wouldn't worry about the "too specific" things. You got most of them already, so the plugin isn't bloated or anything. Thanks for the note on the README. I'll take a look at that now.

@rpicard

Nevermind, it looks like I can fetch again. :+1:

@megamattron
@rpicard

@megamattron Okay, I've got it deployed here: https://robert.duckduckgo.com/?q=30+day+move+out+notice

I played around with the formatting a little in Firebug, and I think it would look better if you removed "customize and" from each of the links, so they were just [download] and [sign online], and stuck a <br /> between the text and the links.

@megamattron
@rpicard

The latest changes are up on http://robert.duckduckgo.com. This is definitely getting there! It seems like this one is a little long though. The whole "Other licensing available" bit could probably be dropped. It actually seems like several of these are a little hefty. Here's another example.

What do you think the easiest way to cut out some of the fluff would be?

@rpicard

FYI: I updated the README to fix that issue you pointed out. Thanks for mentioning it!

9518c87

@megamattron

Hmm, I see what you're saying. In the first example, on our site those are actually links to other related contracts. Should I turn those back into real links? And about summaries that are too long in general, should I ellipsize them? A lost of these contracts are user submitted examples so I don't directly control the text.

Glad to help with the read me!

@rpicard

I think that using ellipses to cut them off at a certain point is a good idea. ~400 chars seems like a good starting place. You'll need to watch out for punctuation though (e.g. if the 400th character is a comma, an ellipse would look weird coming next).

The problem with turning the URLs into links there is that it breaks the formatting of the ZCI box a little. I think that once we trim the abstracts, the URLs won't seem like so much clutter.

@megamattron

Ok, I've added a max 400 character abbreviation to the export now. I'm actually breaking on word boundaries so it's a little nicer to read, and I don't end on punctuation as you suggested. So I'm willing to go a little under the 400 max to get a nicer break more or less. Take a look and see what you think.

@rpicard

I've deployed your changes and I think it looks pretty good. I'll pass it along internally and get back to you with any more feedback we have.

@megamattron
@rpicard

Yes, just to robert.duckduckgo.com for now.

@rpicard

@megamattron After some discussion, we're thinking that the information provided right now just isn't very helpful. I think that we need to take a look at the information we're using here.

Sample document: Note: These documents are from http://ycombinator.com/seriesaa.html and include the following disclaimer (from that site): Y Combinator and Wilson Sonsini Goodrich & Rosati are happy to announce the Series AA Equity Financing Documents. Their goal is to make angel funding rounds for startups easier for both sides. These documents were originally created for YC-funded startups to...

There just isn't any real information in there. If we could modify the data to show something like:

Document description: Y Combinator Series AA Termsheet is an open source legal document (startup, funding).

Basically this template: $title is a an open source [...] ($categories).

I think what we could do here is use the original title (before stripping to make it more general) in the description, but keep the current title as the title element, e.g. a search for "series AA termsheet" brings up the above description.

We could also look at incorporating some more meta data like number of pages and author.

What are your thoughts here?

@rpicard

@megamattron Hey, are you still up for working on this?

@megamattron

Yes! Sorry, we got derailed over here on a few things due to the storm. Yup, let me give this a shot and we'll see how it looks, shouldn't take long.

@rpicard

@megamattron Hope the storm didn't cause too much damage for you. Let me know if you have any questions!

@megamattron

@rpicard Ok, if you run the fetch again you'll get the new format. I also made those category links to lists of more docs of that type, let me know if that makes sense otherwise I can take it out. Thanks!

@rpicard

Thanks for getting back so fast. I'll pass it along for internal review.

Just a note, there's a small typo: "is a an open source legal document."

@megamattron
@rpicard

No problem. It's easy enough to fix with Vim. :thumbsup:

@rpicard

@megamattron The feedback I've received is that we should get rid of the [sign online] link since it's not likely to get a lot of use (Who's going to decide they want to sign the document without seeing it first?), and to move the [download] link onto the same line as the abstract (i.e. get rid of the <br>).

@megamattron
@rpicard

@megamattron Hey, sorry about the delay there. Final exams kept me busy this past week.

It looks like long titles are breaking the formatting of the ZCI box. [1] That might be something we want to account for on our end, so I'm going to ask about that and get back to you.

[1] Here's an example: https://robert.duckduckgo.com/?q=macromedia+software+eula+enduser+license+agreement

@megamattron
@rpicard

:rocket: We are live!

https://duckduckgo.com/?q=Series+AA+Investors%27+Rights+Agreement

The issue with long titles still stands, but we're going to fix that with some CSS on our side so it doesn't really affect the Fathead. Since the long titles are much less likely to actually be searched, we can go ahead and deploy without worrying about too many people coming across them in the mean time.

Thanks for working with me for this long! We're glad to see it on the site.

@rpicard rpicard closed this
@megamattron
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Sep 18, 2012
  1. @megamattron
Commits on Sep 19, 2012
  1. @megamattron
This page is out of date. Refresh to see the latest.
View
5 legal_docs/README.txt
@@ -0,0 +1,5 @@
+Legal docs fathead plugin for DDG from Docracy.
+
+Dependencies:
+
+None.
View
1  legal_docs/data.url
@@ -0,0 +1 @@
+http://www.docracy.com/application/duckduckgo
View
2  legal_docs/fetch.sh
@@ -0,0 +1,2 @@
+#!/bin/bash
+curl -o output.txt 'http://www.docracy.com/application/duckduckgo'
View
20 legal_docs/meta.txt
@@ -0,0 +1,20 @@
+# This is the name of the source as people would refer to it,
+# e.g. Wikipedia or PerlDoc -- gets displayed on Web site.
+Name: Docracy
+
+# This is the base domain where the source pages are located.
+# Get used to get the favicon.
+Domain: www.docracy.com
+
+# This is what gets put in quotes next to the source
+# It can be blank if it is a source with completely
+# general info spanning many types of topics like Facebook.
+Type: Docracy
+
+# Whether the source is from MediaWiki (1) or not (0).
+# Processing happens a bit differently on MediaWiki.
+MediaWiki: 0
+
+# Keywords uses to trigger (or prefer) the source over others.
+# Can seperate multiple keywords with,
+Keywords: legal document,contract,legal contract,legal,law
View
4 legal_docs/parse.py
@@ -0,0 +1,4 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+
+# We do nothing here because the fetch.sh is actually grabbing something in the correct format.
View
4 legal_docs/parse.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+
+cd $(dirname -- "$0")
+# We do nothing here because the fetch.sh is actually grabbing something in the correct format.
View
4 legal_docs/queries.txt
@@ -0,0 +1,4 @@
+Generic NDA example
+convertible note example
+design contract sample
+employment offer letter sample
Something went wrong with that request. Please try again.