Legal documents fathead plugin #37

Closed
wants to merge 2 commits into
from

Conversation

Projects
None yet
3 participants
Contributor

megamattron commented Sep 18, 2012

Hi, here's a plugin for Docracy's complete database of open source legal contracts. Since I work at Docracy I just made a url that generates a fathead compatible output.txt, so you'll see that the fetch.sh is basically all you need, it just saves the output of the url straight to output.txt.

Wasn't sure what to do about the category, I kept it a generic "legal documents" rather than getting more specific. We do have finer grained categorization though if that would be interesting.

Hope I did this right!

megamattron referenced this pull request in duckduckgo/zeroclickinfo-longtail Sep 18, 2012

Closed

Longtail defined enough for development? #1

Contributor

rpicard commented Sep 18, 2012

Thanks for your work on this, but I don't quite see what the use case here is. How would this information help someone searching for "NDA"?

NDA A           legal_documents                                     A quick and dirty one-way (disclosing party-to-recipient) NDA that can be easily modified to suit your particular circumstance.    http://www.docracy.com/0fj9jsndtwt/nda
Contributor

megamattron commented Sep 18, 2012

The data here is a summary of a legal contract and a link to the full
contract. Maybe I've misunderstood the purpose of the fathead plugin? I was
discussing with yegg over at
duckduckgo/zeroclickinfo-longtail#1 about making
a longtail plugin, and he suggested I do a fathead instead.

Owner

yegg commented Sep 19, 2012

Fathead is meant to explain the topic and then link to someplace with more information, like

https://duckduckgo.com/Non-disclosure_agreement

does now. Maybe you just want to change the keywords to

NDA example
convertible note example

i.e. add example and then it would make more sense?

Contributor

megamattron commented Sep 19, 2012

Yeah, that makes a lot more sense that way. Would I basically modify the
titles of the entries to include "example"? I don't remember seeing an
explicit spot for keywords.
On Sep 19, 2012 8:50 AM, "Gabriel Weinberg" notifications@github.com
wrote:

Fathead is meant to explain the topic and then link to someplace with more
information, like

https://duckduckgo.com/Non-disclosure_agreement

does now. Maybe you just want to change the keywords to

NDA example
convertible note example

i.e. add example and then it would make more sense?


Reply to this email directly or view it on GitHubhttps://github.com/duckduckgo/zeroclickinfo-fathead/pull/37#issuecomment-8689836.

Owner

yegg commented Sep 19, 2012

Yup, the title is the keyword :). Fathead is essentially a keyword mapping with a bit of fuzziness around the edges.

Contributor

megamattron commented Sep 19, 2012

Great, thanks. Some people search for "NDA sample" and some people use "NDA
example" - would it then make sense to duplicate the entries to handle
those cases?

Owner

yegg commented Sep 19, 2012

Yup, duplicate it. Usually we have triggers where you don't have to put it, but in this case it doesn't make too much sense to add example and sample as triggers, so I'd just duplicate it.

Owner

yegg commented Sep 19, 2012

Actually, if you want to get fancy you can make a redirect from one to the other :)

Contributor

megamattron commented Sep 19, 2012

Ok, I've made the change to the data export to add sample and example to all the entries, and I've updated the example queries to reflect that.

Do you guys have a way to test different plugins for relevancy, i.e. by click rates or something like that? I ask because I suspect that a lot of people are actually looking for an example of a document, rather than a description of a document when they search for something like "NDA". Would be nice to see if that was the case.

In any case, let me know if you need anything else!

Owner

yegg commented Sep 20, 2012

I definitely want to do what we think is best, though there aren't any great tests for relevancy on this small of the query space.

You'd be surprised -- the primary use case is generally to get a quick definition of what the thing is, though in this case it could be 50/50.

So I could see a path to a third (better) way where you have a first sentence that defines the term and then a second t hat describes the example doc and then you click for more.

Contributor

megamattron commented Sep 20, 2012

Yeah, I think the third option sounds like the best. I guess that would
require a hybrid plugin that returned the combined definition and document
summary correct? There's no way to show results from two matching plugins,
correct?
On Sep 20, 2012 10:04 AM, "Gabriel Weinberg" notifications@github.com
wrote:

I definitely want to do what we think is best, though there aren't any
great tests for relevancy on this small of the query space.

You'd be surprised -- the primary use case is generally to get a quick
definition of what the thing is, though in this case it could be 50/50.

So I could see a path to a third (better) way where you have a first
sentence that defines the term and then a second t hat describes the
example doc and then you click for more.


Reply to this email directly or view it on GitHubhttps://github.com/duckduckgo/zeroclickinfo-fathead/pull/37#issuecomment-8728845.

Owner

yegg commented Sep 20, 2012

We do have some hybrids internally, e.g. https://duckduckgo.com/?q=blink+182 is a fathead plugin but the schedule info is coming from spice.

However, none of this stuff is yet exposed in an easy to use fashion.

I think a better approach would be to rewrite this request to be the third option and then we'd make it preferential. OR we could launch with what we have (the example stuff) and then work for it for v2.

Contributor

megamattron commented Sep 20, 2012

Oh, that example search is awesome. It'd be easy to imagine a similar
search for "nondisclosure agreement" returning the definition, then
"Examples: Generic NDA, Technology NDA, New York NDA, etc". For now though
let's go with with the example/sample version as a starting point and then
improve it for a v2 however you think is best.

It will be nice to cut through the noise a bit for these kinds of searches,
the top results are pretty awful on Google for most of these kind of
searches and there's pretty high search volume for many of the most common
docs.

Owner

yegg commented Sep 20, 2012

It would be easier to do that insert for a Spice plugin, which is designed to add additional information.

Contributor

megamattron commented Sep 25, 2012

Ok, do you need anything else from me this v1 fathead plugin, or can we put it up? Happy to extend it to provide additional info to a separate spice plugin in the future too.

Owner

yegg commented Sep 26, 2012

I'll let @rpicard take over now!

Contributor

rpicard commented Sep 27, 2012

@megamattron Hey Matt, I'm still having trouble understanding the use case that this solves. Who is searching for Y Combinator Series AA Investors' Rights Agreement sample or Old Docracy Terms Of Service (Beta Version) sample?

I can sort of see it for something like this:

Consulting Agreement sample A           legal_documents                                     A standard consulting agreement for a Delaware corporation that needs to hire a consultant. http://www.docracy.com/49/consulting-agreement

...but even for something like that or NDA example, which I can imagine someone searching for, it seems like this is just a search result, which doesn't really make sense as Zero-Click Info. ZCI are supposed to provide the user with information right on the results page. I don't see what "Zero-click" information this provides the user on the results page.

Maybe there's some way to modify the purpose of the plugin to make it more useful as ZCI, rather than just a result pinned to the top of the page.

Contributor

megamattron commented Oct 9, 2012

True, some of those documents are pretty specific, we're just exporting our
entire database which contains a real range of stuff from super specific to
pretty general. There are a bunch of general purpose docs in there though
including the ones you mentioned. i.e.:

intern offer letter
employment offer letter
bill of sale
power of attorney
roommate agreement
assignment of copyright
collaboration agreement
sublease agreement
etc.

Maybe I could improve the keywording for some of these results though and
include associated tags?

Regarding "zero-click", you're right - you don't get the entire contract
right away, which I'm guessing would be too much text correct? Would it be
more useful as a ZCI if it included an excerpt of the document, or the
entire document? We could include some actions too like download and share
if that's useful.

Contributor

rpicard commented Oct 10, 2012

  1. I think that if you just have a row for each one, like "intern offer letter" and "bill of sale," you won't need to have a separate one for " * example" and " * sample." I can add those words as triggers on the back end.

  2. For the abstract, maybe prefixing it with something like "Document example: " would make it more clear that you're talking about a specific document, and not giving a description of "consulting forms" in general.

  3. Also, you have some documents with titles like "Consulting Agreement for a Fixed Price Contract (1)." Maybe dropping the "(1)" would make it more useful. Of course, that means you would end up with conflicts after you also dropped the "(1) (1)" from "Consulting Agreement for a Fixed Price Contract (1) (1)," so you'd have to drop one altogether. I'd say one with a title that might actually get searched is better than two that are just taking up space.

  4. Looking at the data, it seems to me that you might be able to do some processing to generalize some of those really specific items. For example:

  • "Barebones Contracting Agreement" -> "contracting agreement"
  • "Basic Hosting Agreement" -> "hosting agreement"
  • "Generic shortform NDA" -> "shortform NDA"
  • "Simple release for events" -> "release for events"
  • "Y Combinator Series AA Termsheet" -> "series AA termsheet"
  • "Foundry Group standard Bylaws" -> "standard bylaws"

Automatically dropping the first word(s) if it's/they're on a blacklist, including (barebones, basic, generic, simple, Y combinator, Foundry Group, etc.), then dropping duplicates would improve the quality of the data.

  1. I like the idea of action links. One that says [customize] and points to this page could be useful.

What do you think?

Contributor

megamattron commented Oct 10, 2012

  1. Sounds good.

  2. No problem. But keep the same abstract that we have now, just prefix with that text, correcT?

  3. Most of those "(1)" type documents are branches of documents with limited changes, so I think I'll probably just drop them entirely. I'll check to make sure there's nothing good there, but I think that's a safe route to go.

  4. I'll look into this and see what's possible. Am I correct in thinking you need a perfect match with the title for it be a match and show ZCI? Let me take a crack at simplifying the title if that's the case, as you suggested. We also have a tagging system which indicates some generic categories, that might be useful.

  5. Is there a way in the format I should specify those links, or just append them to the abstract as links?

Thanks!

Contributor

megamattron commented Oct 11, 2012

I've updated the feed to handle points 1-4, and I'll add in 5 as soon as I can understand where to put the links. I'm not including a range of documents and have generified the titles a lot so they should come up more readily on different keyword searches. Take a look and see what you think.

Contributor

rpicard commented Oct 12, 2012

@megamattron See the links in brackets here: https://duckduckgo.com/?q=oil+production+in+saudi+arabia

I'm thinking that's what we should do for this one.

Contributor

megamattron commented Oct 12, 2012

Ok cool, so I can just add the appropriate HTML for the links into the
summary then, correct?

Contributor

rpicard commented Oct 13, 2012

@megamattron Yep! That's all you need to do.

Contributor

megamattron commented Oct 15, 2012

Great, I added the links. Let me know if you need anything else!

Contributor

rpicard commented Oct 16, 2012

It doesn't look like those commits have been added to this pull request yet.

Contributor

megamattron commented Oct 17, 2012

All the changes actually happen on our server, I just update the stuff being output. The code in this pull request really just hits that URL and saves the results to a file.

On Oct 16, 2012, at 7:15 PM, Robert Picard notifications@github.com wrote:

It doesn't look like those commits have been added to this pull request yet.


Reply to this email directly or view it on GitHub.

Contributor

rpicard commented Oct 18, 2012

Oh, right! My bad, I'll give it a look now.

Contributor

rpicard commented Oct 18, 2012

It looks like there are some stray newlines in there. For example, see "Community Garden Lease" (line 581 / 582).

Contributor

megamattron commented Oct 18, 2012

Good catch, thanks. Take a look now, should be fixed.

Contributor

rpicard commented Oct 20, 2012

Content

These would be helpful if altered a little. It looks like they fell through some cracks in your processing:

  • [42] LinkedIn CEO Employment Letter
  • [43] Groupon Restricted Stock Unit Award
  • [44] Privacy Policy of .com
  • [55] Employment Offer Letter for CTO of

These (and others like them) probably won't be helpful in there, but it's tough to weed out everything, so I'm not going to let them hold up the Fathead. Still though, I want to mention that there are still some in there:

  • [121] Indemnification Agreement (BK) (#LegalHack Edition)
  • [133] Trademark License Agreement (#LegalHack Edition)
  • [148] #HacktheAct Pitch: DMCA for Trademarks (2.0) (DMTA)
  • [112] #HacktheAct Example Pitch by BLIP Students: DMCA for Trademarks

There are a lot of "Terms of Service" related items. I think it might be good to have the specific terms of service for searches like "Twitter terms of service" (which is in there) and others. That said, I think it would be better to use a more specific data set for those searches, such as tos;dr. I think I'll spend a little time with that, so if you want to remove the "Terms of Service" items for specific companies, it may be a good call.

Format

I wasn't able to run output.txt through the processing scripts. It looks like you may have too many tabs between the "categories" section and the "abstract" section. Please go through your processing scripts to be sure you're following the correct format. 🎯

Contributor

megamattron commented Oct 22, 2012

Hi Robert, I've fixed up most of those first set of results, although I've
the second set for the moment. They won't show up in any search result
anyways and it's hard to filter for "too specific". I'll see if I can come
up with something based on tags or categories though.

I think I've also fixed the output format, I had mistakenly replaced fields
I wasn't providing data for with extra tabs. One small note about the
output format doc, the example output line at the end of the code block
references "internal_links" but that's not mentioned above. I'm guessing
you mean "see_also", but it's not clear.

Thanks again for helping me get through this.

Contributor

rpicard commented Oct 22, 2012

@megamattron It looks like you guys are experiencing some downtime right now, so I can fetch just yet. I wouldn't worry about the "too specific" things. You got most of them already, so the plugin isn't bloated or anything. Thanks for the note on the README. I'll take a look at that now.

Contributor

rpicard commented Oct 22, 2012

Nevermind, it looks like I can fetch again. 👍

Contributor

megamattron commented Oct 22, 2012

Ok great, let me know if you need anything else!

Contributor

rpicard commented Oct 22, 2012

@megamattron Okay, I've got it deployed here: https://robert.duckduckgo.com/?q=30+day+move+out+notice

I played around with the formatting a little in Firebug, and I think it would look better if you removed "customize and" from each of the links, so they were just [download] and [sign online], and stuck a <br /> between the text and the links.

Contributor

megamattron commented Oct 23, 2012

Looks awesome! I've made the changes to the links and inserted the
,
see how that looks now if you get the latest data.

Contributor

rpicard commented Oct 23, 2012

The latest changes are up on http://robert.duckduckgo.com. This is definitely getting there! It seems like this one is a little long though. The whole "Other licensing available" bit could probably be dropped. It actually seems like several of these are a little hefty. Here's another example.

What do you think the easiest way to cut out some of the fluff would be?

Contributor

rpicard commented Oct 24, 2012

FYI: I updated the README to fix that issue you pointed out. Thanks for mentioning it!

9518c87

Contributor

megamattron commented Oct 24, 2012

Hmm, I see what you're saying. In the first example, on our site those are actually links to other related contracts. Should I turn those back into real links? And about summaries that are too long in general, should I ellipsize them? A lost of these contracts are user submitted examples so I don't directly control the text.

Glad to help with the read me!

Contributor

rpicard commented Oct 24, 2012

I think that using ellipses to cut them off at a certain point is a good idea. ~400 chars seems like a good starting place. You'll need to watch out for punctuation though (e.g. if the 400th character is a comma, an ellipse would look weird coming next).

The problem with turning the URLs into links there is that it breaks the formatting of the ZCI box a little. I think that once we trim the abstracts, the URLs won't seem like so much clutter.

Contributor

megamattron commented Oct 24, 2012

Ok, I've added a max 400 character abbreviation to the export now. I'm actually breaking on word boundaries so it's a little nicer to read, and I don't end on punctuation as you suggested. So I'm willing to go a little under the 400 max to get a nicer break more or less. Take a look and see what you think.

Contributor

rpicard commented Oct 24, 2012

I've deployed your changes and I think it looks pretty good. I'll pass it along internally and get back to you with any more feedback we have.

Contributor

megamattron commented Oct 24, 2012

Ok great, by deployed I'm guessing you mean just to your test server,
correct?

Thanks for all your help figuring this stuff out.

On Wed, Oct 24, 2012 at 1:36 PM, Robert Picard notifications@github.comwrote:

I've deployed your changes and I think it looks pretty good. I'll pass it
along internally and get back to you with any more feedback we have.


Reply to this email directly or view it on GitHubhttps://github.com/duckduckgo/zeroclickinfo-fathead/pull/37#issuecomment-9748642.

*Matt Hall
*Co-Founder, Docracy http://www.docracy.com/

Contributor

rpicard commented Oct 24, 2012

Yes, just to robert.duckduckgo.com for now.

Contributor

rpicard commented Oct 29, 2012

@megamattron After some discussion, we're thinking that the information provided right now just isn't very helpful. I think that we need to take a look at the information we're using here.

Sample document: Note: These documents are from http://ycombinator.com/seriesaa.html and include the following disclaimer (from that site): Y Combinator and Wilson Sonsini Goodrich & Rosati are happy to announce the Series AA Equity Financing Documents. Their goal is to make angel funding rounds for startups easier for both sides. These documents were originally created for YC-funded startups to...

There just isn't any real information in there. If we could modify the data to show something like:

Document description: Y Combinator Series AA Termsheet is an open source legal document (startup, funding).

Basically this template: $title is a an open source [...] ($categories).

I think what we could do here is use the original title (before stripping to make it more general) in the description, but keep the current title as the title element, e.g. a search for "series AA termsheet" brings up the above description.

We could also look at incorporating some more meta data like number of pages and author.

What are your thoughts here?

Contributor

rpicard commented Nov 19, 2012

@megamattron Hey, are you still up for working on this?

Contributor

megamattron commented Nov 19, 2012

Yes! Sorry, we got derailed over here on a few things due to the storm. Yup, let me give this a shot and we'll see how it looks, shouldn't take long.

Contributor

rpicard commented Nov 19, 2012

@megamattron Hope the storm didn't cause too much damage for you. Let me know if you have any questions!

Contributor

megamattron commented Nov 19, 2012

@rpicard Ok, if you run the fetch again you'll get the new format. I also made those category links to lists of more docs of that type, let me know if that makes sense otherwise I can take it out. Thanks!

Contributor

rpicard commented Nov 19, 2012

Thanks for getting back so fast. I'll pass it along for internal review.

Just a note, there's a small typo: "is a an open source legal document."

Contributor

megamattron commented Nov 19, 2012

Whoops, typo will be fixed in our next release, should be ~1 hour. Thanks.

On Mon, Nov 19, 2012 at 5:30 PM, Robert Picard notifications@github.comwrote:

Thanks for getting back so fast. I'll pass it along for internal review.

Just a note, there's a small typo: "is a an open source legal document."


Reply to this email directly or view it on GitHubhttps://github.com/duckduckgo/zeroclickinfo-fathead/pull/37#issuecomment-10534216.

*Matt Hall
*Co-Founder, Docracy http://www.docracy.com/

Contributor

rpicard commented Nov 19, 2012

No problem. It's easy enough to fix with Vim. 👍

Contributor

rpicard commented Nov 26, 2012

@megamattron The feedback I've received is that we should get rid of the [sign online] link since it's not likely to get a lot of use (Who's going to decide they want to sign the document without seeing it first?), and to move the [download] link onto the same line as the abstract (i.e. get rid of the <br>).

Contributor

megamattron commented Nov 26, 2012

Ok, made those changes so go ahead and pull a new copy and see how that
works for you.

On Sun, Nov 25, 2012 at 10:56 PM, Robert Picard notifications@github.comwrote:

@megamattron https://github.com/megamattron The feedback I've received
is that we should get rid of the [sign online] link since it's not likely
to get a lot of use (Who's going to decide they want to sign the document
without seeing it first?), and to move the [download] link onto the same
line as the abstract (i.e. get rid of the
).


Reply to this email directly or view it on GitHubhttps://github.com/duckduckgo/zeroclickinfo-fathead/pull/37#issuecomment-10703811.

*Matt Hall
*Co-Founder, Docracy http://www.docracy.com/

Contributor

rpicard commented Dec 7, 2012

@megamattron Hey, sorry about the delay there. Final exams kept me busy this past week.

It looks like long titles are breaking the formatting of the ZCI box. [1] That might be something we want to account for on our end, so I'm going to ask about that and get back to you.

[1] Here's an example: https://robert.duckduckgo.com/?q=macromedia+software+eula+enduser+license+agreement

Contributor

megamattron commented Dec 7, 2012

Ok let me know. Happy to ellipsize if needed.

Hope the exams went well. :)

On Dec 7, 2012, at 6:26 PM, Robert Picard notifications@github.com wrote:

@megamattron Hey, sorry about the delay there. Final exams kept me busy this past week.

It looks like long titles are breaking the formatting of the ZCI box. [1] That might be something we want to account for on our end, so I'm going to ask about that and get back to you.

[1] Here's an example: https://robert.duckduckgo.com/?q=macromedia+software+eula+enduser+license+agreement


Reply to this email directly or view it on GitHub.

Contributor

rpicard commented Dec 11, 2012

🚀 We are live!

https://duckduckgo.com/?q=Series+AA+Investors%27+Rights+Agreement

The issue with long titles still stands, but we're going to fix that with some CSS on our side so it doesn't really affect the Fathead. Since the long titles are much less likely to actually be searched, we can go ahead and deploy without worrying about too many people coming across them in the mean time.

Thanks for working with me for this long! We're glad to see it on the site.

rpicard closed this Dec 11, 2012

Contributor

megamattron commented Dec 11, 2012

Sweet! Thanks for all your help.

On Tue, Dec 11, 2012 at 11:28 AM, Robert Picard notifications@github.comwrote:

[image: 🚀] We are live!

https://duckduckgo.com/?q=Series+AA+Investors%27+Rights+Agreement

The issue with long titles still stands, but we're going to fix that with
some CSS on our side so it doesn't really affect the Fathead. Since the
long titles are much less likely to actually be searched, we can go ahead
and deploy without worrying about too many people coming across them in the
mean time.

Thanks for working with me for this long! We're glad to see it on the site.


Reply to this email directly or view it on GitHubhttps://github.com/duckduckgo/zeroclickinfo-fathead/pull/37#issuecomment-11250472.

*Matt Hall
*Co-Founder, Docracy http://www.docracy.com/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment