# How to access WordNet?

[WordNet](https://wordnet.princeton.edu/) is a huge free lexical resource provided by Princeton University.

More about WordNet can be found [here](https://en.wikipedia.org/wiki/WordNet)

## Main concepts

[Source](https://github.com/wordnet/wordnet)

<img src="https://raw.githubusercontent.com/wordnet/wordnet/master/doc/class_diagram.png" width=55%>

As a relational database the following attributes describe WordNet entities:

### senses

* `id`: The UUID identifier
* `synset_id`:  The UUID of connected Synset
* `external_id`:  The ID from external database, used for importing
* `lemma`: The lemma of Lexeme that Sense belongs to (e.g. car)
* `sense_index`: The index of sense in context of its Synset (e.g. 1)
* `comment`: The short comment, used in UI (e.g. transporting machine)
* `language`: Currently can be `en_GB` or `pl_PL`
* `part_of_speech`: The part of speech of Sense (noun etc.)
* `domain_id`: The ID of the Domain of Sense (not used yet)

### synsets

* `id`: The UUID identifier
* `external_id`:  The ID from external database, used for importing
* `comment`: The short comment by Słowosieć, used in UI
* `definition`: The short comment by Princeton Wordnet, used in UI
* `examples`: The examples of usage of synset from Princeton Wordnet

### relation_types

* `name`: Name of the relation
* `reverse_relation`: Name of reverse relation (see: normalisation)
* `parent_id`: Name of parent RelationType (inheritance-like)
* `priority`: It is used for sorting relation types in UI (lower-better)
* `description`: Description of the relation (not used yet)

### sense\_relations and synset\_relations

* `parent_id`: UUID of base sense (or synset)
* `child_id`: UUID of of related sense (or synset)
* `relation_id`: UUID of relation in which child is toward parent (e.g. UUID hyponymy relation means child is hyponym of parent)


### Illustration

<img src="https://www.w3.org/2001/sw/BestPractices/WNET/wordnet-sw-20040713-fig01.png" width=60%>


## WordNet on the web

Princeton is providing a nice web based interface whereby one can easily query the lexical graph containing the `synsets` and the individual entries in them.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/b/b8/WordNet.PNG/600px-WordNet.PNG" width=65%>

This is the ["WordNet Search" interface"](http://wordnetweb.princeton.edu/perl/webwn)


## Via NLTK

<img src="https://i0.wp.com/clay-atlas.com/wp-content/uploads/2019/08/python_nltk.png?resize=592%2C644&ssl=1" width=20%>

[NLTK](https://www.nltk.org/) is a research / study focused NLP library with a very low barrier of entry, so it can be considered a "go-to" tool for students of NLP since it's [release in 2001](https://en.wikipedia.org/wiki/Natural_Language_Toolkit).

As well as being a pipeline itself, it provides interfaces for famous (mostly "practice sized") default corpora and linguistic resources. (Recurring way to demonstrate NLP problems is to load the corpus with NLTK.)

I this frame it has an interface exposing WordNet - detailed below.

There are numerous introductions to NLTK. [this](https://www.nltk.org/book/ch01.html) is one amongst them.

## Integration with SpaCy

Luckily enough, there is an integration library, that exposes the NLTK based WordNet interface to SpaCy, integrating it to the pipeline.

This is `spacy-wordnet`, and can be found [here](https://pypi.org/project/spacy-wordnet/)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!-- saved from url=(0038)http://www.nltk.org/howto/wordnet.html -->
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

<meta name="generator" content="Docutils 0.12: http://docutils.sourceforge.net/">
<title>WordNet Interface</title>
<style type="text/css">

/*
:Author: David Goodger (goodger@python.org)
:Id: $Id: html4css1.css 7614 2013-02-21 15:55:51Z milde $
:Copyright: This stylesheet has been placed in the public domain.

Default cascading style sheet for the HTML output of Docutils.

See http://docutils.sf.net/docs/howto/html-stylesheets.html for how to
customize this style sheet.
*/

/* used to remove borders from tables and images */
.borderless, table.borderless td, table.borderless th {
  border: 0 }

table.borderless td, table.borderless th {
  /* Override padding for "table.docutils td" with "! important".
     The right padding separates the table cells. */
  padding: 0 0.5em 0 0 ! important }

.first {
  /* Override more specific margin styles with "! important". */
  margin-top: 0 ! important }

.last, .with-subtitle {
  margin-bottom: 0 ! important }

.hidden {
  display: none }

a.toc-backref {
  text-decoration: none ;
  color: black }

blockquote.epigraph {
  margin: 2em 5em ; }

dl.docutils dd {
  margin-bottom: 0.5em }

object[type="image/svg+xml"], object[type="application/x-shockwave-flash"] {
  overflow: hidden;
}

/* Uncomment (and remove this text!) to get bold-faced definition list terms
dl.docutils dt {
  font-weight: bold }
*/

div.abstract {
  margin: 2em 5em }

div.abstract p.topic-title {
  font-weight: bold ;
  text-align: center }

div.admonition, div.attention, div.caution, div.danger, div.error,
div.hint, div.important, div.note, div.tip, div.warning {
  margin: 2em ;
  border: medium outset ;
  padding: 1em }

div.admonition p.admonition-title, div.hint p.admonition-title,
div.important p.admonition-title, div.note p.admonition-title,
div.tip p.admonition-title {
  font-weight: bold ;
  font-family: sans-serif }

div.attention p.admonition-title, div.caution p.admonition-title,
div.danger p.admonition-title, div.error p.admonition-title,
div.warning p.admonition-title, .code .error {
  color: red ;
  font-weight: bold ;
  font-family: sans-serif }

/* Uncomment (and remove this text!) to get reduced vertical space in
   compound paragraphs.
div.compound .compound-first, div.compound .compound-middle {
  margin-bottom: 0.5em }

div.compound .compound-last, div.compound .compound-middle {
  margin-top: 0.5em }
*/

div.dedication {
  margin: 2em 5em ;
  text-align: center ;
  font-style: italic }

div.dedication p.topic-title {
  font-weight: bold ;
  font-style: normal }

div.figure {
  margin-left: 2em ;
  margin-right: 2em }

div.footer, div.header {
  clear: both;
  font-size: smaller }

div.line-block {
  display: block ;
  margin-top: 1em ;
  margin-bottom: 1em }

div.line-block div.line-block {
  margin-top: 0 ;
  margin-bottom: 0 ;
  margin-left: 1.5em }

div.sidebar {
  margin: 0 0 0.5em 1em ;
  border: medium outset ;
  padding: 1em ;
  background-color: #ffffee ;
  width: 40% ;
  float: right ;
  clear: right }

div.sidebar p.rubric {
  font-family: sans-serif ;
  font-size: medium }

div.system-messages {
  margin: 5em }

div.system-messages h1 {
  color: red }

div.system-message {
  border: medium outset ;
  padding: 1em }

div.system-message p.system-message-title {
  color: red ;
  font-weight: bold }

div.topic {
  margin: 2em }

h1.section-subtitle, h2.section-subtitle, h3.section-subtitle,
h4.section-subtitle, h5.section-subtitle, h6.section-subtitle {
  margin-top: 0.4em }

h1.title {
  text-align: center }

h2.subtitle {
  text-align: center }

hr.docutils {
  width: 75% }

img.align-left, .figure.align-left, object.align-left {
  clear: left ;
  float: left ;
  margin-right: 1em }

img.align-right, .figure.align-right, object.align-right {
  clear: right ;
  float: right ;
  margin-left: 1em }

img.align-center, .figure.align-center, object.align-center {
  display: block;
  margin-left: auto;
  margin-right: auto;
}

.align-left {
  text-align: left }

.align-center {
  clear: both ;
  text-align: center }

.align-right {
  text-align: right }

/* reset inner alignment in figures */
div.align-right {
  text-align: inherit }

/* div.align-center * { */
/*   text-align: left } */

ol.simple, ul.simple {
  margin-bottom: 1em }

ol.arabic {
  list-style: decimal }

ol.loweralpha {
  list-style: lower-alpha }

ol.upperalpha {
  list-style: upper-alpha }

ol.lowerroman {
  list-style: lower-roman }

ol.upperroman {
  list-style: upper-roman }

p.attribution {
  text-align: right ;
  margin-left: 50% }

p.caption {
  font-style: italic }

p.credits {
  font-style: italic ;
  font-size: smaller }

p.label {
  white-space: nowrap }

p.rubric {
  font-weight: bold ;
  font-size: larger ;
  color: maroon ;
  text-align: center }

p.sidebar-title {
  font-family: sans-serif ;
  font-weight: bold ;
  font-size: larger }

p.sidebar-subtitle {
  font-family: sans-serif ;
  font-weight: bold }

p.topic-title {
  font-weight: bold }

pre.address {
  margin-bottom: 0 ;
  margin-top: 0 ;
  font: inherit }

pre.literal-block, pre.doctest-block, pre.math, pre.code {
  margin-left: 2em ;
  margin-right: 2em }

pre.code .ln { color: grey; } /* line numbers */
pre.code, code { background-color: #eeeeee }
pre.code .comment, code .comment { color: #5C6576 }
pre.code .keyword, code .keyword { color: #3B0D06; font-weight: bold }
pre.code .literal.string, code .literal.string { color: #0C5404 }
pre.code .name.builtin, code .name.builtin { color: #352B84 }
pre.code .deleted, code .deleted { background-color: #DEB0A1}
pre.code .inserted, code .inserted { background-color: #A3D289}

span.classifier {
  font-family: sans-serif ;
  font-style: oblique }

span.classifier-delimiter {
  font-family: sans-serif ;
  font-weight: bold }

span.interpreted {
  font-family: sans-serif }

span.option {
  white-space: nowrap }

span.pre {
  white-space: pre }

span.problematic {
  color: red }

span.section-subtitle {
  /* font-size relative to parent (h1..h6 element) */
  font-size: 80% }

table.citation {
  border-left: solid 1px gray;
  margin-left: 1px }

table.docinfo {
  margin: 2em 4em }

table.docutils {
  margin-top: 0.5em ;
  margin-bottom: 0.5em }

table.footnote {
  border-left: solid 1px black;
  margin-left: 1px }

table.docutils td, table.docutils th,
table.docinfo td, table.docinfo th {
  padding-left: 0.5em ;
  padding-right: 0.5em ;
  vertical-align: top }

table.docutils th.field-name, table.docinfo th.docinfo-name {
  font-weight: bold ;
  text-align: left ;
  white-space: nowrap ;
  padding-left: 0 }

/* "booktabs" style (no vertical lines) */
table.docutils.booktabs {
  border: 0px;
  border-top: 2px solid;
  border-bottom: 2px solid;
  border-collapse: collapse;
}
table.docutils.booktabs * {
  border: 0px;
}
table.docutils.booktabs th {
  border-bottom: thin solid;
  text-align: left;
}

h1 tt.docutils, h2 tt.docutils, h3 tt.docutils,
h4 tt.docutils, h5 tt.docutils, h6 tt.docutils {
  font-size: 100% }

ul.auto-toc {
  list-style-type: none }

</style>
<style>@media print {#ghostery-purple-box {display:none !important}}</style></head>
<body>
<div class="document" id="wordnet-interface">
<h1 class="title">WordNet Interface</h1>

<!-- Copyright (C) 2001-2015 NLTK Project -->
<!-- For license information, see LICENSE.TXT -->
<p>WordNet is just another NLTK corpus reader, and can be imported like this:</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; from nltk.corpus import wordnet
</pre>
</blockquote>
<p>For more compact code, we recommend:</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; from nltk.corpus import wordnet as wn
</pre>
</blockquote>
<div class="section" id="words">
<h1>Words</h1>
<p>Look up a word using <tt class="docutils literal">synsets()</tt>; this function has an optional <tt class="docutils literal">pos</tt> argument
which lets you constrain the part of speech of the word:</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; wn.synsets('dog') # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'),
Synset('frank.n.02'), Synset('pawl.n.01'), Synset('andiron.n.01'), Synset('chase.v.01')]
&gt;&gt;&gt; wn.synsets('dog', pos=wn.VERB)
[Synset('chase.v.01')]
</pre>
</blockquote>
<p>The other parts of speech are <tt class="docutils literal">NOUN</tt>, <tt class="docutils literal">ADJ</tt> and <tt class="docutils literal">ADV</tt>.
A synset is identified with a 3-part name of the form: word.pos.nn:</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; wn.synset('dog.n.01')
Synset('dog.n.01')
&gt;&gt;&gt; print(wn.synset('dog.n.01').definition())
a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds
&gt;&gt;&gt; len(wn.synset('dog.n.01').examples())
1
&gt;&gt;&gt; print(wn.synset('dog.n.01').examples()[0])
the dog barked all night
&gt;&gt;&gt; wn.synset('dog.n.01').lemmas()
[Lemma('dog.n.01.dog'), Lemma('dog.n.01.domestic_dog'), Lemma('dog.n.01.Canis_familiaris')]
&gt;&gt;&gt; [str(lemma.name()) for lemma in wn.synset('dog.n.01').lemmas()]
['dog', 'domestic_dog', 'Canis_familiaris']
&gt;&gt;&gt; wn.lemma('dog.n.01.dog').synset()
Synset('dog.n.01')
</pre>
</blockquote>
<p>The WordNet corpus reader gives access to the Open Multilingual
WordNet, using ISO-639 language codes.</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; sorted(wn.langs())
['als', 'arb', 'cat', 'cmn', 'dan', 'eng', 'eus', 'fas',
'fin', 'fra', 'fre', 'glg', 'heb', 'ind', 'ita', 'jpn', 'nno',
'nob', 'pol', 'por', 'spa', 'tha', 'zsm']
&gt;&gt;&gt; wn.synsets(b'\xe7\x8a\xac'.decode('utf-8'), lang='jpn')
[Synset('dog.n.01'), Synset('spy.n.01')]
&gt;&gt;&gt; wn.synset('spy.n.01').lemma_names('jpn')
['\u3044\u306c', '\u307e\u308f\u3057\u8005', '\u30b9\u30d1\u30a4', '\u56de\u3057\u8005',
'\u56de\u8005', '\u5bc6\u5075', '\u5de5\u4f5c\u54e1', '\u5efb\u3057\u8005',
'\u5efb\u8005', '\u63a2', '\u63a2\u308a', '\u72ac', '\u79d8\u5bc6\u635c\u67fb\u54e1',
'\u8adc\u5831\u54e1', '\u8adc\u8005', '\u9593\u8005', '\u9593\u8adc', '\u96a0\u5bc6']
&gt;&gt;&gt; wn.synset('dog.n.01').lemma_names('ita')
['cane', 'Canis_familiaris']
&gt;&gt;&gt; wn.lemmas('cane', lang='ita')
[Lemma('dog.n.01.cane'), Lemma('hammer.n.01.cane'), Lemma('cramp.n.02.cane'),
Lemma('bad_person.n.01.cane'), Lemma('incompetent.n.01.cane')]
&gt;&gt;&gt; sorted(wn.synset('dog.n.01').lemmas('dan'))
[Lemma('dog.n.01.hund'), Lemma('dog.n.01.k\xf8ter'),
Lemma('dog.n.01.vovhund'), Lemma('dog.n.01.vovse')]
&gt;&gt;&gt; sorted(wn.synset('dog.n.01').lemmas('por'))
[Lemma('dog.n.01.cachorro'), Lemma('dog.n.01.c\xe3es'),
Lemma('dog.n.01.c\xe3o'), Lemma('dog.n.01.c\xe3o')]
&gt;&gt;&gt; dog_lemma = wn.lemma(b'dog.n.01.c\xc3\xa3o'.decode('utf-8'), lang='por')
&gt;&gt;&gt; dog_lemma
Lemma('dog.n.01.c\xe3o')
&gt;&gt;&gt; dog_lemma.lang()
'por'
&gt;&gt;&gt; len(wordnet.all_lemma_names(pos='n', lang='jpn'))
66027
</pre>
</blockquote>
</div>
<div class="section" id="synsets">
<h1>Synsets</h1>
<p><cite>Synset</cite>: a set of synonyms that share a common meaning.</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; dog = wn.synset('dog.n.01')
&gt;&gt;&gt; dog.hypernyms()
[Synset('canine.n.02'), Synset('domestic_animal.n.01')]
&gt;&gt;&gt; dog.hyponyms()  # doctest: +ELLIPSIS
[Synset('basenji.n.01'), Synset('corgi.n.01'), Synset('cur.n.01'), Synset('dalmatian.n.02'), ...]
&gt;&gt;&gt; dog.member_holonyms()
[Synset('canis.n.01'), Synset('pack.n.06')]
&gt;&gt;&gt; dog.root_hypernyms()
[Synset('entity.n.01')]
&gt;&gt;&gt; wn.synset('dog.n.01').lowest_common_hypernyms(wn.synset('cat.n.01'))
[Synset('carnivore.n.01')]
</pre>
</blockquote>
<p>Each synset contains one or more lemmas, which represent a specific
sense of a specific word.</p>
<p>Note that some relations are defined by WordNet only over Lemmas:</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; good = wn.synset('good.a.01')
&gt;&gt;&gt; good.antonyms()
Traceback (most recent call last):
  File "&lt;stdin&gt;", line 1, in &lt;module&gt;
AttributeError: 'Synset' object has no attribute 'antonyms'
&gt;&gt;&gt; good.lemmas()[0].antonyms()
[Lemma('bad.a.01.bad')]
</pre>
</blockquote>
<p>The relations that are currently defined in this way are <cite>antonyms</cite>,
<cite>derivationally_related_forms</cite> and <cite>pertainyms</cite>.</p>
</div>
<div class="section" id="lemmas">
<h1>Lemmas</h1>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; eat = wn.lemma('eat.v.03.eat')
&gt;&gt;&gt; eat
Lemma('feed.v.06.eat')
&gt;&gt;&gt; print(eat.key())
eat%2:34:02::
&gt;&gt;&gt; eat.count()
4
&gt;&gt;&gt; wn.lemma_from_key(eat.key())
Lemma('feed.v.06.eat')
&gt;&gt;&gt; wn.lemma_from_key(eat.key()).synset()
Synset('feed.v.06')
&gt;&gt;&gt; wn.lemma_from_key('feebleminded%5:00:00:retarded:00')
Lemma('backward.s.03.feebleminded')
&gt;&gt;&gt; for lemma in wn.synset('eat.v.03').lemmas():
...     print(lemma, lemma.count())
...
Lemma('feed.v.06.feed') 3
Lemma('feed.v.06.eat') 4
&gt;&gt;&gt; for lemma in wn.lemmas('eat', 'v'):
...     print(lemma, lemma.count())
...
Lemma('eat.v.01.eat') 61
Lemma('eat.v.02.eat') 13
Lemma('feed.v.06.eat') 4
Lemma('eat.v.04.eat') 0
Lemma('consume.v.05.eat') 0
Lemma('corrode.v.01.eat') 0
</pre>
</blockquote>
<p>Lemmas can also have relations between them:</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; vocal = wn.lemma('vocal.a.01.vocal')
&gt;&gt;&gt; vocal.derivationally_related_forms()
[Lemma('vocalize.v.02.vocalize')]
&gt;&gt;&gt; vocal.pertainyms()
[Lemma('voice.n.02.voice')]
&gt;&gt;&gt; vocal.antonyms()
[Lemma('instrumental.a.01.instrumental')]
</pre>
</blockquote>
<p>The three relations above exist only on lemmas, not on synsets.</p>
</div>
<div class="section" id="verb-frames">
<h1>Verb Frames</h1>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; wn.synset('think.v.01').frame_ids()
[5, 9]
&gt;&gt;&gt; for lemma in wn.synset('think.v.01').lemmas():
...     print(lemma, lemma.frame_ids())
...     print(" | ".join(lemma.frame_strings()))
...
Lemma('think.v.01.think') [5, 9]
Something think something Adjective/Noun | Somebody think somebody
Lemma('think.v.01.believe') [5, 9]
Something believe something Adjective/Noun | Somebody believe somebody
Lemma('think.v.01.consider') [5, 9]
Something consider something Adjective/Noun | Somebody consider somebody
Lemma('think.v.01.conceive') [5, 9]
Something conceive something Adjective/Noun | Somebody conceive somebody
&gt;&gt;&gt; wn.synset('stretch.v.02').frame_ids()
[8]
&gt;&gt;&gt; for lemma in wn.synset('stretch.v.02').lemmas():
...     print(lemma, lemma.frame_ids())
...     print(" | ".join(lemma.frame_strings()))
...
Lemma('stretch.v.02.stretch') [8, 2]
Somebody stretch something | Somebody stretch
Lemma('stretch.v.02.extend') [8]
Somebody extend something
</pre>
</blockquote>
</div>
<div class="section" id="similarity">
<h1>Similarity</h1>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; dog = wn.synset('dog.n.01')
&gt;&gt;&gt; cat = wn.synset('cat.n.01')
</pre>
<pre class="doctest-block">&gt;&gt;&gt; hit = wn.synset('hit.v.01')
&gt;&gt;&gt; slap = wn.synset('slap.v.01')
</pre>
</blockquote>
<p><tt class="docutils literal">synset1.path_similarity(synset2):</tt>
Return a score denoting how similar two word senses are, based on the
shortest path that connects the senses in the is-a (hypernym/hypnoym)
taxonomy. The score is in the range 0 to 1. By default, there is now
a fake root node added to verbs so for cases where previously a path
could not be found---and None was returned---it should return a value.
The old behavior can be achieved by setting simulate_root to be False.
A score of 1 represents identity i.e. comparing a sense with itself
will return 1.</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; dog.path_similarity(cat)  # doctest: +ELLIPSIS
0.2...
</pre>
<pre class="doctest-block">&gt;&gt;&gt; hit.path_similarity(slap)  # doctest: +ELLIPSIS
0.142...
</pre>
<pre class="doctest-block">&gt;&gt;&gt; wn.path_similarity(hit, slap)  # doctest: +ELLIPSIS
0.142...
</pre>
<pre class="doctest-block">&gt;&gt;&gt; print(hit.path_similarity(slap, simulate_root=False))
None
</pre>
<pre class="doctest-block">&gt;&gt;&gt; print(wn.path_similarity(hit, slap, simulate_root=False))
None
</pre>
</blockquote>
<p><tt class="docutils literal">synset1.lch_similarity(synset2):</tt>
Leacock-Chodorow Similarity:
Return a score denoting how similar two word senses are, based on the
shortest path that connects the senses (as above) and the maximum depth
of the taxonomy in which the senses occur. The relationship is given
as -log(p/2d) where p is the shortest path length and d the taxonomy
depth.</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; dog.lch_similarity(cat)  # doctest: +ELLIPSIS
2.028...
</pre>
<pre class="doctest-block">&gt;&gt;&gt; hit.lch_similarity(slap)  # doctest: +ELLIPSIS
1.312...
</pre>
<pre class="doctest-block">&gt;&gt;&gt; wn.lch_similarity(hit, slap)  # doctest: +ELLIPSIS
1.312...
</pre>
<pre class="doctest-block">&gt;&gt;&gt; print(hit.lch_similarity(slap, simulate_root=False))
None
</pre>
<pre class="doctest-block">&gt;&gt;&gt; print(wn.lch_similarity(hit, slap, simulate_root=False))
None
</pre>
</blockquote>
<p><tt class="docutils literal">synset1.wup_similarity(synset2):</tt>
Wu-Palmer Similarity:
Return a score denoting how similar two word senses are, based on the
depth of the two senses in the taxonomy and that of their Least Common
Subsumer (most specific ancestor node). Note that at this time the
scores given do _not_ always agree with those given by Pedersen's Perl
implementation of Wordnet Similarity.</p>
<p>The LCS does not necessarily feature in the shortest path connecting the
two senses, as it is by definition the common ancestor deepest in the
taxonomy, not closest to the two senses. Typically, however, it will so
feature. Where multiple candidates for the LCS exist, that whose
shortest path to the root node is the longest will be selected. Where
the LCS has multiple paths to the root, the longer path is used for
the purposes of the calculation.</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; dog.wup_similarity(cat)  # doctest: +ELLIPSIS
0.857...
</pre>
<pre class="doctest-block">&gt;&gt;&gt; hit.wup_similarity(slap)
0.25
</pre>
<pre class="doctest-block">&gt;&gt;&gt; wn.wup_similarity(hit, slap)
0.25
</pre>
<pre class="doctest-block">&gt;&gt;&gt; print(hit.wup_similarity(slap, simulate_root=False))
None
</pre>
<pre class="doctest-block">&gt;&gt;&gt; print(wn.wup_similarity(hit, slap, simulate_root=False))
None
</pre>
</blockquote>
<p><tt class="docutils literal">wordnet_ic</tt>
Information Content:
Load an information content file from the wordnet_ic corpus.</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; from nltk.corpus import wordnet_ic
&gt;&gt;&gt; brown_ic = wordnet_ic.ic('ic-brown.dat')
&gt;&gt;&gt; semcor_ic = wordnet_ic.ic('ic-semcor.dat')
</pre>
</blockquote>
<p>Or you can create an information content dictionary from a corpus (or
anything that has a words() method).</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; from nltk.corpus import genesis
&gt;&gt;&gt; genesis_ic = wn.ic(genesis, False, 0.0)
</pre>
</blockquote>
<p><tt class="docutils literal">synset1.res_similarity(synset2, ic):</tt>
Resnik Similarity:
Return a score denoting how similar two word senses are, based on the
Information Content (IC) of the Least Common Subsumer (most specific
ancestor node).  Note that for any similarity measure that uses
information content, the result is dependent on the corpus used to
generate the information content and the specifics of how the
information content was created.</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; dog.res_similarity(cat, brown_ic)  # doctest: +ELLIPSIS
7.911...
&gt;&gt;&gt; dog.res_similarity(cat, genesis_ic)  # doctest: +ELLIPSIS
7.204...
</pre>
</blockquote>
<p><tt class="docutils literal">synset1.jcn_similarity(synset2, ic):</tt>
Jiang-Conrath Similarity
Return a score denoting how similar two word senses are, based on the
Information Content (IC) of the Least Common Subsumer (most specific
ancestor node) and that of the two input Synsets. The relationship is
given by the equation 1 / (IC(s1) + IC(s2) - 2 * IC(lcs)).</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; dog.jcn_similarity(cat, brown_ic)  # doctest: +ELLIPSIS
0.449...
&gt;&gt;&gt; dog.jcn_similarity(cat, genesis_ic)  # doctest: +ELLIPSIS
0.285...
</pre>
</blockquote>
<p><tt class="docutils literal">synset1.lin_similarity(synset2, ic):</tt>
Lin Similarity:
Return a score denoting how similar two word senses are, based on the
Information Content (IC) of the Least Common Subsumer (most specific
ancestor node) and that of the two input Synsets. The relationship is
given by the equation 2 * IC(lcs) / (IC(s1) + IC(s2)).</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; dog.lin_similarity(cat, semcor_ic)  # doctest: +ELLIPSIS
0.886...
</pre>
</blockquote>
</div>
<div class="section" id="access-to-all-synsets">
<h1>Access to all Synsets</h1>
<p>Iterate over all the noun synsets:</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; for synset in list(wn.all_synsets('n'))[:10]:
...     print(synset)
...
Synset('entity.n.01')
Synset('physical_entity.n.01')
Synset('abstraction.n.06')
Synset('thing.n.12')
Synset('object.n.01')
Synset('whole.n.02')
Synset('congener.n.03')
Synset('living_thing.n.01')
Synset('organism.n.01')
Synset('benthos.n.02')
</pre>
</blockquote>
<p>Get all synsets for this word, possibly restricted by POS:</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; wn.synsets('dog') # doctest: +ELLIPSIS
[Synset('dog.n.01'), Synset('frump.n.01'), Synset('dog.n.03'), Synset('cad.n.01'), ...]
&gt;&gt;&gt; wn.synsets('dog', pos='v')
[Synset('chase.v.01')]
</pre>
</blockquote>
<p>Walk through the noun synsets looking at their hypernyms:</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; from itertools import islice
&gt;&gt;&gt; for synset in islice(wn.all_synsets('n'), 5):
...     print(synset, synset.hypernyms())
...
Synset('entity.n.01') []
Synset('physical_entity.n.01') [Synset('entity.n.01')]
Synset('abstraction.n.06') [Synset('entity.n.01')]
Synset('thing.n.12') [Synset('physical_entity.n.01')]
Synset('object.n.01') [Synset('physical_entity.n.01')]
</pre>
</blockquote>
</div>
<div class="section" id="morphy">
<h1>Morphy</h1>
<p>Look up forms not in WordNet, with the help of Morphy:</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; wn.morphy('denied', wn.NOUN)
&gt;&gt;&gt; print(wn.morphy('denied', wn.VERB))
deny
&gt;&gt;&gt; wn.synsets('denied', wn.NOUN)
[]
&gt;&gt;&gt; wn.synsets('denied', wn.VERB) # doctest: +NORMALIZE_WHITESPACE
[Synset('deny.v.01'), Synset('deny.v.02'), Synset('deny.v.03'), Synset('deny.v.04'),
Synset('deny.v.05'), Synset('traverse.v.03'), Synset('deny.v.07')]
</pre>
</blockquote>
<p>Morphy uses a combination of inflectional ending rules and exception
lists to handle a variety of different possibilities:</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; print(wn.morphy('dogs'))
dog
&gt;&gt;&gt; print(wn.morphy('churches'))
church
&gt;&gt;&gt; print(wn.morphy('aardwolves'))
aardwolf
&gt;&gt;&gt; print(wn.morphy('abaci'))
abacus
&gt;&gt;&gt; print(wn.morphy('book', wn.NOUN))
book
&gt;&gt;&gt; wn.morphy('hardrock', wn.ADV)
&gt;&gt;&gt; wn.morphy('book', wn.ADJ)
&gt;&gt;&gt; wn.morphy('his', wn.NOUN)
&gt;&gt;&gt;
</pre>
</blockquote>
</div>
<div class="section" id="synset-closures">
<h1>Synset Closures</h1>
<p>Compute transitive closures of synsets</p>
<blockquote>
<pre class="doctest-block">&gt;&gt;&gt; dog = wn.synset('dog.n.01')
&gt;&gt;&gt; hypo = lambda s: s.hyponyms()
&gt;&gt;&gt; hyper = lambda s: s.hypernyms()
&gt;&gt;&gt; list(dog.closure(hypo, depth=1)) == dog.hyponyms()
True
&gt;&gt;&gt; list(dog.closure(hyper, depth=1)) == dog.hypernyms()
True
&gt;&gt;&gt; list(dog.closure(hypo))
[Synset('basenji.n.01'), Synset('corgi.n.01'), Synset('cur.n.01'),
 Synset('dalmatian.n.02'), Synset('great_pyrenees.n.01'),
 Synset('griffon.n.02'), Synset('hunting_dog.n.01'), Synset('lapdog.n.01'),
 Synset('leonberg.n.01'), Synset('mexican_hairless.n.01'),
 Synset('newfoundland.n.01'), Synset('pooch.n.01'), Synset('poodle.n.01'), ...]
&gt;&gt;&gt; list(dog.closure(hyper))
[Synset('canine.n.02'), Synset('domestic_animal.n.01'), Synset('carnivore.n.01'),
Synset('animal.n.01'), Synset('placental.n.01'), Synset('organism.n.01'),
Synset('mammal.n.01'), Synset('living_thing.n.01'), Synset('vertebrate.n.01'),
Synset('whole.n.02'), Synset('chordate.n.01'), Synset('object.n.01'),
Synset('physical_entity.n.01'), Synset('entity.n.01')]
</pre>
</blockquote>
</div>

</div>
</body></html>