### Representing the meaning of a word the linguistic way  

#### Definition of meaning (from Webster dictionary)  
* the idea that is represented by a word, phrase, etc.  
* the idea that a person wants to express by using words, signs, etc.  
* the idea that is expressed in a work of writing, art, etc.  

Linguistic way of defining meaning : denotation (signifier)

__Source__:  
[Deep Learning for Natural Language Processing, Stanford University School of Engineering](https://www.youtube.com/watch?v=ERibwqs9p38&list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6&index=2)

### What is WordNet?  
WordNet is a dataset that uses taxonomy for words and has hypernyms (is-a) relationships and synonyms sets. It can be accessed via the nltk.corpus library (see code below for example).

In [6]:
import pip
import sys
if not 'nltk' in sys.modules.keys():
    pip.main(['install', 'nltk'])
    import nltk
    nltk.download()

In [7]:
from nltk.corpus import wordnet as wn
monkey = wn.synset('monkey.n.01')
hyper = lambda s: s.hypernyms()
list(monkey.closure(hyper))

[Synset('primate.n.02'),
 Synset('placental.n.01'),
 Synset('mammal.n.01'),
 Synset('vertebrate.n.01'),
 Synset('chordate.n.01'),
 Synset('animal.n.01'),
 Synset('organism.n.01'),
 Synset('living_thing.n.01'),
 Synset('whole.n.02'),
 Synset('object.n.01'),
 Synset('physical_entity.n.01'),
 Synset('entity.n.01')]

### Problems with discrete representation such as that in WordNet  
* Also called as local representation
* Great as a resource but missing nuances. For e.g., synonyms
* Missing new words
* Subjective
* Requires human labour to create and adapt
* Hard to compute accurate word similarity
* The vast majority of rule-based and statistical NLP work regards words as atomic symbols
* In vector-space terms, this is a vector with one 1 and a lot of zeroes *(known as one-hot encoding/representation)*
* In this case, the two vectors to be looked at for similarity are orthogonal. There is no natural notion of similarity in a set of one-hot vectors.

__Source__:  
[Deep Learning for Natural Language Processing, Stanford University School of Engineering](https://www.youtube.com/watch?v=ERibwqs9p38&list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6&index=2)

### Distributional similarity is a solution to the above problems  
* Represent a word by meaning of its neighbouring words. 
* This helps us build a dense vector for each word type, chosen so that it is good at predicting other words appearing in the context.  

__Source__:  
[Deep Learning for Natural Language Processing, Stanford University School of Engineering](https://www.youtube.com/watch?v=ERibwqs9p38&list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6&index=2)

### Neural Network Word Embeddings  
We define a model that aims to predict between a center word w<sub>t</sub> and context words in terms of word vectors   
> p(context|w<sub>t</sub>) = ...  

which has a loss function, e.g.,
> J = 1 - p(w<sub>-t</sub>|w<sub>t</sub>)

We look at many positions t in a big language corpus. We keep adjusting the vector representations of words to minimize this loss.

__Source__:  
[Deep Learning for Natural Language Processing, Stanford University School of Engineering](https://www.youtube.com/watch?v=ERibwqs9p38&list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6&index=2)

### References  
* [DE Rumelhart (1986), Learning representations by back-propagating errors](https://www.iro.umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf)  
* [Bengio et al (2003), A neural probabilistic language model](www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf)
* [R Collobert (2011), Natural Language Processing(almost) from Scratch](www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf)
* [Mikolov et al (2013), Distributed Representations of Words and Phrases and their Compositionality](https://arxiv.org/pdf/1310.4546)
