# The distributional hypothesis

The core idea of computational semantics is based on the [distributional hypothesis](https://en.wikipedia.org/wiki/Distributional_semantics) of Harris and Firth, namely that the meaning of words can be induced from its neighboring words.

A simple way to test this hypothesis is via concordance lists in which we query for a words and its neighborhood.

In [1]:
import nltk
from nltk.corpus import gutenberg

corpora = {}
for f in gutenberg.fileids():
    print("Processing: ", f)
    text = nltk.Text(gutenberg.words(f))
    corpora[f] = text


Processing:  austen-emma.txt
Processing:  austen-persuasion.txt
Processing:  austen-sense.txt
Processing:  bible-kjv.txt
Processing:  blake-poems.txt
Processing:  bryant-stories.txt
Processing:  burgess-busterbrown.txt
Processing:  carroll-alice.txt
Processing:  chesterton-ball.txt
Processing:  chesterton-brown.txt
Processing:  chesterton-thursday.txt
Processing:  edgeworth-parents.txt
Processing:  melville-moby_dick.txt
Processing:  milton-paradise.txt
Processing:  shakespeare-caesar.txt
Processing:  shakespeare-hamlet.txt
Processing:  shakespeare-macbeth.txt
Processing:  whitman-leaves.txt


In [2]:
for k,t in corpora.items():
    cl = t.concordance_list('mouse')
    print("======={}=======".format(k))
    for c in cl:
        print(c.line)

pon the earth ; the weasel , and the mouse , and the tortoise after his kind , 
lesh , and the abomination , and the mouse , shall be consumed together , saith
e you had quiet Guard ? Fran . Not a Mouse stirring Barn . Well , goodnight . I
hat do you call the Play ? Ham . The Mouse - trap : Marry how ? Tropically : Th
Wanton on your cheeke , call you his Mouse , And let him for a paire of reechie
bit , Jumper the Hare , Danny Meadow Mouse , Johnny Chuck , Jimmy Skunk , Unc '
Even timid little Whitefoot the Wood Mouse was where he could peer out and see 
 d head surpasses any statue , And a mouse is miracle enough to stagger sextill
tch a bat , and that ' s very like a mouse , you know . But do cats eat bats , 
she soon made out that it was only a mouse that had slipped in like herself . '
' thought Alice , ' to speak to this mouse ? Everything is so out - of - the - 
harm in trying .' So she began : ' O Mouse , do you know the way out of this po
ery tired of swimming about here , O Mou

In [3]:
gutenberg.words(f)[120:130]

["'", 's', '-', 'Self', 'I', 'Sing', 'One', "'", 's', '-']

In [4]:
f

'whitman-leaves.txt'