# Generating Concordances

This notebook shows how you can generate a concordance using lists of tokens.

First we see what text files we have. 

In [1]:
ls *.txt

FullText.txt                performanceConcordance.txt
Hume Enquiry.txt            theWritingStory.txt
StoryOfWriting.txt          truthConcordance.txt
bigdata.txt


We are going to use the "Hume Enquiry.txt" from the Gutenberg Project. You can use whatever text you want. We print the first 50 characters to check.

In [2]:
theText2Use = "Hume Enquiry.txt"
with open(theText2Use, "r") as fileToRead:
    fileRead = fileToRead.read()
    
print("This string has", len(fileRead), "characters.")
print(fileRead[:50])

This string has 366798 characters.
The Project Gutenberg EBook of An Enquiry Concerni


### Cleaning the Text

Coming from Gutenbur

## Tokenization

Now we tokenize the text producing a list called "listOfTokens" and check the first words. This eliminate punctuation and lowercases the words.

In [3]:
import re
listOfTokens = re.findall(r'\b\w[\w-]*\b', fileRead.lower())
print(listOfTokens[:10])

['the', 'project', 'gutenberg', 'ebook', 'of', 'an', 'enquiry', 'concerning', 'human', 'understanding']


## Input

Now we get the word you want a concordance for an the context wanted.

In [1]:
word2find = input("What word do you want concordances for? ").lower() # Ask for the word to search for
context = input("How much context do you want? ") # This asks for the context of words on either side to grab

What word do you want concordances for? 
How much context do you want? 


## Main function

Here is the main function that does the work populating a new list with the lines of concordance. We check the first 5 concordance lines.

In [5]:
def makeConc(word2conc,list2FindIn,context2Use,concList):
    # Lets get 
    end = len(list2FindIn)
    for location in range(end):
        if list2FindIn[location] == word2conc:
            # Here we check whether we are at the very beginning or end
            if (location - context2Use) < 0:
                beginCon = 0
            else:
                beginCon = location - context2Use
                
            if (location + context2Use) > end:
                endCon = end
            else:
                endCon = location + context2Use + 1
                
            theContext = (list2FindIn[beginCon:endCon])
            concordanceLine = ' '.join(theContext)
            # print(str(location) + ": " + concordanceLine)
            concList.append(str(location) + ": " + concordanceLine)

theConc = []
makeConc(word2find,listOfTokens,int(context),theConc)
theConc[-5:]

['49120: ever hope to reach truth and attain a proper',
 '49395: the proper _criteria_ of truth and falsehood there are',
 '50639: and undoubtedly with great truth to have composed his',
 '56851: passion except love of truth and so has few',
 '57892: and space 124 f truth 8 17 v _scepticism_']

## Output

Finally, we output to a text file.

In [6]:
nameOfResults = word2find.capitalize() + ".Concordance.txt"

with open(nameOfResults, "w") as fileToWrite:
    for line in theConc:
        fileToWrite.write(line + "\n")
    
print("Done")

Done


Here we check that the file was created.

In [7]:
ls

Basic CSV Handling.ipynb             Truth.Concordance.txt
Concordances.ipynb                   Truths.Concordance.txt
ExampleTable.csv                     Untitled.ipynb
Exploring a text with NLTK.ipynb     Untitled1.ipynb
Hume Enquiry.txt                     Untitled2.ipynb
Python language notes.ipynb          theText.txt
Teaching IPython to Humanists.ipynb


---
[CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/) From [The Art of Literary Text Analysis](ArtOfLiteraryTextAnalysis.ipynb) by [Stéfan Sinclair](http://stefansinclair.name) &amp; [Geoffrey Rockwell](http://geoffreyrockwell.com)<br >Created September 30th, 2016 (Jupyter 4.2.1)