# Extracting Dictionaries

This script extracts dictionaries from a spreadsheet for use in concept analysis. It uses the General Inquirer speadsheet "inquirerbasic.csv" which has classes of words. It will extract a dictionary (list of words) and save it to a text file for use in dictionary based analysis.

In [3]:
import csv

### Checking files

Here we check where our CSV file is.

In [4]:
ls

Basic CSV Handling.ipynb             Truths.Concordance.txt
Concordances.ipynb                   Untitled.ipynb
Counting Word Types.ipynb            Untitled1.ipynb
ExampleTable.csv                     Untitled2.ipynb
Extracting Dictionaries.ipynb        Untitled3.ipynb
Hume Enquiry.txt                     Using TextBlob.ipynb
Python language notes.ipynb          Web Scraping.ipynb
ScrapeResults.txt                    Web Scraping.ipynb 2
ScrapeResults.xml                    [31mcountdocsmatrix.ipynb[m[m*
Sentiment Over a Text.ipynb          fortmactweets.may3-4.2016.txt
Teaching IPython to Humanists.ipynb  inquirerbasic.csv
Truth.Concordance.txt                theText.txt


## Getting the data out of the CSV

Here we open the named CSV, extract the data and then display the first row of headings that shows us the possible categories. The idea is that you can then pick the category that you want.

In [7]:
csvFile = "inquirerbasic.csv"

In [51]:
listOfRows = []
with open(csvFile, 'r') as file: # This makes sure that file is closed after reading
    data = csv.reader(file)
    for row in data:
        listOfRows.append(row)
file.closed

categories = listOfRows[0]
for i in range(2,len(categories)): # Note that we start at 2 as the first to labels are not categories
    print(str(i), ": ", categories[i])

2 :  Positiv
3 :  Negativ
4 :  Pstv
5 :  Affil
6 :  Ngtv
7 :  Hostile
8 :  Strong
9 :  Power
10 :  Weak
11 :  Submit
12 :  Active
13 :  Passive
14 :  Pleasur
15 :  Pain
16 :  Feel
17 :  Arousal
18 :  EMOT
19 :  Virtue
20 :  Vice
21 :  Ovrst
22 :  Undrst
23 :  Academ
24 :  Doctrin
25 :  Econ@
26 :  Exch
27 :  ECON
28 :  Exprsv
29 :  Legal
30 :  Milit
31 :  Polit@
32 :  POLIT
33 :  Relig
34 :  Role
35 :  COLL
36 :  Work
37 :  Ritual
38 :  SocRel
39 :  Race
40 :  Kin@
41 :  MALE
42 :  Female
43 :  Nonadlt
44 :  HU
45 :  ANI
46 :  PLACE
47 :  Social
48 :  Region
49 :  Route
50 :  Aquatic
51 :  Land
52 :  Sky
53 :  Object
54 :  Tool
55 :  Food
56 :  Vehicle
57 :  BldgPt
58 :  ComnObj
59 :  NatObj
60 :  BodyPt
61 :  ComForm
62 :  COM
63 :  Say
64 :  Need
65 :  Goal
66 :  Try
67 :  Means
68 :  Persist
69 :  Complet
70 :  Fail
71 :  NatrPro
72 :  Begin
73 :  Vary
74 :  Increas
75 :  Decreas
76 :  Finish
77 :  Stay
78 :  Rise
79 :  Exert
80 :  Fetch
81 :  Travel
82 :  Fall
83 :  Think
84 :  Kno

## Extract category words

Here we take the category column number you want (corresponding to the category you want) and extract all the words that match. We then give you the number of words and the first 50.

In [55]:
category = 2

words = []
for row in listOfRows[1:]: # We iterate over the rows skipping the header row
    if row[category] != "":
        words.append(row[0])

print(str(len(words)) + " words in category:" + " " + categories[category])
words[:50]

1915 words in category: Positiv


['ABIDE',
 'ABILITY',
 'ABLE',
 'ABOUND',
 'ABSOLVE',
 'ABSORBENT',
 'ABSORPTION',
 'ABUNDANCE',
 'ABUNDANT',
 'ACCEDE',
 'ACCENTUATE',
 'ACCEPT',
 'ACCEPTABLE',
 'ACCEPTANCE',
 'ACCESSIBLE',
 'ACCESSION',
 'ACCLAIM',
 'ACCLAMATION',
 'ACCOLADE',
 'ACCOMMODATE',
 'ACCOMMODATION',
 'ACCOMPANIMENT',
 'ACCOMPLISH',
 'ACCOMPLISHMENT',
 'ACCORD#2',
 'ACCORD#3',
 'ACCORD#5',
 'ACCORDANCE',
 'ACCOUNTABLE',
 'ACCRUE',
 'ACCURACY',
 'ACCURATE',
 'ACCURATENESS',
 'ACHIEVE',
 'ACHIEVEMENT',
 'ACKNOWLEDGEMENT',
 'ACQUAINT',
 'ACQUAINTANCE',
 'ACQUIT',
 'ACQUITTAL',
 'ACTUAL#1',
 'ACTUAL#2',
 'ACTUALITY',
 'ADAMANT',
 'ADAPTABILITY',
 'ADAPTABLE',
 'ADAPTATION',
 'ADAPTIVE',
 'ADEPT',
 'ADEPTNESS']

## Cleaning the list

Now we clean the list of the repeating words (which have more than one sense.) Note that we just keep one copy of the word. We print out the new number of words in the category and the first 50 words.

In [57]:
cleanedWords = []
theLastWord = ""
for word in words:
    if "#" in word:
        if theLastWord not in word:
            cleanedWords.append(word[:-2])
            theLastWord = (word[:-2])
    else:
        cleanedWords.append(word)
        theLastWord = word

print(str(len(cleanedWords)) + " words in cleaned category:" + " " + categories[category])
cleanedWords[:50]

1631 words in cleaned category: Positiv


['ABIDE',
 'ABILITY',
 'ABLE',
 'ABOUND',
 'ABSOLVE',
 'ABSORBENT',
 'ABSORPTION',
 'ABUNDANCE',
 'ABUNDANT',
 'ACCEDE',
 'ACCENTUATE',
 'ACCEPT',
 'ACCEPTABLE',
 'ACCEPTANCE',
 'ACCESSIBLE',
 'ACCESSION',
 'ACCLAIM',
 'ACCLAMATION',
 'ACCOLADE',
 'ACCOMMODATE',
 'ACCOMMODATION',
 'ACCOMPANIMENT',
 'ACCOMPLISH',
 'ACCOMPLISHMENT',
 'ACCORD',
 'ACCORDANCE',
 'ACCOUNTABLE',
 'ACCRUE',
 'ACCURACY',
 'ACCURATE',
 'ACCURATENESS',
 'ACHIEVE',
 'ACHIEVEMENT',
 'ACKNOWLEDGEMENT',
 'ACQUAINT',
 'ACQUAINTANCE',
 'ACQUIT',
 'ACQUITTAL',
 'ACTUAL',
 'ACTUALITY',
 'ADAMANT',
 'ADAPTABILITY',
 'ADAPTABLE',
 'ADAPTATION',
 'ADAPTIVE',
 'ADEPT',
 'ADEPTNESS',
 'ADEQUATE',
 'ADHERENCE',
 'ADHERENT']

## Saving the list of words

Finally, we save the list of words. Note that we lowercase the words.

In [49]:
nameOfDict = categories[category] + ".dictionary.txt"

with open(nameOfDict, "w") as fileToWrite:
    for word in cleanedWords:
        fileToWrite.write(word.lower() + "\n")
    
print("Done")

Done


## Next steps

You can now use this list with a dictionary based sentiment analysis tool.