In [2]:
%load_ext autoreload
%autoreload 2

import pickle
import numpy as np


Sunday 10/21:

- Set deadlines for when certain things should be done by



Monday 10/22:

- Created the repository and the ReadMe



Sunday 10/28:

- Tried to figure out what we should do about dangling links when making the matrices (which occur when people produce a target that was never tested as a cue) 
- Decided on two options: 
1. a matrix that just ignores dangling links (and in the case of a stochstic matrix normalizes the remaining values to sum to 1)
2. a matrix with entries for the dangling links. We talked about just having no out-edges from nodes corresponding to the dangling links. But now I'm thinking we could just give them links to all other nodes (with weights uniformly distributed in the case of a stochastic matrix)

- I wrote the functions to create the matrices for the first option

- We also discussed what form to load in the data in. We decided on a dict mapping cues to targets, a list of the cues, and list of the unnormed targets.

Monday 10/29:

- Wrote methods to create the matrices (boolean and stochastic) for the second option from above. So the matrix has rows/cols for the dangling links, but right now these nodes have no out-going edges. 
- Immediate Next Step: we still have to decide what we'll do with dangling links and code it up

Wednesday 10/31:

- Decided to give the nodes corresponding to unnormed cues out-edges to all other nodes (all equally weighted in the case of the stocahstic matrix)
- Updated the createFullMatrix methods to reflect this decision
- Noticed that these unnormed nodes shouldn't have out-edges to themselves, so updated method to reflect this

Sunday 11/4:

- Discussed how to get the files storing the matrices to be small enough to be pushed to github
- Decided to just re-compute these matrices each time for now, since it only takes a few minutes and we weren't able to find an easy way of compressing the files further
- Checked to make sure the matrices being computed were correct
- Carefully read through the PageRank chapters of $\textit{Google's PageRank and Beyond: The Science of Search Engine Rankings}$ (https://muse.jhu.edu/book/36229/)
- Next Step: Begin Immplementing PageRank

Sunday 11/11:
    
- Wrote a function to load the two types of retrievability data. The K&F frequency is the frequency that words showd up in a text sample of about a million words, due to Kucera & Francis (1967). The accessibility index of a word x is the number of cues for which at leasrt 1 person produced x. 
- Wrote some tests below to make sure the dictionaries are actually being loaded correctly
- Realized that the Griffiths et al. paper doesn't have their full data set and that the K&F data is word frequency in text, not from a similar procedure to Griffiths et al.'s. This is concerning because we then don't anything to evaluate our models' performances against. Searched Grifiths's page and elsewhere on the internet for similar data sets (keyword: phonemic fluency task), but wasn't able to find any trials done on healthy adults. 
- Next Steps: Need to find a way of evaluating our models. Possibilites: 
    - Use the partial data in Griffiths paper, which only has 7 letters. It has the human frequencies for the top 10 words, and the frequencies for the top 10 words predicted by each model. 
    - Find some other dataset somewhere online
    - Run the experiment ourselves and gather our own data
    - Compare against some other metric besides the human production frequency (I hate this idea unless we can find something really closely related.)

In [3]:
import accessibility_loader

kf_frequencies, accessibility_indices = accessibility_loader.loadDicts()
#print("food:", kf_frequencies['food'], accessibility_indices['food'])
#print("money:", kf_frequencies['money'], accessibility_indices['money'])
#print("water:", kf_frequencies['water'], accessibility_indices['water'])

assert(kf_frequencies['night'] == 411)
assert(accessibility_indices['night'] == 55)

assert(kf_frequencies['bar'] == 82)
assert(accessibility_indices['bar'] == 46)

assert(kf_frequencies['fake'] == 10)
assert(accessibility_indices['fake'] == 46)

assert(kf_frequencies['grandparents'] == 3)
assert(accessibility_indices['grandparents'] == 2)

assert(kf_frequencies['man'] == 1207)
assert(accessibility_indices['man'] == 171)

assert(kf_frequencies.get('asdfasdf') == None)
assert(accessibility_indices.get('asdfasdfas') == None)
assert(kf_frequencies.get('devices') == None)
assert(kf_frequencies.get('peal') == None)


Monday 11/12:

- Went of office hours to talk the problem through with Anna. She suggested to use Griffiths partial data. She said we could run the experiment ourselves but she wouldn't recommend it unless we want to. Decided to run the experiment ourselves.  
- Created a google form to run the experiment (with Luna)
- After a trial on a few people, decided on the final experimental procedure: 
    - record name and gender
    - run through all the letters in a random order, twice
    - skipped the same letters as Griffiths et al. (K, Q, X, Y, Z)
    - "You're going to see a letter on the screen, and for each one, tell me first word starting with the letter than comes into your head."
    - Decided that any first word is ok, even if it is "inappropriate", a proper noun, or a repetition from teh first trial. However, non-English words are not allowed.
    - Let the participant read the letter off the screen so as not to confuse them. People kept thinking "eye" when I would say "I", "You" when I would say "U", etc. Zoom in to 500% in the browser so they participant can only see 1 letter at a time. 
    - When they say a word, I type it in and then tab to the next letter.
- Ran the experiment on roughly 25 people. 

Tuesday 11/13:

- Ran the experiment on roughly 5 more people to get to a total of 30 full trials 
- Thought about a couple of things we'll have to decide for cleaning the data: 
    - Do we count both entries if someone said the same thing in both trials?
    - Do we count the plural forms as the same as the singular? What about verb tenses? What about like apple and applesauce?
- Wrote code to load in the data from a CSV and put it into dictionaries. A dict maps each letter to another dict. Each of these dicts maps words (starting with that letter) to its frequency.
- Resolved this by just going through the preliminary dictionaries and deciding case-by-case which changes to make. The only changes we made were: 
    - inhibited to inhibit
    - lemons to lemon
    - males to male
    - rhino to rhinoceros
    - t-rex to tyrannosaurus
    - wins to win
    - (I made these changes by directly modifying the CSV)


## Load and Process Experimental Data:

In [4]:
import experimental_data_loader

frequencies = experimental_data_loader.load_data("data/ExperimentalData.csv")
tested_letters = ["A","B",'C','D','E','F','G','H','I','J','L','M','N','O','P','R','S','T','U','V','W']

for letter in tested_letters:
    print(sorted(frequencies[letter].items()))
    print("\n")

[('A', 1), ('AARDVARK', 1), ('ABERCROMBIE', 1), ('ACE', 1), ('AGGREGATE', 1), ('AIR', 1), ('ALBEIT', 1), ('ALEXANDER', 1), ('ALLEHANDRO', 1), ('ALLIGATOR', 1), ('ALPACCA', 1), ('ALPHABET', 9), ('AM', 1), ('AN', 1), ('AND', 1), ('ANDROGYNOUS', 1), ('ANGLE', 1), ('ANIMAL', 3), ('ANSWER', 6), ('ANT', 1), ('ANY', 1), ('ANYA', 1), ('APOSTROPHE', 1), ('APPLE', 14), ('APPLESAUCE', 2), ('ASSHOLE', 1), ('ATHLETE', 1), ('ATTACK', 1), ('ATTEMPT', 1), ('ATTRIBUTE', 1), ('AWESOME', 1)]


[('BABY', 2), ('BACHELOR', 1), ('BACK', 4), ('BAD', 1), ('BARBER', 1), ('BARELY', 1), ('BASEBALL', 1), ('BASKETBALL', 2), ('BASTARD', 1), ('BAT', 1), ('BATMAN', 1), ('BEAR', 2), ('BEAUTIFUL', 1), ('BECAUSE', 1), ('BEE', 3), ('BEGIN', 2), ('BELL', 2), ('BEN', 1), ('BERRY', 1), ('BIG', 1), ('BIRTHDAY', 2), ('BLACK', 2), ('BLUE', 1), ('BOAT', 2), ('BOB', 1), ('BOG', 1), ('BONITIS', 1), ('BONY', 1), ('BOOK', 5), ('BOTANIST', 1), ('BOY', 4), ('BRAG', 1), ('BRAIN', 2), ('BRING', 2), ('BROWN', 2), ('BUCK', 1), ('BUILDING'

Tuesday 11/13 (continued):

- Wrote the code below to sort each letter's dictionary by the values (frequencies)
- Wrote the code below to remove words that were only produced once
- Next Step: If we have time, write code to remove results that were produced only twice and both by the same person
    

In [5]:
ordered_frequencies = []
for letter in tested_letters:
    ordered_frequencies.append(sorted(frequencies[letter].items(), key=lambda x: x[1])[::-1])
    
for i in range(len(ordered_frequencies)):
    print("Most commonly produced words for " + tested_letters[i] + ":")
    for pair in ordered_frequencies[i]:
        print(pair[0]+ " ({0} times)".format(pair[1]))
    print("\n")

Most commonly produced words for A:
APPLE (14 times)
ALPHABET (9 times)
ANSWER (6 times)
ANIMAL (3 times)
APPLESAUCE (2 times)
ALEXANDER (1 times)
ANY (1 times)
APOSTROPHE (1 times)
ATTACK (1 times)
ANT (1 times)
AWESOME (1 times)
ATTRIBUTE (1 times)
ALPACCA (1 times)
AND (1 times)
ABERCROMBIE (1 times)
AGGREGATE (1 times)
AN (1 times)
ANDROGYNOUS (1 times)
ATHLETE (1 times)
AM (1 times)
A (1 times)
ANYA (1 times)
AARDVARK (1 times)
ALBEIT (1 times)
ALLEHANDRO (1 times)
ASSHOLE (1 times)
ANGLE (1 times)
AIR (1 times)
ALLIGATOR (1 times)
ACE (1 times)
ATTEMPT (1 times)


Most commonly produced words for B:
BOOK (5 times)
BACK (4 times)
BOY (4 times)
BEE (3 times)
BRING (2 times)
BABY (2 times)
BEAR (2 times)
BLACK (2 times)
BRAIN (2 times)
BASKETBALL (2 times)
BEGIN (2 times)
BROWN (2 times)
BOAT (2 times)
BIRTHDAY (2 times)
BELL (2 times)
BASEBALL (1 times)
BRAG (1 times)
BECAUSE (1 times)
BUCK (1 times)
BUILDING (1 times)
BOTANIST (1 times)
BERRY (1 times)
BACHELOR (1 times)
BONY (1 t

In [6]:
ordered_frequencies_no_singles = []
#for letter_list in ordered_frequencies: 
#    ordered_frequencies_no_singles.append(item for item in letter_list if item[1] != 1)
for letter_list in ordered_frequencies:
    to_append = []
    for item in letter_list:
        if item[1] != 1:
            to_append.append(item)
    ordered_frequencies_no_singles.append(to_append)
    
    
for i in range(len(ordered_frequencies_no_singles)):
    print("Most commonly produced words for " + tested_letters[i] + " (singletons removed):")
    for pair in ordered_frequencies_no_singles[i]:
        print(pair[0]+ " ({0} times)".format(pair[1]))
    print("\n")

Most commonly produced words for A (singletons removed):
APPLE (14 times)
ALPHABET (9 times)
ANSWER (6 times)
ANIMAL (3 times)
APPLESAUCE (2 times)


Most commonly produced words for B (singletons removed):
BOOK (5 times)
BACK (4 times)
BOY (4 times)
BEE (3 times)
BRING (2 times)
BABY (2 times)
BEAR (2 times)
BLACK (2 times)
BRAIN (2 times)
BASKETBALL (2 times)
BEGIN (2 times)
BROWN (2 times)
BOAT (2 times)
BIRTHDAY (2 times)
BELL (2 times)


Most commonly produced words for C (singletons removed):
CAT (15 times)
CUP (3 times)
CAR (3 times)
CLAY (2 times)
CLOSE (2 times)
COLIN (2 times)
COMPLEX (2 times)
CONTROLLER (2 times)


Most commonly produced words for D (singletons removed):
DOG (16 times)
DICK (5 times)
DAD (3 times)
DINOSAUR (2 times)
DANDELION (2 times)
DRAIN (2 times)


Most commonly produced words for E (singletons removed):
ELEPHANT (17 times)
EAT (7 times)
EGO (2 times)
EXPERIENCE (2 times)
ENTER (2 times)
ELEMENT (2 times)
EAR (2 times)
EDWARD (2 times)
EXPERIMENT (2 ti

Tuesday 11/13 (continued):

- Wrote a new method to load the data but discount the second time a subject says the same word. (the previous Possible Next Step)
- Re-ran the two follow-up steps from above on the new dictionary. Code Below:

In [7]:
import experimental_data_loader

frequencies_no_repeats = experimental_data_loader.load_data_no_repeats("data/ExperimentalData.csv")
for letter in tested_letters:
    print(sorted(frequencies_no_repeats[letter].items()))
    print("\n")
    

[('A', 1), ('AARDVARK', 1), ('ABERCROMBIE', 1), ('ACE', 1), ('AGGREGATE', 1), ('AIR', 1), ('ALBEIT', 1), ('ALEXANDER', 1), ('ALLEHANDRO', 1), ('ALLIGATOR', 1), ('ALPACCA', 1), ('ALPHABET', 7), ('AM', 1), ('AN', 1), ('AND', 1), ('ANDROGYNOUS', 1), ('ANGLE', 1), ('ANIMAL', 3), ('ANSWER', 5), ('ANT', 1), ('ANY', 1), ('ANYA', 1), ('APOSTROPHE', 1), ('APPLE', 11), ('APPLESAUCE', 2), ('ASSHOLE', 1), ('ATHLETE', 1), ('ATTACK', 1), ('ATTEMPT', 1), ('ATTRIBUTE', 1), ('AWESOME', 1)]


[('BABY', 2), ('BACHELOR', 1), ('BACK', 3), ('BAD', 1), ('BARBER', 1), ('BARELY', 1), ('BASEBALL', 1), ('BASKETBALL', 2), ('BASTARD', 1), ('BAT', 1), ('BATMAN', 1), ('BEAR', 1), ('BEAUTIFUL', 1), ('BECAUSE', 1), ('BEE', 3), ('BEGIN', 1), ('BELL', 2), ('BEN', 1), ('BERRY', 1), ('BIG', 1), ('BIRTHDAY', 1), ('BLACK', 2), ('BLUE', 1), ('BOAT', 1), ('BOB', 1), ('BOG', 1), ('BONITIS', 1), ('BONY', 1), ('BOOK', 4), ('BOTANIST', 1), ('BOY', 3), ('BRAG', 1), ('BRAIN', 2), ('BRING', 1), ('BROWN', 2), ('BUCK', 1), ('BUILDING'

In [8]:
ordered_frequencies_no_repeats = []
for letter in tested_letters: 
    ordered_frequencies_no_repeats.append(sorted(frequencies_no_repeats[letter].items(), key=lambda x: x[1])[::-1])


for i in range(len(ordered_frequencies_no_repeats)):
    print("Most commonly produced words for " + tested_letters[i] + ":")
    for pair in ordered_frequencies_no_repeats[i]:
        print(pair[0]+ " ({0} times)".format(pair[1]))
    print("\n")

Most commonly produced words for A:
APPLE (11 times)
ALPHABET (7 times)
ANSWER (5 times)
ANIMAL (3 times)
APPLESAUCE (2 times)
ALEXANDER (1 times)
ANY (1 times)
APOSTROPHE (1 times)
ATTACK (1 times)
ANT (1 times)
AWESOME (1 times)
ATTRIBUTE (1 times)
ALPACCA (1 times)
AND (1 times)
ABERCROMBIE (1 times)
AGGREGATE (1 times)
AN (1 times)
ANDROGYNOUS (1 times)
ATHLETE (1 times)
AM (1 times)
A (1 times)
ANYA (1 times)
AARDVARK (1 times)
ALBEIT (1 times)
ALLEHANDRO (1 times)
ASSHOLE (1 times)
ANGLE (1 times)
AIR (1 times)
ALLIGATOR (1 times)
ACE (1 times)
ATTEMPT (1 times)


Most commonly produced words for B:
BOOK (4 times)
BEE (3 times)
BACK (3 times)
BOY (3 times)
BABY (2 times)
BLACK (2 times)
BRAIN (2 times)
BASKETBALL (2 times)
BROWN (2 times)
BELL (2 times)
BASEBALL (1 times)
BRAG (1 times)
BRING (1 times)
BECAUSE (1 times)
BUCK (1 times)
BUILDING (1 times)
BOTANIST (1 times)
BERRY (1 times)
BACHELOR (1 times)
BEAR (1 times)
BONY (1 times)
BEAUTIFUL (1 times)
BLUE (1 times)
BATMAN (1

WARM (1 times)
WEDNESDAY (1 times)
WYOMING (1 times)
WAYLAID (1 times)
WHOA (1 times)
WALRUS (1 times)
WONDERWALL (1 times)
WENT (1 times)
WEATHER (1 times)
WHEN (1 times)
WORK (1 times)
WATER BOTTLE (1 times)
WORD (1 times)
WILL (1 times)
WONDER (1 times)
WALK (1 times)
WET (1 times)
WHETHER (1 times)
WE (1 times)




In [9]:
ordered_frequencies_no_repeats_no_singles = []
for letter_list in ordered_frequencies_no_repeats:
    to_append = []
    for item in letter_list:
        if item[1] != 1:
            to_append.append(item)
    ordered_frequencies_no_repeats_no_singles.append(to_append)
    
for i in range(len(ordered_frequencies_no_repeats_no_singles)):
    print("Most commonly produced words for " + tested_letters[i] + " (singletons removed):")
    for pair in ordered_frequencies_no_repeats_no_singles[i]:
        print(pair[0]+ " ({0} times)".format(pair[1]))
    print("\n")
    
print(type(ordered_frequencies_no_repeats_no_singles))
print(type(ordered_frequencies_no_repeats_no_singles[0]))


Most commonly produced words for A (singletons removed):
APPLE (11 times)
ALPHABET (7 times)
ANSWER (5 times)
ANIMAL (3 times)
APPLESAUCE (2 times)


Most commonly produced words for B (singletons removed):
BOOK (4 times)
BEE (3 times)
BACK (3 times)
BOY (3 times)
BABY (2 times)
BLACK (2 times)
BRAIN (2 times)
BASKETBALL (2 times)
BROWN (2 times)
BELL (2 times)


Most commonly produced words for C (singletons removed):
CAT (11 times)
CUP (3 times)
CAR (2 times)


Most commonly produced words for D (singletons removed):
DOG (11 times)
DICK (5 times)
DINOSAUR (2 times)
DANDELION (2 times)
DRAIN (2 times)
DAD (2 times)


Most commonly produced words for E (singletons removed):
ELEPHANT (12 times)
EAT (5 times)
EXPERIENCE (2 times)
EAR (2 times)


Most commonly produced words for F (singletons removed):
FUCK (8 times)
FRIEND (3 times)
FROG (2 times)
FIVE (2 times)
FLUENCY (2 times)
FAT (2 times)
FLOWER (2 times)


Most commonly produced words for G (singletons removed):
GOD (5 times)
GREAT

In [10]:
with open('top_words.pickle', 'wb') as f:
    pickle.dump(ordered_frequencies_no_singles, f)
    
with open('top_words_no_repeats.pickle', 'wb') as f:
    pickle.dump(ordered_frequencies_no_repeats_no_singles, f)
    

Tuesday 11/13 (continued):

- Next Steps: If time, get the pickles above to work
- Create code to evaluate the hits and pagerank models against the most popular responses above

- Wrote the code below to load in the the data from the rankings
- Wrote the bode below that, creating dicts for each model that map letters to lists. Each list has the words starting with that letter, ordered by importance according to the model. This will be very useful for evaluating the models. 

## Model Evaluation and Visualization:

In [33]:
# Load in the data from the experiment we ran

with open('lib/top_words.pickle', 'rb') as f:
    top_words = pickle.load(f)
    
with open('lib/top_words_no_repeats.pickle', 'rb') as f:
    top_words_no_repeats = pickle.load(f)

[('APPLE', 11), ('ALPHABET', 7), ('ANSWER', 5), ('ANIMAL', 3), ('APPLESAUCE', 2)]


In [29]:
#Load in data from models

f = open("lib/pageRankRankings.pickle", 'r')
page_rank_rankings = pickle.load(f)

f = open("lib/hitsAuthRankings.pickle", 'r')
hits_auth_rankings = pickle.load(f)

f = open("lib/hitsHubRankings.pickle", 'r')
hits_hub_rankings = pickle.load(f)

f = open("lib/normedItems.pickle", 'r')
normed_list = pickle.load(f)

f = open("lib/unnormedItems.pickle", 'r')
unnormed_list = pickle.load(f)

all_indices = np.append(normed_list, unnormed_list)

# each of the np (1-d) ndarrays below contains lists of indices in order of importance according to the relevant model
# the values stored in the array correspond to indices in all_indices
page_rank_ordered = np.argsort(page_rank_rankings)[::-1]
hits_auth_ordered = np.argsort(hits_auth_rankings)[::-1]
hits_hub_ordered = np.argsort(hits_hub_rankings)[::-1]

[u'A' u'AARDVARK' u'ABDOMEN' ... u'ZODIAC' u'ZOMBIE' u'ZOOLOGY']


Tuesday 11/13 (continued):

Next Step:

For each model and each letter, we have a list of the words starting with that letter, ordered by importance accoring to the model. Additionally, we have the experimental data in a useable format: for each letter we used as a cue, we have an ordered list of those words that were produced by multiple partipants, in order of frequency. However, this data able to be printed but not accessed otherwise due to reasons (relating to generators I think) that I don't understand. Assuming the data is accessible, we simply need to loop through the words in the experimental production data and find the importance ranking that each model gave to that word (normalized for the total number of words starting with that letter). The aggregate of these scores is how Griffiths et al. evaluated their model. 

Wednesday 11/14:

- Edited the code above to turn all the generators into regular lists by just ripping out the code and writing it with vanilla for loops. I'm still not really sure why those were generators and not just list comprehension, but everything seeems to be working now. In particular, we have two lists of lists (top_words and top_words_no_repeats). There is one outer list for each letter we ran the experiment on. The inner lists contain tuples of words the participants prduced and their frequencies, ordered by frequency. Neither list contains words that only showed up once. The second list only counts a word once if a single participant produced it twice. 
- Now that the lists are working, it was trivial to get the pickle code to work as well. 
- Wrote the code below. For each word the participants in the experiment produced multiple times, it prints out the ranking that each of the 3 measures gave to that the word (or N/A if it wasn't in the semantic network)

In [46]:
page_rank_ordered_list = np.ndarray.tolist(page_rank_ordered)
hits_auth_ordered_list = np.ndarray.tolist(hits_auth_ordered)
hits_hub_ordered_list = np.ndarray.tolist(hits_hub_ordered)
indices_list = np.ndarray.tolist(all_indices)

results = []
for letter_index in range(len(tested_letters)):
    letter_results = []
    letter = tested_letters[letter_index]
    target_words = top_words_no_repeats[letter_index]
    
    for target, freq in target_words:
        target_results = [target]
        if target in indices_list:
            target_index = indices_list.index(target)
            target_results.append(page_rank_ordered_list.index(target_index))
            target_results.append(hits_auth_ordered_list.index(target_index))
            target_results.append(hits_hub_ordered_list.index(target_index))
        else: 
            target_results += (["N/A", "N/A", "N/A"])
        letter_results.append(target_results)
    results.append(letter_results)

for letter_result in results:
    print("\nTarget Word, PageRank, AuthScore, HubScore")
    for word_result in letter_result:
        print(word_result)
    


Target Word, PageRank, AuthScore, HubScore
['APPLE', 188, 274, 6991]
['ALPHABET', 1508, 2466, 9419]
['ANSWER', 264, 192, 9817]
['ANIMAL', 23, 20, 5621]
['APPLESAUCE', 'N/A', 'N/A', 'N/A']

Target Word, PageRank, AuthScore, HubScore
['BOOK', 41, 19, 7704]
['BEE', 921, 1098, 10281]
['BACK', 205, 256, 8844]
['BOY', 32, 209, 9446]
['BABY', 97, 76, 6465]
['BLACK', 25, 26, 9448]
['BRAIN', 307, 329, 7078]
['BASKETBALL', 484, 414, 8631]
['BROWN', 423, 387, 6221]
['BELL', 509, 1024, 9265]

Target Word, PageRank, AuthScore, HubScore
['CAT', 43, 150, 10591]
['CUP', 399, 402, 9486]
['CAR', 12, 5, 5713]

Target Word, PageRank, AuthScore, HubScore
['DOG', 3, 22, 10448]
['DICK', 3101, 5041, 5575]
['DINOSAUR', 3975, 2329, 6411]
['DANDELION', 5753, 4684, 10313]
['DRAIN', 2071, 1724, 6524]
['DAD', 459, 721, 10485]

Target Word, PageRank, AuthScore, HubScore
['ELEPHANT', 507, 1173, 6450]
['EAT', 36, 34, 9251]
['EXPERIENCE', 3798, 1977, 6517]
['EAR', 420, 847, 8196]

Target Word, PageRank, AuthScore, Hub

In [66]:
percentile_results = []
count_by_letter = [0 for i in range(len(tested_letters))]
for word in np.nditer(all_indices):
    print(word)
    print(type(word))
    #index = tested_letters.index(word[0])
    #count_by_letter[index] += 1
print(count_by_letter)

'''
for letter_index in range(len(tested_letters)):
    letter_results = []
    letter = tested_letters[letter_index]
    target_words = top_words_no_repeats[letter_index]
    
    for target, freq in target_words:
        target_results = [target]
        if target in indices_list:
            target_index = indices_list.index(target)
            target_results.append(page_rank_ordered_list.index(target_index))
            target_results.append(hits_auth_ordered_list.index(target_index))
            target_results.append(hits_hub_ordered_list.index(target_index))
        else: 
            target_results += (["N/A", "N/A", "N/A"])
        letter_results.append(target_results)
    results.append(letter_results)

for letter_result in results:
    print("\nTarget Word, PageRank, AuthScore, HubScore")
    for word_result in letter_result:
        print(word_result)
'''

A
<type 'numpy.ndarray'>
AARDVARK
<type 'numpy.ndarray'>
ABDOMEN
<type 'numpy.ndarray'>
ABDUCT
<type 'numpy.ndarray'>
ABILITY
<type 'numpy.ndarray'>
ABLE
<type 'numpy.ndarray'>
ABNORMAL
<type 'numpy.ndarray'>
ABOVE
<type 'numpy.ndarray'>
ABSENCE
<type 'numpy.ndarray'>
ABSENT
<type 'numpy.ndarray'>
ABSTRACT
<type 'numpy.ndarray'>
ABSURD
<type 'numpy.ndarray'>
ABUNDANCE
<type 'numpy.ndarray'>
ABUSE
<type 'numpy.ndarray'>
ACCELERATE
<type 'numpy.ndarray'>
ACCEPT
<type 'numpy.ndarray'>
ACCIDENT
<type 'numpy.ndarray'>
ACCOMPLISH
<type 'numpy.ndarray'>
ACCOMPLISHED
<type 'numpy.ndarray'>
ACCOUNT
<type 'numpy.ndarray'>
ACCUMULATE
<type 'numpy.ndarray'>
ACCURATE
<type 'numpy.ndarray'>
ACCUSE
<type 'numpy.ndarray'>
ACE
<type 'numpy.ndarray'>
ACHE
<type 'numpy.ndarray'>
ACHIEVE
<type 'numpy.ndarray'>
ACHIEVEMENT
<type 'numpy.ndarray'>
ACID
<type 'numpy.ndarray'>
ACKNOWLEDGE
<type 'numpy.ndarray'>
ACORN
<type 'numpy.ndarray'>
ACQUIRE
<type 'numpy.ndarray'>
ACRE
<type 'numpy.ndarray'>
ACROBAT
<typ

<type 'numpy.ndarray'>
CHRISTIAN
<type 'numpy.ndarray'>
CHRISTMAS
<type 'numpy.ndarray'>
CHROMOSOMES
<type 'numpy.ndarray'>
CHUCK
<type 'numpy.ndarray'>
CHUNK
<type 'numpy.ndarray'>
CHURCH
<type 'numpy.ndarray'>
CIGAR
<type 'numpy.ndarray'>
CIGARETTE
<type 'numpy.ndarray'>
CINEMA
<type 'numpy.ndarray'>
CINNAMON
<type 'numpy.ndarray'>
CIRCLE
<type 'numpy.ndarray'>
CIRCUMSTANCE
<type 'numpy.ndarray'>
CIRCUS
<type 'numpy.ndarray'>
CITIZEN
<type 'numpy.ndarray'>
CITRUS
<type 'numpy.ndarray'>
CITY
<type 'numpy.ndarray'>
CLAIM
<type 'numpy.ndarray'>
CLAIMS
<type 'numpy.ndarray'>
CLAM
<type 'numpy.ndarray'>
CLAMP
<type 'numpy.ndarray'>
CLARIFY
<type 'numpy.ndarray'>
CLARINET
<type 'numpy.ndarray'>
CLASS
<type 'numpy.ndarray'>
CLAW
<type 'numpy.ndarray'>
CLAY
<type 'numpy.ndarray'>
CLEAN
<type 'numpy.ndarray'>
CLEANER
<type 'numpy.ndarray'>
CLEANING
<type 'numpy.ndarray'>
CLEAR
<type 'numpy.ndarray'>
CLENCH
<type 'numpy.ndarray'>
CLERK
<type 'numpy.ndarray'>
CLEVER
<type 'numpy.ndarray'>
CLICK

DETAIL
<type 'numpy.ndarray'>
DETECTIVE
<type 'numpy.ndarray'>
DETERGENT
<type 'numpy.ndarray'>
DETERIORATE
<type 'numpy.ndarray'>
DETERMINE
<type 'numpy.ndarray'>
DEVELOP
<type 'numpy.ndarray'>
DEVELOPMENT
<type 'numpy.ndarray'>
DEVICE
<type 'numpy.ndarray'>
DEVIL
<type 'numpy.ndarray'>
DEW
<type 'numpy.ndarray'>
DIAGRAM
<type 'numpy.ndarray'>
DIAL
<type 'numpy.ndarray'>
DIAMETER
<type 'numpy.ndarray'>
DIAMOND
<type 'numpy.ndarray'>
DIAPER
<type 'numpy.ndarray'>
DIARY
<type 'numpy.ndarray'>
DICE
<type 'numpy.ndarray'>
DICTATOR
<type 'numpy.ndarray'>
DICTIONARY
<type 'numpy.ndarray'>
DIE
<type 'numpy.ndarray'>
DIET
<type 'numpy.ndarray'>
DIFFER
<type 'numpy.ndarray'>
DIFFERENCE
<type 'numpy.ndarray'>
DIFFERENT
<type 'numpy.ndarray'>
DIFFICULT
<type 'numpy.ndarray'>
DIFFICULTY
<type 'numpy.ndarray'>
DIG
<type 'numpy.ndarray'>
DIGEST
<type 'numpy.ndarray'>
DIGESTION
<type 'numpy.ndarray'>
DIGGER
<type 'numpy.ndarray'>
DIGIT
<type 'numpy.ndarray'>
DIGNITY
<type 'numpy.ndarray'>
DILEMMA
<t

<type 'numpy.ndarray'>
FUZZ
<type 'numpy.ndarray'>
FUZZY
<type 'numpy.ndarray'>
GAG
<type 'numpy.ndarray'>
GAIN
<type 'numpy.ndarray'>
GAL
<type 'numpy.ndarray'>
GALAXY
<type 'numpy.ndarray'>
GALLON
<type 'numpy.ndarray'>
GALLOP
<type 'numpy.ndarray'>
GALOSHES
<type 'numpy.ndarray'>
GAMBLE
<type 'numpy.ndarray'>
GAME
<type 'numpy.ndarray'>
GAMES
<type 'numpy.ndarray'>
GANDER
<type 'numpy.ndarray'>
GANG
<type 'numpy.ndarray'>
GANGSTER
<type 'numpy.ndarray'>
GARAGE
<type 'numpy.ndarray'>
GARBAGE
<type 'numpy.ndarray'>
GARDEN
<type 'numpy.ndarray'>
GARLIC
<type 'numpy.ndarray'>
GAS
<type 'numpy.ndarray'>
GATE
<type 'numpy.ndarray'>
GATHER
<type 'numpy.ndarray'>
GATHERING
<type 'numpy.ndarray'>
GAUGE
<type 'numpy.ndarray'>
GAUZE
<type 'numpy.ndarray'>
GAVEL
<type 'numpy.ndarray'>
GAZELLE
<type 'numpy.ndarray'>
GEAR
<type 'numpy.ndarray'>
GEESE
<type 'numpy.ndarray'>
GEM
<type 'numpy.ndarray'>
GENDER
<type 'numpy.ndarray'>
GENE
<type 'numpy.ndarray'>
GENERAL
<type 'numpy.ndarray'>
GENEROUS


<type 'numpy.ndarray'>
LOSE
<type 'numpy.ndarray'>
LOSER
<type 'numpy.ndarray'>
LOSS
<type 'numpy.ndarray'>
LOST
<type 'numpy.ndarray'>
LOT
<type 'numpy.ndarray'>
LOTS
<type 'numpy.ndarray'>
LOTTERY
<type 'numpy.ndarray'>
LOUD
<type 'numpy.ndarray'>
LOUNGE
<type 'numpy.ndarray'>
LOVE
<type 'numpy.ndarray'>
LOVER
<type 'numpy.ndarray'>
LOVERS
<type 'numpy.ndarray'>
LOVING
<type 'numpy.ndarray'>
LOW
<type 'numpy.ndarray'>
LOWER
<type 'numpy.ndarray'>
LOYAL
<type 'numpy.ndarray'>
LOYALTY
<type 'numpy.ndarray'>
LUBRICATE
<type 'numpy.ndarray'>
LUCK
<type 'numpy.ndarray'>
LUGGAGE
<type 'numpy.ndarray'>
LUMBER
<type 'numpy.ndarray'>
LUMP
<type 'numpy.ndarray'>
LUNCH
<type 'numpy.ndarray'>
LUNG
<type 'numpy.ndarray'>
LUST
<type 'numpy.ndarray'>
LUXURY
<type 'numpy.ndarray'>
MACARONI
<type 'numpy.ndarray'>
MACHINE
<type 'numpy.ndarray'>
MAD
<type 'numpy.ndarray'>
MADE
<type 'numpy.ndarray'>
MAFIA
<type 'numpy.ndarray'>
MAGAZINE
<type 'numpy.ndarray'>
MAGGOT
<type 'numpy.ndarray'>
MAGIC
<type '

PORT
<type 'numpy.ndarray'>
PORTION
<type 'numpy.ndarray'>
PORTRAIT
<type 'numpy.ndarray'>
PORTRAY
<type 'numpy.ndarray'>
POSITION
<type 'numpy.ndarray'>
POSITIVE
<type 'numpy.ndarray'>
POSSESS
<type 'numpy.ndarray'>
POSSESSION
<type 'numpy.ndarray'>
POSSIBILITY
<type 'numpy.ndarray'>
POSSIBLE
<type 'numpy.ndarray'>
POSSUM
<type 'numpy.ndarray'>
POST
<type 'numpy.ndarray'>
POSTAGE
<type 'numpy.ndarray'>
POT
<type 'numpy.ndarray'>
POTATO
<type 'numpy.ndarray'>
POTATOES
<type 'numpy.ndarray'>
POTENTIAL
<type 'numpy.ndarray'>
POTTERY
<type 'numpy.ndarray'>
POUCH
<type 'numpy.ndarray'>
POUNCE
<type 'numpy.ndarray'>
POUND
<type 'numpy.ndarray'>
POUR
<type 'numpy.ndarray'>
POVERTY
<type 'numpy.ndarray'>
POWDER
<type 'numpy.ndarray'>
POWER
<type 'numpy.ndarray'>
POWERFUL
<type 'numpy.ndarray'>
PRACTICE
<type 'numpy.ndarray'>
PRAIRIE
<type 'numpy.ndarray'>
PRAISE
<type 'numpy.ndarray'>
PRANK
<type 'numpy.ndarray'>
PRAY
<type 'numpy.ndarray'>
PRAYER
<type 'numpy.ndarray'>
PREACHER
<type 'numpy.

SLY
<type 'numpy.ndarray'>
SMALL
<type 'numpy.ndarray'>
SMART
<type 'numpy.ndarray'>
SMASH
<type 'numpy.ndarray'>
SMEAR
<type 'numpy.ndarray'>
SMELL
<type 'numpy.ndarray'>
SMELT
<type 'numpy.ndarray'>
SMILE
<type 'numpy.ndarray'>
SMOG
<type 'numpy.ndarray'>
SMOKE
<type 'numpy.ndarray'>
SMOKING
<type 'numpy.ndarray'>
SMOKY
<type 'numpy.ndarray'>
SMOOTH
<type 'numpy.ndarray'>
SMOTHER
<type 'numpy.ndarray'>
SMUDGE
<type 'numpy.ndarray'>
SNACK
<type 'numpy.ndarray'>
SNAIL
<type 'numpy.ndarray'>
SNAKE
<type 'numpy.ndarray'>
SNAP
<type 'numpy.ndarray'>
SNATCH
<type 'numpy.ndarray'>
SNEAK
<type 'numpy.ndarray'>
SNEAKER
<type 'numpy.ndarray'>
SNEAKERS
<type 'numpy.ndarray'>
SNEAKY
<type 'numpy.ndarray'>
SNEEZE
<type 'numpy.ndarray'>
SNIFF
<type 'numpy.ndarray'>
SNOB
<type 'numpy.ndarray'>
SNOOZE
<type 'numpy.ndarray'>
SNORE
<type 'numpy.ndarray'>
SNORKEL
<type 'numpy.ndarray'>
SNOT
<type 'numpy.ndarray'>
SNOTTY
<type 'numpy.ndarray'>
SNOW
<type 'numpy.ndarray'>
SNUGGLE
<type 'numpy.ndarray'>
S

<type 'numpy.ndarray'>
TOW
<type 'numpy.ndarray'>
TOWEL
<type 'numpy.ndarray'>
TOWER
<type 'numpy.ndarray'>
TOWN
<type 'numpy.ndarray'>
TOY
<type 'numpy.ndarray'>
TOYS
<type 'numpy.ndarray'>
TRACE
<type 'numpy.ndarray'>
TRACK
<type 'numpy.ndarray'>
TRACTOR
<type 'numpy.ndarray'>
TRADE
<type 'numpy.ndarray'>
TRADITION
<type 'numpy.ndarray'>
TRAFFIC
<type 'numpy.ndarray'>
TRAGEDY
<type 'numpy.ndarray'>
TRAIL
<type 'numpy.ndarray'>
TRAILER
<type 'numpy.ndarray'>
TRAIN
<type 'numpy.ndarray'>
TRAIT
<type 'numpy.ndarray'>
TRAITOR
<type 'numpy.ndarray'>
TRAMP
<type 'numpy.ndarray'>
TRANCE
<type 'numpy.ndarray'>
TRANQUIL
<type 'numpy.ndarray'>
TRANSPARENT
<type 'numpy.ndarray'>
TRANSPLANT
<type 'numpy.ndarray'>
TRANSPORTATION
<type 'numpy.ndarray'>
TRAP
<type 'numpy.ndarray'>
TRASH
<type 'numpy.ndarray'>
TRAUMA
<type 'numpy.ndarray'>
TRAVEL
<type 'numpy.ndarray'>
TRAY
<type 'numpy.ndarray'>
TREAD
<type 'numpy.ndarray'>
TREASON
<type 'numpy.ndarray'>
TREASURE
<type 'numpy.ndarray'>
TREAT
<type 

<type 'numpy.ndarray'>
ACADEMY
<type 'numpy.ndarray'>
ACCENT
<type 'numpy.ndarray'>
ACCENTUATE
<type 'numpy.ndarray'>
ACCEPTABLE
<type 'numpy.ndarray'>
ACCEPTANCE
<type 'numpy.ndarray'>
ACCEPTED
<type 'numpy.ndarray'>
ACCESS
<type 'numpy.ndarray'>
ACCIDENTS
<type 'numpy.ndarray'>
ACCOMMODATION
<type 'numpy.ndarray'>
ACCOMPANY
<type 'numpy.ndarray'>
ACCOMPLICE
<type 'numpy.ndarray'>
ACCOMPLISHMENT
<type 'numpy.ndarray'>
ACCOUNTANT
<type 'numpy.ndarray'>
ACCOUNTING
<type 'numpy.ndarray'>
ACCUSED
<type 'numpy.ndarray'>
ACES
<type 'numpy.ndarray'>
ACHILLES
<type 'numpy.ndarray'>
ACHOO
<type 'numpy.ndarray'>
ACNE
<type 'numpy.ndarray'>
ACQUAINT
<type 'numpy.ndarray'>
ACQUAINTANCE
<type 'numpy.ndarray'>
ACROBATICS
<type 'numpy.ndarray'>
ACROBATS
<type 'numpy.ndarray'>
ACROSS
<type 'numpy.ndarray'>
ACTING
<type 'numpy.ndarray'>
ACTIVATE
<type 'numpy.ndarray'>
ACTS
<type 'numpy.ndarray'>
ACTUAL
<type 'numpy.ndarray'>
ACUTE
<type 'numpy.ndarray'>
ADAM
<type 'numpy.ndarray'>
ADAMS
<type 'numpy.n

<type 'numpy.ndarray'>
BREVARD
<type 'numpy.ndarray'>
BREW
<type 'numpy.ndarray'>
BRIAR
<type 'numpy.ndarray'>
BRICKLAYER
<type 'numpy.ndarray'>
BRICKS
<type 'numpy.ndarray'>
BRIEFING
<type 'numpy.ndarray'>
BRIEFS
<type 'numpy.ndarray'>
BRIGHTENS
<type 'numpy.ndarray'>
BRIGHTER
<type 'numpy.ndarray'>
BRIGHTNESS
<type 'numpy.ndarray'>
BRIM
<type 'numpy.ndarray'>
BRING BACK
<type 'numpy.ndarray'>
BRIQUETTES
<type 'numpy.ndarray'>
BRITAIN
<type 'numpy.ndarray'>
BRITISH
<type 'numpy.ndarray'>
BRITTANICA
<type 'numpy.ndarray'>
BROACH
<type 'numpy.ndarray'>
BROADWAY
<type 'numpy.ndarray'>
BROILED
<type 'numpy.ndarray'>
BROKER
<type 'numpy.ndarray'>
BRONCO
<type 'numpy.ndarray'>
BRONTOSAURUS
<type 'numpy.ndarray'>
BRONX
<type 'numpy.ndarray'>
BROOD
<type 'numpy.ndarray'>
BROTHER'S
<type 'numpy.ndarray'>
BROTHERHOOD
<type 'numpy.ndarray'>
BROTHERS
<type 'numpy.ndarray'>
BROUGHT
<type 'numpy.ndarray'>
BROW
<type 'numpy.ndarray'>
BROWNIE
<type 'numpy.ndarray'>
BRUISES
<type 'numpy.ndarray'>
BRUN

COMPROMISE
<type 'numpy.ndarray'>
COMPULSIVE
<type 'numpy.ndarray'>
COMRADE
<type 'numpy.ndarray'>
CON ARTIST
<type 'numpy.ndarray'>
CON CARNE
<type 'numpy.ndarray'>
CONCEAL
<type 'numpy.ndarray'>
CONCEDE
<type 'numpy.ndarray'>
CONCEPTION
<type 'numpy.ndarray'>
CONCH
<type 'numpy.ndarray'>
CONCISE
<type 'numpy.ndarray'>
CONCOCTION
<type 'numpy.ndarray'>
CONCORD
<type 'numpy.ndarray'>
CONDENSE
<type 'numpy.ndarray'>
CONDIMENT
<type 'numpy.ndarray'>
CONDITIONED
<type 'numpy.ndarray'>
CONDITIONS
<type 'numpy.ndarray'>
CONDO
<type 'numpy.ndarray'>
CONDOLENCE
<type 'numpy.ndarray'>
CONDOMINIUM
<type 'numpy.ndarray'>
CONDONE
<type 'numpy.ndarray'>
CONDUCT
<type 'numpy.ndarray'>
CONES
<type 'numpy.ndarray'>
CONFEDERACY
<type 'numpy.ndarray'>
CONFEDERATE
<type 'numpy.ndarray'>
CONFIDENTIAL
<type 'numpy.ndarray'>
CONFINE
<type 'numpy.ndarray'>
CONFUSED
<type 'numpy.ndarray'>
CONFUSING
<type 'numpy.ndarray'>
CONGESTION
<type 'numpy.ndarray'>
CONGRATULATE
<type 'numpy.ndarray'>
CONGRESSMAN
<type 

<type 'numpy.ndarray'>
DISMAL
<type 'numpy.ndarray'>
DISMISSAL
<type 'numpy.ndarray'>
DISNEY
<type 'numpy.ndarray'>
DISNEY WORLD
<type 'numpy.ndarray'>
DISOBEY
<type 'numpy.ndarray'>
DISORDERLY
<type 'numpy.ndarray'>
DISORGANIZED
<type 'numpy.ndarray'>
DISORIENTED
<type 'numpy.ndarray'>
DISPEL
<type 'numpy.ndarray'>
DISPENSE
<type 'numpy.ndarray'>
DISPLEASE
<type 'numpy.ndarray'>
DISPOSAL
<type 'numpy.ndarray'>
DISPOSITION
<type 'numpy.ndarray'>
DISPUTE
<type 'numpy.ndarray'>
DISRESPECT
<type 'numpy.ndarray'>
DISROBE
<type 'numpy.ndarray'>
DISRUPT
<type 'numpy.ndarray'>
DISRUPTIVE
<type 'numpy.ndarray'>
DISSATISFACTION
<type 'numpy.ndarray'>
DISSECT
<type 'numpy.ndarray'>
DISSERTATION
<type 'numpy.ndarray'>
DISSUADE
<type 'numpy.ndarray'>
DISTASTEFUL
<type 'numpy.ndarray'>
DISTINCTION
<type 'numpy.ndarray'>
DISTINGUISH
<type 'numpy.ndarray'>
DISTINGUISHED
<type 'numpy.ndarray'>
DISTRACT
<type 'numpy.ndarray'>
DISTRAUGHT
<type 'numpy.ndarray'>
DISTRIBUTE
<type 'numpy.ndarray'>
DISTRIBUT

<type 'numpy.ndarray'>
G I JOE
<type 'numpy.ndarray'>
GAB
<type 'numpy.ndarray'>
GADGET
<type 'numpy.ndarray'>
GADGETS
<type 'numpy.ndarray'>
GAGGED
<type 'numpy.ndarray'>
GAINS
<type 'numpy.ndarray'>
GALA
<type 'numpy.ndarray'>
GALL
<type 'numpy.ndarray'>
GALLANT
<type 'numpy.ndarray'>
GALLERY
<type 'numpy.ndarray'>
GALLEY
<type 'numpy.ndarray'>
GALS
<type 'numpy.ndarray'>
GANGS
<type 'numpy.ndarray'>
GAP
<type 'numpy.ndarray'>
GARMENT
<type 'numpy.ndarray'>
GARNISH
<type 'numpy.ndarray'>
GARTER
<type 'numpy.ndarray'>
GASP
<type 'numpy.ndarray'>
GATORAID
<type 'numpy.ndarray'>
GATORS
<type 'numpy.ndarray'>
GAUDY
<type 'numpy.ndarray'>
GAY
<type 'numpy.ndarray'>
GAYS
<type 'numpy.ndarray'>
GAZE
<type 'numpy.ndarray'>
GEARS
<type 'numpy.ndarray'>
GEEK
<type 'numpy.ndarray'>
GEL
<type 'numpy.ndarray'>
GELATIN
<type 'numpy.ndarray'>
GEMINI
<type 'numpy.ndarray'>
GENERATION
<type 'numpy.ndarray'>
GENERIC
<type 'numpy.ndarray'>
GENES
<type 'numpy.ndarray'>
GENETICS
<type 'numpy.ndarray'>
GE

INEBRIATE
<type 'numpy.ndarray'>
INEPT
<type 'numpy.ndarray'>
INEXPERIENCE
<type 'numpy.ndarray'>
INEXPERIENCED
<type 'numpy.ndarray'>
INFAMOUS
<type 'numpy.ndarray'>
INFANTILE
<type 'numpy.ndarray'>
INFATUATION
<type 'numpy.ndarray'>
INFERIORITY
<type 'numpy.ndarray'>
INFESTED
<type 'numpy.ndarray'>
INFIDELITY
<type 'numpy.ndarray'>
INFLATE
<type 'numpy.ndarray'>
INFLEXIBLE
<type 'numpy.ndarray'>
INFORMAL
<type 'numpy.ndarray'>
INFREQUENT
<type 'numpy.ndarray'>
INGENIOUS
<type 'numpy.ndarray'>
INGENUITY
<type 'numpy.ndarray'>
INGEST
<type 'numpy.ndarray'>
INGESTION
<type 'numpy.ndarray'>
INGREDIENT
<type 'numpy.ndarray'>
INGREDIENTS
<type 'numpy.ndarray'>
INGROWN
<type 'numpy.ndarray'>
INHERIT
<type 'numpy.ndarray'>
INHERITANCE
<type 'numpy.ndarray'>
INHUMANE
<type 'numpy.ndarray'>
INITIATION
<type 'numpy.ndarray'>
INJUSTICE
<type 'numpy.ndarray'>
INNATE
<type 'numpy.ndarray'>
INNER
<type 'numpy.ndarray'>
INNER TUBE
<type 'numpy.ndarray'>
INNOVATIVE
<type 'numpy.ndarray'>
INPUT
<type 

<type 'numpy.ndarray'>
MILLIONAIRE
<type 'numpy.ndarray'>
MIME
<type 'numpy.ndarray'>
MINCE
<type 'numpy.ndarray'>
MINERALS
<type 'numpy.ndarray'>
MING
<type 'numpy.ndarray'>
MINGLE
<type 'numpy.ndarray'>
MINGLED
<type 'numpy.ndarray'>
MINI
<type 'numpy.ndarray'>
MINIATURE
<type 'numpy.ndarray'>
MINIMIZE
<type 'numpy.ndarray'>
MINISCULE
<type 'numpy.ndarray'>
MINIVAN
<type 'numpy.ndarray'>
MINNESOTA
<type 'numpy.ndarray'>
MIRACLE
<type 'numpy.ndarray'>
MIRAMAK
<type 'numpy.ndarray'>
MIRE
<type 'numpy.ndarray'>
MISCHIEVOUS
<type 'numpy.ndarray'>
MISDEMEANOR
<type 'numpy.ndarray'>
MISER
<type 'numpy.ndarray'>
MISFORTUNE
<type 'numpy.ndarray'>
MISGUIDED
<type 'numpy.ndarray'>
MISHAP
<type 'numpy.ndarray'>
MISLEAD
<type 'numpy.ndarray'>
MISPLACE
<type 'numpy.ndarray'>
MISSISSIPPI
<type 'numpy.ndarray'>
MISSPELL
<type 'numpy.ndarray'>
MISTAKES
<type 'numpy.ndarray'>
MISTERS
<type 'numpy.ndarray'>
MISTRESS
<type 'numpy.ndarray'>
MISTRUST
<type 'numpy.ndarray'>
MISTY
<type 'numpy.ndarray'>
MI

<type 'numpy.ndarray'>
POMPOUS
<type 'numpy.ndarray'>
POODLE
<type 'numpy.ndarray'>
POODLES
<type 'numpy.ndarray'>
POOF
<type 'numpy.ndarray'>
POOP
<type 'numpy.ndarray'>
POOPY
<type 'numpy.ndarray'>
POPPINS
<type 'numpy.ndarray'>
POPS
<type 'numpy.ndarray'>
POPSICLE STICKS
<type 'numpy.ndarray'>
POPTART
<type 'numpy.ndarray'>
PORES
<type 'numpy.ndarray'>
PORNOGRAPHY
<type 'numpy.ndarray'>
PORRIDGE
<type 'numpy.ndarray'>
PORSCHE
<type 'numpy.ndarray'>
PORTAL
<type 'numpy.ndarray'>
POSE
<type 'numpy.ndarray'>
POSH
<type 'numpy.ndarray'>
POSITRON
<type 'numpy.ndarray'>
POSSE
<type 'numpy.ndarray'>
POSSESSED
<type 'numpy.ndarray'>
POSSESSIONS
<type 'numpy.ndarray'>
POST OFFICE
<type 'numpy.ndarray'>
POSTED
<type 'numpy.ndarray'>
POSTER
<type 'numpy.ndarray'>
POSTMAN
<type 'numpy.ndarray'>
POSTPONE
<type 'numpy.ndarray'>
POSTULATE
<type 'numpy.ndarray'>
POSTURE
<type 'numpy.ndarray'>
POTASSIUM
<type 'numpy.ndarray'>
POTATO SALAD
<type 'numpy.ndarray'>
POTION
<type 'numpy.ndarray'>
POTPIE
<

<type 'numpy.ndarray'>
SENIOR
<type 'numpy.ndarray'>
SENIOR CITIZEN
<type 'numpy.ndarray'>
SENSATION
<type 'numpy.ndarray'>
SENSES
<type 'numpy.ndarray'>
SENSIBLE
<type 'numpy.ndarray'>
SENSITIVITY
<type 'numpy.ndarray'>
SENT
<type 'numpy.ndarray'>
SENTENCES
<type 'numpy.ndarray'>
SENTIMENTAL
<type 'numpy.ndarray'>
SEPARATED
<type 'numpy.ndarray'>
SEPTEMBER
<type 'numpy.ndarray'>
SEQUENCE
<type 'numpy.ndarray'>
SERENITY
<type 'numpy.ndarray'>
SERIOUSNESS
<type 'numpy.ndarray'>
SERVE
<type 'numpy.ndarray'>
SERVICES
<type 'numpy.ndarray'>
SET DOWN
<type 'numpy.ndarray'>
SET GO
<type 'numpy.ndarray'>
SET UP
<type 'numpy.ndarray'>
SETTER
<type 'numpy.ndarray'>
SETTING
<type 'numpy.ndarray'>
SETTLE
<type 'numpy.ndarray'>
SETTLEMENT
<type 'numpy.ndarray'>
SEVENTEEN
<type 'numpy.ndarray'>
SEVENTIES
<type 'numpy.ndarray'>
SEWER
<type 'numpy.ndarray'>
SEWING
<type 'numpy.ndarray'>
SEXIST
<type 'numpy.ndarray'>
SEXUAL
<type 'numpy.ndarray'>
SEXUALITY
<type 'numpy.ndarray'>
SHADED
<type 'numpy.nd

TUNED
<type 'numpy.ndarray'>
TUNER
<type 'numpy.ndarray'>
TURBO
<type 'numpy.ndarray'>
TURF
<type 'numpy.ndarray'>
TURMOIL
<type 'numpy.ndarray'>
TURN AGAINST
<type 'numpy.ndarray'>
TURN DOWN
<type 'numpy.ndarray'>
TURNED
<type 'numpy.ndarray'>
TURNING
<type 'numpy.ndarray'>
TUT
<type 'numpy.ndarray'>
TUX
<type 'numpy.ndarray'>
TV
<type 'numpy.ndarray'>
TWEET
<type 'numpy.ndarray'>
TWENTIETH
<type 'numpy.ndarray'>
TWENTY
<type 'numpy.ndarray'>
TWENTY FIRST
<type 'numpy.ndarray'>
TWENTY FOUR
<type 'numpy.ndarray'>
TWENTY ONE
<type 'numpy.ndarray'>
TWENTY SIX
<type 'numpy.ndarray'>
TWENTY THIRD
<type 'numpy.ndarray'>
TWENTY-FOUR
<type 'numpy.ndarray'>
TWINKLES
<type 'numpy.ndarray'>
TWIRL
<type 'numpy.ndarray'>
TWISTED
<type 'numpy.ndarray'>
TWISTING
<type 'numpy.ndarray'>
TWITCH
<type 'numpy.ndarray'>
TYPES
<type 'numpy.ndarray'>
TYPICAL
<type 'numpy.ndarray'>
TYPING
<type 'numpy.ndarray'>
TYPO
<type 'numpy.ndarray'>
TYRANNOSAURUS
<type 'numpy.ndarray'>
TYRANT
<type 'numpy.ndarray'>
TYS

'\nfor letter_index in range(len(tested_letters)):\n    letter_results = []\n    letter = tested_letters[letter_index]\n    target_words = top_words_no_repeats[letter_index]\n    \n    for target, freq in target_words:\n        target_results = [target]\n        if target in indices_list:\n            target_index = indices_list.index(target)\n            target_results.append(page_rank_ordered_list.index(target_index))\n            target_results.append(hits_auth_ordered_list.index(target_index))\n            target_results.append(hits_hub_ordered_list.index(target_index))\n        else: \n            target_results += (["N/A", "N/A", "N/A"])\n        letter_results.append(target_results)\n    results.append(letter_results)\n\nfor letter_result in results:\n    print("\nTarget Word, PageRank, AuthScore, HubScore")\n    for word_result in letter_result:\n        print(word_result)\n'

In [48]:
# For each of the 3 models, get 1 list for each letter that was tested
# Each list has the words starting with that letter, ordered by importance according to the model

#page_rank_by_letter = {letter: [] for letter in tested_letters}
page_rank_by_letter = {}
for letter in tested_letters:
    page_rank_by_letter[letter] = []
for index in page_rank_ordered:
    word = all_indices[index]
    if (word[0]) in tested_letters:
        page_rank_by_letter[word[0]].append(word)
        
#hits_auth_by_letter = {letter: [] for letter in tested_letters}
hits_auth_by_letter = {}
for letter in tested_letters:
    hits_auth_by_letter[letter] = []
for index in hits_auth_ordered:
    word = all_indices[index]
    if (word[0]) in tested_letters:
        hits_auth_by_letter[word[0]].append(word)
        
#hits_hub_by_letter = {letter: [] for letter in tested_letters}
hits_hub_by_letter = {}
for letter in tested_letters:
    hits_hub_by_letter[letter] = []
for index in hits_hub_ordered:
    word = all_indices[index]
    if (word[0]) in tested_letters:
        hits_hub_by_letter[word[0]].append(word)