### Introduction

In this project, we'll be looking at the questions that the TV show Jeopardy typically asks to try and discern some patterns that could offer us a competitive edge on the show.

We'll be working with a dataset called jeopardy.csv, containing 20,000 rows of Jeopardy questions and related information. We can see the data and columns below.

In [56]:
import pandas as pd

data = pd.read_csv("jeopardy.csv")
data.columns = ["Show Number", "Air Date", "Round", "Category", "Value", "Question", "Answer"]
data.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was ...",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,No. 2: 1912 Olympian; football star at Carlisl...,Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,The city of Yuma in this state has a record av...,Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", th...",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Co...",John Adams


### Cleaning the Data

Before we analyze the data it's a good idea to normalize the data in some of the columns. We'll remove capitalization and punctuation in the Question, and Answer columns.

In [57]:
import re

def remove_grammar(string):
    string = string.lower()
    string = re.sub("[^A-Za-z0-9\s]", "", string)
    return string

data["Question"] = data["Question"].apply(remove_grammar)
data["Answer"] = data["Answer"].apply(remove_grammar)

We also want to clean up the Value column and convert it to an integer. Not all values in the column are numerical, so we'll have to replace non-numerical strings with an integer manually. Fortunately, this is only the case for "None" values so we can just replace non-numerical strings with 0.

In [58]:
def value_to_integer(string):
    string = re.sub("[^A-Za-z0-9\s]", "", string)
    try:
        string = int(string)
    except:
        string = 0
    return string

data["Value"] = data["Value"].apply(value_to_integer)

Finally, let's convert the values in the Air Date column to datetime objects.

In [59]:
data["Air Date"] = pd.to_datetime(data["Air Date"])

### Devising a Jeopardy Strategy

To devise a strategy for Jeopary, we want to begin by thinking about what kinds of things we want to study. If the same questions appear on Jeopardy often, then it would be good to start by studying the questions that have appeared on past episodes of Jeopardy. Or, if the answer can be inferred from the question, we may not need to study at all!

Let's start by looking at the latter of these possibilities. While there are lots of kinds of clues questions can give you that allow you to infer the answer, we'll only be looking at question and answer pairs in which the question contains part of, or the, answer.

We'll do this by comparing the list of words in each question with the list of words in that question's answer. For each pair, we'll count how many matches there are, and then see what proportio of the answer is a match by dividing the number of words in the answer by the number of matches. We'll do this for every question and answer pair, adding the results to a new column.

In [60]:
def answer_matches_question(row):
    question = row["Question"]
    answer = row["Answer"]
    split_question = question.split()
    split_answer = answer.split()
    split_answer = [x for x in split_answer if x != "the"]
    match_count = 0
    if len(split_answer) == 0:
        return(0)
    else:
        for word in split_answer:
            if word in split_question:
                match_count += 1
    proportion_matches = match_count / len(split_answer)
    return(proportion_matches)

In [61]:
data["answer_in_question"] = data.apply(answer_matches_question, axis = 1)
mean_answer_in_question = data["answer_in_question"].mean()
mean_answer_in_question

0.058347444789267004

It looks like the strategy of not studying at all would not go well! Only 5.8% of the words in the answers match words in the question. Given that most answers are one or two words, that's a pretty low hit rate.

Let's now look at how often questions are repeats of old ones, to see whether it's worth studying former Jeopardy questions. Rather than look for exact matches of questions, we'll look for questions which contain similar terms to previous ones.

In [68]:
sorted_dates = data["Air Date"].sort_values(ascending = True)

terms_used = set()
# We're creating a set here rather than a list as sets are immutable and contain only unique items.
# This will prevent us adding duplicate words to the set which we then have to iterate through, thus speeding up our loop.
question_overlap = []

for i, row in data.iterrows():
    words = row["Question"].split()
    words = [x for x in words if len(x) > 5]
    # To save time, we'll try to filter out words which are likely to be less signifcant to the meaning of the question.
    match_count = 0
    for word in words:
        if word in terms_used:
            match_count += 1
    for word in words:
        terms_used.add(word)
    if len(words) > 0:
        match_count /= len(words)
    question_overlap.append(match_count)

data["question_overlap"] = question_overlap
mean_overlap = data["question_overlap"].mean()
mean_overlap

0.6908737315671878

70% of the content of new questions (counting only words with 6 or more letters) has been used in previous questions. At first glance this seems like quite a high proportion, but there are a few things that we need to note about the figure.

The first is that questions which appear earlier in our dataset are less likely to match previous questions as there are fewer of them. The first question we iterate through in our above code will have no words that match the previous question. This means that, as time goes on, it's more likely that questions will be repeats. This further means that the mean is not a very good figure for determining the likelihood of a repeat question for future shows, as the mean is affected by the early shows where the chance of a repeat question was significantly lower. This factor leads to an underestimate of the mean.

Secondly, the same words may appear in questions that occur in the same show, but even if some of the words are the same, we can be quite certain (unless the makers of the show have made a significant error) that the semantic content of that question is distinct. Currently, however, our code doesn't account for this, and so this factor leads to an overestimate of the mean.

Thirdly, words used in previous question can appear in current questions, but the semantic content of the question can be distinct. For example, words like defeated or winner can refer to a wide variety of categories and have totally different content. This is the major issue with the figure we have above. How many of the words that are repeated actually indicate that the question with the same semantic content is being asked? My guess is that the answer is relatively low, and that most of the matches come from words which are frequently used in questions of a particular category, like city, country, etc. This factor leads to an overestimate of the mean with regards to the information we'd hoped it might give us.

Fourthly, our threshold for meaningful words is hard to evaluate. We miss out words like for, in, the, of, etc., which would greatly influence the mean and so we rightly wish to disregard. However, we also miss out words like Italy, game, and so on which indicate categories which the question refers to and which are relevant if repeated.

All this being said, 70% is still quite high, and even if our mean is a drastic overestimate, studying old Jeopardy questions may well be a good strategy.

### Analyzing Question Categories

A better measure than something so complex might be to just do a simple count of unique words in questions and then manually inspect the results to see which categories come up frequently. Let's do this now.

We'll reduce the threshold for significant words to those words with more than 3 letters. We'll still have words like which and from, but will remove words like for, of, the, in at hopefully less information cost. The best way to filter for articles, prepositions, etc. would be to create a dictionary with such words and then remove them from the questions, but we won't be doing that here.

In [78]:
word_counts = {}

for i, row in data.iterrows():
    words = row["Question"].split()
    words = [x for x in words if len(x) > 3]
    for word in words:
        if word in word_counts:
            word_counts[word] += 1
        else:
            word_counts[word] = 1

for w in sorted(word_counts, key = word_counts.get, reverse = True):
    print(w, word_counts[w])

this 11676
from 1851
with 1658
these 1389
that 1348
name 1025
first 949
city 581
when 577
about 549
called 521
named 513
like 489
have 483
country 476
were 450
seen 446
state 443
type 440
after 426
clue 420
film 414
made 401
used 386
title 370
your 368
crew 368
also 365
which 361
known 352
here 327
into 326
word 324
years 316
only 311
said 297
played 297
wrote 287
became 287
term 275
over 267
before 267
novel 265
world 261
president 258
capital 257
american 257
part 251
than 249
famous 246
targetblankherea 244
french 243
last 242
some 241
book 236
their 235
king 234
been 231
more 230
most 227
show 224
home 220
time 218
island 216
john 212
they 210
song 210
means 209
where 209
great 208
found 205
what 198
born 193
just 191
play 190
river 188
make 186
people 184
national 183
many 182
work 180
largest 179
little 178
life 176
group 173
there 171
around 169
comes 169
once 169
star 168
back 168
whose 166
british 166
house 165
south 165
author 164
meaning 162
during 161
while 161
dont 161
gre

certain 28
sure 28
code 28
legendary 28
important 28
targetblankhi 28
titles 28
rose 28
organ 28
equal 28
department 28
lower 28
remember 28
production 28
founder 28
ford 28
hold 28
control 28
signed 28
medical 28
knight 28
coin 28
hours 27
glass 27
launched 27
basketball 27
4letter 27
foot 27
tony 27
youd 27
pennsylvania 27
becomes 27
womens 27
raised 27
spot 27
cars 27
murder 27
test 27
wore 27
outside 27
received 27
speed 27
bush 27
write 27
ruler 27
christian 27
1957 27
instead 27
appropriately 27
mile 27
holy 27
structure 27
1951 27
jane 27
temple 27
pirate 27
cape 27
evil 27
action 27
return 27
lewis 27
francisco 27
least 27
theory 26
snow 26
1920s 26
divided 26
bring 26
francis 26
discovery 26
skin 26
machine 26
student 26
salt 26
coined 26
course 26
flowers 26
refer 26
tail 26
annual 26
1948 26
count 26
bell 26
cluea 26
soldiers 26
artists 26
titled 26
kill 26
slang 26
dress 26
cold 26
walk 26
devoted 26
wanted 26
owner 26
cathedral 26
ring 26
pope 26
deal 26
succeeded 26
orlea

publication 11
experiments 11
peanut 11
attorney 11
democrat 11
celebration 11
muslim 11
wales 11
whiskey 11
wave 11
separated 11
potable 11
stamp 11
telescope 11
causes 11
underground 11
entered 11
bass 11
boxer 11
monarch 11
englands 11
failed 11
onto 11
marilyn 11
3word 11
fever 11
crane 11
drop 11
executive 11
maria 11
priest 11
adapted 11
talking 11
lucy 11
wagon 11
1927 11
problem 11
increase 11
cooked 11
apply 11
bacon 11
hanks 11
signs 11
estimated 11
dame 11
sight 11
pull 11
receive 11
fresh 11
exercise 11
hong 11
mistress 11
happens 11
camera 11
dear 11
burned 11
faith 11
1903 11
carter 11
interest 11
sheep 11
sells 11
finish 11
arent 11
direct 11
pulled 11
prior 11
tribute 11
teenage 11
keeps 11
lisa 11
2letter 11
jose 11
zealand 11
eaten 11
shortly 11
ernest 11
thick 11
fact 11
1857 11
presented 11
crystal 11
assassination 11
baked 11
bergman 11
ordered 11
quiet 11
vessel 11
cardinal 11
myth 11
potent 11
greeks 11
commercial 11
swing 11
beast 11
booth 11
idol 11
jungle 11
s

cafe 8
dressed 8
fate 8
birthstone 8
rump 8
fords 8
suicide 8
defensive 8
ulysses 8
charlemagne 8
remake 8
virginias 8
beware 8
passion 8
briefly 8
duncan 8
avenue 8
merged 8
leonard 8
fastener 8
phil 8
problems 8
1850 8
jets 8
alive 8
ideas 8
dwarf 8
flew 8
martini 8
chancellor 8
pigs 8
canary 8
nixon 8
nest 8
anatomy 8
filmed 8
puerto 8
merry 8
enemy 8
sudden 8
boulevard 8
kenya 8
gothic 8
verne 8
bunny 8
brands 8
expensive 8
gorge 8
pluto 8
joan 8
feed 8
corner 8
designated 8
flies 8
1492 8
readers 8
closed 8
nick 8
ambassador 8
hate 8
meets 8
rachel 8
equator 8
dash 8
walking 8
particle 8
falling 8
wonderful 8
frogs 8
sunset 8
rolled 8
eisenhower 8
bloody 8
edmund 8
delicacy 8
sally 8
visitors 8
cowboys 8
aromatic 8
hood 8
oman 8
herb 8
essential 8
wilkes 8
cotton 8
chaplin 8
classical 8
stroke 8
1804 8
establish 8
eden 8
strikes 8
hugo 8
naught 8
fatal 8
spacecraft 8
bilbao 8
galileo 7
sunshine 7
basically 7
headlines 7
agreed 7
creating 7
seats 7
buddy 7
pointed 7
organizations 7

obtained 6
slain 6
1874 6
jamaica 6
suburb 6
tunnel 6
rage 6
isaac 6
lovely 6
annually 6
secrets 6
animation 6
goods 6
cubs 6
essay 6
brief 6
1832 6
dining 6
hybrid 6
thrilled 6
inspiration 6
twist 6
denmarks 6
100000 6
slowly 6
targetblanksofia 6
spaina 6
lara 6
lethal 6
billions 6
bust 6
familys 6
listen 6
delight 6
spielberg 6
wilde 6
oldfashioned 6
maple 6
waltz 6
sudan 6
1818 6
maintains 6
monty 6
uniform 6
heroic 6
paints 6
buying 6
quoted 6
trust 6
tasty 6
storage 6
prominent 6
librarian 6
nassau 6
canterbury 6
retreat 6
busiest 6
contained 6
buddhist 6
offensive 6
100th 6
assume 6
worms 6
virtue 6
scholars 6
available 6
scoring 6
websters 6
driver 6
300000 6
conflict 6
invaders 6
carthage 6
tools 6
playboy 6
failure 6
joining 6
cajun 6
abstract 6
priests 6
hitchcock 6
louise 6
dominican 6
pursue 6
jackie 6
universitya 6
extra 6
messenger 6
accessory 6
hyde 6
practical 6
melville 6
slightly 6
leaf 6
aside 6
demonstrated 6
ivan 6
danced 6
succession 6
fixed 6
warned 6
monaco 6
ac

knighted 5
collegiate 5
groove 5
woodward 5
griffith 5
heels 5
suggest 5
chester 5
italys 5
managed 5
families 5
expo 5
desperate 5
looney 5
gustav 5
stuffing 5
steer 5
buds 5
drugstore 5
gallons 5
substances 5
inherited 5
toms 5
winters 5
patented 5
kenneth 5
veil 5
spends 5
powered 5
gateway 5
1839 5
plains 5
bowler 5
fission 5
roaring 5
hitler 5
foundationa 5
parker 5
connection 5
memphis 5
isolated 5
dipped 5
tongues 5
buddha 5
bark 5
opposition 5
camels 5
commissioner 5
strategic 5
kitty 5
dots 5
entirely 5
eruption 5
suspended 5
cigar 5
doll 5
baldwin 5
prom 5
seek 5
rolls 5
murray 5
folding 5
bucket 5
washingtons 5
amazon 5
imported 5
explored 5
dental 5
dare 5
hermosa 5
defended 5
pine 5
carlo 5
projects 5
prizewinning 5
maurice 5
cyrus 5
moss 5
allied 5
largely 5
1838 5
calcutta 5
surgeon 5
florentine 5
terrific 5
arrival 5
classified 5
lebanon 5
winfrey 5
influenced 5
massive 5
punishment 5
portable 5
cricket 5
usual 5
guerrilla 5
dantes 5
mick 5
ruling 5
collect 5
yield 5
co

explains 4
grateful 4
meetings 4
homage 4
novelists 4
secreted 4
peruvian 4
toured 4
eldest 4
bloodsucking 4
cactus 4
bubble 4
chin 4
bison 4
1493 4
sweater 4
claudius 4
empires 4
cornelius 4
splash 4
logically 4
newborns 4
measuring 4
argenteuil 4
vulcanization 4
vancouver 4
spectrum 4
gear 4
costa 4
hearings 4
ease 4
pigment 4
thinker 4
shield 4
16000 4
inauguration 4
bacall 4
obviously 4
select 4
celebs 4
1821 4
leaguers 4
commemorate 4
manned 4
1763 4
aeschylus 4
daphne 4
hindi 4
vines 4
nurses 4
bengal 4
nowhere 4
conversion 4
introducing 4
costarred 4
summary 4
tunney 4
lengths 4
clam 4
interview 4
audiences 4
diplomatic 4
tonywinning 4
synonymous 4
baghdad 4
hester 4
oddly 4
tennyson 4
latvia 4
theft 4
compete 4
pepper 4
tomato 4
sleigh 4
wool 4
oratorio 4
spears 4
latvian 4
detect 4
raids 4
isle 4
valens 4
chic 4
sacked 4
meriwether 4
penguins 4
absolute 4
assisi 4
judiciary 4
medici 4
recruited 4
hide 4
distress 4
interred 4
enemys 4
peaceful 4
sherlock 4
documents 4
regiment 

gumbo 3
loans 3
operated 3
saloon 3
automatic 3
aviator 3
clowns 3
dona 3
fragments 3
1996s 3
deepest 3
ramblin 3
multiple 3
joey 3
1519 3
targetblankjames 3
lipton 3
enlisted 3
libretto 3
floyd 3
totem 3
ology 3
selection 3
weber 3
fairies 3
hapsburgs 3
vivaldi 3
museo 3
algae 3
7letter 3
marathon 3
ballroom 3
exact 3
saga 3
identification 3
venerable 3
legit 3
soups 3
pockets 3
illustrations 3
hartford 3
mcdonalds 3
casa 3
choices 3
caviar 3
perdu 3
gunsmoke 3
illegally 3
taylors 3
impressionists 3
rococo 3
matrix 3
routes 3
brunei 3
franchise 3
fete 3
yogurt 3
mariners 3
allnut 3
sabin 3
bert 3
sara 3
permitted 3
jurors 3
debated 3
chronicled 3
sweep 3
rallies 3
sparks 3
lennon 3
archery 3
doctrine 3
cubist 3
mcbeal 3
crypt 3
ives 3
reflecting 3
aliens 3
laughing 3
sculptures 3
stout 3
salzburg 3
graves 3
intake 3
exhale 3
songbird 3
talkin 3
booster 3
milligrams 3
racket 3
minerals 3
checked 3
vanishing 3
klondike 3
1810 3
bacteriologist 3
bucks 3
mendelssohn 3
thru 3
dolly 3
supre

phillips 3
duos 3
airborne 3
apparel 3
awkward 3
marionettes 3
policeman 3
pinocchio 3
paralysis 3
glenn 3
canberra 3
shoppers 3
farrell 3
trials 3
blanks 3
presentday 3
nitrogen 3
nohitters 3
mixture 3
permission 3
ruthless 3
spinach 3
mood 3
bout 3
structural 3
targetblankitas 3
greatgrandson 3
andes 3
herd 3
proceedings 3
hypothetical 3
constructed 3
italianborn 3
eroica 3
triumvirate 3
debussy 3
mourned 3
hagar 3
melt 3
quills 3
abroad 3
diner 3
struggling 3
pathos 3
liberalism 3
portrays 3
turkeys 3
proposal 3
panels 3
netherland 3
ambitious 3
pear 3
mound 3
scuba 3
yonder 3
reclaimed 3
benny 3
eataly 3
clown 3
topselling 3
shannon 3
actresses 3
gate 3
highfrequency 3
consent 3
donated 3
1619 3
cosby 3
volleyball 3
1415 3
yields 3
communications 3
raided 3
pins 3
8ball 3
havent 3
glands 3
humorous 3
abel 3
noise 3
italians 3
targetblankin 3
jung 3
samson 3
pray 3
osborne 3
delawares 3
ancestors 3
proverbs 3
chances 3
aussie 3
weatherman 3
grandpa 3
stirred 3
acclaim 3
highlands 3


hunters 2
ducks 2
gaul 2
schweitzer 2
viral 2
hines 2
terrain 2
quietly 2
brideshead 2
kayes 2
nosehorned 2
confusing 2
bully 2
cariou 2
overnight 2
horsemen 2
dennehy 2
fran 2
garage 2
liffey 2
fastening 2
ancestry 2
pharaohs 2
exxon 2
mending 2
longrange 2
hess 2
taurus 2
introductory 2
civilians 2
espionage 2
mandel 2
maxwell 2
darn 2
vulgar 2
scopes 2
tennessees 2
sensational 2
stealth 2
musicals 2
murfreesboro 2
lonesome 2
commanding 2
victorias 2
benchley 2
widowhood 2
shylock 2
prick 2
editorinchief 2
ladys 2
enclosure 2
braid 2
tread 2
ssns 2
facilities 2
tacky 2
malibu 2
evangelists 2
competitors 2
chronicle 2
wharves 2
highlighted 2
connie 2
tensions 2
locks 2
precinct 2
valentines 2
reaching 2
investigated 2
chefs 2
platt 2
intervention 2
iill 2
sault 2
annexed 2
indonesia 2
tides 2
technically 2
3560 2
navigating 2
decorate 2
caracas 2
erasmus 2
lumpur 2
martinez 2
1564 2
1540 2
rumors 2
1816 2
superlative 2
14letter 2
fork 2
otis 2
redding 2
18yearold 2
goth 2
viscera 2
pi

suspicion 2
overtook 2
tuskegee 2
amnesty 2
rebels 2
appeal 2
churchyard 2
18011835 2
seminary 2
pomegranates 2
pippa 2
punjabi 2
households 2
wasteland 2
botanical 2
nails 2
herbivore 2
bartolommeo 2
houghton 2
simile 2
meter 2
incomprehensible 2
hominid 2
skeleton 2
dropout 2
autopsy 2
microphone 2
englishlanguage 2
shoeless 2
hostile 2
allwhite 2
reborn 2
criticized 2
mating 2
ruff 2
wctu 2
goodfellas 2
cautious 2
horticulturist 2
arrakis 2
melange 2
gospels 2
maltin 2
sickness 2
progesterone 2
batch 2
examiner 2
embedded 2
hawthornes 2
justine 2
illusion 2
theons 2
madhya 2
pradesh 2
encarta 2
1590 2
seems 2
michel 2
douglass 2
grads 2
dilbert 2
bumppo 2
keynote 2
pell 2
satirizes 2
operates 2
hospitals 2
croquet 2
loretta 2
newer 2
forbidding 2
1621 2
hails 2
violent 2
marge 2
crimea 2
poke 2
pollinate 2
highkicking 2
hawkeye 2
bastille 2
ripen 2
rien 2
skill 2
bows 2
zhivago 2
limelight 2
benchmark 2
brigade 2
lieutenants 2
solving 2
flown 2
talon 2
saab 2
confusion 2
peshtigo 2


leahy 2
triglycerides 2
accessories 2
philby 2
jong 2
stabbed 2
westerns 2
1644 2
breakout 2
1483 2
cezanne 2
targetblankhisa 2
coups 2
iberia 2
strokes 2
ashram 2
sticksa 2
lame 2
roanoke 2
assoc 2
minsk 2
notea 2
applause 2
funloving 2
mama 2
olives 2
gwyneth 2
paltrow 2
varied 2
carols 2
ezekiel 2
hides 2
allens 2
katmandu 2
spurs 2
chasing 2
knocks 2
simian 2
towel 2
meadow 2
pond 2
whoville 2
purify 2
encased 2
sparky 2
hotels 2
300th 2
1740 2
basmati 2
millet 2
souvenir 2
retains 2
egypts 2
membership 2
montreals 2
remedies 2
1992s 2
muddah 2
fadduh 2
wyeth 2
engineering 2
chimney 2
countess 2
confessions 2
stepsister 2
brewing 2
orchid 2
exulted 2
wrigley 2
vasa 2
transylvania 2
alphabetic 2
margot 2
fonteyn 2
hays 2
reservation 2
moores 2
shamrock 2
morrow 2
particularly 2
lender 2
soho 2
peep 2
preferring 2
hatter 2
stella 2
kermit 2
refugee 2
scissorhands 2
merchants 2
keyboard 2
ignored 2
void 2
erebus 2
gunter 2
licensed 2
vostok 2
amorous 2
dreamed 2
stagg 2
gulping 2
bowl

excalibur 2
tossing 2
1400s 2
fanfare 2
jolla 2
playhouse 2
aurantifolia 2
beryl 2
educating 2
turners 2
brightened 2
screens 2
firstborn 2
blessing 2
ruffles 2
bullock 2
wakin 2
laborers 2
interviewed 2
entertained 2
motown 2
bathtub 2
alkaline 2
batteries 2
fyodor 2
artistry 2
smashed 2
silence 2
dissatisfied 2
zacatecas 2
depart 2
melvilles 2
subtitle 2
vida 2
marius 2
sundance 2
madisons 2
playful 2
connecticuts 2
partially 2
independencia 2
zathura 2
leavenworth 2
handsa 2
coverage 2
relics 2
staten 2
eliza 2
attendant 2
rational 2
shakira 2
oppose 2
yams 2
repellent 2
venture 2
deltoid 2
tenderfoot 2
summertime 2
whitcomb 2
keeper 2
selftitled 2
grader 2
huygens 2
dispatch 2
ramses 2
rests 2
cork 2
fainted 2
businessmen 2
unusually 2
potency 2
flor 2
urbane 2
crockett 2
lineup 2
chests 2
andersen 2
existentialist 2
shapely 2
melodrama 2
executives 2
eligibility 2
liberia 2
socks 2
hosiery 2
sophomore 2
belgrade 2
dewey 2
janitor 2
sock 2
dixon 2
calle 2
whalers 2
robertson 2
love

ranging 1
bounces 1
aredeeper 1
6pronged 1
deserts 1
fallin 1
tedder 1
nodes 1
previous 1
1035 1
springboard 1
wolfgang 1
9995 1
chula 1
taipei 1
hrefhttpwwwjarchivecommedia20100706dj26jpg 1
outpouring 1
pleated 1
glucose 1
zealands 1
piha 1
hrefhttpwwwjarchivecommedia20100706dj27jpg 1
widespread 1
corollas 1
inclined 1
mosely 1
hrefhttpwwwjarchivecommedia20100706dj14jpg 1
hrefhttpwwwjarchivecommedia20100706dj28jpg 1
hrefhttpwwwjarchivecommedia20100706dj21wmvjimmy 1
moutha 1
ashlyn 1
hrefhttpwwwjarchivecommedia20100706dj29jpg 1
under17 1
haris 1
seferovic 1
hrefhttpwwwjarchivecommedia20100706dj30jpg 1
heene 1
mullaney 1
mcgarry 1
hitmaker 1
marlows 1
yawl 1
untamed 1
heifer 1
sadism 1
cruelty 1
butalso 1
fiercest 1
assaults 1
langham 1
kirshner 1
vh1s 1
spirited 1
teri 1
shipshape 1
mermaids 1
cruisin 1
jomo 1
kenyatta 1
readable 1
approachable 1
palladios 1
1554 1
antiquities 1
execs 1
qualify 1
proprietor 1
rustling 1
saintpierre 1
miquelon 1
vivienne 1
westwood 1
refinery 1
maintain

1490 1
5300 1
swashbuckler 1
wildes 1
nicaraguan 1
harbors 1
bergmans 1
178085 1
178793 1
mckeesport 1
hrefhttpwwwjarchivecommedia20060213j01wmvjon 1
sita 1
commands 1
listenokay 1
rhythmic 1
disciplines 1
hrefhttpwwwjarchivecommedia20060213j02wmvjon 1
pedigreea 1
collars 1
regulator 1
hrefhttpwwwjarchivecommedia20060213j03wmvjon 1
fingermounted 1
brusha 1
tartarregular 1
brushing 1
vmas 1
husk 1
guerre 1
fireroasted 1
sundried 1
romas 1
hrefhttpwwwjarchivecommedia20060213j04wmvrocko 1
pawa 1
hrefhttpwwwjarchivecommedia20060213j04ajpg 1
targetblankpadsa 1
011331 1
pourraisje 1
parler 1
borgwarner 1
hrefhttpwwwjarchivecommedia20060213j05wmvrocko 1
frisky 1
crewa 1
shiitakes 1
morels 1
chanterelles 1
racketeers 1
genetrix 1
flagmaker 1
marvell 1
multiplane 1
rhetoric 1
molon 1
pouchmouthed 1
daddies 1
nannies 1
pharnaces 1
zela 1
accuses 1
steaing 1
attract 1
microbiological 1
sciences 1
cisalpine 1
hrefhttpwwwjarchivecommedia20060213dj25mp3wi 1
monie 1
embrace 1
tendera 1
duplicate 1
ju

greetings 1
noriega 1
hrefhttpwwwjarchivecommedia20040723dj02jpg 1
singeractor 1
feydrautha 1
escorted 1
juntas 1
provinciales 1
campari 1
pernod 1
premeal 1
pincers 1
moliere 1
usbrokered 1
ohlsson 1
halite 1
winona 1
aung 1
godchild 1
nigerianled 1
freetown 1
ousting 1
radiate 1
floridaborn 1
clovis 1
11200 1
cellophane 1
flatbottomed 1
somber 1
taxis 1
gamaa 1
folsom 1
10900 1
thrower 1
westinghouse 1
loaned 1
repaid 1
haiphong 1
anasazi 1
hydramatic 1
stakes 1
adenahopewell 1
effigy 1
freight 1
handbeaten 1
ecuadoran 1
computerenhanced 1
5100 1
rubbles 1
habanera 1
pepys 1
1660 1
footrests 1
accommodate 1
swollen 1
highwire 1
bodysuit 1
trapeze 1
retiring 1
obiwan 1
kenobi 1
comingtocalifornia 1
kalifornia 1
coughs 1
sediment 1
clapping 1
imports 1
texasbased 1
artillery 1
journals 1
trainspotting 1
taxonomic 1
infringing 1
bighaired 1
nothings 1
tehran 1
aryan 1
hrefhttpwwwjarchivecommedia20070112j17jpg 1
1100s 1
crashes 1
furtrading 1
estimate 1
salisbury 1
wiltshire 1
manhours 1

pontiac 1
vernewilkie 1
scifidetective 1
historically 1
freemason 1
shan 1
chungyang 1
rollout 1
angola 1
novelpoem 1
hitchhike 1
diverges 1
annihilation 1
migrate 1
erwin 1
rommel 1
subsaharan 1
langs 1
genome 1
footrace 1
delovely 1
conchiglioni 1
ladders 1
hemorrhagic 1
cordate 1
chanter 1
drones 1
hexahedron 1
toil 1
foys 1
dendroid 1
dentiform 1
speckled 1
nidre 1
chanted 1
atonement 1
bracketed 1
maya 1
spiegelman 1
toralv 1
maurstad 1
hrefhttpwwwjarchivecommedia20071204j30jpg 1
foursided 1
nonparallel 1
capella 1
papals 1
emboldened 1
maxentius 1
gollly 1
mayberry 1
mercalli 1
fearful 1
trill 1
longed 1
dinh 1
105room 1
sleeket 1
cowran 1
timrous 1
panics 1
breastie 1
palefaces 1
hrefhttpwwwjarchivecommedia20071204dj23jpg 1
pointyhaired 1
nerd 1
sajak 1
herobut 1
accumulation 1
roper 1
1544 1
rented 1
mosses 1
dangrous 1
pierian 1
beauties 1
hrefhttpwwwjarchivecommedia20071204dj20jpg 1
austrias 1
mozarts 1
hardcore 1
hinkley 1
6milewide 1
romanced 1
camilla 1
broadlands 1
greatu

flamborough 1
tictactoelike 1
contestants 1
ecological 1
seaward 1
karn 1
manassas 1
kennesaw 1
query 1
verizons 1
weaken 1
theodor 1
geisel 1
judgement 1
contests 1
retablo 1
maese 1
peformed 1
macready 1
hrefhttpwwwjarchivecommedia20050128dj07mp3thus 1
famine 1
followsa 1
intern 1
lagerfeld 1
ioalus 1
iphicles 1
automedusa 1
strawberries 1
hrefhttpwwwjarchivecommedia20050128dj23jpg 1
blackboarda 1
reordering 1
implementing 1
pontifical 1
newmans 1
hrefhttpwwwjarchivecommedia20050128dj14jpg 1
goggles 1
foresta 1
forested 1
conservationist 1
soncronus 1
reine 1
bothered 1
sized 1
flosshilde 1
rhinemaiden 1
rheingold 1
stuntman 1
hrefhttpwwwjarchivecommedia20050128dj16jpg 1
hrefhttpwwwjarchivecommedia20050128dj25jpg 1
tyrolean 1
chaleta 1
hrefhttpwwwjarchivecommedia20050128dj25mp3an 1
overturea 1
bodily 1
hiccup 1
dripper 1
okrent 1
166263 1
167071 1
gamblers 1
yeti 1
2story 1
pavarottis 1
buchan 1
thirtynine 1
sandalwood 1
aquarelle 1
opaque 1
klees 1
mataveri 1
approve 1
sanctity 1
mc

knicksa 1
bleecker 1
alcoholic 1
stylin 1
flatly 1
jungers 1
longago 1
13foot 1
caribou 1
rangifer 1
bogey 1
bacalls 1
zalmay 1
khalizad 1
talibandied 1
brobdingnagian 1
hrefhttpwwwjarchivecommedia20070221dj14jpg 1
brendon 1
operational 1
shrimping 1
batre 1
hrefhttpwwwjarchivecommedia20070221dj06jpg 1
flaska 1
hrefhttpwwwjarchivecommedia20070221dj06ajpg 1
concern 1
rabelaisian 1
tornados 1
alnico 1
hrefhttpwwwjarchivecommedia20070221dj12jpg 1
homegrown 1
amazement 1
climatologist 1
evacuates 1
hrefhttpwwwjarchivecommedia20070221dj08jpg 1
electromagnets 1
horseshoeshaped 1
hrefhttpwwwjarchivecommedia20070221dj08ajpg 1
immeasurably 1
jest 1
poemes 1
barbares 1
bumpkin 1
nary 1
tutu 1
meshugge 1
ashkenazic 1
vigee 1
durant 1
buick 1
derain 1
monsieurs 1
montand 1
heist 1
perls 1
psychotherapy 1
donovan 1
paulina 1
porizkova 1
emlyn 1
admiring 1
dodd 1
funding 1
erskine 1
arthuritative 1
taxfree 1
withdrawals 1
tubeway 1
gaflach 1
seniority 1
wiedlin 1
goblin 1
dowds 1
teethfact 1
lossmt 

facade 1
supersport 1
ninja 1
zx14 1
tussle 1
portly 1
crank 1
segels 1
breakup 1
dessau 1
oldtime 1
jeong 1
1703 1
holiest 1
odoriferous 1
accurately 1
confound 1
chancre 1
insectivore 1
corulers 1
rivals 1
jackal 1
lather 1
rinse 1
undetected 1
presumed 1
colts 1
hallucinogenic 1
exdiplomat 1
rove 1
plame 1
forfeit 1
grief 1
jowls 1
caramba 1
conquistador 1
girdle 1
chancellors 1
helter 1
skelter 1
shouldve 1
incubation 1
3040 1
dollpuss 1
puddnhead 1
hrefhttpwwwjarchivecommedia20100528dj01jpg 1
shipa 1
roskildethe 1
mend 1
yikes 1
indigenous 1
australians 1
initiatory 1
hrefhttpwwwjarchivecommedia20100528dj02jpg 1
imaid 1
mistia 1
atlantabased 1
zealander 1
paksu 1
chipping 1
seedeating 1
ticks 1
hrefhttpwwwjarchivecommedia20100528dj03jpg 1
aislea 1
tsukijia 1
gasp 1
foyer 1
yenisey 1
millionsqmi 1
northcentral 1
ixodes 1
dammini 1
whitefooted 1
whitetailed 1
hrefhttpwwwjarchivecommedia20100528dj04jpg 1
intersectiona 1
thrill 1
warao 1
shamanic 1
hrefhttpwwwjarchivecommedia20100528d

mckellen 1
lotr 1
mcewan 1
keira 1
knightley 1
hrefhttpwwwjarchivecommedia20080520dj20wmvsarah 1
rodere 1
gnaw 1
uppercrust 1
numeral 1
iras 1
charlize 1
theron 1
armorplated 1
thurio 1
catherick 1
javan 1
hrefhttpwwwjarchivecommedia20080520dj30jpg 1
flier 1
lefthanders 1
furnace 1
galbraith 1
indispensable 1
anythingtheres 1
boardroom 1
socked 1
tiled 1
rivarol 1
hrefhttpwwwjarchivecommedia20070115j15jpg 1
filler 1
hrefhttpwwwjarchivecommedia20070115j04jpg 1
georgiaa 1
raffia 1
binder 1
duvalier 1
spingarn 1
secondbusiest 1
hottie 1
hrefhttpwwwjarchivecommedia20070115j07jpg 1
draftsmans 1
toola 1
hrefhttpwwwjarchivecommedia20070115j07bjpg 1
draftsmen 1
preset 1
plug 1
godard 1
morial 1
nora 1
ephron 1
knotty 1
embroidered 1
dulce 1
decorum 1
patria 1
seemly 1
hrefhttpwwwjarchivecommedia20070115j28jpg 1
treehopper 1
galagidae 1
mari 1
zora 1
neale 1
hurston 1
hrefhttpwwwjarchivecommedia20070115dj06jpg 1
winder 1
winders 1
90degree 1
sims 1
genuflect 1
1707 1
eaves 1
masked 1
furbearer 

rightwing 1
hrefhttpwwwjarchivecommedia20080416dj15jpg 1
steep 1
1620s 1
izzie 1
swearingin 1
pelosi 1
linoleum 1
reintroduced 1
trimethylxanthine 1
checkout 1
districts 1
replayed 1
sideways 1
handspring 1
elbows 1
hrefhttpwwwjarchivecommedia19841204j18mp3the 1
induces 1
successive 1
8track 1
alligator 1
nonchocolate 1
blondies 1
supremacy 1
bumblen 1
congressmen 1
apiculturist 1
unifying 1
honeycreeper 1
vibes 1
deported 1
predominanty 1
leaping 1
baldric 1
porthos 1
handkerchief 1
aramis 1
stratocaster 1
blackie 1
959500 1
reddi 1
hrefhttpwwwjarchivecommedia20111024j26jpg 1
hrefhttpwwwjarchivecommedia20111024j26ajpg 1
targetblankparmesan 1
cheesea 1
hrefhttpwwwjarchivecommedia20111024j26bjpg 1
volcanos 1
hrefhttpwwwjarchivecommedia20111024j27jpg 1
hrefhttpwwwjarchivecommedia20111024j27ajpg 1
targetblankbread 1
enthused 1
peles 1
hrefhttpwwwjarchivecommedia20111024j11jpg 1
4700 1
outlets 1
hrefhttpwwwjarchivecommedia20111024j28jpg 1
tuscan 1
microclimates 1
hrefhttpwwwjarchivecommedi

shorttailed 1
rottnest 1
imper 1
eggy 1
hrefhttpwwwjarchivecommedia20080926dj04jpg 1
branagh 1
shingleback 1
skink 1
hamas 1
palestinian 1
backsides 1
manatee 1
sirenia 1
aroma 1
closets 1
ascend 1
hrefhttpwwwjarchivecommedia20050323j02jpg 1
smugglers 1
jangle 1
followin 1
preakness 1
shoebill 1
whaleheaded 1
deliverers 1
500000member 1
criticize 1
kinshasa 1
guamaninan 1
hrefhttpwwwjarchivecommedia20050323j08jpg 1
confiscated 1
shame 1
scrub 1
hoochie 1
nide 1
torched 1
chertoffs 1
admin 1
brasilia 1
belmopan 1
zodiacs 1
tariffs 1
hrefhttpwwwjarchivecommedia20050323j17jpg 1
fieldif 1
rangoon 1
barrets 1
wimpole 1
circuses 1
symobol 1
juggling 1
lovelier 1
hrefhttpwwwjarchivecommedia20050323dj02jpg 1
freeholdborn 1
whitewater 1
trampled 1
underfoot 1
pointless 1
hrefhttpwwwjarchivecommedia20050323dj07jpg 1
targetblankthosea 1
exnixon 1
maladythis 1
incompetence 1
jeffers 1
hrefhttpwwwjarchivecommedia20050323dj13jpg 1
coroners 1
inquest 1
hrefhttpwwwjarchivecommedia20050323dj16mp3herea 

hrefhttpwwwjarchivecommedia20080408j10ajpg 1
targetblankrelativea 1
choreographing 1
hrefhttpwwwjarchivecommedia20080408j26jpg 1
targetblankeastern 1
shorea 1
delmarva 1
hrefhttpwwwjarchivecommedia20080408j26ajpg 1
poltergeist 1
totenberg 1
reopen 1
cornus 1
carpathian 1
hrefhttpwwwjarchivecommedia20080408dj18jpg 1
hrefhttpwwwjarchivecommedia20080408dj18ajpg 1
affix 1
harajuku 1
mcdavid 1
detachable 1
lampoons 1
hrefhttpwwwjarchivecommedia20080408dj19jpg 1
hrefhttpwwwjarchivecommedia20080408dj19ajpg 1
targetblankfootball 1
helmeta 1
uttering 1
hrefhttpwwwjarchivecommedia20080408dj20jpg 1
platea 1
hrefhttpwwwjarchivecommedia20080408dj20ajpg 1
targetblankkutania 1
superseded 1
naca 1
aeronautics 1
hrefhttpwwwjarchivecommedia20080408dj21jpg 1
hrefhttpwwwjarchivecommedia20080408dj21ajpg 1
mollet 1
evaluate 1
udon 1
buckwheatflour 1
assertion 1
hrefhttpwwwjarchivecommedia20080408dj24jpg 1
menoraha 1
hrefhttpwwwjarchivecommedia20080408dj24ajpg 1
menorah 1
hrefhttpwwwjarchivecommedia20080408d

embattled 1
glued 1
flapsticks 1
survives 1
kross 1
kreated 1
kraze 1
klothes 1
inferno 1
traywick 1
riverboat 1
lawford 1
greer 1
garson 1
rognoni 1
cervello 1
kirsch 1
wholesale 1
ladyfingers 1
fingershaped 1
gauls 1
kiefer 1
deforested 1
schonbrunn 1
slash 1
emissions 1
bahamian 1
roddolfo 1
adbandoned 1
258foottall 1
australianamerican 1
senta 1
canisters 1
alphatrack 1
detectors 1
bolton 1
frenchcanadian 1
hearth 1
narrate 1
moneymaking 1
bullfighting 1
pyrotechnics 1
destinations 1
earns 1
tackling 1
hamlisch 1
streisandredford 1
rockthrower 1
herrmann 1
hrefhttpwwwjarchivecommedia19980402j03wmvherea 1
hearst 1
seawolf 1
humanpowered 1
hrefhttpwwwjarchivecommedia19980402j24wmvherea 1
horners 1
prudential 1
skywalk 1
darts 1
beetlejuice 1
wees 1
nicene 1
vegetarian 1
ordains 1
clerks 1
overseers 1
smear 1
slander 1
sans 1
choked 1
motorists 1
frasiers 1
sideshow 1
kelsey 1
grammer 1
siddur 1
mahzor 1
lists 1
windiest 1
moes 1
garb 1
hrefhttpwwwjarchivecommedia19980402dj26jpgherea 

kryptonite 1
poteen 1
hrefhttpwwwjarchivecommedia20070305dj23jpg 1
hrefhttpwwwjarchivecommedia20070305dj23ajpg 1
mustache 1
mustax 1
debts 1
thesenot 1
styron 1
hrefhttpwwwjarchivecommedia20070305dj28jpg 1
freestanding 1
plinth 1
cartier 1
balked 1
laurence 1
sternes 1
tristram 1
petain 1
yakov 1
smirnoff 1
psych 1
hrefhttpwwwjarchivecommedia20070305dj25jpg 1
hrefhttpwwwjarchivecommedia20070305dj25ajpg 1
telfair 1
dufourspitze 1
hrefhttpwwwjarchivecommedia20060601j02jpg 1
hrefhttpwwwjarchivecommedia20060601j02ajpg 1
mcconnell 1
gallega 1
bridgestone 1
hrefhttpwwwjarchivecommedia20060601j03jpg 1
hrefhttpwwwjarchivecommedia20060601j18jpg 1
hrefhttpwwwjarchivecommedia20060601j18ajpg 1
hanford 1
selfdescribed 1
amsterdambased 1
hrefhttpwwwjarchivecommedia20060601j04jpg 1
townsend 1
hrefhttpwwwjarchivecommedia20060601j08jpg 1
hrefhttpwwwjarchivecommedia20060601j08wmvthesea 1
salvage 1
hrefhttpwwwjarchivecommedia20060601j05jpg 1
hrefhttpwwwjarchivecommedia20060601j05ajpg 1
targetblankmaska 1

gneiss 1
ciotte 1
hrefhttpwwwjarchivecommedia20061124dj08mp3herea 1
garps 1
lupus 1
tanner 1
buttermaker 1
nankipoo 1
shreds 1
schoolhouse 1
gambles 1
wonderboykaput 1
maputo 1
unconditional 1
ascribe 1
predetermined 1
characteristics 1
3footlong 1
9foot 1
spied 1
insult 1
outie 1
gunnison 1
pottos 1
tarsiers 1
madagascar 1
pushers 1
pleshette 1
pecked 1
goon 1
bunnies 1
leverets 1
marnie 1
bandmate 1
pillowed 1
adherants 1
prolongs 1
zenodotus 1
strewn 1
coffeecake 1
crumbly 1
ballot 1
hrefhttpwwwjarchivecommedia19980617dj26mp3herea 1
iwake 1
workmans 1
ballparks 1
stacks 1
consisted 1
iommi 1
geezer 1
ballandclaw 1
eratosthenes 1
fiendishly 1
eurovision 1
flaws 1
buggies 1
bikes 1
critiqued 1
provisional 1
therapeutic 1
lipsticksporting 1
17801830 1
furiniture 1
bertolt 1
brechts 1
threepenny 1
presbytery 1
halsman 1
inhalation 1
tetrachloride 1
boozing 1
dirge 1
hrefhttpwwwjarchivecommedia20081222j26jpg 1
targetblankliliuokalania 1
hrefhttpwwwjarchivecommedia20081222j26ajpg 1
target

ypsilanti 1
eisenach 1
reformer 1
wartburg 1
hrefhttpwwwjarchivecommedia20071106j23wmvherea 1
hacksaw 1
bechamel 1
materialism 1
bobo 1
1729 1
binghi 1
hrefhttpwwwjarchivecommedia20071106dj06jpgalex 1
picchua 1
hrefhttpwwwjarchivecommedia20071106dj06ajpg 1
targetblankthat 1
ledgea 1
aligns 1
mcentires 1
18month 1
1667 1
gravitation 1
oreos 1
janeites 1
hrefhttpwwwjarchivecommedia20071106dj07jpgalex 1
hrefhttpwwwjarchivecommedia20071106dj07ajpg 1
targetblankhiram 1
binghama 1
affectionately 1
undiscovered 1
ornaments 1
kidds 1
rails 1
hrefhttpwwwjarchivecommedia20071106dj08jpgalex 1
hrefhttpwwwjarchivecommedia20071106dj08ajpg 1
lovett 1
tuba 1
refute 1
uncertainty 1
hrefhttpwwwjarchivecommedia20071106dj23jpg 1
zoya 1
hrefhttpwwwjarchivecommedia20071106dj09jpgalex 1
sedona 1
hrefhttpwwwjarchivecommedia20071106dj09ajpg 1
mckechnie 1
cassie 1
damboise 1
1628 1
alzheimers 1
disapprovingly 1
wagonmaker 1
hrefhttpwwwjarchivecommedia20071106dj10jpgalex 1
talons 1
hrefhttpwwwjarchivecommedia200

bloomberg 1
activitiesone 1
awaytheres 1
encore 1
foisthe 1
manoa 1
arranges 1
embargo 1
mentors 1
mariology 1
cousine 1
crooning 1
hrefhttpwwwjarchivecommedia20080509j24jpg 1
sather 1
californiaberkeleya 1
hrefhttpwwwjarchivecommedia20080509j24ajpg 1
targetblankfiat 1
luxa 1
pushes 1
boomers 1
daccord 1
hairdos 1
metaphorically 1
lending 1
parlezvous 1
allemand 1
hrefhttpwwwjarchivecommedia20080509dj17jpg 1
hrefhttpwwwjarchivecommedia20080509dj17ajpg 1
pressed 1
nittany 1
lovin 1
litter 1
faraway 1
evenly 1
stadiums 1
buckeyes 1
hrefhttpwwwjarchivecommedia20080509dj02jpg 1
hrefhttpwwwjarchivecommedia20080509dj28jpg 1
hrefhttpwwwjarchivecommedia20080509dj19jpg 1
shorten 1
hrefhttpwwwjarchivecommedia20080509dj19ajpg 1
fern 1
warts 1
corkscrew 1
reworked 1
kroc 1
hrefhttpwwwjarchivecommedia20080509dj14jpg 1
hrefhttpwwwjarchivecommedia20080509dj30jpg 1
hrefhttpwwwjarchivecommedia20080509dj21jpg 1
illustrate 1
hrefhttpwwwjarchivecommedia20080509dj21ajpg 1
newtonian 1
marcy 1
vestments 1
pl

droplets 1
bakery 1
creamy 1
cakelike 1
unicorn 1
drey 1
teaberry 1
shrivers 1
hrefhttpwwwjarchivecommedia20110712dj25jpg 1
hrefhttpwwwjarchivecommedia20110712dj25ajpg 1
byre 1
booklet 1
punches 1
bootlegger 1
dampier 1
inhabitant 1
221b 1
hrefhttpwwwjarchivecommedia20091117j30wmvalex 1
4972a 1
muppeteer 1
clash 1
giggle 1
brangus 1
valid 1
melvin 1
dummars 1
googlyeyed 1
gobbling 1
hermes 1
starving 1
heartland 1
scramble 1
squabble 1
prostitute 1
serum 1
accidentwhoa 1
spleen 1
budds 1
ethelreds 1
hrefhttpwwwjarchivecommedia20091117j12jpg 1
targetblankthereas 1
intestate 1
sweatinducing 1
bath 1
hrefhttpwwwjarchivecommedia20091117dj01jpg 1
sauteed 1
cambrai 1
meck 1
preferred 1
symbolic 1
hrefhttpwwwjarchivecommedia20091117dj20jpg 1
outgunned 1
sovietmade 1
ludovico 1
sforza 1
grandest 1
oncenomadic 1
hrefhttpwwwjarchivecommedia20091117dj03jpg 1
lawrys 1
misnomer 1
keni 1
keto 1
shinhoto 1
chiha 1
kessler 1
hrefhttpwwwjarchivecommedia20091117dj27jpg 1
alvar 1
aalto 1
tartan 1
marinad

qaywayn 1
iskenderun 1
islahiye 1
fessenden 1
abbreviations 1
omitted 1
equally 1
offenders 1
clayand 1
occupational 1
pastorelli 1
cracker 1
eldin 1
hrefhttpwwwjarchivecommedia19991217j21jpg 1
alexakis 1
evercleara 1
alienated 1
semicolon 1
doldrums 1
yarmulke 1
mid80s 1
geddes 1
ellie 1
glidin 1
onehorse 1
paragraphs 1
compost 1
penzance 1
morel 1
gorshin 1
pirating 1
trumpets 1
cursus 1
hedgehog 1
hemiechinus 1
sensory 1
toasted 1
eureka 1
hrefhttpwwwjarchivecommedia20060725j01mp3thisa 1
nasal 1
hrefhttpwwwjarchivecommedia20060725j02mp3thisa 1
offically 1
cheneys 1
flecked 1
nationsnot 1
irelandsfirst 1
1142 1
hrefhttpwwwjarchivecommedia20060725j03mp3herea 1
offenbachs 1
genevieve 1
hrefhttpwwwjarchivecommedia20060725j18jpg 1
setter 1
hrefhttpwwwjarchivecommedia20060725j18ajpg 1
targetblankstylea 1
mozambique 1
decolletage 1
fortuitously 1
hrefhttpwwwjarchivecommedia20060725j05mp3herea 1
deacon 1
2masted 1
rigged 1
hrefhttpwwwjarchivecommedia20060725j06mp3followinga 1
proven 1
terps

ulcer 1
ranitidine 1
hydrochloride 1
smedley 1
peel 1
hrefhttpwwwjarchivecommedia20040707dj16jpg 1
targetblankmerlina 1
victorians 1
invincible 1
planner 1
immunosuppressant 1
transplants 1
fellowsoldiers 1
ivanhoe 1
uncover 1
2340mile 1
talleyrand 1
intriguing 1
sawatch 1
epperson 1
lemonade 1
hatchetwielding 1
smasher 1
hatchetface 1
ortega 1
kisses 1
113mile 1
pesticides 1
oleancontaining 1
unmake 1
licentious 1
pence 1
westphall 1
ehrlich 1
pfennigs 1
unwanted 1
meanie 1
spaghettini 1
mcdermott 1
1670s 1
ruccolo 1
traylor 1
kopecks 1
firewood 1
bind 1
gala 1
berkshires 1
darned 1
urich 1
centessimi 1
squaredoff 1
eartoear 1
biblicalera 1
lovd 1
wisely 1
participants 1
1504 1
hrefhttpwwwjarchivecommedia20060127j20jpg 1
replicas 1
trophies 1
hrefhttpwwwjarchivecommedia20060127j20ajpg 1
targetblank97 1
mastersa 1
hrefhttpwwwjarchivecommedia20060127j20bjpg 1
targetblank2000 1
opena 1
manley 1
abercrombie 1
zambia 1
noyes 1
excommunicated 1
hrefhttpwwwjarchivecommedia20060127j21jpg 1
hr

There are a little under 20000 questions in the dataset, so there don't appear to be any categories that come up with great regularity. However, we can discern some more common categories from the above, namely: city, country, state, film, crew, played, president, american, french, king, song, and so on. These represent categories like history, geography, music, and sports/games.

Each of those words appear in at least 1% of questions on the show. That means it's unlikely for any one of them to appear on the show, but they're still better bets for categorical study than some others.

### Maximizing Question Value

Let's opt for a different strategy now. We want to get the most value out of our studying hours, so let's direct them towards categories or questions that have a high monetary value.

We can get a rough sense of this by assigning a monetary value to individual words that appeared in our terms_used set previously. We'll begin by classifying each question as either high or low value. Next, we'll define a function which will calculate the value of each word by counting how many times each word appears in high and low value questions respectively. We'll then apply this function to a small selection of words, giving us two frequencies for each word. Finally, we'll compare these frequencies with the expected frequencies of each word with a chi-squared test. The results of this test will indicate whether there is a correlation between the word and the value of the question.

In [90]:
def find_worth(row):
    value = row["Value"]
    if value > 800:
        worth = 1
    else:
        worth = 0
    return (worth)

data["high_value"] = data.apply(find_worth, axis = 1)

In [97]:
def word_worth(word):
    low_count = 0
    high_count = 0
    for i, row in data.iterrows():
        if word in row["Question"].split():
            if row["high_value"] == 1:
                high_count += 1
            else:
                low_count += 1
    return(high_count, low_count)

terms = list(terms_used)[:5]
# To make computation quicker, we'll only take the first 5 terms from terms_used.
observed_counts = []
for term in terms:
    observed_counts.append(word_worth(term))

observed_counts

[(0, 2), (0, 1), (1, 1), (0, 1), (0, 1)]

We have the number of high and low value counts for each of the five words we've selected. We now need to find the expected value counts so that we can work out whether any of these results are significant.

In [103]:
from scipy.stats import chisquare

high_value_count = data[data["high_value"] == 1].shape[0]
low_value_count = data[data["high_value"] == 0].shape[0]

chi_squared = []

for observed in observed_counts:
    total = observed[0] + observed[1] # Number of times each word appears in the dataset
    total_proportion = total / data.shape[0] # Proportion of times each word appears in dataset
    high_value_expected = total_proportion * high_value_count
    low_value_expected = total_proportion * low_value_count
    
    observed = np.array([observed[0], observed[1]])
    expected = np.array([high_value_expected, low_value_expected])
    
    chi_squared.append(chisquare(observed, expected))

chi_squared
# "statistic" below is the chi-squared value.

[Power_divergenceResult(statistic=0.803925692253768, pvalue=0.3699222378079571),
 Power_divergenceResult(statistic=0.401962846126884, pvalue=0.5260772985705469),
 Power_divergenceResult(statistic=0.4448774816612795, pvalue=0.5047776487545996),
 Power_divergenceResult(statistic=0.401962846126884, pvalue=0.5260772985705469),
 Power_divergenceResult(statistic=0.401962846126884, pvalue=0.5260772985705469)]

All of the p-values for these words are very high, meaning the probability that the given words would appear in high/low value questions at the frequencies they did by random chance is high. That is, if the words in high value questions and low value questions were decided randomly, there is a good chance that the frequencies we observed would occur as a result of that random process. As such, it is unlikely that there is a correlation between the words we've investigated and the value of the question. For reference, a p-value is considered statistically significant if it is less than 0.05 (or 5%).

Similarly, the chi-squared values are all low, meaning there is little difference between the observed and expected values, again indicating a lack of a correlation between our selected words and the value of the question.

Now, it should be noted that chi-squared tests become less useful as the sample size decreases. Each word that we've selected only appears twice, meaning that our sample size is extremely small. This makes chi-squared an almost useless measure in this situation. As expected, we're unable to conclude anything much from this line of investigation.

### Conclusion

To conclude, then, our best strategy appears to be either studying past Jeopardy questions, or studying specific categories which the Jeopardy questions tend to fall into.