Skip to content

Commit

Permalink
Merge branch 'master' of github.com:chriskelvinlee/trivial_pursuit
Browse files Browse the repository at this point in the history
  • Loading branch information
chriskelvinlee committed Dec 9, 2011
2 parents 2ba56a2 + 30357bc commit f8d65c2
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion trivialpursuitfunctions.py
Expand Up @@ -84,8 +84,13 @@ def getAnswerKeywords(answers):

def getTokens(urls):
combinedtokens = []

user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
headers={'User-Agent':user_agent,}

for url in urls:
html = urlopen(url).read()
req = urllib2.Request(url,None,headers)
html = urlopen(req).read()
raw = nltk.clean_html(html)
combinedtokens += nltk.word_tokenize(raw)
# may need to adjust 2 value here depending on answer choices
Expand Down

0 comments on commit f8d65c2

Please sign in to comment.