Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse code

Shouldn't be getting 403 from Wikipedia anymore

  • Loading branch information...
commit 30357bc64b7e134a52c04c9633e7dcaeaa31ca1b 1 parent 8f64ddc
Michael Chen authored December 07, 2011

Showing 1 changed file with 6 additions and 1 deletion. Show diff stats Hide diff stats

  1. 7  trivialpursuitfunctions.py
7  trivialpursuitfunctions.py
@@ -84,8 +84,13 @@ def getAnswerKeywords(answers):
84 84
 
85 85
 def getTokens(urls):
86 86
     combinedtokens = []
  87
+    
  88
+    user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
  89
+    headers={'User-Agent':user_agent,} 
  90
+    
87 91
     for url in urls:
88  
-        html = urlopen(url).read()
  92
+        req = urllib2.Request(url,None,headers)
  93
+        html = urlopen(req).read()
89 94
         raw = nltk.clean_html(html)
90 95
         combinedtokens += nltk.word_tokenize(raw)
91 96
     # may need to adjust 2 value here depending on answer choices

0 notes on commit 30357bc

Please sign in to comment.
Something went wrong with that request. Please try again.