## Set up Wikipedia API

In [2]:
!pip install wikipedia

Collecting wikipedia
  Using cached https://files.pythonhosted.org/packages/67/35/25e68fbc99e672127cc6fbb14b8ec1ba3dfef035bf1e4c90f78f24a80b7d/wikipedia-1.4.0.tar.gz
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py): started
  Building wheel for wikipedia (setup.py): finished with status 'done'
  Stored in directory: C:\Users\tiffl\AppData\Local\pip\Cache\wheels\87\2a\18\4e471fd96d12114d16fe4a446d00c3b38fb9efcb744bd31f4a
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


You should consider upgrading via the 'python -m pip install --upgrade pip' command.


In [26]:
import wikipedia
# the following imports resolve the certificate error
import os
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
os.environ['CURL_CA_BUNDLE'] = ""
os.environ['PYTHONWARNINGS']="ignore:Unverified HTTPS request"

## Testing Wikipedia API Functions

In [31]:
# getting suggestions
print(wikipedia.search("Biden"))
%timeit wikipedia.search("Biden")

['Joe Biden', 'Hunter Biden', 'Jill Biden', 'Biden–Ukraine conspiracy theory', 'Family of Joe Biden', 'Beau Biden', 'Ashley Biden', 'Presidency of Joe Biden', 'Neilia Hunter Biden', 'Cabinet of Joe Biden']
1.01 µs ± 33.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [36]:
# finding result for the search
# sentences = 2 refers to numbers of line
result = wikipedia.summary("Biden", sentences = 2)
# WARNING: can get a disambiguation error (e.g., "corona"... beer? borealis? song?)
# slight misspellings appear to still work

print(result)

Joseph Robinette Biden Jr. ( BY-dən; born November 20, 1942) is an American politician who is the 46th and current president of the United States.


In [19]:
"""The page() method is used to get the contents, categories, coordinates, images, 
links and other metadata of a Wikipedia page."""

# wikipedia page object is created
page_object = wikipedia.page("Biden")
 
# printing html of page_object
print(page_object.html)
 
# printing title
print(page_object.original_title)
 
# printing links on that page object
print(page_object.links[0:10])

<bound method WikipediaPage.html of <WikipediaPage 'Joe Biden'>>
Joe Biden
['100 Days Masking Challenge', '100th United States Congress', '101st United States Congress', '102nd United States Congress', '103rd United States Congress', '104th United States Congress', '105th United States Congress', '106th United States Congress', '107th United States Congress', '108th United States Congress']


## Using Wikipedia API on Speaker Names

In [51]:
"""Given a speaker name (str), use wikipedia API to find the corresponding speaker type."""
def find_speaker(name, exact=True):
    if not exact: 
        name = wikipedia.search(name)[0] #guaranteed page hit; assuming the first search result is what we want
    first_sentences = wikipedia.summary(name, sentences = 2) #counts a period as end of a sentence
    return first_sentences

find_speaker("President Trump")

"Donald John Trump (born June 14, 1946) is an American politician, media personality, and businessman who served as the 45th president of the United States from 2017 to 2021.\nBorn and raised in Queens, New York City, Trump graduated from the Wharton School of the University of Pennsylvania with a bachelor's degree in 1968."

In [52]:
%timeit find_speaker("President Trump")

1.6 µs ± 57.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [62]:
#testing a variety of names
print(find_speaker("Biden"))
print("\n",find_speaker("Senator John Barrasso"))
print("\n",find_speaker("Sen Roger Wicker"))
print("\n",find_speaker("Kathleen H. Hicks"))
print("\n",find_speaker("Navy Adm. Christopher W."))

Joseph Robinette Biden Jr. ( BY-dən; born November 20, 1942) is an American politician who is the 46th and current president of the United States.

 John Anthony Barrasso III ( bə-RAH-soh; born July 21, 1952) is an American physician and politician serving as the senior United States senator from Wyoming. A member of the Republican Party, he previously served in the Wyoming State Senate.

 Roger Frederick Wicker (born July 5, 1951) is an American attorney and politician serving as the senior United States senator from Mississippi, in office since 2007. A member of the Republican Party, Wicker previously served as a member of the United States House of Representatives and the Mississippi State Senate.

 Kathleen Holland Hicks (born September 25, 1970) is an American government official who has served as the United States deputy secretary of defense since February 9, 2021, where she will lead the modernization of the country's nuclear triad. Hicks is the first Senate-confirmed woman in t

Considerations:

- extract speaker type from wikipedia summary... regex? NER? key words?
- worth it compared to extracting speaker type from article? (speed concerns)
- what are the speaker types? broad categories or specific titles
- create system where wikipedia results are stored? first check with stored list?