<a href="https://colab.research.google.com/github/dscoool/dataincontext/blob/main/Wikipedia_API_for_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Wikipedia API for Python


## In this tutorial let us understand the usage of Wikipedia API.


![alt text](https://miro.medium.com/max/1400/1*1FHnsWYdcfxoygKxkTdJew.png)

# Introduction

Wikipedia, the world’s largest and free encyclopedia. It is the land full of information. I mean who would have used Wikipedia in their entire life (If you haven’t used it then most probably you are lying). The python library called `Wikipedia` allows us to easily access and parse the data from Wikipedia. In other words, you can also use this library as a little scraper where you can scrape only limited information from Wikipedia. We will see how can we do that today in this tutorial.




---




# Installation

The first step of using the API is manually installing it. Because, this is an external API it’s not built-in, so just type the following command to install it.

* If you are using a [jupyter notebook](https://colab.research.google.com/notebooks/intro.ipynb) then make sure you use the below command (with the ‘!’ mark — the reason for this is it tell the jupyter notebook environment that a command is being typed (AKA **command mode**).


In [1]:
!pip install wikipedia

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11678 sha256=fcfe90dd745d2641519d96faaad374b463d6a68734462d265bbb6aa28207a3b4
  Stored in directory: /root/.cache/pip/wheels/5e/b6/c5/93f3dec388ae76edc830cb42901bb0232504dfc0df02fc50de
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0


* If you are using any IDE such as [Microsoft Visual Studio Code](https://code.visualstudio.com/), [PyCharm](https://www.jetbrains.com/pycharm/) and even [Sublime Text](https://www.sublimetext.com/3) then make sure in the terminal you enter the below command:


In [None]:
pip install wikipedia

After you enter the above command, in either of the above two cases you will be then prompted by success message like the one shown below. This is an indication that the library is successfully installed.


In [None]:
!pip install wikipedia

Collecting wikipedia
  Downloading https://files.pythonhosted.org/packages/67/35/25e68fbc99e672127cc6fbb14b8ec1ba3dfef035bf1e4c90f78f24a80b7d/wikipedia-1.4.0.tar.gz
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia: filename=wikipedia-1.4.0-cp36-none-any.whl size=11686 sha256=d0d5cc5f62e177020a96252ea5991ec3839cb9e4a302f3d217426ce0a2c406d5
  Stored in directory: /root/.cache/pip/wheels/87/2a/18/4e471fd96d12114d16fe4a446d00c3b38fb9efcb744bd31f4a
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0




---



# Search and Suggestion

Now let us see some of the built-in methods provided by the Wikipedia API. The first one is Search and Suggestion. I’m pretty sure you guys might know the usage of these two methods because of its name.

## Search

The search method returns the search result for a query. Just like other search engines, Wikipedia has its own search engine, you can have a look at it below:

[Wikipedia Search](https://en.wikipedia.org/w/index.php?search)

Now let us see how to retrieve the search results of a query using python. I will use **“Coronavirus”** as the topic in today’s tutorial because as well all know it’s trending and spreading worldwide. The first thing before starting to use API you need to first import it.


In [3]:
import wikipedia
print(wikipedia.search("Human"))

['Human', 'Human rights', 'Human sexuality', 'Human body', 'Human trafficking', 'Human evolution', 'Human brain', 'Human resources', 'Human Development Index', 'Human Race']


The above are some of the most searched queries on Wikipedia if you don’t believe me, go to the above link I have given and search for the topic and compare the results. And the search results change every hour probably.


There are some of the ways where you can filter the search results by using search parameters such as results and suggestion (I know don’t worry about the spelling). The result returns the maximum number of results and the suggestion if True, return results and suggestion (if any) in a tuple.


In [4]:
print(wikipedia.search("Human", results = 5, suggestion = True))

(['Human', 'Human rights', 'Human sexuality', 'Human body', 'Human trafficking'], 'man')


## Suggestion

Now the suggestion as the name suggests returns the suggested Wikipedia title for the query or none if it doesn't get any.


In [5]:
print(wikipedia.suggest('Human'))

man




---



# Summary

To get the summary of an article use the **“summary”** method as shown below:


In [6]:
print(wikipedia.summary("Human"))

A man is an adult male human. Prior to adulthood, a male human is referred to as a boy (a male child or adolescent).
Like most other male mammals, a man's genome usually inherits an X chromosome from the mother and a Y chromosome from the father. Sex differentiation of the male fetus is governed by the SRY gene on the Y chromosome. During puberty, hormones which stimulate androgen production result in the development of secondary sexual characteristics, thus exhibiting greater differences between the sexes. These include greater muscle mass, the growth of facial hair and a lower body fat composition. Male anatomy is distinguished from female anatomy by the male reproductive system, which includes the penis, testicles, sperm duct, prostate gland and the epididymis, and by secondary sex characteristics, including a narrower pelvis, narrower hips, and smaller breasts.
Throughout human history, traditional gender roles have often defined and limited men's activities and opportunities. Men 

But sometimes be careful, you might run into a `DisambiguationError`. Which means the same words with different meanings. For example, the word **“bass”** can represent a fish or beats or many more. At that time the summary method throws an error as shown below.



> **Hint**: Be specific in your approach




In [7]:
print(wikipedia.summary("bass"))



  lis = BeautifulSoup(html).find_all('li')


DisambiguationError: ignored

Also, Wikipedia API gives us an option to change the language that we want to read the articles. All you have to do it set the language to your desired language. **Any french readers in the house, I would be using the french language as a reference.**


In [8]:
wikipedia.set_lang("fr")
wikipedia.summary("Human")

DisambiguationError: ignored



---



# Languages supported

Now let us what languages does Wikipedia support, this might be a common question that people ask. Now here is the answer. Currently, Wikipedia supports **444 different languages**. To find it see the code below:


In [None]:
wikipedia.languages()

{'aa': 'Qafár af',
 'ab': 'Аҧсшәа',
 'abs': 'bahasa ambon',
 'ace': 'Acèh',
 'ady': 'адыгабзэ',
 'ady-cyrl': 'адыгабзэ',
 'aeb': 'تونسي/Tûnsî',
 'aeb-arab': 'تونسي',
 'aeb-latn': 'Tûnsî',
 'af': 'Afrikaans',
 'ak': 'Akan',
 'aln': 'Gegë',
 'als': 'Alemannisch',
 'am': 'አማርኛ',
 'an': 'aragonés',
 'ang': 'Ænglisc',
 'anp': 'अङ्गिका',
 'ar': 'العربية',
 'arc': 'ܐܪܡܝܐ',
 'arn': 'mapudungun',
 'arq': 'جازايرية',
 'ary': 'Maġribi',
 'arz': 'مصرى',
 'as': 'অসমীয়া',
 'ase': 'American sign language',
 'ast': 'asturianu',
 'atj': 'Atikamekw',
 'av': 'авар',
 'avk': 'Kotava',
 'awa': 'अवधी',
 'ay': 'Aymar aru',
 'az': 'azərbaycanca',
 'azb': 'تۆرکجه',
 'ba': 'башҡортса',
 'ban': 'Bali',
 'bar': 'Boarisch',
 'bat-smg': 'žemaitėška',
 'bbc': 'Batak Toba',
 'bbc-latn': 'Batak Toba',
 'bcc': 'جهلسری بلوچی',
 'bcl': 'Bikol Central',
 'be': 'беларуская',
 'be-tarask': 'беларуская (тарашкевіца)\u200e',
 'be-x-old': 'беларуская (тарашкевіца)\u200e',
 'bg': 'български',
 'bgn': 'روچ کپتین بلوچی',
 'bh': 

To check is a language is supported then write a condition as shown below:


In [None]:
'en' in wikipedia.languages()

True

Here **‘en’** stands for **‘English’** and you know the answer for the above code. Its obviously a **“True”** or **“False”**, here it’s **“True”**


Also, to get a possible language prefix please try:


In [None]:
wikipedia.languages()['en']

'English'



---



# Page Access


The API also gives us full access to the Wikipedia page, with the help of which we can access the title, URL, content, images, links of the complete page. In order to access the page you need to load the page first as shown below:

**Just a heads up, I will use a single article topic (Coronavirus) as a reference in this example:**



In [10]:
politics = wikipedia.page("politics")

## Title

To access the title of the above-provided page use:


In [11]:
print(politics.title)

Élection orageuse


## URL
To get the URL of the page use:

In [12]:
print(politics.url)

https://fr.wikipedia.org/wiki/%C3%89lection_orageuse


## Content
To access the content of the page use:


In [13]:
print(politics.content)

Élection orageuse (titre original : Politics) est un film américain réalisé par Charles Reisner, sorti en 1931.


== Synopsis ==
Le crime et la corruption est endémique à Lake City et alors que le maire corrompu se présente pour un nouveau mandat, dont il a beaucoup de chance qu'il remportera, les femmes de la ville, dégoûtées de la situation, décident d'agir. Elles demande à Hattie Burns de se présenter contre lui qui accepte. Très vite, Le groupe des femmes et le groupe des hommes politique devient une véritable bataille des sexes. En fin de compte, après une fastidieuses campagne où tous les coups sont permit Hattie glisse vers la victoire.


== Fiche technique ==
Titre original : Politics
Titre français : Élection orageuse
Réalisation : Charles Reisner
Scénario : Zelda Sears, Malcolm Stuart Boylan, Wells Root et Robert E. Hopkins
Direction artistique : Cedric Gibbons
Photographie : Clyde De Vinna
Montage : William S. Gray
Pays d'origine :  États-Unis
Format : Noir et blanc — 35 mm 



> **Hint**: You can get the content of the entire page using the above method



## Images

Yes, you are right we can get the images from the Wikipedia article. But the catch point here is, we can’t render the whole images here but we can get them as URL’s as shown below:


In [14]:
print(politics.images)

['https://upload.wikimedia.org/wikipedia/commons/7/7c/1930s.png', 'https://upload.wikimedia.org/wikipedia/commons/7/73/Blue_pencil.svg', 'https://upload.wikimedia.org/wikipedia/commons/a/aa/Circle-icons-filmreel.svg', 'https://upload.wikimedia.org/wikipedia/commons/a/a4/Flag_of_the_United_States.svg', 'https://upload.wikimedia.org/wikipedia/commons/3/35/Information_icon.svg', 'https://upload.wikimedia.org/wikipedia/commons/5/53/Nuvola_USA_flag.svg', 'https://upload.wikimedia.org/wikipedia/commons/7/77/United_States_film_clapperboard.svg']


## Links
Similarly, we can get the links that Wikipedia used as a reference from different websites or research, etc.

In [15]:
print(politics.links)

['1931', '1931 au cinéma', 'Ann Dvorak', 'Aventures au harem', 'Brotherly Love (film, 1928)', "Cadet d'eau douce", 'Ce bon monsieur Hunter', 'Cedric Gibbons', 'Charles Reisner', 'Chasing Rainbows (film, 1930)', 'China Bound', 'Cinéma', 'Cinéma américain', 'Claire Du Brey', 'Clyde De Vinna', 'Comédie (cinéma)', 'DeWitt Jennings', 'Everybody Dance', 'Flying High (film, 1931)', "Format d'image", 'Format de pellicule photographique', 'Grand Cœur', "Harrigan's Kid", 'Herbert Prior', 'Hollywood chante et danse', 'In This Corner', "It's in the Air", 'Joan Marsh', 'John Miljan', 'Karen Morley', 'Lee Phelps', 'Les Marx au grand magasin', 'Love in the Rough', 'Malcolm Stuart Boylan', 'Manhattan Merry-Go-Round', 'Marie Dressler', 'Mary Alden', 'Meet the People', 'Monophonique', 'Murder Goes to College', 'Noir et blanc', 'Pension de famille (film)', 'Polly Moran', 'Reducing', 'Robert Dudley (acteur)', 'Robert E. Hopkins', 'Roscoe Ates', "Sophie Lang s'évade", 'Stepping Out (film, 1931)', 'Student 



---



So, there you go, you have reached the end of the tutorial of Wikipedia API for Python. To know more methods visit [Wikipedia API](https://wikipedia.readthedocs.io/en/latest/code.html#api). I hope you guys had a lot of fun learning and implementing. If you guys have any comments or concerns let me know via the comment section below. Until then Good-Bye.


# Be Safe.

