### **A snippet on how to use wikipedia resources on Jupyter Notebook**

In [1]:
# First thing first. Lets install wikipedia module and import it.
!pip install wikipedia
import wikipedia
print("\n\n\nWikipedia is ready for use.")

Collecting wikipedia
  Downloading wikipedia-1.4.0.tar.gz (27 kB)
Building wheels for collected packages: wikipedia
  Building wheel for wikipedia (setup.py) ... [?25l- \ done
[?25h  Created wheel for wikipedia: filename=wikipedia-1.4.0-py3-none-any.whl size=11685 sha256=ab8681b49fcf5a7104ec379745d3c079022ee46ca1c85d525b2b6c8668dd9f03
  Stored in directory: /root/.cache/pip/wheels/15/93/6d/5b2c68b8a64c7a7a04947b4ed6d89fb557dcc6bc27d1d7f3ba
Successfully built wikipedia
Installing collected packages: wikipedia
Successfully installed wikipedia-1.4.0
You should consider upgrading via the '/opt/conda/bin/python3.7 -m pip install --upgrade pip' command.[0m



Wikipedia is ready for use.


We will search for a certain page and explore it first.

In [2]:
# Its easy to find certain topics.
page_capitals = wikipedia.page("List of national capitals")
# So what type of objects are we dealing with?
type(page_capitals)

wikipedia.wikipedia.WikipediaPage

Good thing is, this [link](https://wikipedia.readthedocs.io/en/latest/code.html#indices-and-tables) gives us all the information we need to use wikipedia module.
Let's see which url our module was able to generate. We would also navigate there manually.

In [3]:
# This should have the url we need.
page_capitals.url

'https://en.wikipedia.org/wiki/List_of_national_capitals'

Time to save the page contents into a variable.

In [4]:
# now that we have the URL to work with, lets grab the contents.
page_content = page_capitals.content
print(type(page_content))  # This tells us the object type.
print(page_capitals.title)  # This should bring up correct page title. 

<class 'str'>
List of national capitals


Lets check the entire page_content.

In [5]:
print(page_content)

This is a list of national capitals, including capitals of territories and dependencies, non-sovereign states including associated states and entities whose sovereignty is disputed.  
The capitals included on this list are those associated with states or territories listed by the international standard ISO 3166-1, or that are included in the list of states with limited recognition.
Sovereign states and observer states within the United Nations are shown in bold text.


== Notes ==


== References ==


Well that wasn't extremely useful. What about the tables? Using wikipedia.html() will definitely work (and pull everything) but will return unformatted text. This is where [BeautifulSoup](https://pypi.org/project/beautifulsoup4/) comes in (not covered in this article). But let's keep on exploring.

In [6]:
page_capitals.categories

['Articles containing Spanish-language text',
 'Articles with short description',
 'Lists of capitals',
 'Lists of countries',
 'Short description is different from Wikidata']

We can pass .images to capture all the image urls and possibly download the flags? That would be nice. :)

In [7]:
capture_the_flags = page_capitals.images
# Lets check for any inconsistencies.
capture_the_flags

['https://upload.wikimedia.org/wikipedia/commons/9/9a/Flag_of_Afghanistan.svg',
 'https://upload.wikimedia.org/wikipedia/commons/3/36/Flag_of_Albania.svg',
 'https://upload.wikimedia.org/wikipedia/commons/7/77/Flag_of_Algeria.svg',
 'https://upload.wikimedia.org/wikipedia/commons/8/87/Flag_of_American_Samoa.svg',
 'https://upload.wikimedia.org/wikipedia/commons/1/19/Flag_of_Andorra.svg',
 'https://upload.wikimedia.org/wikipedia/commons/9/9d/Flag_of_Angola.svg',
 'https://upload.wikimedia.org/wikipedia/commons/b/b4/Flag_of_Anguilla.svg',
 'https://upload.wikimedia.org/wikipedia/commons/8/89/Flag_of_Antigua_and_Barbuda.svg',
 'https://upload.wikimedia.org/wikipedia/commons/1/1a/Flag_of_Argentina.svg',
 'https://upload.wikimedia.org/wikipedia/commons/2/2f/Flag_of_Armenia.svg',
 'https://upload.wikimedia.org/wikipedia/commons/3/3d/Flag_of_Artsakh.svg',
 'https://upload.wikimedia.org/wikipedia/commons/f/f6/Flag_of_Aruba.svg',
 'https://upload.wikimedia.org/wikipedia/commons/6/65/Flag_of_Asc

In [8]:
print(f"We have flags of total {len(capture_the_flags)} countries here.")


We have flags of total 243 countries here.


To download the files, we will use the wget module which is extremely easy to use.

In [9]:
!pip install wget
import wget
print("\n\n\n\nwget is ready for use")

Collecting wget
  Downloading wget-3.2.zip (10 kB)
Building wheels for collected packages: wget
  Building wheel for wget (setup.py) ... [?25l- \ done
[?25h  Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9681 sha256=a938333e64c8fdeb9c801bff620889c26ec6c5d144a426b4718bcf3c7ea8048e
  Stored in directory: /root/.cache/pip/wheels/a1/b6/7c/0e63e34eb06634181c63adacca38b79ff8f35c37e3c13e3c02
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2
You should consider upgrading via the '/opt/conda/bin/python3.7 -m pip install --upgrade pip' command.[0m




wget is ready for use


In [10]:
# We will make a list, capture the names of the flags as well as download them in working directory.
flags = []

# Looping to get the first 5 flags only.

for i in capture_the_flags[:5]:
    # Use wget download method to download specified image url.
    image_filename = wget.download(i)

    print('Image Successfully Downloaded: ', image_filename)
    flags.append(image_filename)

Image Successfully Downloaded:  Flag_of_Afghanistan.svg
Image Successfully Downloaded:  Flag_of_Albania.svg
Image Successfully Downloaded:  Flag_of_Algeria.svg
Image Successfully Downloaded:  Flag_of_American_Samoa.svg
Image Successfully Downloaded:  Flag_of_Andorra.svg


In [11]:
flags

['Flag_of_Afghanistan.svg',
 'Flag_of_Albania.svg',
 'Flag_of_Algeria.svg',
 'Flag_of_American_Samoa.svg',
 'Flag_of_Andorra.svg']

### Success!!!

#### In case you are like me and since 'wget' automatically doesn't overwrite existing file/file-name, use this code to manually delete them (in my case kaggle) or simply delete everything from your local directoy if you have direct access to. Or just create a simple function to check for existing file and tell wget not to skip that. Possibilities are unlimited. 

In [12]:
# import os
# os.remove("./Flag_of_Afghanistan.svg")  # file name

### Have a great day :)