# Setup
Source: https://pypi.org/project/Wikipedia-API/


In [None]:
!pip install wikipedia-api

Collecting wikipedia-api
  Downloading Wikipedia-API-0.5.4.tar.gz (18 kB)
Building wheels for collected packages: wikipedia-api
  Building wheel for wikipedia-api (setup.py) ... [?25l[?25hdone
  Created wheel for wikipedia-api: filename=Wikipedia_API-0.5.4-py3-none-any.whl size=13475 sha256=c26daeafc9c030e39df235270c8ce38e10ebe7c55c347a661e7b75682fc02453
  Stored in directory: /root/.cache/pip/wheels/d3/24/56/58ba93cf78be162451144e7a9889603f437976ef1ae7013d04
Successfully built wikipedia-api
Installing collected packages: wikipedia-api
Successfully installed wikipedia-api-0.5.4


In [None]:
import wikipediaapi
import pandas as pd

pd.set_option('display.max_columns', None)
#pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)

# Wikipedia

Let's go to Wikipedia and find a page that you would like to further understand. I will pick "Spice" page for this example: https://en.wikipedia.org/wiki/Spice

![picture](https://drive.google.com/uc?id=1UeGzVI9xT6kQ6Rr9HvDYoCz0-TpZYrWI)



In [None]:
# Define language and page
wiki = wikipediaapi.Wikipedia('en')
page_spice = wiki.page('Spice')

In [None]:
# Get title
page_spice.title

'Spice'

In [None]:
# Get summary: short description
page_spice.summary

'A spice is a seed, fruit, root, bark, or other plant substance primarily used for flavoring or coloring food. Spices are distinguished from herbs, which are the leaves, flowers, or stems of plants used for flavoring or as a garnish. Spices are sometimes used in medicine, religious rituals, cosmetics or perfume production.'

In [None]:
print(page_spice.text)

A spice is a seed, fruit, root, bark, or other plant substance primarily used for flavoring or coloring food. Spices are distinguished from herbs, which are the leaves, flowers, or stems of plants used for flavoring or as a garnish. Spices are sometimes used in medicine, religious rituals, cosmetics or perfume production.

History
Early history
The spice trade developed throughout the Indian subcontinent by at earliest 2000 BCE with cinnamon and black pepper, and in East Asia with herbs and pepper. The Egyptians used herbs for mummification and their demand for exotic spices and herbs helped stimulate world trade. The word spice comes from the Old French word espice, which became epice, and which came from the Latin root spec, the noun referring to "appearance, sort, kind": species has the same root. By 1000 BCE, medical systems based upon herbs could be found in China, Korea, and India. Early uses were connected with magic, medicine, religion, tradition, and preservation.Cloves were u

In [None]:
print(page_spice.sections)

[Section: History (1):

Subsections (3):
Section: Early history (2):
The spice trade developed throughout the Indian subcontinent by at earliest 2000 BCE with cinnamon and black pepper, and in East Asia with herbs and pepper. The Egyptians used herbs for mummification and their demand for exotic spices and herbs helped stimulate world trade. The word spice comes from the Old French word espice, which became epice, and which came from the Latin root spec, the noun referring to "appearance, sort, kind": species has the same root. By 1000 BCE, medical systems based upon herbs could be found in China, Korea, and India. Early uses were connected with magic, medicine, religion, tradition, and preservation.Cloves were used in Mesopotamia by 1700 BCE. The ancient Indian epic Ramayana mentions cloves. The Romans had cloves in the 1st century CE, as Pliny the Elder wrote about them.The earliest written records of spices come from ancient Egyptian, Chinese, and Indian cultures. The Ebers Papyrus 

In [None]:
def print_sections(sections, level=0):
        for s in sections:
            print(level * "\t*", (s.title + ': '), s.text[0:100])
            print_sections(s.sections, level + 1)


In [None]:
print_sections(page_spice.sections, level=0)

 History:  
	* Early history:  The spice trade developed throughout the Indian subcontinent by at earliest 2000 BCE with cinnamon a
	* Middle Ages:  Spices were among the most demanded and expensive products available in Europe in the Middle Ages,[5
	* Early Modern Period:  Spain and Portugal were interested in seeking new routes to trade in spices and other valuable produ
 Function:  Spices are primarily used as food flavoring. They are also used to perfume cosmetics and incense. At
 Classification and types:  
	* Culinary herbs and spices:  
	* Botanical basis:  
	* Common spice mixtures:  
 Handling:  A spice may be available in several forms: fresh, whole dried, or pre-ground dried. Generally, spice
	* Salmonella contamination:  A study by the Food and Drug Administration of shipments of spices to the United States during fisca
 Nutrition:  Because they tend to have strong flavors and are used in small quantities, spices tend to add few ca
 Production:  India contributes 75% of glo

In [None]:
# Get categories for a page
def page_categories(page):
        categories = page.categories
        for title in sorted(categories.keys()):
            print("%s: %s" % (title, categories[title]))


print("Categories")
page_categories(page_spice)

Categories
Category:All articles needing examples: Category:All articles needing examples (id: ??, ns: 14)
Category:All articles with unsourced statements: Category:All articles with unsourced statements (id: ??, ns: 14)
Category:Articles needing examples from December 2018: Category:Articles needing examples from December 2018 (id: ??, ns: 14)
Category:Articles with BNF identifiers: Category:Articles with BNF identifiers (id: ??, ns: 14)
Category:Articles with GND identifiers: Category:Articles with GND identifiers (id: ??, ns: 14)
Category:Articles with LCCN identifiers: Category:Articles with LCCN identifiers (id: ??, ns: 14)
Category:Articles with NDL identifiers: Category:Articles with NDL identifiers (id: ??, ns: 14)
Category:Articles with short description: Category:Articles with short description (id: ??, ns: 14)
Category:Articles with unsourced statements from December 2009: Category:Articles with unsourced statements from December 2009 (id: ??, ns: 14)
Category:Articles with 

# Collecting (labelled) data
Build a data frame for spices:
In an earlier exercise we built a spice recommender. There, the spice list input was made manually. Let's build it now using Wikipedia.
Link to "Spices" category page: https://en.wikipedia.org/wiki/Category:Spices

In [None]:
# Get members for a category together with a short description
def members_collector(category):
      mdict = {}
      categorymembers = category.categorymembers
      for c in categorymembers.values():
        if c.ns == 0:                                                            # Exclude categories within the category
          mdict[c.title] = c.summary
      return mdict

In [None]:
# Let's collect spice
spices = wiki.page("Category:Spices")
spices_dict = members_collector(spices)

In [None]:
# Create dataframe
spices_df = pd.DataFrame([spices_dict.keys(), spices_dict.values()]).T
spices_df.columns = ['Name', 'Description']
spices_df = spices_df.style.set_properties(**{'text-align': 'left'}).set_table_styles([dict(selector='th', props=[('text-align', 'left')])])
spices_df

Unnamed: 0,Name,Description
0,Spice,"A spice is a seed, fruit, root, bark, or other plant substance primarily used for flavoring or coloring food. Spices are distinguished from herbs, which are the leaves, flowers, or stems of plants used for flavoring or as a garnish. Spices are sometimes used in medicine, religious rituals, cosmetics or perfume production."
1,Acorus calamus,"Acorus calamus (also called sweet flag, sway or muskrat root, among many common names) is a species of flowering plant with psychoactive chemicals. It is a tall wetland monocot of the family Acoraceae, in the genus Acorus. Although used in traditional medicine over centuries to treat digestive disorders and pain, there is no clinical evidence for its safety or efficacy – and ingested calamus may be toxic – leading to its commercial ban in the United States."
2,Adobo,"Adobo or adobar (Spanish: marinade, sauce, or seasoning) is the immersion of raw food in a stock (or sauce) composed variously of paprika, oregano, salt, garlic, and vinegar to preserve and enhance its flavor. The Portuguese variant is known as Carne de vinha d'alhos. The practice, native to Iberia (Spanish cuisine and Portuguese cuisine), was widely adopted in Latin America, as well as Spanish and Portuguese colonies in Africa and Asia. In the Philippines, the name adobo was given by colonial-era Spaniards on the islands to a different indigenous cooking method that also uses vinegar. Although similar, this developed independently of Spanish influence."
3,Aframomum corrorima,"Aframomum corrorima is a species of flowering plant in the ginger family, Zingiberaceae. It's a herbaceous perennial that produces leafy stems 1–2 meters tall from rhizomatous roots. The alternately-arranged leaves are dark green, 10–30 cm long and 2.5–6 cm across, elliptical to oblong in shape. Pink flowers are borne near the ground and give way to red, fleshy fruits containing shiny brown seeds, which are typically 3–5 mm in diameter.The spice, known as Ethiopian cardamom, false cardamom, or korarima, is obtained from the plant's seeds (usually dried), and is extensively used in Ethiopian and Eritrean cuisine. It is an ingredient in berbere, mitmita, awaze, and other spice mixtures, and is also used to flavor coffee. Its flavor is comparable to that of the closely related Elettaria cardamomum or green cardamom. In Ethiopian herbal medicine, the seeds are used as a tonic, carminative, and laxative.The plant is native to Tanzania, western Ethiopia (in the vicinity of Lake Tana and Gelemso), southwestern Sudan, western Uganda. It is cultivated in both Ethiopia and Eritrea, although the fruits are typically harvested from wild plants. The dried fruits are widely sold in markets and are relatively expensive, while fresh fruits are sold in production areas.In dried seeds and pods, the major oil components are 1,8-cineole (eucalyptol) and (E)-nerolidol. In fresh seeds, the major component of the essential oil is 1,8-cineole, followed by sabinene and geraniol. In fresh pods, the major oil constituents are γ-terpinene, β-pinene, α-phellandrene, 1,8-cineole, and p-cymene."
4,Aframomum melegueta,"Aframomum melegueta is a species in the ginger family, Zingiberaceae, and closely related to cardamom. Its seeds are used as a spice (ground or whole); it imparts a pungent, black-pepper-like flavor with hints of citrus. It is commonly known as grains of paradise, melegueta pepper, alligator pepper, Guinea grains, ossame, or fom wisa. The term Guinea pepper has also been used, but is most often applied to Xylopia aethiopica (grains of Selim). It is native to West Africa, which is sometimes named the Pepper Coast (or Grain Coast) because of this commodity. It is also an important cash crop in the Basketo district of southern Ethiopia."
5,Ajwain,"Ajwain, ajowan (), or Trachyspermum ammi—also known as ajowan caraway, thymol seeds, bishop's weed, or carom—is an annual herb in the family Apiaceae. Both the leaves and the seed‑like fruit (often mistakenly called seeds) of the plant are consumed by humans. The name ""bishop's weed"" also is a common name for other plants. The ""seed"" (i.e., the fruit) is often confused with lovage ""seed""."
6,Aleppo pepper,"The Aleppo pepper (Arabic: فلفل حلبي‎ / ALA-LC: fulful Ḥalabī) ,Turkish: Halep biberi Pronounciation:(halep bibeɾi) is a variety of Capsicum annuum used as a spice, particularly in Middle Eastern, and Mediterranean cuisine. Also known as the Halaby pepper, it starts as pods, which ripen to a burgundy color, and then are semi-dried, de-seeded, then crushed or coarsely ground. The pepper flakes are known in Turkey as pul biber (pul = flake, biber = pepper), and in Armenia as Halebi bibar. In Turkey, pul biber is the third most commonly used spice, after salt and black pepper. In Arabic, the pepper is named after Aleppo, a long-inhabited city along the Silk Road in northern Syria, and is grown in Syria and Turkey. Although a common condiment, its use in Europe and the United States outside Armenian, Syrian and Turkish immigrant communities was rare until the 20th century, with one source (Los Angeles magazine) dating its rise in use among the broader U.S. population according to the 1994 publication of The Cooking of the Eastern Mediterranean by Paula Wolfert."
7,Alleppey Green Cardamom,"Alleppey Green Cardamom is a green variety of kiln dried Cardamom capsule grown in Cardamom Hills of Idukki district in Kerala. This cardamom variety is called Alleppey cardamom not because it's grown in Alleppey, Kerala rather it's because Alleppey was the main depot through which this cardamom was processed in erstwhile Travancore. [1] In 18th century kings of Travancore brought state monopoly over trade and export of cardamom. The understanding between kings with British Raj, Travancore–Dutch Wars,consolidation of power under Marthanda Varma with British aid etc. were the reasons why such monopoly had come. This made all produce of cardamom in Travancore state to be sold solely to the state depot at Alleppey. Back then Alappuzha was the major port in Travancore. This led to development of cardamom sorting and processing in Alleppey which resulted the naming of the most high quality cardamom produce from the region as Alleppey Green Cardamom. Post rise of Kochi market and port, Independence of India, Travancore–Cochin merger the relevance and existence of a port in Alleppey became a thing of past. Today the development of better processing facilities, abundance of raw material, skilled cheap labour and allied infrastructure etc. in Idukki had resulted in complete shifting of the processing industry out of Alappuzha. Alleppey cardamom has a unique quality. It is kiln dried and has superior sensory quality, uniform shade of green and a three cornered ribbed appearance."
8,Alligator pepper,"Alligator pepper (also known as mbongo spice or hepper pepper) is a West African spice made from the seeds and seed pods of Aframomum danielli, A. citratum or A. exscapum. It is a close relative of grains of paradise, obtained from the closely related species, Aframomum melegueta or ""grains of paradise"". Unlike grains of paradise, which are generally sold as only the seeds of the plant, alligator pepper is sold as the entire pod containing the seeds (in the same manner to another close relative, black cardamom). The plants which provide alligator pepper are herbaceous perennial flowering plants of the ginger family (Zingiberaceae), native to swampy habitats along the West African coast. Once the pod is open and the seeds are revealed, the reason for this spice's common English name becomes apparent as the seeds have a papery skin enclosing them and the bumps of the seeds within this skin is reminiscent of an alligator's back. As mbongo spice, the seeds of alligator pepper are often sold as the grains isolated from the pod and with the outer skin removed. Mbongo spice is most commonly either A. danielli or A. citratum, and has a more floral aroma than A. exscapum (which is the commonest source of the entire pod). It is a common ingredient in West African cuisine, where it imparts both pungency and a spicy aroma to soups and stews."
9,Allspice,"Allspice, also known as Jamaica pepper, myrtle pepper, pimenta, or pimento, is the dried unripe berry of Pimenta dioica, a midcanopy tree native to the Greater Antilles, southern Mexico, and Central America, now cultivated in many warm parts of the world. The name ""allspice"" was coined as early as 1621 by the English, who valued it as a spice that combined the flavours of cinnamon, nutmeg, and clove.Several unrelated fragrant shrubs are called ""Carolina allspice"" (Calycanthus floridus), ""Japanese allspice"" (Chimonanthus praecox), or ""wild allspice"" (Lindera benzoin)."


In [None]:
# Let's check favourite spice(s)
spices_df.data.query("Name=='Poppy seed'")

Unnamed: 0,Name,Description
126,Poppy seed,"Poppy seed is an oilseed obtained from the opium poppy (Papaver somniferum). The tiny, kidney-shaped seeds have been harvested from dried seed pods by various civilizations for thousands of years. It is still widely used in many countries, especially in Central Europe and South Asia, where it is legally grown and sold in shops. The seeds are used whole or ground into meal as an ingredient in many foods – especially in pastry and bread – and they are pressed to yield poppyseed oil."


# ✨ Your turn✨
Well, herbs are not spices according to Wikipedia. But I really think we use them similarly when cooking. Can you now extract all herbs too to make our (imagined) recommender better?

In [None]:
# Let's collect herbs
# Add your code here

In [None]:
# Let's check out the herbs' dataframe
# Add your code here

In [None]:
# Let's check favourite herb(s)
# Add your code here

# References

- https://phpenthusiast.com/blog/what-is-rest-api
- https://github.com/siznax/wptools/wiki/Data-captured
- https://en.wikipedia.org/w/api.php
- https://wikipedia.readthedocs.io/en/latest/code.html