# Setup
Source: https://pypi.org/project/Wikipedia-API/


In [18]:
!pip install wikipedia-api

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [1]:
import wikipediaapi
import pandas as pd

pd.set_option('display.max_columns', None)
#pd.set_option('display.expand_frame_repr', False)
pd.set_option('max_colwidth', None)

# Wikipedia

Let's go to Wikipedia and find a page that you would like to further understand. I will pick "Spice" page for this example: https://en.wikipedia.org/wiki/Spice

![picture](https://drive.google.com/uc?id=1UeGzVI9xT6kQ6Rr9HvDYoCz0-TpZYrWI)



In [2]:
# Define language and page
wiki = wikipediaapi.Wikipedia('en')
page_spice = wiki.page('Spice')

In [3]:
# Get title
page_spice.title

'Spice'

In [4]:
# Get summary: short description
page_spice.summary

"A spice is a seed, fruit, root, bark, or other plant substance primarily used for flavoring or coloring food. Spices are distinguished from herbs, which are the leaves, flowers, or stems of plants used for flavoring or as a garnish. Spices are sometimes used in medicine, religious rituals, cosmetics or perfume production. For example, vanilla is commonly used as an ingredient in fragrance manufacturing.A spice may be available in several forms: fresh, whole dried, or pre-ground dried. Generally, spices are dried. Spices may be ground into a powder for convenience. A whole dried spice has the longest shelf life, so it can be purchased and stored in larger amounts, making it cheaper on a per-serving basis. A fresh spice, such as ginger, is usually more flavorful than its dried form, but fresh spices are more expensive and have a much shorter shelf life. Some spices are not always available either fresh or whole, for example turmeric, and often must be purchased in ground form. Small see

In [5]:
print(page_spice.text)


A spice is a seed, fruit, root, bark, or other plant substance primarily used for flavoring or coloring food. Spices are distinguished from herbs, which are the leaves, flowers, or stems of plants used for flavoring or as a garnish. Spices are sometimes used in medicine, religious rituals, cosmetics or perfume production. For example, vanilla is commonly used as an ingredient in fragrance manufacturing.A spice may be available in several forms: fresh, whole dried, or pre-ground dried. Generally, spices are dried. Spices may be ground into a powder for convenience. A whole dried spice has the longest shelf life, so it can be purchased and stored in larger amounts, making it cheaper on a per-serving basis. A fresh spice, such as ginger, is usually more flavorful than its dried form, but fresh spices are more expensive and have a much shorter shelf life. Some spices are not always available either fresh or whole, for example turmeric, and often must be purchased in ground form. Small seed

In [6]:
print(page_spice.sections)

[Section: Etymology (1):
The word spice originated in Middle English which came from the Old French words espece, espis(c)e, and espis(c)e. According to the Middle English Dictionary, the Old French words came from Anglo-French spece; according to Merriam Webster, the Old-French words came from Anglo-French espece, and espis. Both publications agree that the Anglo-French words derived from Latin species. Middle English spice had its first known use as a noun in the 13th century.
Subsections (0):
, Section: History (1):

Subsections (3):
Section: Early history (2):
The spice trade developed throughout the Indian subcontinent by at earliest 2000 BCE with cinnamon and black pepper, and in East Asia with herbs and pepper. The Egyptians used herbs for mummification and their demand for exotic spices and herbs helped stimulate world trade. By 1000 BCE, medical systems based upon herbs could be found in China, Korea, and India. Early uses were connected with magic, medicine, religion, traditi

In [7]:
def print_sections(sections, level=0):
        for s in sections:
            print(level * "\t*", (s.title + ': '), s.text[0:100])
            print_sections(s.sections, level + 1)


In [8]:
print_sections(page_spice.sections, level=0)

 Etymology:  The word spice originated in Middle English which came from the Old French words espece, espis(c)e, 
 History:  
	* Early history:  The spice trade developed throughout the Indian subcontinent by at earliest 2000 BCE with cinnamon a
	* Middle Ages:  Spices were among the most demanded and expensive products available in Europe in the Middle Ages,[5
	* Early modern period:  Spain and Portugal were interested in seeking new routes to trade in spices and other valuable produ
 Function:  Spices are primarily used as food flavoring or to create variety. They are also used to perfume cosm
	* Preservative claim:  It is often claimed that spices were used either as food preservatives or to mask the taste of spoil
 Classification and types:  
	* Culinary herbs and spices:  
	* Botanical basis:  
	* Common spice mixtures:  
 Handling:  For ground spices, to grind a whole spice, the classic tool is mortar and pestle. Less labor-intensi
	* Salmonella contamination:  A study by the Foo

In [9]:
# Get categories for a page
def page_categories(page):
        categories = page.categories
        for title in sorted(categories.keys()):
            print("%s: %s" % (title, categories[title]))


print("Categories")
page_categories(page_spice)

Categories
Category:All articles with unsourced statements: Category:All articles with unsourced statements (id: ??, ns: 14)
Category:Articles containing Latin-language text: Category:Articles containing Latin-language text (id: ??, ns: 14)
Category:Articles containing Middle English (1100-1500)-language text: Category:Articles containing Middle English (1100-1500)-language text (id: ??, ns: 14)
Category:Articles containing Old French (842-ca. 1400)-language text: Category:Articles containing Old French (842-ca. 1400)-language text (id: ??, ns: 14)
Category:Articles with BNF identifiers: Category:Articles with BNF identifiers (id: ??, ns: 14)
Category:Articles with GND identifiers: Category:Articles with GND identifiers (id: ??, ns: 14)
Category:Articles with HDS identifiers: Category:Articles with HDS identifiers (id: ??, ns: 14)
Category:Articles with J9U identifiers: Category:Articles with J9U identifiers (id: ??, ns: 14)
Category:Articles with LCCN identifiers: Category:Articles wi

# Collecting (labelled) data
Build a data frame for spices:
In an earlier exercise we built a spice recommender. There, the spice list input was made manually. Let's build it now using Wikipedia.
Link to "Spices" category page: https://en.wikipedia.org/wiki/Category:Spices

In [10]:
# Get members for a category together with a short description
def members_collector(category):
      mdict = {}
      categorymembers = category.categorymembers
      for c in categorymembers.values():
        if c.ns == 0:                                                            # Exclude categories within the category
          mdict[c.title] = c.summary
      return mdict

In [14]:
# Let's collect spice
spices = wiki.page("Category:Spices")
spices_dict = members_collector(spices)
spices_dict

{'Spice': "A spice is a seed, fruit, root, bark, or other plant substance primarily used for flavoring or coloring food. Spices are distinguished from herbs, which are the leaves, flowers, or stems of plants used for flavoring or as a garnish. Spices are sometimes used in medicine, religious rituals, cosmetics or perfume production. For example, vanilla is commonly used as an ingredient in fragrance manufacturing.A spice may be available in several forms: fresh, whole dried, or pre-ground dried. Generally, spices are dried. Spices may be ground into a powder for convenience. A whole dried spice has the longest shelf life, so it can be purchased and stored in larger amounts, making it cheaper on a per-serving basis. A fresh spice, such as ginger, is usually more flavorful than its dried form, but fresh spices are more expensive and have a much shorter shelf life. Some spices are not always available either fresh or whole, for example turmeric, and often must be purchased in ground form.

In [15]:
spices_dict

{'Spice': "A spice is a seed, fruit, root, bark, or other plant substance primarily used for flavoring or coloring food. Spices are distinguished from herbs, which are the leaves, flowers, or stems of plants used for flavoring or as a garnish. Spices are sometimes used in medicine, religious rituals, cosmetics or perfume production. For example, vanilla is commonly used as an ingredient in fragrance manufacturing.A spice may be available in several forms: fresh, whole dried, or pre-ground dried. Generally, spices are dried. Spices may be ground into a powder for convenience. A whole dried spice has the longest shelf life, so it can be purchased and stored in larger amounts, making it cheaper on a per-serving basis. A fresh spice, such as ginger, is usually more flavorful than its dried form, but fresh spices are more expensive and have a much shorter shelf life. Some spices are not always available either fresh or whole, for example turmeric, and often must be purchased in ground form.

In [12]:
spices.categorymembers

{'Spice': Spice (id: 26897, ns: 0),
 'Acorus calamus': Acorus calamus (id: 371540, ns: 0),
 'Adobo': Adobo (id: 1242340, ns: 0),
 'Aframomum corrorima': Aframomum corrorima (id: 25998011, ns: 0),
 'Aframomum melegueta': Aframomum melegueta (id: 294127, ns: 0),
 'Ajwain': Ajwain (id: 346860, ns: 0),
 'Aleppo pepper': Aleppo pepper (id: 8250188, ns: 0),
 'Alleppey Green Cardamom': Alleppey Green Cardamom (id: 38291751, ns: 0),
 'Alligator pepper': Alligator pepper (id: 12395897, ns: 0),
 'Allspice': Allspice (id: 194873, ns: 0),
 'Alpinia caerulea': Alpinia caerulea (id: 18414662, ns: 0),
 'Alpinia galanga': Alpinia galanga (id: 5863248, ns: 0),
 'Alpinia nigra': Alpinia nigra (id: 38529280, ns: 0),
 'Alpinia officinarum': Alpinia officinarum (id: 9167550, ns: 0),
 'Amchoor': Amchoor (id: 3767418, ns: 0),
 'Amomum ovoideum': Amomum ovoideum (id: 52307157, ns: 0),
 'Angostura bitters': Angostura bitters (id: 1919281, ns: 0),
 'Anise': Anise (id: 68674, ns: 0),
 'Annatto': Annatto (id: 304

In [35]:
spices_dict['White mustard']

'White mustard (Sinapis alba) is an annual plant of the family Brassicaceae. It is sometimes also referred to as Brassica alba or B. hirta. Grown for its seeds, used to make the condiment mustard, as fodder crop, or as a green manure, it is now widespread worldwide, although it probably originated in the Mediterranean region.'

In [43]:
# Create dataframe
spices_df = pd.DataFrame([spices_dict.keys(), spices_dict.values()]).T
spices_df.columns = ['Name', 'Description']
# spices_df = spices_df.style.set_properties(**{'text-align': 'left'}).set_table_styles([dict(selector='th', props=[('text-align', 'left')])])
# spices_df.to_csv('data/spices.csv')
spices_df

Unnamed: 0,Name,Description
0,Spice,"A spice is a seed, fruit, root, bark, or other plant substance primarily used for flavoring or coloring food. Spices are distinguished from herbs, which are the leaves, flowers, or stems of plants used for flavoring or as a garnish. Spices are sometimes used in medicine, religious rituals, cosmetics or perfume production. For example, vanilla is commonly used as an ingredient in fragrance manufacturing.A spice may be available in several forms: fresh, whole dried, or pre-ground dried. Generally, spices are dried. Spices may be ground into a powder for convenience. A whole dried spice has the longest shelf life, so it can be purchased and stored in larger amounts, making it cheaper on a per-serving basis. A fresh spice, such as ginger, is usually more flavorful than its dried form, but fresh spices are more expensive and have a much shorter shelf life. Some spices are not always available either fresh or whole, for example turmeric, and often must be purchased in ground form. Small seeds, such as fennel and mustard seeds, are often used both whole and in powder form.\nAlthough health benefits are often claimed for spices, there is currently not enough research conducted to prove these benefits.India contributes to 75% of global spice production. This is reflected culturally through their cuisine; historically, the spice trade developed throughout the Indian subcontinent, as well as in East Asia and the Middle East. Europe's demand for spices was among the economic and cultural factors that encouraged exploration in the early modern period."
1,Acorus calamus,"Acorus calamus (also called sweet flag, sway or muskrat root, among many common names) is a species of flowering plant with psychoactive chemicals. It is a tall wetland monocot of the family Acoraceae, in the genus Acorus. Although used in traditional medicine over centuries to treat digestive disorders and pain, there is no clinical evidence for its safety or efficacy – and ingested calamus may be toxic – leading to its commercial ban in the United States."
2,Adobo,"Adobo or adobar (Spanish: marinade, sauce, or seasoning) is the immersion of cooked food in a stock (or sauce) composed variously of paprika, oregano, salt, garlic, and vinegar to preserve and enhance its flavor. The Portuguese variant is known as Carne de vinha d'alhos. The practice, native to Iberia (Spanish cuisine and Portuguese cuisine), was widely adopted in Latin America, as well as Spanish and Portuguese colonies in Africa and Asia.\nIn the Philippines, the name adobo was given by colonial-era Spaniards on the islands to a different indigenous cooking method that also uses vinegar. Although similar, this developed independently of Spanish influence."
3,Aframomum corrorima,"Aframomum corrorima is a species of flowering plant in the ginger family, Zingiberaceae. It's a herbaceous perennial that produces leafy stems 1–2 meters tall from rhizomatous roots. The alternately-arranged leaves are dark green, 10–30 cm long and 2.5–6 cm across, elliptical to oblong in shape. Pink flowers are borne near the ground and give way to red, fleshy fruits containing shiny brown seeds, which are typically 3–5 mm in diameter.The spice, known as Ethiopian cardamom, false cardamom, or korarima, is obtained from the plant's seeds (usually dried), and is extensively used in Ethiopian and Eritrean cuisine. It is an ingredient in berbere, mitmita, awaze, and other spice mixtures, and is also used to flavor coffee. Its flavor is comparable to that of the closely related Elettaria cardamomum or green cardamom. In Ethiopian herbal medicine, the seeds are used as a tonic, carminative, and laxative.The plant is native to Tanzania, western Ethiopia (in the vicinity of Lake Tana and Gelemso), southwestern Sudan, western Uganda. It is cultivated in both Ethiopia and Eritrea, although the fruits are typically harvested from wild plants. The dried fruits are widely sold in markets and are relatively expensive, while fresh fruits are sold in production areas.In dried seeds and pods, the major oil components are 1,8-cineole (eucalyptol) and (E)-nerolidol. In fresh seeds, the major component of the essential oil is 1,8-cineole, followed by sabinene and geraniol. In fresh pods, the major oil constituents are γ-terpinene, β-pinene, α-phellandrene, 1,8-cineole, and p-cymene."
4,Aframomum melegueta,"Aframomum melegueta is a species in the ginger family, Zingiberaceae, and closely related to cardamom. Its seeds are used as a spice (ground or whole); it imparts a pungent, black-pepper-like flavor with hints of citrus. It is commonly known as grains of paradise, melegueta pepper, Guinea grains, ossame, or fom wisa, and is confused with alligator pepper. The term Guinea pepper has also been used, but is most often applied to Xylopia aethiopica (grains of Selim).\nIt is native to West Africa, which is sometimes named the Pepper Coast (or Grain Coast) because of this commodity. It is also an important cash crop in the Basketo district of southern Ethiopia."
...,...,...
168,Xylopia aethiopica,"Xylopia aethiopica is an evergreen, aromatic tree, of the Annonaceae family that can grow up to 20m high. It is a native to the lowland rainforest and moist fringe forests in the savanna zones of Africa.\nThe dried fruits of X. aethiopica (grains of Selim) are used as a spice and an herbal medicine."
169,Zanthoxylum acanthopodium,"Zanthoxylum acanthopodium, or andaliman, is a species of flowering plant in the family Rutaceae. Its range includes southern western China (Guangxi, Guizhou, Sichuan, Tibet, and Yunnan), Bangladesh, Bhutan, northern India and northeastern India (Arunachal Pradesh, Assam, Manipur, Meghalaya, Mizoram, Nagaland, Sikkim, Uttar Pradesh, and West Bengal), Nepal, Laos, Burma, northern Thailand, Vietnam, Indonesia (northern Sumatran highlands), and Peninsular Malaysia.Much like the closely related Sichuan pepper (Z. piperitum), the seed pericarps are used as spices in cooking and have a similar tongue-numbing characteristic. However, in cooking, the flavour of andaliman has lemon-like notes (similar to those of lemon-grass) as well as a hint of the aromatic pandan leaf.\n\n\n== References =="
170,Zanthoxylum armatum,"Zanthoxylum armatum, also called winged prickly ash or rattan pepper in English, is a species of plant in the family Rutaceae. It is an aromatic, deciduous, spiny shrub growing to 3.5 metres (11 ft) in height, endemic from Pakistan across to Southeast Asia and up to Korea and Japan. It is one of the sources of the spice Sichuan pepper, and also used in folk medicine, essential oil production and as an ornamental garden plant."
171,Zanthoxylum piperitum,"Zanthoxylum piperitum, also known as Japanese pepper or Japanese prickly-ash is a deciduous aromatic spiny shrub or small tree of the citrus and rue family Rutaceae, native to Japan and Korea.\nIt is called sanshō (山椒) in Japan and chopi (초피) in Korea. Both the leaves and fruits (peppercorns) are used as an aromatic and flavoring in these countries. It is closely related to the Chinese Szechuan peppers, which come from plants of the same genus."


In [26]:
type(spices_df)

pandas.io.formats.style.Styler

In [28]:
# Let's check favourite spice(s)
spices_df.data.query("Name=='Jakhya'")

Unnamed: 0,Name,Description
87,Jakhya,"Jakhya (Hindi: जख्या; Urdu: زخیا) (also called dog mustard or wild mustard) is the seed of the Cleome viscosa plant used for tempering on culinary dishes. It is mostly grown and consumed in Uttarakhand and in the Terai regions of India and Nepal.The seeds are dark brown in color, and crackles on being heated in oil. It is used in the Garhwali and Kumaoni styles of cuisines.\n\n\n== References =="


# ✨ Try at home✨
Well, herbs are not spices according to Wikipedia. But I really think we use them similarly when cooking. Can you now extract all herbs too to make our (imagined) recommender better?

In [None]:
# Let's collect herbs
# Add your code here

In [None]:
# Let's check out the herbs' dataframe
# Add your code here

In [None]:
# Let's check favourite herb(s)
# Add your code here

# References

- https://phpenthusiast.com/blog/what-is-rest-api
- https://github.com/siznax/wptools/wiki/Data-captured
- https://en.wikipedia.org/w/api.php
- https://wikipedia.readthedocs.io/en/latest/code.html