## Scraping Wine Prices from Jumia Kenya

Jumia.co.ke is one of the largest online retailers in Kenya. On their list of consortment of goods is wine. As an enthusiast keen on comparing prices, below is the code I used to scrape details on more than 450 wine products on the platform.

In [1]:
# Importing the packages/libraries I will need

from bs4 import BeautifulSoup # for parsing the page
import requests # for downloading the page
import pandas as pd # for creating and rendering the dataframe

In [2]:
url = 'https://www.jumia.co.ke/wine/?page=1#catalog-listing' #.format(i)
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')    

In [3]:
df = pd.DataFrame(columns = ['Product', 'Price', 'Old Price', 'Percentage Drop', 'Stars', 'Reviews'])
df

Unnamed: 0,Product,Price,Old Price,Percentage Drop,Stars,Reviews


In [4]:
div = soup.find('div', class_ = "-paxs row _no-g _4cl-3cm-shs")
wine_elems = div.find_all('article', class_ = "prd _fb col c-prd")

for elem in wine_elems:
    name = elem.find('h3', class_ = 'name').get_text()
    price = elem.find('div', class_ = 'prc').get_text().replace('KSh ', '')
    try: 
        old_price = elem.find('div', class_ = 'old').get_text().replace('KSh ', '')
    except:
        old_price = ''
    try:
        price_drop = elem.find('div', class_ = "tag _dsct _sm").get_text().replace('%', '')
        
    except:
        price_drop = ''
    try:
        stars = elem.find('div', class_ = "rev").get_text('#').split('#')[0]
        stars = stars[0]
    except:
        stars = ''
    try:
        revs = elem.find('div', class_ = "rev").get_text('#').split('#')[1]
        revs = revs.strip('()').strip(')')
    except:
        revs = ''
                                                          
        
    data = {
        'Product': [name], 
        'Price': [price], 
        'Old Price': [old_price], 
        'Percentage Drop': [price_drop], 
        'Stars': [stars], 
        'Reviews': [revs]
    }
    
    temp_df = pd.DataFrame(data)
    df = pd.concat([df, temp_df], ignore_index = True)

df

Unnamed: 0,Product,Price,Old Price,Percentage Drop,Stars,Reviews
0,Robertson Winery Natural Sweet Red Wine 750ml,1200,1500,20.0,4.0,2.0
1,Four Cousins Red Wine - 750 ML,1500,,,4.0,76.0
2,Cellar Cask Johannisberger Red Natural sweet...,950,,,4.0,22.0
3,Cellar Cask Jhb Red Wine 750ml Cellar Cask,868,,,4.0,10.0
4,4th Street Natural Sweet Red Wine - 5L,3750,3950,5.0,5.0,1.0
5,Cellar Cask White Wine - 750 Ml,950,1500,37.0,5.0,1.0
6,Caprice Red Sweet Tetra Pack 1l,680,,,4.0,25.0
7,Robertson Winery Natural Sweet White Wine,1340,1700,21.0,,
8,Carrefour Delush Sweet Red 750 Ml,833,,,5.0,1.0
9,Pierre Marcel Sweet Red 750ml,1150,,,5.0,1.0


In [5]:
# Modify code to mine data from all 12 pages instead of just 1

for i in range(1, 13):
    url = 'https://www.jumia.co.ke/wine/?page={}#catalog-listing'.format(i)
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'html.parser')

    div = soup.find('div', class_ = "-paxs row _no-g _4cl-3cm-shs")
    try:
        wine_elems = div.find_all('article', class_ = "prd _fb col c-prd")
    except:
        wine_elems = ''
    #print(wine_elems)
    for elem in wine_elems:
        name = elem.find('h3', class_ = 'name').get_text()
        price = elem.find('div', class_ = 'prc').get_text().replace('KSh ', '')
        try: 
            old_price = elem.find('div', class_ = 'old').get_text().replace('KSh ', '')
        except:
            old_price = ''
        try:
            price_drop = elem.find('div', class_ = "tag _dsct _sm").get_text().replace('%', '')

        except:
            price_drop = ''
        try:
            stars = elem.find('div', class_ = "rev").get_text('#').split('#')[0]
            stars = stars[0]
        except:
            stars = ''
        try:
            revs = elem.find('div', class_ = "rev").get_text('#').split('#')[1]
            revs = revs.strip('()').strip(')')
        except:
            revs = ''

        #print(name, price, old_price, stars, revs)
        
        data = {
            'Product': [name], 
            'Price': [price], 
            'Old Price': [old_price], 
            'Percentage Drop': [price_drop], 
            'Stars': [stars], 
            'Reviews': [revs]
        }

        temp_df = pd.DataFrame(data)
        df = pd.concat([df, temp_df], ignore_index = True)

df

Unnamed: 0,Product,Price,Old Price,Percentage Drop,Stars,Reviews
0,Robertson Winery Natural Sweet Red Wine 750ml,1200,1500,20,4,2
1,Four Cousins Red Wine - 750 ML,1500,,,4,76
2,Cellar Cask Johannisberger Red Natural sweet...,950,,,4,22
3,Cellar Cask Jhb Red Wine 750ml Cellar Cask,868,,,4,10
4,4th Street Natural Sweet Red Wine - 5L,3750,3950,5,5,1
...,...,...,...,...,...,...
409,Generic 100% New Indoor Room LCD Electronic Te...,833,1666,50,,
410,Generic 100% New ANENE A830L Digital Voltmeter...,1068,2136,50,,
411,Drostdy Hof Claret Select Medium Bodied Red Wi...,3800,7600,50,5,1
412,Martini Bianco White Wine 750ml,2294,2500,8,,
