# Web scraping project of Jumia products

## Importing Libraries
For this web scraping project, the libries used would be
* Requests to send a get request to Jumia site to get the html code of the particular page
* Beautiful soup to parse the page gotten.
* Pandas to store the data scraped
* Array for imputation of array as an argument of a function

In [16]:
import requests
import pandas as pd
from bs4 import BeautifulSoup
from array import *

First order of business in this scraping project was to get the major category of products in Jumia as shown in the code below

In [5]:
def get_category():
    topic_url= 'https://jumia.com'
    try:
        response=requests.get(topic_url)
    except:
        pass
    if response:
        print(response.status_code)
        page_content=response.text
        doc = BeautifulSoup(page_content, 'html.parser')
        category_links=doc.find_all('a',{'class':'itm', 'role':"menuitem"})
        category_hrefs=[]
        categories=[]
        if category_links:
            for link in category_links:
                try:
                    if link["href"].startswith('https'):
                        category_hrefs.append(link["href"])
                    else:
                        category_hrefs.append('https://jumia.com.ng' + link["href"])
                    categories.append(link.text)
                except:
                    pass
        else:
            pass
    return pd.DataFrame({'category_href':category_hrefs,'categories':categories})
    

In [6]:
category=get_category()

200


In [7]:
category

Unnamed: 0,category_href,categories
0,https://www.jumia.com.ng/groceries/,Supermarket
1,https://jumia.com.ng/health-beauty/,Health & Beauty
2,https://jumia.com.ng/home-office/,Home & Office
3,https://jumia.com.ng/phones-tablets/,Phones & Tablets
4,https://jumia.com.ng/computing/,Computing
5,https://jumia.com.ng/electronics/,Electronics
6,https://jumia.com.ng/category-fashion-by-jumia/,Fashion
7,https://jumia.com.ng/baby-products/,Baby Products
8,https://jumia.com.ng/video-games/,Gaming
9,https://jumia.com.ng/sporting-goods/,Sporting Goods


#### Next was to get the list of products in a specific category.The function below takes the link of the category as an argument. A while loop was implemented so as to scrape each page of the category. Jumia's pagination only goes as far as 50 pages and 40 items per page, so the highest number of products in a category that can be gotten is around 2000 products.

In [8]:
def category_lists(category_href):
    category_page= category_href
    list_hrefs=[]
    prod_name= []
    price=[]
    rating=[]
    rating_count= []
    while True:
        cat_res=requests.get(category_page)
        print(cat_res.status_code)
        page_content=cat_res.text
        cat_doc= BeautifulSoup(page_content,'html.parser')
        cat_list=cat_doc.find_all('div',{'class': 'info'})
        cat_list=[lists.parent for lists in cat_list]
        
        for lists in cat_list:
            try:
                href=lists['href']
                if href.startswith('https'):
                    list_hrefs.append(href)
                else:
                    list_hrefs.append('https://jumia.com.ng' + href)
                prod_tag=lists.find('h3',{'class':'name'})
                if prod_tag:
                    prod_name.append(prod_tag.text)
                else:
                    prod_name.append('no name')
                stars_tag=lists.find('div',{'class':'rev'})
                if stars_tag:
                    rating.append(stars_tag.text.split('(')[0])
                    rating_count.append(stars_tag.text.split('(')[1])
                else:
                    rating.append('nil_rating')
                    rating_count.append('Nil count')
                price_tag=lists.find('div',{'class':'prc'})
                if price_tag:
                    price.append(price_tag.text)
                else:
                    price.append('no price')
            except:
                pass
        next_page=cat_doc.find('a', {'aria-label': 'Next Page'})
        header='https://jumia.com.ng'
        if next_page == None:
            break
        else: 
            category_page= header + next_page['href']
    
    return pd.DataFrame({'product links':list_hrefs,'product name':prod_name,
                         'price':price, 'product_rating':rating,'rating count':rating_count})

In [10]:
health_category_link=category['category_href'][1]
health_category_link

'https://jumia.com.ng/health-beauty/'

In [11]:
health_category_list=category_lists(health_category_link)
health_category_list

200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200


Unnamed: 0,product links,product name,price,product_rating,rating count
0,https://jumia.com.ng/andrea-fast-powerful-oil-...,Andrea Fast Powerful Oil Hair And Quick Beard ...,"₦ 1,000",nil_rating,Nil count
1,https://jumia.com.ng/andrea-fast-powerful-oil-...,Andrea Fast Powerful Oil Hair And Quick Beard ...,"₦ 1,000",2 out of 5,2)
2,https://jumia.com.ng/uv-aqua-rich-watery-essen...,Biore UV Aqua Rich Watery Essence Sunscreen SP...,"₦ 2,800",4.5 out of 5,247)
3,https://jumia.com.ng/braided-ghana-weaving-hai...,Braided Ghana Weaving Hair Wig,"₦ 10,350",nil_rating,Nil count
4,https://jumia.com.ng/narciso-rodriguez-for-him...,Narciso Rodriguez For Him Men Perfume 100ML EDT,"₦ 51,500",nil_rating,Nil count
...,...,...,...,...,...
1995,https://jumia.com.ng/generic-buttercup-setting...,Buttercup Setting Powder/Finishing Powder For ...,"₦ 2,999",4.2 out of 5,31)
1996,https://jumia.com.ng/acne-stress-control-toner...,Neutrogena Acne Stress Control Toner,"₦ 7,600",4.3 out of 5,11)
1997,https://jumia.com.ng/beauty-formulas-tea-tree-...,Beauty Formulas Tea Tree Cleansing Wipes,"₦ 2,699",4.5 out of 5,2)
1998,https://jumia.com.ng/kirkland-signature-minoxi...,Kirkland Signature Minoxidil 5% Hair & Beard G...,"₦ 7,300",3.9 out of 5,13)


#### The link from category to list of products in a category is different from the link if the individual wishes to search just a particular product on jumia search bar. To that effect, a function was drawm up which takes the search criteria as an argument and returns a dataframe of the list of products in the individual product or category searched for. Just like the list of products in a specific category, the list of product from the search field is 40 per page and the maximum of 50 pages which amounts to a maximum of 2000 products for each scraping.

In [12]:
def list_frm_search(search_crt):
    search_crt=str(search_crt)
    if len(search_crt)==1:
        search_crt=search_crt
    else:
        search_crt='+'.join(search_crt.split(' '))
    search_href='https://www.jumia.com.ng/catalog/?q=' + search_crt
    list_hrefs=[]
    prod_name= []
    price=[]
    rating=[]
    rating_count= []
    while True:
        search_res=requests.get(search_href)
        page_content=search_res.text
        search_doc= BeautifulSoup(page_content,'html.parser')
        search_list=search_doc.find_all('div',{'class': 'info'})
        search_list=[lists.parent for lists in search_list]
        
        for lists in search_list:
            try:
                href=lists['href']
                if href.startswith('https'):
                    list_hrefs.append(href)
                else:
                    list_hrefs.append('https://jumia.com.ng' + href)
                prod_tag=lists.find('h3',{'class':'name'})
                if prod_tag:
                    prod_name.append(prod_tag.text)
                else:
                    prod_name.append('no name')
                stars_tag=lists.find('div',{'class':'rev'})
                if stars_tag:
                    rating.append(stars_tag.text.split('(')[0])
                    rating_count.append(stars_tag.text.split('(')[1])
                else:
                    rating.append('nil_rating')
                    rating_count.append('Nil count')
                price_tag=lists.find('div',{'class':'prc'})
                if price_tag:
                    price.append(price_tag.text)
                else:
                    price.append('no price')
            except:
                pass
        next_page=search_doc.find('a', {'aria-label': 'Next Page'})
        header='https://jumia.com.ng'
        if next_page == None:
            break
        else: 
            search_href= header + next_page['href']
    
    return pd.DataFrame({'product links':list_hrefs,'product name':prod_name,
                         'price':price, 'product_rating':rating,'rating count':rating_count})

In [13]:
Phone_search_list=list_frm_search('Phones')
Phone_search_list

200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200
200


Unnamed: 0,product links,product name,price,product_rating,rating count
0,https://jumia.com.ng/xiaomi-redmi-note-12-6.67...,XIAOMI Redmi Note 12 6.67'' 8GB 128GB ROM Andr...,"₦ 150,000",nil_rating,Nil count
1,https://jumia.com.ng/samsung-galaxy-a03-core-a...,Samsung Galaxy A03 Core Android 11 - 6.5 Inche...,"₦ 56,000",4.4 out of 5,58)
2,https://jumia.com.ng/xiaomi-a1-plus-6.52-4g-lt...,"XIAOMI A1 Plus, 6.52"" 4G LTE, 2GB/32GB Memory,...","₦ 51,510",4.5 out of 5,95)
3,https://jumia.com.ng/infinix-hot-20i-x665c-6.6...,"Infinix HOT 20i- (X665C) 6.6"" 2GB/64GB 4G - Blue","₦ 68,478",4 out of 5,8)
4,https://jumia.com.ng/samsung-galaxy-a14-6.6-4g...,"Samsung Galaxy A14 - 6.6"" (4GB RAM, 128GB ROM)...","₦ 121,570",4.1 out of 5,17)
...,...,...,...,...,...
1995,https://jumia.com.ng/generic-adapter-dac-to-3....,Adapter DAC To 3.5mm MFi Lightning For IPhones...,"₦ 31,000",nil_rating,Nil count
1996,https://jumia.com.ng/generic-rj45-to-adapter-l...,RJ45 To Adapter Lightning Ethernet For IPhone1...,"₦ 31,700",nil_rating,Nil count
1997,https://jumia.com.ng/generic-replacement-new-t...,Replacement New Touch Screen Digitizer For Hua...,"₦ 21,849",nil_rating,Nil count
1998,https://jumia.com.ng/generic-switch-on-off-pow...,Switch On Off Power Volume Button Flex Cable F...,"₦ 23,172",nil_rating,Nil count


#### Lastly, the reviews from each of the product from the search will be scraped for their reviews

In [82]:
def get_reviews(links):
    seller_perm=[]
    seller_followers_perm = []
    seller_rating_perm = []
    pr_revs=[]
    pr_rev_rating=[]
    pr_rev_date=[]
    pr_rev_title=[]
    name_perm=[]
    desc_perm=[]
    for link_hrefs in links:
        prod_res=requests.get(link_hrefs)
        page_content= prod_res.text
        pr_doc=BeautifulSoup(page_content, 'html.parser')
        seller_info=pr_doc.find('h2',{'class':"-m -fs14"})
        if seller_info:
            seller_info=seller_info.parent
            if seller_info:
                seller_info=seller_info.parent
                if seller_info:
                    seller=seller_info.find('p', {'class': '-m -pbs'})
                    seller_rating= seller_info.find('bdo', {'class': '-m -prxs'})
                    seller_followers= seller_info.find('p', {'data-followers': 'true'})
                    if seller:
                        seller=seller.text
                    else:
                        seller='No seller'
                    if seller_rating:
                        seller_rating=seller_rating.text
                    else:
                        seller_rating='No seller rating'
                    if seller_followers:
                        seller_followers=seller_followers.text
                    else:
                        seller_followers='No seller followers'
        prod_name=pr_doc.find('h1', {'class':"-fs20 -pts -pbxs"})
        if prod_name:
            name=prod_name.text
        else:
            name='No Product name'
        prod_desc=pr_doc.find('div',{'class':"markup -mhm -pvl -oxa -sc"})
        if prod_desc:
            desc=prod_desc.text
        else:
            desc='No Product Description'
        rev_link=pr_doc.find('a',{'class':'btn _def _ti -mhs -fsh0'})
        if rev_link== None:
            if seller:
                seller_perm.append(seller)
            else:
                seller_perm.append('Nil seller')
            if seller_followers:
                seller_followers_perm.append(seller_followers)
            else:
                seller_followers_perm.append('Nil seller followers')
            if seller_rating:
                seller_rating_perm.append(seller_rating)
            else:
                seller_rating_perm.append('Nil seller rating')
            if name:
                name_perm.append(name)
            else:
                name_perm.append('Nil prod name')
            if desc:
                desc_perm.append(desc)
            else:
                desc_perm.append('Nil prod description')
            pr_revs.append('Nil Review')
            pr_rev_rating.append('Nil rating')
            pr_rev_date.append('Nil review date')
            pr_rev_title.append('Nil review title')
        else:
            rev_link='https://jumia.com.ng' + rev_link['href']
            while True:
                rev_res=requests.get(rev_link)
                page_content=rev_res.text
                rev_doc=BeautifulSoup(page_content, 'html.parser')
                reviews=rev_doc.find_all('article', {'class': '-pvs -hr _bet'})
                for rev in reviews:
                    if seller:
                        seller_perm.append(seller)
                    else:
                        seller_perm.append('Nil seller')
                    if seller_followers:
                        seller_followers_perm.append(seller_followers)
                    else:
                        seller_followers_perm.append('Nil seller followers')
                    if seller_rating:
                        seller_rating_perm.append(seller_rating)
                    else:
                        seller_rating_perm.append('Nil seller rating')
                    if name:
                        name_perm.append(name)
                    else:
                        name_perm.append('Nil prod name')
                    if desc:
                        desc_perm.append(desc)
                    else:
                        desc_perm.append('Nil prod description')
                    rev_title= rev.find('h3',{'class':"-m -fs16 -pvs"})
                    if rev_title:
                        pr_rev_title.append(rev_title.text)
                    else:
                        rev_title.append('No title')
                    rev_text= rev.find('p',{'class':"-pvs"})
                    if rev_text:
                        pr_revs.append(rev_text.text)
                    else:
                        pr_revs.append('Nil text')
                    rev_rating=rev.find('div',{'class':"stars _m _al -mvs"})
                    if rev_rating:
                        pr_rev_rating.append(rev_rating.text)
                    else:
                        pr_rev_rating.append('Nil Rating')
                    rev_date=rev.find('span',{'class':"-prs"})
                    if rev_date:
                        pr_rev_date.append(rev_date.text)
                    else:
                        pr_rev_date.append('Nil date')
                next_page=rev_doc.find('a', {'aria-label': 'Next Page'})
                if next_page == None:
                    break
                else:
                    rev_link= 'https://jumia.com.ng' + next_page['href']
    return pd.DataFrame({'seller':seller_perm, 'seller followers': seller_followers_perm,
                         'product name': name_perm,'description': desc_perm,
                         'seller rating': seller_rating_perm,'review rating':pr_rev_rating,
                         'review date': pr_rev_date, 'review title': pr_rev_title,'reviews': pr_revs,
                        })
                         

In [83]:
product_list=Phone_search_list['product links'].values.tolist()
arr=product_list[:2]

In [84]:
arr

['https://jumia.com.ng/xiaomi-redmi-note-12-6.67-8gb-128gb-rom-android-13-onyx-gray-229788869.html',
 'https://jumia.com.ng/samsung-galaxy-a03-core-android-11-6.5-inches-screen-display-2gb-ram-32gb-internal-storage-5000mah-4g-lte-onyx-black-125729493.html']

In [85]:
get_reviews(arr)

Unnamed: 0,seller,seller followers,product name,description,seller rating,review rating,review date,review title,reviews
0,HiTaste Communication,1276 Followers,XIAOMI Redmi Note 12 6.67'' 8GB 128GB ROM Andr...,All Redmi Note 12 models feature an impressive...,100%,Nil rating,Nil review date,Nil review title,Nil Review
1,SUN WORLDWIDE DISTRIBUTIONS LIMITED-AC,640 Followers,Samsung Galaxy A03 Core Android 11 - 6.5 Inche...,Maximize your view to the fullestExpand your v...,94%,5 out of 5,08-05-2023,I love it,A very good product with a long lasting battery
2,SUN WORLDWIDE DISTRIBUTIONS LIMITED-AC,640 Followers,Samsung Galaxy A03 Core Android 11 - 6.5 Inche...,Maximize your view to the fullestExpand your v...,94%,4 out of 5,01-05-2023,beautiful,"Its beautiful and battery last long,i like and..."
3,SUN WORLDWIDE DISTRIBUTIONS LIMITED-AC,640 Followers,Samsung Galaxy A03 Core Android 11 - 6.5 Inche...,Maximize your view to the fullestExpand your v...,94%,5 out of 5,28-04-2023,samsung galaxy A03 core,This mobile phone from Samsung is one of the b...
4,SUN WORLDWIDE DISTRIBUTIONS LIMITED-AC,640 Followers,Samsung Galaxy A03 Core Android 11 - 6.5 Inche...,Maximize your view to the fullestExpand your v...,94%,5 out of 5,17-04-2023,Good,Great buy
5,SUN WORLDWIDE DISTRIBUTIONS LIMITED-AC,640 Followers,Samsung Galaxy A03 Core Android 11 - 6.5 Inche...,Maximize your view to the fullestExpand your v...,94%,4 out of 5,13-04-2023,I like the products,Even though it's a bit heavy but I love the gr...
6,SUN WORLDWIDE DISTRIBUTIONS LIMITED-AC,640 Followers,Samsung Galaxy A03 Core Android 11 - 6.5 Inche...,Maximize your view to the fullestExpand your v...,94%,4 out of 5,10-04-2023,About Galaxy A03 phone,Product okay
7,SUN WORLDWIDE DISTRIBUTIONS LIMITED-AC,640 Followers,Samsung Galaxy A03 Core Android 11 - 6.5 Inche...,Maximize your view to the fullestExpand your v...,94%,4 out of 5,05-04-2023,Its Nice And Fast,"Make the earpiece clearer, louder and softer."
8,SUN WORLDWIDE DISTRIBUTIONS LIMITED-AC,640 Followers,Samsung Galaxy A03 Core Android 11 - 6.5 Inche...,Maximize your view to the fullestExpand your v...,94%,4 out of 5,31-03-2023,GOOD,Only that it doesn't have fingerprint as speci...
9,SUN WORLDWIDE DISTRIBUTIONS LIMITED-AC,640 Followers,Samsung Galaxy A03 Core Android 11 - 6.5 Inche...,Maximize your view to the fullestExpand your v...,94%,4 out of 5,21-03-2023,good,It has been working fine
