# **Scraping Script:** *Scraping One Product*
## Identifying the website target
The project seeks to analyze Kenyan online platforms and help companies or sellers understand whether reviews reflect true product quality, or if they are false, overly biased or misleading. Out of all the online platforms in kenya, we have decided to use Jumia Kenya as our target website as its not only the biggest e-commerce website in kenya, but also the leading pan-African e-commerce platform active across 9 countries in the continent, emphasizing the user base.

The website provides a vast dataset to analyze allowing for more robust statistical analysis and reduces the chance of drawing inaccurate conclusions based on limited data.

## Inspecting the website structure.
To inspect the website structure and scrape it, we will be using BeautifulSoup and requests modules. 

We will be inspecting the website to understand what exactly we need to scrape and its exact place.


In [1]:
# Import scraping dependencies.
from bs4 import BeautifulSoup
import requests



In [2]:
# First test url
url = 'https://www.jumia.co.ke/catalog/productratingsreviews/sku/SO460HA3N497WNAFAMZ/'
headers = {"User-Agent": "Mozilla/5.0"}

# Request access from the webpage
page = requests.get(url, headers=headers)
page

<Response [200]>

In [3]:
# Accessing the webpage.
soup = BeautifulSoup(page.content, 'html.parser')
soup1 = BeautifulSoup(soup.prettify(), 'html.parser')
soup1

<!DOCTYPE html>

<html dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>
   Reviews of Stainless Steel 1.8Ltr Electric Kettle,1.5m Power Cord,1500W Fast Boil 1YR WRTY
  </title>
<meta content="product" property="og:type"/>
<meta content="Jumia Kenya" property="og:site_name"/>
<meta content="Reviews of Stainless Steel 1.8Ltr Electric Kettle,1.5m Power Cord,1500W Fast Boil 1YR WRTY" property="og:title"/>
<meta content="/catalog/productratingsreviews/sku/SO460HA3N497WNAFAMZ/" property="og:url"/>
<meta content="https://ke.jumia.is/cms/icons/jumialogo-x-4.png" property="og:image"/>
<meta content="en_KE" property="og:locale"/>
<meta content="Reviews of Stainless Steel 1.8Ltr Electric Kettle,1.5m Power Cord,1500W Fast Boil 1YR WRTY" name="title"/>
<meta content="noindex,follow" name="robots"/>
<meta content="2099414773624117" property="fb:app_id"/>
<meta content="395103207248808" property="fb:pages"/>
<meta content="n_VPq2qj81eefHQXQuWUQcpCjf22dILtzJ-5fqwv3cY" name="google-site-veri

The website structure is written in HTML.

In [4]:
# Accessing the customer feedback code.
customer_feedback = soup1.find_all('article', class_="-pvs -hr _bet")
customer_feedback

[<article class="-pvs -hr _bet">
 <div class="stars _m _al -mvs">
            4 out of 5
            <div class="in" style="width:80%">
 </div>
 </div>
 <h3 class="-m -fs16 -pvs">
            Love it
           </h3>
 <p class="-pvs">
            So far so good just as ordered
           </p>
 <div class="-df -j-bet -i-ctr -gy5">
 <div class="-pvs">
 <span class="-prs">
              20-06-2025
             </span>
 <span>
              by Damaris
             </span>
 </div>
 <div class="-df -i-ctr -gn5 -fsh0">
 <svg class="ic -f-gn5" height="22" viewbox="0 0 24 24" width="22">
 <use xlink:href="https://www.jumia.co.ke/assets_he/images/i-icons.77a720d0.svg#check-verified">
 </use>
 </svg>
             Verified Purchase
            </div>
 </div>
 </article>,
 <article class="-pvs -hr _bet">
 <div class="stars _m _al -mvs">
            5 out of 5
            <div class="in" style="width:100%">
 </div>
 </div>
 <h3 class="-m -fs16 -pvs">
            worthy
           </h3>
 <p class="-p

In [5]:
# Accessing the ratings from the page
ratings = soup1.find_all('div', class_="stars _m _al -mvs")
rates = [rate.text for rate in ratings]
rates = [rate.strip('\n ') for rate in rates]
rates


['4 out of 5',
 '5 out of 5',
 '5 out of 5',
 '5 out of 5',
 '4 out of 5',
 '3 out of 5',
 '5 out of 5',
 '5 out of 5',
 '4 out of 5',
 '5 out of 5']

In [6]:
# Accessing the review titles from the page.
review_text = soup1.find_all('h3', class_="-m -fs16 -pvs")
review_text = [review.text for review in review_text]
review_text = [review.strip('\n ') for review in review_text]
review_text

['Love it',
 'worthy',
 'Value for your money.',
 'kettle',
 "It's awesome",
 'could change',
 '10/10',
 'its faster',
 'stick it together',
 'Good']

In [7]:
# Access the review dates from the page
review_date = soup1.find_all('span', class_="-prs")
review_date = [date.text for date in review_date]
review_date = [date.strip('\n ') for date in review_date]
review_date

['20-06-2025',
 '20-06-2025',
 '19-06-2025',
 '19-06-2025',
 '19-06-2025',
 '19-06-2025',
 '18-06-2025',
 '18-06-2025',
 '18-06-2025',
 '18-06-2025']

In [8]:
# Accessing the verification tag from the page
verified = soup1.find_all('div', class_="-df -i-ctr -gn5 -fsh0")
verified = [tag.text for tag in verified]
verified = [tag.strip('\n ') for tag in verified]
verified

['Verified Purchase',
 'Verified Purchase',
 'Verified Purchase',
 'Verified Purchase',
 'Verified Purchase',
 'Verified Purchase',
 'Verified Purchase',
 'Verified Purchase',
 'Verified Purchase',
 'Verified Purchase']

Now, having understood and known where all the elements are, we will be writting a scraping script for batch scraping.