# 🛒 Jumia Product Scraper & Sentiment Analysis

## 📌 Project Summary

This project is a **web scraping and sentiment analysis pipeline** designed to extract product details and customer reviews from **[Jumia Egypt](https://www.jumia.com.eg/)**.

It uses **BeautifulSoup** to parse product pages, collects **ratings, reviews, and metadata (SKU, product name, price, etc.)**, and applies **sentiment analysis** on customer reviews to determine if they are **positive, negative, or neutral**.

Finally, the data is exported into a **structured CSV file** for further analysis.

---

## ⚙️ Features

* 🔗 **Scrape product links** from multiple catalog pages
* 🏷 **Extract product details** (name, SKU, URL, overall rating)
* ⭐ **Collect customer reviews** (review text, rating, date, author)
* 😀 **Perform sentiment analysis** using **TextBlob**
* 🌍 (Optional) **Translate reviews** with **MarianMT** for multilingual analysis
* 📊 **Save structured data** into CSV for reporting & analytics

---

## 🛠 Tech Stack

* **Python** 🐍
* **Requests** → Sending HTTP requests
* **BeautifulSoup** → HTML parsing
* **Pandas** → Data handling & CSV export
* **TextBlob** → Sentiment analysis
* **Transformers (MarianMT)** → Neural machine translation
* **Torch** → Deep learning model support

---

## 📂 Output

The scraper produces a **CSV file** with the following fields:

| product\_name | url | sku | overall\_rating | rating | headline | review | sentiment | date | author |
| ------------- | --- | --- | --------------- | ------ | -------- | ------ | --------- | ---- | ------ |

---

## 🚀 Use Cases

* 🛍 **Market Research** → Understand customer opinions on products
* 📈 **Business Insights** → Track product ratings and sentiment trends
* 🤖 **NLP Datasets** → Build training data for sentiment classification models
* 💡 **E-commerce Analytics** → Compare competitors & product feedback

---

👉 This project can be extended to include **price tracking, competitor monitoring, and advanced NLP translation** for full multilingual review analysis.


### ✅ Importing Libs

In [4]:
import requests                           # Send HTTP requests
from bs4 import BeautifulSoup              # Parse HTML
import pandas as pd                        # Handle data & save as CSV
import time                                # Delay requests to avoid blocking
from textblob import TextBlob              # Simple sentiment analysis
from transformers import MarianMTModel, MarianTokenizer   # Translation
import torch                               # Needed for MarianMT


### ✅ Def Functions 

In [23]:
# Define the base URL and headers
headers =  {"User-Agent": "Mozilla/5.0"}

url = "https://www.jumia.com.eg/catalog/?q="

In [None]:
# let's make a function to get product links from category pages
def get_product_links(category_url, pages=1):
    """
    Get product links from Jumia category pages.
    """
    links = set() # Use a set to avoid duplicates
    
    for p in range(1, pages + 1):
        url = f"{category_url}&page={p}"
        resp = requests.get(url, headers  )
        soup = BeautifulSoup(resp.text, "html.parser") # Parse the HTML
        
        for a in soup.select("a.core"):  # Product cards
            href = a.get("href")
            if href:
                links.add("https://www.jumia.com.eg" + href.split("?")[0]) # Full URL without query params
        
        time.sleep(1)  
    
    return list(links)


In [None]:
# the next function will extract SKU from product page
def extract_sku(soup): 
    """
    Extract SKU from a product page.
    """
    ul_tag = soup.find("ul", class_="-pvs -mvxs -phm -lsn")
    if ul_tag:
        for li in ul_tag.find_all("li"):
            if "SKU" in li.text:
                return li.text.split(":")[-1].strip()
    return None


In [None]:
# let's make a function to extract reviews and rating using SKU
def extract_reviews_and_rating(sku):
    """
    Extract rating and customer reviews using SKU.
    """
    url = f"https://www.jumia.com.eg/catalog/productratingsreviews/sku/{sku}/"
    resp = requests.get(url, headers)
    if resp.status_code != 200: # If request failed
        return None, [] # Return None and empty list
    
    soup = BeautifulSoup(resp.text, "html.parser")
    
    # Overall rating
    rating = soup.select_one(".stars")
    rating_text = rating.get_text(strip=True) if rating else None
    
    # Reviews list
    reviews = []
    for article in soup.select("article.-pvs.-hr._bet"): # using for loop to get all reviews
        stars = article.select_one(".stars")
        headline = article.select_one("h3")
        text = article.select_one("p")
        spans = article.select("span")
        
        raw_review = text.get_text(strip=True) if text else None # Raw review text
        
        # Sentiment with TextBlob
        polarity = TextBlob(raw_review).sentiment.polarity if raw_review else 0 # Polarity score
        if polarity > 0.1:
            sentiment = "positive"
        elif polarity < -0.1:
            sentiment = "negative"
        else:
            sentiment = "neutral"
        
        reviews.append({ # Append review details to list
            "rating": stars.get_text(strip=True) if stars else None,
            "headline": headline.get_text(strip=True) if headline else None,
            "review": raw_review,
            "sentiment": sentiment,
            "date": spans[0].get_text(strip=True) if len(spans) > 0 else None,
            "author": spans[1].get_text(strip=True) if len(spans) > 1 else None,
        })
    
    return rating_text, reviews


In [None]:
# the main function to scrape product details
def scrape_product(url):
    """
    Scrape one product page: name, SKU, rating, reviews.
    """
    print("[INFO] Scraping:", url)
    resp = requests.get(url, headers  )
    soup = BeautifulSoup(resp.text, "html.parser")
    
    # Product name
    name = soup.select_one("h1.-fs20")
    name = name.get_text(strip=True) if name else None
    
    # SKU
    sku = extract_sku(soup)
    if not sku:
        print("[WARN] No SKU found for:", url)
        return None
    
    # Get reviews & rating
    rating, reviews = extract_reviews_and_rating(sku)
    
    return {"name": name, "url": url, "sku": sku, "rating": rating, "reviews": reviews}


In [None]:
# we need a function to save data to CSV
def save_csv(data, fname="jumia_reviews.csv"):
    """
    Save scraped data to CSV.
    """
    rows = []
    for p in data:
        revs = p["reviews"] or [{}]
        for r in revs:
            rows.append({
                "product_name": p["name"],
                "url": p["url"],
                "sku": p["sku"],
                "overall_rating": p["rating"],
                "rating": r.get("rating"),
                "headline": r.get("headline"),
                "review": r.get("review"),
                "sentiment": r.get("sentiment"),
                "date": r.get("date"),
                "author": r.get("author")
            })
    
    df = pd.DataFrame(rows).dropna()
    df.to_csv(fname, index=False, encoding="utf-8-sig")
    print(f"[INFO] Saved {len(rows)} rows to {fname}")
    return df


In [None]:
# let's run the scraper
if __name__ == "__main__":
    links = get_product_links(url, pages=10)   # Collect 10 pages
    data = []
    
    for link in links:
        result = scrape_product(link)
        if result:
            data.append(result)
        time.sleep(1)
    
    df = save_csv(data)


[INFO] Scraping: https://www.jumia.com.eg/dice-set-of-5-boxer-basic-116913990.html
[INFO] Scraping: https://www.jumia.com.eg/honor-x8c-6.7-inches-dual-sim-4g-512gb8gb-mobile-phone-moonlight-white-free-earbuds-x6-white-131555399.html
[INFO] Scraping: https://www.jumia.com.eg/bundle-of-six-printed-underwear-for-women-malika-mpg836203.html
[INFO] Scraping: https://www.jumia.com.eg/55-du7000-crystal-uhd-4k-tv-samsung-mpg1155055.html
[INFO] Scraping: https://www.jumia.com.eg/aloe-eva-hair-conditioner-with-aloe-vera-moroccan-argan-oil-230-ml-106770647.html
[INFO] Scraping: https://www.jumia.com.eg/activ-leather-low-cut-sneakers-dark-blue-turquoise-132388913.html
[INFO] Scraping: https://www.jumia.com.eg/carefree-panty-liners-cotton-fresh-scent-56-pcs-99951586.html
[INFO] Scraping: https://www.jumia.com.eg/freshdays-sanitary-napkins-long-scented-pantyliners-72-pads-22987028.html
[INFO] Scraping: https://www.jumia.com.eg/lc-waikiki-crew-neck-boy-t-shirt-131356286.html
[INFO] Scraping: https://

In [30]:
df

Unnamed: 0,product_name,url,sku,overall_rating,rating,headline,review,sentiment,date,author
0,Dice Set Of (5) Boxer Basic,https://www.jumia.com.eg/dice-set-of-5-boxer-b...,DI195MW1TG81LNAFAMZ,4.3 out of 5,4 out of 5,الخامه بمنتهى الراحه والجمال فقط الأستك غير مريح,تغير الأستك اللى خامه افضل وانعم,neutral,17-08-2025,by Eslam
1,Dice Set Of (5) Boxer Basic,https://www.jumia.com.eg/dice-set-of-5-boxer-b...,DI195MW1TG81LNAFAMZ,4.3 out of 5,5 out of 5,good,خامه نضيف وتقفيل كويس,neutral,17-08-2025,by Ahmad
2,Dice Set Of (5) Boxer Basic,https://www.jumia.com.eg/dice-set-of-5-boxer-b...,DI195MW1TG81LNAFAMZ,4.3 out of 5,5 out of 5,ممتاز,ممتاز انصح به,neutral,06-08-2025,by محمد
3,Dice Set Of (5) Boxer Basic,https://www.jumia.com.eg/dice-set-of-5-boxer-b...,DI195MW1TG81LNAFAMZ,4.3 out of 5,5 out of 5,جميل اوى,حلوة اوى,neutral,18-06-2025,by احمد جابر سعد
4,Honor X8c – 6.7 inches Dual SIM 4G 512GB/8GB M...,https://www.jumia.com.eg/honor-x8c-6.7-inches-...,HO474MP3AOVP5NAFAMZ,5 out of 5,5 out of 5,وصل بالوقت المناسب والمنتج جدا رائع,الموبايل رائع جدا، وكان السعر الأنسب في جوميا,neutral,16-07-2025,by Saad
...,...,...,...,...,...,...,...,...,...,...
2087,Kiddy Large Baby Diapers Size 4 - 64 Pcs,https://www.jumia.com.eg/kiddy-large-baby-diap...,BR032HB18SLRFNAFAMZ,4.1 out of 5,4 out of 5,جيد,المقاس مظبوط مش صغير زي ما بعض الريفيوهات بتقول,neutral,10-07-2025,by Mustafa Suroor
2088,Kiddy Large Baby Diapers Size 4 - 64 Pcs,https://www.jumia.com.eg/kiddy-large-baby-diap...,BR032HB18SLRFNAFAMZ,4.1 out of 5,5 out of 5,رائع,حلو جدااااا استخدمت منه كتير ومازلت بطلبه كل ل...,neutral,24-06-2025,by ايناس
2089,Kiddy Large Baby Diapers Size 4 - 64 Pcs,https://www.jumia.com.eg/kiddy-large-baby-diap...,BR032HB18SLRFNAFAMZ,4.1 out of 5,5 out of 5,ممتاز,ممتاز كقيمة مقابل سعر,neutral,23-06-2025,by Hany
2090,Kiddy Large Baby Diapers Size 4 - 64 Pcs,https://www.jumia.com.eg/kiddy-large-baby-diap...,BR032HB18SLRFNAFAMZ,4.1 out of 5,5 out of 5,تحفة,جميل و القطن ناشف و مش بيكلكع مع الاستخدام \nش...,neutral,22-06-2025,by Abeer
