# **Reviews vs Reality.**

## **Business Understanding.**
### Background.
E-commerce is slowly but steadily climbing the ladder to be one of Kenya's economic backbones. With businesses shifting from the traditional trade, that included setting up shops and waiting for customers to walk around looking for what they need, to digital trade where all one needs to do is open an account on any social media platform, post whatever they're selling and just wait for notifications that someone needs their product. No need for physical shop, no pressure, just internet connection and the comfort of their homes.

However, such a strong growth in such an industry, comes with really stiff competition. Every seller wants to be the best, to sell the most and earn the most which in result pushes sellers to the extremes of buying fake reviews.

### Problem Statement.
In Kenya, entreprenuers, both young and old are turning to platforms like Jumia, Killimall, Jiji, PigiaMe among others to sell products and make ends meet. It's not only a business, but a survival.

The rise of fake reviews, however, threatens the honest business persons. There have been not only reports but also companies have come up and stated openly that they do, in fact, sell online reviews. This practice not only creates an unfair playing field for vendors because the vendors buying reviews are often better funded than the ones that do not but also denies the struggling vendors a way to earn a living.

As a result of fake reviews;
- Good products go unseen as the algorithms normally show products with high ratings so genuine sellers who actually rely on real customer feedback get burried in search results. 

- Honest sellers lose customers as with a low number of reviews, their products are percieved as lower quality.

Online platforms need a way to detect mismatches between reviews and ratings, to surface truly trustworthy sellers and protect buyers and sellers alike. This project seeks to help them both understand if the reviews are true or false.

### Project Objectives.
This project is looking to;
- Use Natural Language Processing and sentiment analysis to spot suspicious products whose ratings do not match what people are really saying.

### Success criteria.
When fake visisbility wins over genuine value, everyone is affected. What would spell success for this project would be that;

- Sellers get equal visibility based on real customer feedback.

- Customers get protected against cons

- Companies get to protect their customers from falling victims of fake reviews 

### Stakeholders.
- Customers: The individuals or businesses purchasing goods or services through the platform.

- Sellers/Merchants: Businesses or individuals who list and sell products or services on the platform. 

- Platform Providers: The company or organization that owns and operates the online commerce platform.

- Regulatory Bodies: Government agencies and other organizations that set rules and standards for online commerce, including consumer protection.

- Investors: Individuals or organizations that have provided funding for the platform. 

- Government and International Organizations: These entities are stakeholders due to their role in setting and enforcing regulations related to online commerce. 

### Project plan.

## **Data Understanding.**
### Data Source.
The data used in this project was collected from publicly accessible product pages on `Jumia Kenya` using web scraping techniques. Reviews were gathered from selected products in three categories: fashion, appliances, and other electronics.

The scraping was performed using Python libraries such as requests and BeautifulSoup as evident in the `Scraper` folder, and all information collected is visible to any user visiting the site. No login or bypassing of protections was required.

This data was collected strictly for educational and research purposes, with the intent of exploring the relationship between product star ratings and actual customer sentiment. It is not affiliated with, endorsed by, or intended to defame Jumia or any of its sellers. It simply aims to uncover insights from publicly available information and promote transparency in digital marketplaces.

The dataset is under the file path `Data/`.

### Why is the data suitable for this project?
Jumia Kenya is one of the most popular e-commerce platforms in Kenya, data from the platform not only provides a larger pool of sellers but also a richer variety of reviews which allows for a more comprehensive understanding of consumer behavior.

- The largest e-commerce companies have millions or even billions of transactions, providing a vast dataset to analyze. This allows for more robust statistical analysis and reduces the chance of drawing inaccurate conclusions based on limited data.

- Large companies also serve a diverse customer base, including demographics, geographic locations, and purchasing habits. This diverse data helps in identifying patterns that might be missed in smaller datasets focused on specific niches.

### Exploring the dataset for understanding.
In this section we will be carrying out both qualitative and quantitative analysis to understand the structure of the dataset as well as identify areas that would impact our analysis if left unchecked or simply not fixed.

#### Import dependencies and loading the dataset.


In [6]:
# Import dependencies
import pandas as pd
import sqlite3
import seaborn as sns
import matplotlib.pyplot as plt

# Loading dataset
# Connect to database
conn = sqlite3.connect('../Data/reviews.db')

# Use pandas to read sql
reviews = pd.read_sql_query('SELECT * FROM reviews', conn)

# Close connection
conn.close()

# Dataset preview
reviews.head()

Unnamed: 0,id,product_name,category,review_text,rating,review_date,verified
0,1,Men's Sneakers,fashion,Very comfortable and stylish.,5,2024-06-01,yes
1,2,Berrykey Hawaiian Shirt,fashion,big size not cotton,1 out of 5,19-06-2025,Verified Purchase
2,3,Berrykey Hawaiian Shirt,fashion,Not satisfied,1 out of 5,13-06-2025,Verified Purchase
3,4,Berrykey Hawaiian Shirt,fashion,I like it,5 out of 5,12-05-2025,Verified Purchase
4,5,Berrykey Hawaiian Shirt,fashion,HomeFashionMen's FashionClothingShirtsMen's Ha...,5,,no


In [7]:
reviews.tail()

Unnamed: 0,id,product_name,category,review_text,rating,review_date,verified
40,41,Oppo Refurbished A57,phones & tablets,HomePhones & TabletsMobile PhonesSmartphonesAn...,3.0,,no
41,42,TWS Bluetooth Earphones,phones & tablets,HomePhones & TabletsMobile Phone AccessoriesBl...,3.8,,no
42,43,Boy Boyurn-Down Collor Top + Shorts,fashion,HomeFashionKid's FashionBoysClothingClothing S...,4.2,,no
43,44,Kids Girl 2PCS Puff Sleeve Top Floral Dresses,fashion,HomeFashionKid's FashionGirlsClothingDressesCa...,3.9,,no
44,45,HP Refurbished Elitebook 840,computing,HomeComputingComputersLaptops & DesktopsLaptop...,4.6,,no
