# Jumia Phone Price Prediction

## Business Understanding
 Retailers on Jumia's e-commerce platform face challenges in determining optimal pricing due to the competitive nature of the marketplace with over 100,000 of them and the time-consuming process of evaluating other competitor prices. Jumia has tasked us to analyze the phone catalog data and develop a predictive model that provides data-driven insights, enabling sellers to set competitive prices and maximize profitability effectively ahead of the November black Friday Big Sale.This model is expected to reduce the stress that retailers/sellers have to go through to determins the optimal average price of the product they intend to list on the platform.
 
 The objective of our project is as outlined below:
* Identify factors contributing to higher product visibility and marketability on Jumia’s first top pages.
* Explore the relationship between phone features and customer reviews.
* Develop a predictive model to recommend competitive, optimal pricing that promotes first-page placement.
* Assess the potential relationship between buyer ratings and product pricing.



## Data Understanding
The data we used was scrapped on 29th October 2024 from the Jumia Kenya e-commerce platform specifically under the smartphones category and sorted by popularity from the 1at to the last page. This gave us 12,000 listed devices. The python code used to scrape the data has been stored on a separate file **scrapped_data_code.ipynb** The packages used included the Beautiful Soup and Pandas. We saved the data in the csv format on our local machine as jumia_mobile_phone.csv and jumia_phone_catalog_popularity.csv that contains the below features respectively outlines:

**Name** This describes the brand and the feature of the phone.

**Price** This describes the current price the phone retails at.

**Old Price** This describes the previous price of the phone.

**Discount** The % discount calculated

**Rating** The buyers explicit rating of the product and service.

**Number of Reviews** The number of reviews from possible buyers.

**Search Ranking** The page and position of the product in terms of listing and popularity.

**Best Seller** Whether the product was a best seller or not. 

The two files have the Name column in common we therefore will merge them to one file on this column. The Name column contains unstructured text, combining brand names and product specifications (e.g., “Samsung Galaxy A12, 5000mAh, 128GB ROM, 6GB RAM”). To transform these into separate, structured attributes, we shall use Regex as it allows for consistent pattern matching, enabling the extraction of information such as battery capacity (e.g., numbers followed by "mAh") and storage (e.g., "GB" or "MB"), making data more structured and accessible for analysis.

Data Limitation:

* Dynamic Pricing: Prices on e-commerce platforms fluctuate frequently. Therefore, the scraped prices reflect only the prices at the time of scraping and may not represent current or future values.

* Incomplete or Inconsistent Data: Due to the variety of phone models and brands, some listings may lack uniform information (e.g., missing battery details or memory specifications), which could lead to variability in the parsed features.

* Unverified Ratings and Reviews: Ratings and reviews might be biased or manipulated, affecting any insights or model predictions derived from them.

* Potential Duplicate Listings: Duplicate or near-duplicate entries may exist if the same model is listed by multiple sellers, which could influence popularity and ranking statistics.




### Import Relevant Libraries

In [3]:
# Import libraries for inspecting, loading, cleaning and visualizing data
import pandas as pd
import numpy as np
#Visualization 
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
#regular expression
import re