Jumia_API-like Data Scraping with Selenium and Flask

Bot must follow this rules Site scaping is permited IF the user-agent is clearly identify it as a bot and the bot owner and is using less than 200 request per minute Bot identification must have a owner url or contact if we need to contact them Bots with fake user-agent will be blocked Bots trying to use too many IPs to increase performance may also be blocked. If you need more than 200 RPM, please contact the email techops at jumia com jumia robots.txt

This project involves building an API that can scrape data from a specific webpage, similar to how the Jumia website is scraped for product information. It utilizes the Selenium library for web scraping and the Flask microframework for building the API.

How it Works:

Web Scraping with Selenium:

Selenium is used to automate the process of opening a web browser, navigating to the desired webpage (e.g., a Jumia search results page), and extracting the necessary data.

Flask:

Flask is used to create an API endpoint that allows external applications to access the scraped data.
The API endpoint can be called with specific parameters, such as the search query or product category, to retrieve the relevant data.

Valuable Information Retrieved:

The data scraped from the Jumia search results page typically includes the following valuable information:

Product title
Product price
Product image URL
Product rating

Example Usage:

To use the API, a client application can make a request to the API endpoint with the desired parameters. For instance, to scrape data for products related to "mobile phones," the client application would send a request to the endpoint with the search query "mobile phones."

Upon receiving the request, the API would invoke Selenium to scrape the Jumia search results page and extract the relevant product information. This information would then be returned to the client application in a structured format, such as JSON.

Requirements

This package requires the following to run:

python>=3.11

Installation

First you have to clone the repo by writing the following code

Clone the git rep

Change directory to jumia python web scraper cd in to the repo or open it with a text editor. Because that's where the main python file is (main.py) Activate your Virtual Environment (venv)

Then run

pip install -r requirements.txt

Usage

Then run the python file

python main.py

Endpoints:

GET / - Homepage (This page)
GET /product_name/{number_of_page} - Scrapes products from Jumia based on page number
GET /product_name/{discount_percentage}/{number_of_page} - Scrapes products with a discount percentage from Jumia based on page number

Example:

localhost/get_all/phones/2

2 here is the number of page to scrape

Contribution

You can contribute to this project. To contribute to this project, clone repo locally and commit your code on a seperate branch.

You can also reach me via email me or better yet, shoot me a twitter DM.

license

MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
templates		templates
.replit		.replit
README.md		README.md
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
replit.nix		replit.nix
requirements		requirements
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Jumia_API-like Data Scraping with Selenium and Flask

How it Works:

Web Scraping with Selenium:

Flask:

Valuable Information Retrieved:

Example Usage:

Requirements

Installation

Usage

Endpoints:

Example:

localhost/get_all/phones/2

2 here is the number of page to scrape

Contribution

license

About

Releases

Packages

Languages

Arinze1020/jumia_market_scraper

Folders and files

Latest commit

History

Repository files navigation

Jumia_API-like Data Scraping with Selenium and Flask

How it Works:

Web Scraping with Selenium:

Flask:

Valuable Information Retrieved:

Example Usage:

Requirements

Installation

Usage

Endpoints:

Example:

localhost/get_all/phones/2

2 here is the number of page to scrape

Contribution

license

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages