# Intoduction

Jumia is a Pan-African technology company that is built around a marketplace, logistics service and payment service. In this Project I will be scraping some informations about the laptops for sale on the jumia website then export it into excel file for basic analysis.

## Import the required libraries

In [5]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import lxml

### Get Request

In [6]:
#store the website in a variable and get the status code
website = 'https://www.jumia.com.ng/catalog/?q=laptops'
response = requests.get(website)
response.status_code 

200

### Soup Object
Since Status code of the request returns 200 that means the website is permitted and avaliable for scraping, we will proceed with creating the soup object for parsing through the contents of the webpage.

In [7]:
soup = BeautifulSoup(response.content, 'lxml')
print(soup.prettify()) #prints the content of the webpage in a more organized way

<!DOCTYPE html>
<html dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Shop All Products - TVs, Laptops, Fashion Items | Jumia Nigeria
  </title>
  <meta content="product" property="og:type"/>
  <meta content="Jumia Nigeria" property="og:site_name"/>
  <meta content="Shop All Products - TVs, Laptops, Fashion Items | Jumia Nigeria" property="og:title"/>
  <meta content="Enjoy amazing discounts and deals up to 70% on your favourite iPhones, android devices, TVs, Cookers and more offers at the best prices on Jumia Nigeria." property="og:description"/>
  <meta content="/catalog/" property="og:url"/>
  <meta content="https://ng.jumia.is/cms/jumialogonew.png" property="og:image"/>
  <meta content="en_NG" property="og:locale"/>
  <meta content="Shop All Products - TVs, Laptops, Fashion Items | Jumia Nigeria" name="title"/>
  <meta content="noindex,follow" name="robots"/>
  <meta content="Enjoy amazing discounts and deals up to 70% on your favourite iPhones, android devices,

### Results
After successfully parsing through the webpage, we can now find the information we want, in this case the body section of each laptop on sale by pulling the html tag which includes the link, image of the laptop, description, price, review etc.

In [8]:
results = soup.find_all('a', {'class':'core'})
results

[<a class="core" data-brand="Lenovo" data-category="Computing/Computers &amp; Accessories/Computers &amp; Tablets/Laptops" data-dimension23="" data-dimension26="3" data-dimension27="5" data-dimension28="1" data-dimension37="0" data-dimension43="BF22|BF22_03|FDYJE|JMALL|TBOOST" data-dimension44="0" data-id="LE842CL4P9DQINAFAMZ" data-list="" data-name="V15-IGL Intel Celeron 1TB HDD 4GB RAM Win 10" data-position="1" data-price="286.03" data-track-onclick="eecProduct" data-track-onview="eecProduct" href="/lenovo-v15-igl-intel-celeron-1tb-hdd-4gb-ram-win-10-201829248.html"><div class="img-c"><img alt="BF22" class="_ni camp" data-lazy="" data-src="https://ng.jumia.is/badges/bf22/3/138x18.png?6652" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAA

In [9]:
#check the total number of laptop extracted from the page
len(results)

40

### Observation
We can see that there are currently 40 laptops posted on the first(current) webpage of jumia laptop category as ret. We can now start extracting the information we want from the first result of the results variable before proceeding to extracting the whole page and then all the pages in the laptop category.

## Target Necessary Data
Here we target the particular data we want from the html content extracted to the results variable above. Below are the neccesary data we would be targeting in this project.
#### Product details
Consists of product name and some informations about the product
#### Product Price
Price of the product
#### Review ratings
Number of review star for each products
#### Store status
Whether the product is posted by the official store or not

### Product details

In [10]:
product_details = results[0].find('h3', class_='name').get_text()
product_details

'Lenovo V15-IGL Intel Celeron 1TB HDD 4GB RAM Win 10'

### Product Price

In [11]:
price = results[0].find('div', class_='prc').get_text()
price

'₦ 130,990'

### Review ratings

In [12]:
review_rating = results[0].find('div', class_='stars _s').get_text().split()[0]
review_rating

'5'

### Put everything together inside a For-Loop
Here we create a list of the targeted data then loop through the first page to add the targeted data of all the laptops in the page to their respective list.

In [13]:
product_name = []
product_price = []
rating = []
store_status = []
for result in results:
    
    #details
        try:
            product_name.append(result.find('h3', class_='name').get_text())
        except:
            product_name.append('n/a')
            
    #price        
        try:
            product_price.append(result.find('div', class_='prc').get_text())
        except:
            product_price.append('n/a')
     
    #review rating
        try:
            rating.append(result.find('div', class_='stars _s').get_text().split()[0]) 
        except:
            rating.append('n/a')
        
    #store
        try:
            store_status.append(result.find('div', class_='bdg _mall _xs').get_text())
        except:
            store_status.append('Not Offical Store')

### PAGINATION
As we did above we would be looping through pages but this time it will be all the pages under laptop category which consists of 51 pages.  

In [20]:
for num in range(2, 51):
    
    website = 'https://www.jumia.com.ng/catalog/?q=laptops&page='+str(num)+'#catalog-listing'
    response = requests.get(website)
    soup = BeautifulSoup(response.content, 'lxml')
    results = soup.find_all('a', {'class':'core'})
    #loop through each page
    for result in results:
        #name
        try:
            product_name.append(result.find('h3', class_='name').get_text())
        except:
            product_name.append('n/a')
            
    #price        
        try:
            product_price.append(result.find('div', class_='prc').get_text())
        except:
            product_price.append('n/a')
     
    #review rating
        try:
            rating.append(result.find('div', class_='stars _s').get_text().split()[0]) 
        except:
            rating.append('n/a')
        
        #store
        try:
            store_status.append(result.find('div', class_='bdg _mall _xs').get_text())
        except:
            store_status.append('Not Offical Store')

### Convert the data into a pandas DataFrame

In [21]:
df = pd.DataFrame({'Name':product_name,'Price':product_price,'Rating':rating,'store':store_status})
df

Unnamed: 0,Name,Price,Rating,store
0,"Asus 2021 14"" PC Intel Celeron N4020 4GB RAM-1...","₦ 159,600",4,Not Offical Store
1,"Hp Stream11Intel Celeron 32gb Mmc,2gbRamOnBoar...","₦ 102,500",1,Not Offical Store
2,"itel Intel® Celeron™ N3350, 4GB/1TB HDD 14"" La...","₦ 147,500",,Not Offical Store
3,Asus Mini Notebook Intel Celeron 4GB RAM 500GB...,"₦ 139,000",,Not Offical Store
4,Firman SUMEC FIRMAN 10000MAH SUPER STRONG RELI...,"₦ 7,272",,Not Offical Store
...,...,...,...,...
3915,Hp ProBook 11 X360- TouchScreen - Intel Penti...,"₦ 355,000",5,Not Offical Store
3916,Waterproof Dustproof Silicone Keyboard Cover F...,"₦ 1,723",,Not Offical Store
3917,DELL Inspiron 14 Intel Core I3 128GB SSD+1TB H...,"₦ 351,000",,Not Offical Store
3918,Hp Pavilion 14 X360-Intel Core I3 Backlit Keyb...,"₦ 400,000",,Not Offical Store


### Export into an excel file

In [16]:
df.to_excel('laptop_data.xlsx')