# Web scrapping in Mercado Libre
---
---
This web scrapping exercise is intented to show the process to extract data from the website 'Mercado Libre' a various product selling platform in Latin America. The URL for extraction is taken from the website for Mexico, and the product chosen is 'Laptops'.
## Importing libraries
The libraries we wil be using for this exercise are: 'requests' for sending the HTTP requests, 'BeautifulSoup' for parsing the unwanted data and helping to organize and format the messy web data by fixing bad HTML and be presented to us in easily-traversible XML structures, 'pandas' for analysing our scrapped data, and 'namedtuple' for creating simple, lightweight data structures similar to a class, but without the overhead of defining a full class.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from collections import namedtuple

Afterwards, we declare the URL we will be applying the web srapping to, that for this case will be for laptops.
We define our 'namedtuple' to store the information we extract from the URL as name, previous price, current price, discount and the image url per product as a list.
We finally declare the product list that will start empty to store all the products of type namedtuple extracted from our scrapping.

In [2]:
url = "https://listado.mercadolibre.com.mx/laptops#D[A:laptops]"
Product = namedtuple('Product', ['name','prev_price','curr_price','discount','url_image'])
product_list = []

We proceed to connect with the site using 'requests' to extract the info of the URL. Then, we need the text and we call the contents with the attribute 'text'.

In [3]:
r = requests.get(url)
html_contents=r.text

Then we create our 'BeautifulSoup' object with the html requested from MercadoLibre 'html_contents' and the html parser to standardize and to avoid conflicts with versions. Finally, with this we will have stablished a successful connection with the site.

In [4]:
html_soup = BeautifulSoup(html_contents,'html.parser')

# Starting the analysis
We need to define the products we will be analysing from the site, so, we go through the website to see how the information we require is stored and organised by revising their html structure. By this we can identify that each of the products(laptops) information is stored in the class "ui-search-layout__item".
Then, we create our variable 'products_mercado' containing our target product and by using our BeautifulSoup object and using '.find_all'with the tag and class name, we extract all the listed laptops included in the website. We are ready to create a 'for' cycle to extract the information we need for all the products.

In [5]:
products_mercado = html_soup.find_all('li',class_="ui-search-layout__item")

# The 'for' cycle
---
## Creating the cycle
After determining the list of items to use, by finding the respective class, it is required to create a 'for' cycle where we can get the details 
of our data. Remember that for each product we need the name, the previous price (if any), the current price, the discount (if any as well), and the image url.
The argument 'try' is used for those products that have a previous price and/or discount, and the argument 'except' is used for those products that do not have a previous price and/or discount.
## Replacing data types
As most data is a type 'string' we should make sure 'int' types are designated as correspond: for the previous price and current price, we replace the commas for blanks and for discount we replace '% OFF' also with a blank, and in 'append' we cast the elements that should be 'int' types, as for this case, to previous price(prev_price), current price(curr_price) and discount.

In [7]:
for product in products_mercado:
    name = product.find('h2',class_="ui-search-item__title shops__item-title").text
    try:
        prev_price = product.find('s',class_='andes-money-amount').find('span',class_='andes-money-amount__fraction').text
        discount = product.find('span',class_="ui-search-price__second-line__label shops__price-second-line__label").find('span',class_="ui-search-price__discount shops__price-discount").text
    except AttributeError:
        prev_price = '0'
        discount = '0'
    curr_price = product.find('span',class_='andes-money-amount').find('span',class_='andes-money-amount__fraction').text
    prev_price = prev_price.replace(',','')
    curr_price = curr_price.replace(',','')
    discount = discount.replace('% OFF','')
    image = product.find('img')
    product_list.append(Product(name,int(prev_price),int(curr_price),int(discount),image.attrs['data-src']))

# Organising the data into a DataFrame and analysis execution
---
Now, to make our data easy-to-read, we create a Pandas DataFrame in order to match the 'product_list' columns with our fields in our designated namedtuple 'Product', and we wil be able to have our data better organised.

In [8]:
df = pd.DataFrame(product_list,columns=Product._fields)

Afterwards, we just revise our organisation by extracting a sample, which for this case will be of 5 items.

In [9]:
df.sample(5)

Unnamed: 0,name,prev_price,curr_price,discount,url_image
15,Laptop Gamer Msi Gf63 Thin Geforce Gtx 1650 Co...,18499,14899,19,https://http2.mlstatic.com/D_NQ_NP_747423-MLA5...
37,"Laptop Acer C733 shale black 11.6"", Intel Cel...",0,4299,0,https://http2.mlstatic.com/D_NQ_NP_864557-MLA4...
51,Laptop Huawei Matebook D15 Core I5 11.5th 8gb ...,16999,12499,26,https://http2.mlstatic.com/D_NQ_NP_948505-MLU7...
68,Laptop Gamer Dell Ryzen 5 3450u 12gb 1tb 256gb...,14899,12499,16,https://http2.mlstatic.com/D_NQ_NP_770187-MLU7...
80,Laptop gamer Thunderobot Zero Ultra plata y ...,49119,30945,37,https://http2.mlstatic.com/D_NQ_NP_937867-MLA7...


We can see now that our scrapped data from the web is matching our namedtuple.

# Analysing current price
Now that our data is cleaned and organised, we can proceed to find the Laptops ranging from $8,000 MXN and under that price; therefore, we filter our DataFrame as shown below.

In [10]:
df[df['curr_price']<=8000]

Unnamed: 0,name,prev_price,curr_price,discount,url_image
2,Laptop Hp 245 G8 14 Pulgadas Procesador Amd R...,10999,6598,40,https://http2.mlstatic.com/D_NQ_NP_836775-MLA5...
3,Hp 14-dq0052dx 14 Celeron 4gb Ram 64gb Ssd Sno...,7299,4159,43,https://http2.mlstatic.com/D_NQ_NP_882543-MLA5...
5,Laptop Gateway Ultra Slim GWNR51416 green 14....,0,7005,0,https://http2.mlstatic.com/D_NQ_NP_902361-MLA5...
6,Laptop Lenovo Ideapad 15.6 Ryzen 3 7320u 8gb 2...,15999,7299,54,https://http2.mlstatic.com/D_NQ_NP_685676-MLA5...
10,Inter Laptop Aocwei 15.6 6gb+scalable Ssd Win...,9637,4818,50,https://http2.mlstatic.com/D_NQ_NP_820770-MLM7...
11,"Laptop 14, Intel Celeron J4105 8gb Ram 120gb S...",5247,3725,28,https://http2.mlstatic.com/D_NQ_NP_832127-CBT7...
12,Hp (15-dy3008ca) 15.6 Celeron N4500 8gb 256gb...,5470,4649,15,https://http2.mlstatic.com/D_NQ_NP_892296-MLU7...
16,"Laptop Chuwi HeroBook Pro space gray 14.1"", In...",5773,5080,11,https://http2.mlstatic.com/D_NQ_NP_723065-MLA4...
17,Laptop Aocwei 15.6 6+256gb Scalable Ssd Window...,9637,4818,50,https://http2.mlstatic.com/D_NQ_NP_653948-MLM7...
20,Laptop Chromebook Touch - 16gb 4gb Ram Wfi Web...,0,1399,0,https://http2.mlstatic.com/D_NQ_NP_883687-MLM5...


# Conclusions
---
From here we can start making decisions as we could obtain the products that comply with the condition we stablished. The best part is that every time the code is run, the data will be up to date to be choosing from the best current options.