# Ceneo Scraper


## Components of single opinion

|Component|Selector|Variable|
|---------|--------|--------|
|opinion ID|["data-entry-id"]|opinion_id|
|opinions's author|span.user-post__author-name|author|
|author's recommendation|span.user-post__author- > em|recommendation|
|score expressed in number of stars|span.user-post__score-count|score|
|opinion’s content|div.user-post__text|content|
|list of product advantages|div.review-feature__title--positives ~ div.review-feature__item|pros|
|list of product disadvantages|div.review-feature__title--negatives ~ div.review-feature__item|cons|
|how many users think that opinion was helpful|span["button.vote-yes > span"]|helpful|
|how many users think that opinion was unhelpful|span["button.vote-no > span"]|unhelpful|
|publishing date|span.user-post__published > time:nth-child(1)["datetime"]|publish_date|
|purchase date|span.user-post__published > time:nth-child(2)["datetime"]|purchase_date|

# Imports

In [10]:
import requests
from bs4 import BeautifulSoup

## Definition of extraction function

In [35]:
def extract_content(ancestor, selector=None, attribute=None, return_list=False):
    if selector :
        if return_list:
            if attribute:
                return [teg[attribute].strip() for teg in ancestor.select(selector)]
            return [teg.text.strip() for teg in ancestor.select(selector)]    
        if attribute:
            try:
                return ancestor.select_one(selector)[attribute].strip()
            except TypeError:
                return None    
        return ancestor.select_one(selector).text.strip()
    if attribute :    
        return ancestor[attribute]
    return ancestor.text.strip()     

## Opinion structure


In [40]:
selectors = {
    "opinion_id": (None, "data-entry-id",),
    "author": ("span.user-post__author-name",),
    "recommendation": ("span.user-post__author-recomendation > em",),
    "score": ("span.user-post__score-count",),
    "content": ("div.user-post__text",),
    "pros": ("div.review-feature__title--positives ~ div.review-feature__item", None, True),
    "cons": ("div.review-feature__title--negatives ~ div.review-feature__item", None, True),
    "helpful": ("button.vote-yes > span",),
    "unhelpful": ("button.vote-no > span",),
    "publish_date": ("span.user-post__published > time:nth-child(1)","datetime"),
    "purchase_date": ("span.user-post__published > time:nth-child(2)","datetime"),
}

## Send request to Ceneo.pl service

In [11]:
product_id = "104305410"
url = f"https://www.ceneo.pl/{product_id}#tab=reviews_scroll"
response = requests.get(url)
response.status_code



200

## Convert plain text HTML code into DOM structure

In [43]:
page_dom = BeautifulSoup(response.text, "html.parser")
opinions = page_dom.select("div.js_product-review")
opinion = page_dom.select_one("div.js_product-review")


## Extract all components of single opinion
 

In [44]:
single_opinion = {
    key: extract_content(opinion, *value)
        for key, value in selectors.items()
}
print(single_opinion)

{'opinion_id': '10920918', 'author': 'Użytkownik Ceneo', 'recommendation': 'Polecam', 'score': '4,5/5', 'content': 'Odpowienik robota znanej marki TM, tyle że odpowiednio tańszy. Jak dla mnie rewelacyjny. Kroi, miesza, gotuje, wyrabia ciasto, gotuje na parze. Jednym słowem rewelacja! Idealny nawet dla osob, które nie umieją gotować. Dzięki wbudowanym przepisom krok po kroku prowadzi przez kolejne etapy gotowania. Łatwy w czyszczeniu. Polecam.', 'pros': ['garnek', 'głośność pracy', 'mikser', 'parownik', 'przepisy', 'trwałość', 'wielofunkcyjność', 'wydajność'], 'cons': [], 'helpful': '38', 'unhelpful': '9', 'publish_date': '2019-08-15 10:48:12', 'purchase_date': None}
