## Objective:
**Scrape the products given on the website [SlickDeals](https://slickdeals.net/) and create a database of the items with the discount. This way, we can monitor which products have the best deals. The Database includes for each product:**
- Title
- Photo
- Updated date
- Discounted price
- Original price
- Discount amount
- Seller
- Number of comments
- Deal score
- url of the product

## The steps we follow:
- Scrape the products given at [SlickDeals](https://slickdeals.net/)
- For each product we get the above-mentioned info
- We filter for the discounted products only
- We calculate the discout percentage
- We store the data into CSV file in the following format

```
title,photo,update_date,discount_price,original_price,discount_amount,seller,comments_number,deal_score,url
2-Piece Nautica Men's Modern Fit Suits,https://static.slickdealscdn.com/attachment/1/6/7/9/8/1/2/200x200/13859936.thumb,23 minutes ago,$90,$395,77%,Macy's,2 Comments,+17,https://slickdeals.net/f/16766321-2-piece-nautica-men-s-modern-fit-suits-bi-stretch-stretch-cotton-solid-cotton-linen-blend-various-90-free-shipping?src=frontpage
```

## Tools we use:
- `requests` to download the page
- `BeautifulSoup` to parse the html
- `pandas` to store into CSV

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [2]:
html_text=requests.get('https://slickdeals.net/').text

In [3]:
soup=BeautifulSoup(html_text,'lxml')

In [4]:
products=pd.DataFrame(columns=['title','photo','update_date','discount_price','original_price','discount_amount','seller','comments_number','deal_score','url'])
items=soup.find_all('li',class_='frontpageGrid__feedItem')
for item in items:
    # skip the unformatted products, sucha as banner, ads, ...
    try:
        #discount calculation
        discounted_price=item.find('span',class_='dealCard__price').text
        original_price=item.find('span',class_='dealCard__originalPrice').text
        op_num=float(original_price[1:])
        if op_num=='':
            continue
        dp_num=float(discounted_price[1:])
        discount=f"{int(100*(op_num-dp_num)/op_num)}%"

        # info
        title=item.find('a',class_='dealCard__title').text.split('&')[0]
        photo=item.find('div',class_='dealCard__content').img['src']
        updated_at=item.find('header',class_='dealCard__header').find('span',class_='dealCard__statusTimestamp').text
        seller=item.find('a',class_='dealCard__storeLink').text
        url=f"https://slickdeals.net{item.div.a['href']}"

        # more details
        item_url_html=requests.get(url).text
        item_soup=BeautifulSoup(item_url_html,'lxml')

        comments_count=item_soup.find('div',class_='scorepos').label.text
        deal_score=item_soup.find('span',class_='dealScoreBox').find_all('span')[0].text + item_soup.find('span',class_='dealScoreBox').find_all('span')[1].text
        product=pd.DataFrame({'title':[title],
                              'photo':[photo],
                              'update_date':[updated_at],
                              'discount_price':[discounted_price],
                              'original_price':[original_price],
                              'discount_amount':[discount],
                              'seller':[seller],
                              'comments_number':[comments_count],
                              'deal_score':[deal_score],
                              'url':[url]})
        products=pd.concat([products,product])
    except:
        pass
products.reset_index(drop=True,inplace=True)

In [5]:
products.to_csv('products.csv',index=False)