<a href="https://colab.research.google.com/github/KelvinLam05/price_tracker/blob/main/price_scraping.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Goal of the project**

In ecommerce, it’s very common for retailers to need to monitor the prices of their competitors. Prices make a big difference to sales and if they’re set too high then customers will go elsewhere, so monitoring them is crucial for ensuring the sales keeping coming in.
In this project, we’ll create a price tracker with Webscraper.io. We’ll scrape an ecommerce store, extract the product prices (Black Friday deals), and store them in a CSV file.

**Scrape the site**

Webscraper.io is a chrome browser extension built for data extraction from webpages. Using this extension we can create a plan (sitemap) how a website should be traversed and what should be extracted.

`{"_id":"price_scraping","startUrl":["https://www.primark.com/en-us/all-products/womenswear/c/womens","https://www.primark.com/en-us/all-products/menswear/c/mens","https://www.primark.com/en-us/all-products/kids/c/kids","https://www.primark.com/en-us/all-products/cosmetics/c/beauty","https://www.primark.com/en-us/all-products/homeware/c/homeware","https://www.primark.com/en-us/all-products/new-arrivals/c/newarrival"],"selectors":[{"id":"product_wrappers","parentSelectors":["_root"],"type":"SelectorElementClick","clickElementSelector":"button.button--invisible-border","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickMore","delay":4000,"discardInitialElements":"do-not-discard","multiple":true,"selector":"li.product-listing__item"},{"id":"name","parentSelectors":["product_wrappers"],"type":"SelectorText","selector":"p.product-item__name","multiple":false,"regex":""},{"id":"price","parentSelectors":["product_wrappers"],"type":"SelectorText","selector":"div","multiple":false,"regex":""}]}`

**Load the package**

In [70]:
# Importing library
import pandas as pd

**Load the data**

In [71]:
# Load dataset
df = pd.read_csv('/content/price_scraping.csv')

In [72]:
# Only include the relevant columns
df = df[['name', 'price']]

In [73]:
# Examine the data
df.head()

Unnamed: 0,name,price
0,3-Pack Lace Trim Hipster Briefs,$8.00
1,3-Pack Elegant Lace Thongs,$6.00
2,Faux Fur Checkerboard Slides,$10.00
3,Faux Fur Checkerboard Slides,$10.00
4,3-Piece Shacket Set,$24.00


In [74]:
# Overview of all variables, their datatypes
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2316 entries, 0 to 2315
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    2316 non-null   object
 1   price   2316 non-null   object
dtypes: object(2)
memory usage: 36.3+ KB


**Data preprocessing**

In [75]:
# Remove dollar sign
df['price'] = df['price'].str.replace('$', '')

  


In [76]:
# Convert price to float
df['price'] = pd.to_numeric(df['price'])

In [77]:
df.head()

Unnamed: 0,name,price
0,3-Pack Lace Trim Hipster Briefs,8.0
1,3-Pack Elegant Lace Thongs,6.0
2,Faux Fur Checkerboard Slides,10.0
3,Faux Fur Checkerboard Slides,10.0
4,3-Piece Shacket Set,24.0


In [78]:
# Export Pandas DataFrame to CSV
df.to_csv('/content/price_tracker.csv', index = None, header = True)