This is a mini project that I use ```BeautifulSoup4``` to scrape list of product in [ACFC](https://www.acfc.com.vn/) which is an online shopping website. The objective is to extract information of all the items in Women Clothing (name, brand, price, link, and image). 
```BeautifulSoup``` is a simple, user-friendly tool for anyone to learn Web Scraping. For the scope of this project, I will focus on what ```BeautifulSoup``` can do in this website and finally export as a CSV file. 

**Project guide** <br>
I had checked this website ahead of time and found out that the pages for Women's categories have the same structure, so my code can be reduced for other categories (Shoes, Bags, etc). I picked the first category which was Clothing. 

**Setting up** <br>
Fist, I imported some of the packages that would be used in this project:

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import csv

First, I created the emty lists of the 5 attributes (Brand, Name, Price, Link and Image):

In [2]:
product_name = []
product_brand = []
product_price = []
product_link = []
product_image = []

Define a function that will parses all the pages with three parameters for URL, the first page number and the last page number:

In [3]:
def pages(generic_url, page_number, page_last):
    while page_number <= page_last:
        url = generic_url + str(page_number)
        html_text = requests.get(url).text
# Create soup object       
        soup = BeautifulSoup(html_text,'lxml')
        product_list = soup.find_all('li', class_ = 'item product product-item')
        
# Brand
        product_brand.extend([product.find('span', class_ = 'brand-name').text.replace(' -','') for product in product_list])
# Name
        product_name.extend([product.find('a', class_='product-item-link').contents[2].strip() for product in product_list])
# Price
        product_price.extend([product.find('span', class_ = 'price').text.replace('đ','').strip() for product in product_list])
# Link
        product_link.extend([product.find('a', class_='product-item-link').get('href') for product in product_list])
# Image
        product_image.extend([product.find('img', class_='product-image-photo').get('src') for product in product_list])
# Generate url for next page
        page_number += 1

Input values to pages function

In [4]:
pages('https://acfc.com.vn/nu/trang-phuc-nu.html?p=',1,5)

Let's check out values of the attributes. For each of them, it's supposed to be a list of strings. 

In [5]:
print(product_brand)

['CK JEANS ', 'OLD NAVY ', 'BANANA REPUBLIC ', 'GAP ', 'PARFOIS ', 'OVS ', 'TOMMY JEANS ', 'OLD NAVY ', 'CK UNDERWEAR ', 'CK JEANS ', 'BANANA REPUBLIC ', 'COTTON ON BODY ', 'COTTON ON ', 'COTTON ON BODY ', "LEVI'S ", 'TOMMY HILFIGER ', 'COTTON ON ', 'FRENCH CONNECTION ', 'PARFOIS ', 'OVS ', 'FRENCH CONNECTION ', 'MANGO ', 'CK UNDERWEAR ', 'COTTON ON BODY ', 'CK PERFORMANCE ', 'MANGO ', 'TOMMY JEANS ', 'CK PERFORMANCE ', "LEVI'S ", 'CK PERFORMANCE ', 'CK JEANS ', 'GAP ', 'OVS ', 'FRENCH CONNECTION ', 'CK UNDERWEAR ', 'TOMMY HILFIGER ', 'COTTON ON ', 'PARFOIS ', 'CK JEANS ', "LEVI'S ", 'OLD NAVY ', "LEVI'S ", 'FRENCH CONNECTION ', 'CK PERFORMANCE ', 'CK UNDERWEAR ', 'MANGO ', 'TOMMY JEANS ', 'OVS ', 'TOMMY HILFIGER ', 'MANGO ', 'PARFOIS ', 'COTTON ON BODY ', 'CK JEANS ', 'TOMMY HILFIGER ', 'TOMMY JEANS ', 'MANGO ', 'BANANA REPUBLIC ', "LEVI'S ", 'COTTON ON BODY ', 'BANANA REPUBLIC ', 'CK UNDERWEAR ', 'COTTON ON ', 'FRENCH CONNECTION ', 'GAP ', 'CK PERFORMANCE ', 'PARFOIS ', 'TOMMY HILFIG

In [6]:
print(product_name)

['Áo Kiểu Nữ', 'Áo Thun Nữ', 'Đầm Nữ', 'Đầm Thun Nữ', 'Áo Nữ Santorini', 'Áo Thun Nữ', 'Áo Thun Nữ Tay Ngắn Tjw Classic Essential Logo 2 Ss', 'Áo Nữ', 'Áo Ngực Nữ', 'Áo Lạnh Nữ 0 Fit', 'Áo Tay Ngắn Nữ', 'Quần Ngắn Thể Thao Nữ-Active Core Bike Short', 'Áo Thun Nữ Ngắn Tay-Boyfriend Fit Billie Eilish Te', 'Áo ngủ ngắn tay nữ - 90S BED T SHIRT', 'Quần Khaki Nữ Dài', 'Đầm Nữ Vis Twill Midi Shirt Dress Ls', 'Quần Ngắn Denim Nữ - Mid Rise Classic Stretch Denim Short', 'Đầm Nữ Cellienne Sequin L/S Dress', 'Áo Len Ocean', 'Áo Thun Nữ', 'Quần Dài Nữ Colour Block Sunday Joggers', 'Quần Dài Chain', 'Áo Ngực Nữ Plunge Fit', 'Quần Ngủ Nữ - Summer Lounge Short', 'Áo Thun Thể Thao Nữ Fashion Fit', 'Áo Len Lucca3', 'Quần Dài Nữ TJW Mom Jog', 'Quần Thun Thể Thao Nữ 7/8 Length Fit', 'Áo Thun Nữ Tay Ngắn', 'Quần Thể Thao Nữ Taper Fit', 'Quần Ngắn Nữ', 'Quần Jeans Dài Nữ - High Rise Skinny', 'Đầm Nữ', 'Áo Dệt Kim Nữ Ramona Knits Layered Jumper', 'Quần Lót Nữ Hipster Fit', 'Áo Polo Nữ Tay Ngắn IM Reg

In [7]:
print(product_price)

['2.029.300', '199.000', '3.550.000', '1.550.000', '1.390.000', '499.000', '999.000', '1.195.000', '1.070.300', '3.749.000', '1.950.000', '429.000', '529.000', '195.000', '899.000', '5.899.000', '295.000', '4.199.000', '1.390.000', '549.000', '1.799.000', '699.000', '1.849.000', '195.000', '1.364.300', '899.000', '2.899.000', '1.294.300', '449.000', '2.499.000', '1.739.400', '1.150.000', '1.299.000', '899.000', '359.400', '2.899.000', '349.000', '1.190.000', '2.999.000', '1.299.000', '795.000', '449.000', '3.299.000', '1.469.300', '1.399.000', '799.000', '1.699.000', '499.000', '2.799.000', '799.000', '1.190.000', '529.000', '2.029.000', '1.899.000', '2.599.000', '899.000', '3.350.000', '399.000', '429.000', '1.150.000', '559.300', '679.000', '399.000', '999.000', '1.649.000', '1.590.000', '3.699.000', '399.000', '1.799.000', '497.500', '199.000', '599.000', '999.000', '195.000', '5.199.000', '1.650.000', '909.300', '1.699.000', '195.000', '1.295.000', '299.000', '599.000', '579.000', 

In [8]:
print(product_link)

['https://www.acfc.com.vn/ck-jeans-ao-thun-nu-ck-j217255-yaf.html', 'https://www.acfc.com.vn/old-navy-ao-thun-nu-oln-580387-02.html', 'https://www.acfc.com.vn/banana-republic-dam-nu-br-787165-02.html', 'https://www.acfc.com.vn/dam-thun-nu-gap-685690-00.html', 'https://www.acfc.com.vn/parfois-ao-nu-santorini-parfoi-190395-bgu-bg.html', 'https://www.acfc.com.vn/ovs-ao-thun-nu-ovs-1518540-1518540.html', 'https://www.acfc.com.vn/tommy-jeans-ao-thun-nu-tay-ngan-tjw-classic-essential-logo-2-ss-thj-dw0dw12853-ybr.html', 'https://www.acfc.com.vn/old-navy-ao-nu-oln-720534-02.html', 'https://www.acfc.com.vn/ck-underwear-ao-nguc-nu-ck-qf6693ad-100.html', 'https://www.acfc.com.vn/ck-jeans-ao-lanh-nu-0-fit-ck-j218920-beh.html', 'https://www.acfc.com.vn/banana-republic-ao-tay-ngan-nu-br-445871-01.html', 'https://www.acfc.com.vn/body-quan-ngan-the-thao-nu-active-core-bike-short-coc-630605-26.html', 'https://www.acfc.com.vn/cotton-on-ao-thun-nu-ngan-tay-boyfriend-fit-billie-eilish-te-coc-2054187-01.ht

In [9]:
print(product_image)

['https://exacdn.acfc.com.vn/media/catalog/product/cache/e0eb9f74eaaa356eac6b0ce899de6ac0/j/2/j217255-yaf-1_iovdkdqlw9aclgeo.jpg', 'https://exacdn.acfc.com.vn/media/catalog/product/cache/e0eb9f74eaaa356eac6b0ce899de6ac0/c/n/cn18789864_uudw9sujs902c7ce.jpg', 'https://exacdn.acfc.com.vn/media/catalog/product/cache/e0eb9f74eaaa356eac6b0ce899de6ac0/c/n/cn27431426_a5crlghacguengrf.jpg', 'https://exacdn.acfc.com.vn/media/catalog/product/cache/e0eb9f74eaaa356eac6b0ce899de6ac0/c/n/cn20512267_t4hannadjjnhpqhw.jpg', 'https://exacdn.acfc.com.vn/media/catalog/product/cache/e0eb9f74eaaa356eac6b0ce899de6ac0/1/9/190395_bg_1y_omk9nwc2m8xotszi.jpg', 'https://exacdn.acfc.com.vn/media/catalog/product/cache/e0eb9f74eaaa356eac6b0ce899de6ac0/1/2/12087627_w7wjqrybah0xzqci.jpg', 'https://exacdn.acfc.com.vn/media/catalog/product/cache/e0eb9f74eaaa356eac6b0ce899de6ac0/d/w/dw0dw12853ybr_f_fcjjrcxvpjxxmh1i.png', 'https://exacdn.acfc.com.vn/media/catalog/product/cache/e0eb9f74eaaa356eac6b0ce899de6ac0/c/n/cn2775880

Form the results in a dataframe: 

In [10]:
product_info = pd.DataFrame({
    'Brand': product_brand,
    'Name': product_name,
    'Price': product_price,
    'Link': product_link,
    'Image': product_image
    })

Take a look at the dataframe

In [11]:
product_info

Unnamed: 0,Brand,Name,Price,Link,Image
0,CK JEANS,Áo Kiểu Nữ,2.029.300,https://www.acfc.com.vn/ck-jeans-ao-thun-nu-ck...,https://exacdn.acfc.com.vn/media/catalog/produ...
1,OLD NAVY,Áo Thun Nữ,199.000,https://www.acfc.com.vn/old-navy-ao-thun-nu-ol...,https://exacdn.acfc.com.vn/media/catalog/produ...
2,BANANA REPUBLIC,Đầm Nữ,3.550.000,https://www.acfc.com.vn/banana-republic-dam-nu...,https://exacdn.acfc.com.vn/media/catalog/produ...
3,GAP,Đầm Thun Nữ,1.550.000,https://www.acfc.com.vn/dam-thun-nu-gap-685690...,https://exacdn.acfc.com.vn/media/catalog/produ...
4,PARFOIS,Áo Nữ Santorini,1.390.000,https://www.acfc.com.vn/parfois-ao-nu-santorin...,https://exacdn.acfc.com.vn/media/catalog/produ...
...,...,...,...,...,...
295,COTTON ON,Áo Shacket Nữ-The Shacket,495.000,https://www.acfc.com.vn/cotton-on-ao-shacket-n...,https://exacdn.acfc.com.vn/media/catalog/produ...
296,GAP,Chân Váy Jeans Nữ - Mini Skirt,1.250.000,https://www.acfc.com.vn/chan-vay-jeans-nu-mini...,https://exacdn.acfc.com.vn/media/catalog/produ...
297,OVS,Quần Jeans Nữ,499.000,https://www.acfc.com.vn/ovs-quan-jeans-nu-ovs-...,https://exacdn.acfc.com.vn/media/catalog/produ...
298,FRENCH CONNECTION,Áo Len Fcuk Oversized Crew Nck Sweatr,1.499.000,https://www.acfc.com.vn/french-connection-ao-l...,https://exacdn.acfc.com.vn/media/catalog/produ...


Export to CSV file

In [12]:
product_info.to_csv('product_infor.csv',index=False, encoding= 'utf-8', sep= ',')