# [Vestiaire Collective](https://fr.vestiairecollective.com/) Scraping

### Set up

In [3]:
%cd C:/Users/pemma/OneDrive - Université de Tours/Mécen/M2/S1/02 - Machine Learning/05 - Projet/ML_Vestiaire_Collective/

C:\Users\pemma\OneDrive - Université de Tours\Mécen\M2\S1\02 - Machine Learning\05 - Projet\ML_Vestiaire_Collective


In [4]:
import vc_scraping as vcs

In [5]:
from typing import List

### Vestiaire Collective's home page scraping 

`vcs.home_page.py`

In [2]:
scraper = vcs.HomePageScraper()

In [None]:
vcs.save_json(
    data=scraper.home_page, 
    file_name="./backup/home_page.json"
)

### Collect interesting brands

`vcs.home_page.py`

Note: the 35 selected brands come from the "editor's pick" on the Vestiaire Collective website.

In [None]:
brands_links = scraper.get_brands()

In [None]:
vcs.save_json(
    data=brands_links,
    file_name="./backup/brands_links.json"
)

### Get catalog of items per page for each brand

`vcs.brands.py`

Note : only the first ten pages are retrieved, which amounts to 480 items per brand and 16,800 items in total.

In [12]:
brands_links = vcs.load_json(
    file_name="./backup/brands_links.json", 
    data_type=List[vcs.BrandLink]
)

In [3]:
vcs.save_all_brands_pages(brands=brands_links)

### Get and save basic information on the collected items

`vcs.brand_page.py`

In [8]:
brands_files_paths = vcs.get_brands_files_paths()

In [5]:
vcs.save_all_basic_items(brands_files_paths)

Processing acne-studios...
./backup/brands/acne-studios/items already exists.
acne-studios processed.
******************************
Processing alexander-mcqueen...
./backup/brands/alexander-mcqueen/items already exists.
alexander-mcqueen processed.
******************************
Processing alexander-wang...
./backup/brands/alexander-wang/items already exists.
alexander-wang processed.
******************************
Processing balenciaga...
./backup/brands/balenciaga/items already exists.
balenciaga processed.
******************************
Processing balmain...
./backup/brands/balmain/items already exists.
balmain processed.
******************************
Processing bottega-veneta...
./backup/brands/bottega-veneta/items already exists.
bottega-veneta processed.
******************************
Processing burberry...
./backup/brands/burberry/items already exists.
burberry processed.
******************************
Processing celine...
./backup/brands/celine/items already exists.
celine pr

### Get and save each item's description

`vcs.item_page.py`

Note: since there are 35 brands, 9 pages per brand and about 48 items per page, it is very time consuming to launch the `DescriptionScraper` process for all products. That's why, it is decided to apply the process for the page number `page_no` for all brands in order to retrieve diverse data. 

In [5]:
brands_to_collect = vcs.get_brands()[31:]
page_no = 6

In [6]:
vcs.save_items_desc_for_all_brands(
    brands=brands_to_collect, 
    page_no=page_no
)

Collecting items' description for tory-burch...
Collected data saved at ./backup/brands/tory-burch/items/items_desc_p6.json.
****************************************************************************************************
Collecting items' description for valentino-garavani...
Collected data saved at ./backup/brands/valentino-garavani/items/items_desc_p6.json.
****************************************************************************************************
Collecting items' description for versace...
Collected data saved at ./backup/brands/versace/items/items_desc_p6.json.
****************************************************************************************************
Collecting items' description for yves-saint-laurent...
Collected data saved at ./backup/brands/yves-saint-laurent/items/items_desc_p6.json.
****************************************************************************************************


### Save cleaned files with items' attributes 

`vcs.item.py`

In [6]:
for brand in vcs.get_brands(): 
    for page_no in range(7): 
        print(f"Processing {brand}...")
        parser = vcs.DescriptionParser(brand, page_no)
        items = parser.to_ItemAttrs()
        parser.save(items)
        print(f"*"*50)

Processing acne-studios...
File saved at C:/Users/pemma/OneDrive - Université de Tours/Mécen/M2/S1/02 - Machine Learning/05 - Projet/ML_Vestiaire_Collective/backup/brands/acne-studios/items/items_attrs_p0.json.
**************************************************
Processing acne-studios...
File saved at C:/Users/pemma/OneDrive - Université de Tours/Mécen/M2/S1/02 - Machine Learning/05 - Projet/ML_Vestiaire_Collective/backup/brands/acne-studios/items/items_attrs_p1.json.
**************************************************
Processing acne-studios...
File saved at C:/Users/pemma/OneDrive - Université de Tours/Mécen/M2/S1/02 - Machine Learning/05 - Projet/ML_Vestiaire_Collective/backup/brands/acne-studios/items/items_attrs_p2.json.
**************************************************
Processing acne-studios...
File saved at C:/Users/pemma/OneDrive - Université de Tours/Mécen/M2/S1/02 - Machine Learning/05 - Projet/ML_Vestiaire_Collective/backup/brands/acne-studios/items/items_attrs_p3.json.
***