# [Vestiaire Collective](https://fr.vestiairecollective.com/) Scraping

### Set up

In [3]:
import VCScraping as vcs

In [4]:
from typing import List

### Vestiaire Collective's home page scraping 

`vcs.home_page.py`

In [2]:
scraper = vcs.HomePageScraper()

In [None]:
vcs.save_json(
    data=scraper.home_page, 
    file_name="./backup/home_page.json"
)

### Collect interesting brands

`vcs.home_page.py`

Note: the 35 selected brands come from the "editor's pick" on the Vestiaire Collective website.

In [None]:
brands_links = scraper.get_brands()

In [None]:
vcs.save_json(
    data=brands_links,
    file_name="./backup/brands_links.json"
)

### Get catalog of items per page for each brand

`vcs.brands.py`

Note : only the first ten pages are retrieved, which amounts to 480 items per brand and 16,800 items in total.

In [12]:
brands_links = vcs.load_json(
    file_name="./backup/brands_links.json", 
    data_type=List[vcs.BrandLink]
)

In [14]:
vcs.save_all_brands_pages(brands=brands_links)

Data collection for acne-studios.
./backup/brands/acne-studios already exists.
Data has been added to ./backup/brands/acne-studios/.
**************************************************
Data collection for alexander-mcqueen.
./backup/brands/alexander-mcqueen already exists.
Data has been added to ./backup/brands/alexander-mcqueen/.
**************************************************
Data collection for alexander-wang.
./backup/brands/alexander-wang already exists.
Data has been added to ./backup/brands/alexander-wang/.
**************************************************
Data collection for balenciaga.
./backup/brands/balenciaga already exists.
Data has been added to ./backup/brands/balenciaga/.
**************************************************
Data collection for balmain.
./backup/brands/balmain already exists.
Data has been added to ./backup/brands/balmain/.
**************************************************
Data collection for bottega-veneta.


NoSuchWindowException: Message: no such window: target window already closed
from unknown error: web view not found
  (Session info: chrome=95.0.4638.69)
Stacktrace:
Backtrace:
	Ordinal0 [0x004206F3+2492147]
	Ordinal0 [0x003B9BD1+2071505]
	Ordinal0 [0x002C2478+1057912]
	Ordinal0 [0x002ABE96+966294]
	Ordinal0 [0x00307CE9+1342697]
	Ordinal0 [0x00314692+1394322]
	Ordinal0 [0x0030568B+1332875]
	Ordinal0 [0x002E21D4+1188308]
	Ordinal0 [0x002E302F+1191983]
	GetHandleVerifier [0x005A67A6+1545030]
	GetHandleVerifier [0x0065105C+2243580]
	GetHandleVerifier [0x004ABC97+518199]
	GetHandleVerifier [0x004AAD80+514336]
	Ordinal0 [0x003BED2D+2092333]
	Ordinal0 [0x003C2EE8+2109160]
	Ordinal0 [0x003C3022+2109474]
	Ordinal0 [0x003CCB71+2149233]
	BaseThreadInitThunk [0x7630FA29+25]
	RtlGetAppContainerNamedObjectPath [0x77117A9E+286]
	RtlGetAppContainerNamedObjectPath [0x77117A6E+238]


### Get and save basic information on the collected items

`vcs.brand_page.py`

In [8]:
brands_files_paths = vcs.get_brands_files_paths()

In [5]:
vcs.save_all_basic_items(brands_files_paths)

Processing acne-studios...
./backup/brands/acne-studios/items already exists.
acne-studios processed.
******************************
Processing alexander-mcqueen...
./backup/brands/alexander-mcqueen/items already exists.
alexander-mcqueen processed.
******************************
Processing alexander-wang...
./backup/brands/alexander-wang/items already exists.
alexander-wang processed.
******************************
Processing balenciaga...
./backup/brands/balenciaga/items already exists.
balenciaga processed.
******************************
Processing balmain...
./backup/brands/balmain/items already exists.
balmain processed.
******************************
Processing bottega-veneta...
./backup/brands/bottega-veneta/items already exists.
bottega-veneta processed.
******************************
Processing burberry...
./backup/brands/burberry/items already exists.
burberry processed.
******************************
Processing celine...
./backup/brands/celine/items already exists.
celine pr

### Get and save each item's description

`vcs.item_page.py`

Note: since there are 35 brands, 9 pages per brand and about 48 items per page, it is very time consuming to launch the `DescriptionScraper` process for all products. That's why, it is decided to apply the process for the page number `page_no` for all brands in order to retrieve diverse data. 

In [5]:
brands_to_collect = vcs.get_brands()[31:]
page_no = 6

In [6]:
vcs.save_items_desc_for_all_brands(
    brands=brands_to_collect, 
    page_no=page_no
)

Collecting items' description for tory-burch...
Collected data saved at ./backup/brands/tory-burch/items/items_desc_p6.json.
****************************************************************************************************
Collecting items' description for valentino-garavani...
Collected data saved at ./backup/brands/valentino-garavani/items/items_desc_p6.json.
****************************************************************************************************
Collecting items' description for versace...
Collected data saved at ./backup/brands/versace/items/items_desc_p6.json.
****************************************************************************************************
Collecting items' description for yves-saint-laurent...
Collected data saved at ./backup/brands/yves-saint-laurent/items/items_desc_p6.json.
****************************************************************************************************


### Save cleaned files with items' attributes 

`vcs.item.py`

In [7]:
page_no = 6

In [8]:
for brand in vcs.get_brands(): 
    print(f"Processing {brand}...")
    parser = vcs.DescriptionParser(brand, page_no)
    items = parser.to_ItemAttrs()
    parser.save(items)
    print(f"*"*50)

Processing acne-studios...
File saved at ./backup/brands/acne-studios/items/items_attrs_p6.json.
**************************************************
Processing alexander-mcqueen...
File saved at ./backup/brands/alexander-mcqueen/items/items_attrs_p6.json.
**************************************************
Processing alexander-wang...
File saved at ./backup/brands/alexander-wang/items/items_attrs_p6.json.
**************************************************
Processing balenciaga...
File saved at ./backup/brands/balenciaga/items/items_attrs_p6.json.
**************************************************
Processing balmain...
File saved at ./backup/brands/balmain/items/items_attrs_p6.json.
**************************************************
Processing bottega-veneta...
File saved at ./backup/brands/bottega-veneta/items/items_attrs_p6.json.
**************************************************
Processing burberry...
File saved at ./backup/brands/burberry/items/items_attrs_p6.json.
*******************