## Clothing Similarity Search

**Objective:**
- The goal of this project is to create a machine learning model capable of receiving text describing a clothing item and returning a ranked list of links to similar items from different websites.

- Your solution must be a function deployed on Google Cloud that accepts a text string and returns JSON responses with ranked suggestions.

**Outline**: 
- We will collect the data from **ASOS** company by web scraping

> ASOS is a well-known British fashion and beauty retailer that operates and ships globally to 195+ countries. Founded in the year 2000 in London, ASOS offers their own label, known as ASOS Design, as well as more than 850 other brands, similar to an outlet retailer.

- Save the data as a excel format

- Perform Text pre-processing

- Implement Feature Extraction techniques like word2vec and Sentence Transformers.

- Compute the Cosine similarity to find Top-N similar products


Step 1 - Imports

In [None]:
# Importing Libraries
import requests
import pandas as pd 

Step 2 - Requests & CURL

In [None]:
headers = {
    'authority': 'www.asos.com',
    'sec-ch-ua': '^\\^Chromium^\\^;v=^\\^92^\\^, ^\\^',
    'asos-c-plat': 'web',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36',
    'asos-c-name': 'asos-web-product-listing-page',
    'accept': 'application/json, text/plain, */*',
    'sec-ch-ua-mobile': '?0',
    'asos-c-ver': '1.1.1-a8be885fdf22-3622',
    'asos-cid': '063402f0-502c-4c6b-ab3c-7b66f32fec44',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.asos.com/men/sale/ctas/sale-edit-6/cat/?cid=28233&nlid=mw^%^7Csale^%^7Cshop^%^20sale^%^20by^%^20product^%^7Cbest^%^20of^%^20sale&page=3',
    'accept-language': 'en-US,en;q=0.9,de-DE;q=0.8,de;q=0.7',
    'cookie': 'browseCountry=GB; storeCode=COM; browseSizeSchema=UK; browseCurrency=GBP; browseLanguage=en-GB; currency=1; featuresId=e05fca48-253c-405b-bb89-f1c8aa5bd65d; geocountry=DE; bm_sz=736DDC7FE33703F003A5B173E2D2BADF~YAAQ180QAi8D8NB6AQAAPw5iHwz/IHHyx6g/CPFY6zo8nn1alkBZ/jb48Ed6P0dSsCwr4klH2xe6PLJIYm8m95PXXrJvpFkDt8Z01hRrrSQqH4gZHFbDy51zsUYkLCknWo/jH1kqq/9clCyPdYHqauL4O2HS3TjrG1+lTTRkmcf60MP/TmWK6nNGdPmmQJWbl9C4NNc26G+r/sCieStoqaQ0tN2lPGZZHa117wCQokz5zTX4ypbQyvWw+E7eC/WGH8DhlWkDAk1ImtJjXZBvXZbeTDLkBlXo6z4F4xSbXI4u~4600373~4405299; siteChromeVersion=au=11&com=11&de=11&dk=11&es=11&fr=11&it=11&nl=11&pl=11&roe=11&row=11&ru=11&se=11&us=11; keyStoreDataversion=hnm9sjt-28; asos-gdpr22=true; asos-b-sdv629=hnm9sjt-28; asos-perx=c18eeba1b40f4a6397cfdd7fd21fa6f9^|^|3fa9c19a63f345f0bb804ff378ac18dd; AMCVS_C0137F6A52DEAFCC0A490D4C^%^40AdobeOrg=1; AMCV_C0137F6A52DEAFCC0A490D4C^%^40AdobeOrg=-1303530583^%^7CMCMID^%^7C22063688722824849353113986842878306286^%^7CMCAID^%^7CNONE^%^7CMCOPTOUT-1628326326s^%^7CNONE^%^7CvVersion^%^7C3.3.0; _s_fpv=true; s_cc=true; floor=1001; asos=PreferredSite=&currencyid=1&currencylabel=GBP&customerguid=c18eeba1b40f4a6397cfdd7fd21fa6f9&topcatid=1001; stc-welcome-message=resolvedDeliveryCountry=DE&cappedPageCount=2&userTookActionOnWelcomeMessage=true; bm_mi=7AA28A08A5C5A979BC0BB78FC66B5DF9~LhYKUWMwkcdUMfuVrjzv0uzTXyjDP0tPrNKs3QPAi6Jgh/tyu//UhtIx1+HmMjqbXLNwxGIh1CrreB2Ydx0E2DhlAy4QL2JZGBjx+nZfHna2ZakOegzkrOvWNWMtAKw46dSZroxV7Lomu6Zh47zAm4TETZGsJqBKKY0NQKv4oydZwOxgZhtti8y119D8SybFXUPy/3JS6ZYw89fK80LbbUcP8YoWY381ZjBD4buScvNkBWOEAf1TUkqAx/uFJpaC; ak_bmsc=164685393A8BD96FE8D9A681CE92876B~000000000000000000000000000000~YAAQSmZWuIBDMbp6AQAAMft6HwwudwiVGoRoooMS1nSCFAqQ0B+UIou/9A0kc5OmT9Si9eplNxevO3FneYndbeAMfjtZoWboIp3yxpdq/dbAOKDXCwxbQqHzT5IKbD0hesFeJKwX2++f7ol+3boRfpuXROPdy1+oVf5QgFJ/QFNJw2YToFnu/T4OxGpPyw6EzsTEw3YhhA5NNXb3aDQX1MlA3UawMP69+RC0gKlRwj1Nctsn9D+ddCJ9Z006bowagooMhUTtI9WTMHwyghLVknsDB8NJ+q0ascGxu/58cxV/emtTOrKSUBa7THkcSPn1dbS59P9sfFeotkQXlt+NjKJ18iCtxwPXUzmYTUi8eCUElK+DbLK8QizyQdk2ka8XOwY8SXsjGRU3CuRqNP/qAlSt/AfuWQ1NkCe3GZiFGLveMPQDaBU6eot2YuP6HPYolYb3cvGwLXTzKQ==; _abck=600A6A08018CABED002C60BE85538032~0~YAAQSmZWuFTjMbp6AQAA/5Z9HwY1uakt+y87gjLoQxLz2Fkeys8oN1+wnNgzRs9xUwUsG1DApFjxynb04moY7zYX0zsR0I5qBvAHSF07m/OvTTg97NFKXN/Jppr3fZMc7enGydhOZffZB06Sv7CzqnViX79eYm5pWPKk5pms3MGmavJtcbI1iM1sgMLi5BxjAA+r7kLSvB2uafOr+J+5ZbQSsg8SF3Qu1+zF9z+0x7OS86cnEjdDI1RvWA1JgGBczsPNYXV2GkV8eoSucaUW53fr0K2E1RO6VD4eOlt5INgG7y2FSzkxpza/UAY38JCIl1DA4nrQ5Gq+nWzeVkWIEHBOskfg0ndGtVa6nIgQfiZ2/r8zMw1D0IeJHlf41+ct8UuqYkahBp6lhEmfOcDJcbOSVEhVAw==~-1~-1~-1; s_pers=^%^20s_vnum^%^3D1630447200758^%^2526vn^%^253D1^%^7C1630447200758^%^3B^%^20gpv_p6^%^3D^%^2520^%^7C1628322537651^%^3B^%^20s_invisit^%^3Dtrue^%^7C1628323218696^%^3B^%^20s_nr^%^3D1628321418699-Repeat^%^7C1659857418699^%^3B^%^20gpv_e47^%^3Dno^%^2520value^%^7C1628323218701^%^3B^%^20gpv_p10^%^3Ddesktop^%^2520com^%^257Ccategory^%^2520page^%^257C28233^%^2520page^%^25202^%^7C1628323218704^%^3B; plp_columsCount=twoColumns; s_sq=asoscomprod^%^3D^%^2526c.^%^2526a.^%^2526activitymap.^%^2526page^%^253Ddesktop^%^252520com^%^25257Ccategory^%^252520page^%^25257C28233^%^252520page^%^2525202^%^2526link^%^253DLOAD^%^252520MORE^%^2526region^%^253Dplp^%^2526pageIDType^%^253D1^%^2526.activitymap^%^2526.a^%^2526.c',
}

params = (
    ('channel', 'mobile-web'),
    ('country', 'GB'),
    ('currency', 'GBP'),
    ('keyStoreDataversion', 'hnm9sjt-28'),
    ('lang', 'en-GB'),
    ('limit', '72'),
    ('offset', '0'),
    ('rowlength', '2'),
    ('store', 'COM'),
)

response = requests.get('https://www.asos.com/api/product/search/v2/categories/28233', headers=headers, params=params)

Step 3 - Check Status Code

In [None]:
response

<Response [200]>

Step 4 - Create Json Object

In [None]:
# json_object
json_object = response.json()

In [None]:
json_object

{'searchTerm': '',
 'categoryName': 'Sale: Selling fast',
 'itemCount': 1180,
 'redirectUrl': '',
 'products': [{'id': 202362881,
   'name': 'ASOS DESIGN oversized stripe t-shirt in light blue with Chicago chest print',
   'price': {'current': {'value': 7.5, 'text': '£7.50'},
    'previous': {'value': 18.0, 'text': '£18.00'},
    'rrp': {'value': None, 'text': ''},
    'isMarkedDown': True,
    'isOutletPrice': False,
    'currency': 'GBP'},
   'colour': 'LIGHT BLUE',
   'colourWayId': 202362882,
   'brandName': 'ASOS DESIGN',
   'hasVariantColours': False,
   'hasMultiplePrices': False,
   'groupId': None,
   'productCode': 116462175,
   'productType': 'Product',
   'url': 'asos-design/asos-design-oversized-stripe-t-shirt-in-light-blue-with-chicago-chest-print/prd/202362881?clr=light-blue&colourWayId=202362882',
   'imageUrl': 'images.asos-media.com/products/asos-design-oversized-stripe-t-shirt-in-light-blue-with-chicago-chest-print/202362881-1-lightblue',
   'additionalImageUrls': ['

Step 5 - Output Keys

In [None]:
json_object.keys()

dict_keys(['searchTerm', 'categoryName', 'itemCount', 'redirectUrl', 'products', 'facets', 'diagnostics', 'searchPassMeta', 'queryId', 'discoverSearchProductTypes', 'campaigns'])

Step 6 - Find your Data

In [None]:
json_object['products']

[{'id': 202362881,
  'name': 'ASOS DESIGN oversized stripe t-shirt in light blue with Chicago chest print',
  'price': {'current': {'value': 7.5, 'text': '£7.50'},
   'previous': {'value': 18.0, 'text': '£18.00'},
   'rrp': {'value': None, 'text': ''},
   'isMarkedDown': True,
   'isOutletPrice': False,
   'currency': 'GBP'},
  'colour': 'LIGHT BLUE',
  'colourWayId': 202362882,
  'brandName': 'ASOS DESIGN',
  'hasVariantColours': False,
  'hasMultiplePrices': False,
  'groupId': None,
  'productCode': 116462175,
  'productType': 'Product',
  'url': 'asos-design/asos-design-oversized-stripe-t-shirt-in-light-blue-with-chicago-chest-print/prd/202362881?clr=light-blue&colourWayId=202362882',
  'imageUrl': 'images.asos-media.com/products/asos-design-oversized-stripe-t-shirt-in-light-blue-with-chicago-chest-print/202362881-1-lightblue',
  'additionalImageUrls': ['images.asos-media.com/products/asos-design-oversized-stripe-t-shirt-in-light-blue-with-chicago-chest-print/202362881-2',
   'imag

In [None]:
# starting point
result_items = json_object['products']

In [None]:
# name
result_items[0]['name']

'ASOS DESIGN oversized stripe t-shirt in light blue with Chicago chest print'

In [None]:
# brand name
result_items[0]['brandName']

'ASOS DESIGN'

Step 7 - Put everything together - Loop through results and append data inside a list

In [None]:
name = []
brand = []
color = []
url = []

for result in result_items:
    
    # name
    name.append(result['name'])
    
    # brand
    brand.append(result['brandName'])
    
    # color
    color.append(result['colour'])
    
    # url
    url.append(result['imageUrl'])
    

Step 8 - Pandas Dataframe

In [None]:
df_asos = pd.DataFrame({'Name': name, 'Brand': brand, 'Colour': color, 'url':url})
df_asos

Unnamed: 0,Name,Brand,Colour,url
0,ASOS DESIGN oversized stripe t-shirt in light ...,ASOS DESIGN,LIGHT BLUE,images.asos-media.com/products/asos-design-ove...
1,Revolution Skincare Blemish 2% Salicylic Acid ...,Revolution Skincare,No colour,images.asos-media.com/products/revolution-skin...
2,ASOS DESIGN retro square sunglasses in black p...,ASOS DESIGN,BLACK,images.asos-media.com/products/asos-design-ret...
3,ASOS DESIGN 90s mini oval glasses in black wit...,ASOS DESIGN,Black,images.asos-media.com/products/asos-design-90s...
4,ASOS DESIGN relaxed shirt in aztec multi colou...,ASOS DESIGN,MULTI,images.asos-media.com/products/asos-design-rel...
...,...,...,...,...
67,Faded Future industrial charm necklace in silver,FADED FUTURE,SILVER,images.asos-media.com/products/faded-future-in...
68,ASOS DESIGN lace up boot in tan leather with s...,ASOS DESIGN,TAN,images.asos-media.com/products/asos-design-lac...
69,Vans Old Skool trainers in flax suede Exclusiv...,Vans,BEIGE,images.asos-media.com/products/vans-old-skool-...
70,ASOS DESIGN oversized t-shirt in green open mesh,ASOS DESIGN,GREEN,images.asos-media.com/products/asos-design-ove...


Step 10 - Multiple Pages by changing the offset value

In [None]:
name = []
brand = []
color = []
url = []

for i in range(0,1440,72):

    headers = {
        'authority': 'www.asos.com',
        'sec-ch-ua': '^\\^Chromium^\\^;v=^\\^92^\\^, ^\\^',
        'asos-c-plat': 'web',
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36',
        'asos-c-name': 'asos-web-product-listing-page',
        'accept': 'application/json, text/plain, */*',
        'sec-ch-ua-mobile': '?0',
        'asos-c-ver': '1.1.1-a8be885fdf22-3622',
        'asos-cid': '063402f0-502c-4c6b-ab3c-7b66f32fec44',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'cors',
        'sec-fetch-dest': 'empty',
        'referer': 'https://www.asos.com/men/sale/ctas/sale-edit-6/cat/?cid=28233&nlid=mw^%^7Csale^%^7Cshop^%^20sale^%^20by^%^20product^%^7Cbest^%^20of^%^20sale&page=3',
        'accept-language': 'en-US,en;q=0.9,de-DE;q=0.8,de;q=0.7',
        'cookie': 'browseCountry=GB; storeCode=COM; browseSizeSchema=UK; browseCurrency=GBP; browseLanguage=en-GB; currency=1; featuresId=e05fca48-253c-405b-bb89-f1c8aa5bd65d; geocountry=DE; bm_sz=736DDC7FE33703F003A5B173E2D2BADF~YAAQ180QAi8D8NB6AQAAPw5iHwz/IHHyx6g/CPFY6zo8nn1alkBZ/jb48Ed6P0dSsCwr4klH2xe6PLJIYm8m95PXXrJvpFkDt8Z01hRrrSQqH4gZHFbDy51zsUYkLCknWo/jH1kqq/9clCyPdYHqauL4O2HS3TjrG1+lTTRkmcf60MP/TmWK6nNGdPmmQJWbl9C4NNc26G+r/sCieStoqaQ0tN2lPGZZHa117wCQokz5zTX4ypbQyvWw+E7eC/WGH8DhlWkDAk1ImtJjXZBvXZbeTDLkBlXo6z4F4xSbXI4u~4600373~4405299; siteChromeVersion=au=11&com=11&de=11&dk=11&es=11&fr=11&it=11&nl=11&pl=11&roe=11&row=11&ru=11&se=11&us=11; keyStoreDataversion=hnm9sjt-28; asos-gdpr22=true; asos-b-sdv629=hnm9sjt-28; asos-perx=c18eeba1b40f4a6397cfdd7fd21fa6f9^|^|3fa9c19a63f345f0bb804ff378ac18dd; AMCVS_C0137F6A52DEAFCC0A490D4C^%^40AdobeOrg=1; AMCV_C0137F6A52DEAFCC0A490D4C^%^40AdobeOrg=-1303530583^%^7CMCMID^%^7C22063688722824849353113986842878306286^%^7CMCAID^%^7CNONE^%^7CMCOPTOUT-1628326326s^%^7CNONE^%^7CvVersion^%^7C3.3.0; _s_fpv=true; s_cc=true; floor=1001; asos=PreferredSite=&currencyid=1&currencylabel=GBP&customerguid=c18eeba1b40f4a6397cfdd7fd21fa6f9&topcatid=1001; stc-welcome-message=resolvedDeliveryCountry=DE&cappedPageCount=2&userTookActionOnWelcomeMessage=true; bm_mi=7AA28A08A5C5A979BC0BB78FC66B5DF9~LhYKUWMwkcdUMfuVrjzv0uzTXyjDP0tPrNKs3QPAi6Jgh/tyu//UhtIx1+HmMjqbXLNwxGIh1CrreB2Ydx0E2DhlAy4QL2JZGBjx+nZfHna2ZakOegzkrOvWNWMtAKw46dSZroxV7Lomu6Zh47zAm4TETZGsJqBKKY0NQKv4oydZwOxgZhtti8y119D8SybFXUPy/3JS6ZYw89fK80LbbUcP8YoWY381ZjBD4buScvNkBWOEAf1TUkqAx/uFJpaC; ak_bmsc=164685393A8BD96FE8D9A681CE92876B~000000000000000000000000000000~YAAQSmZWuIBDMbp6AQAAMft6HwwudwiVGoRoooMS1nSCFAqQ0B+UIou/9A0kc5OmT9Si9eplNxevO3FneYndbeAMfjtZoWboIp3yxpdq/dbAOKDXCwxbQqHzT5IKbD0hesFeJKwX2++f7ol+3boRfpuXROPdy1+oVf5QgFJ/QFNJw2YToFnu/T4OxGpPyw6EzsTEw3YhhA5NNXb3aDQX1MlA3UawMP69+RC0gKlRwj1Nctsn9D+ddCJ9Z006bowagooMhUTtI9WTMHwyghLVknsDB8NJ+q0ascGxu/58cxV/emtTOrKSUBa7THkcSPn1dbS59P9sfFeotkQXlt+NjKJ18iCtxwPXUzmYTUi8eCUElK+DbLK8QizyQdk2ka8XOwY8SXsjGRU3CuRqNP/qAlSt/AfuWQ1NkCe3GZiFGLveMPQDaBU6eot2YuP6HPYolYb3cvGwLXTzKQ==; _abck=600A6A08018CABED002C60BE85538032~0~YAAQSmZWuFTjMbp6AQAA/5Z9HwY1uakt+y87gjLoQxLz2Fkeys8oN1+wnNgzRs9xUwUsG1DApFjxynb04moY7zYX0zsR0I5qBvAHSF07m/OvTTg97NFKXN/Jppr3fZMc7enGydhOZffZB06Sv7CzqnViX79eYm5pWPKk5pms3MGmavJtcbI1iM1sgMLi5BxjAA+r7kLSvB2uafOr+J+5ZbQSsg8SF3Qu1+zF9z+0x7OS86cnEjdDI1RvWA1JgGBczsPNYXV2GkV8eoSucaUW53fr0K2E1RO6VD4eOlt5INgG7y2FSzkxpza/UAY38JCIl1DA4nrQ5Gq+nWzeVkWIEHBOskfg0ndGtVa6nIgQfiZ2/r8zMw1D0IeJHlf41+ct8UuqYkahBp6lhEmfOcDJcbOSVEhVAw==~-1~-1~-1; s_pers=^%^20s_vnum^%^3D1630447200758^%^2526vn^%^253D1^%^7C1630447200758^%^3B^%^20gpv_p6^%^3D^%^2520^%^7C1628322537651^%^3B^%^20s_invisit^%^3Dtrue^%^7C1628323218696^%^3B^%^20s_nr^%^3D1628321418699-Repeat^%^7C1659857418699^%^3B^%^20gpv_e47^%^3Dno^%^2520value^%^7C1628323218701^%^3B^%^20gpv_p10^%^3Ddesktop^%^2520com^%^257Ccategory^%^2520page^%^257C28233^%^2520page^%^25202^%^7C1628323218704^%^3B; plp_columsCount=twoColumns; s_sq=asoscomprod^%^3D^%^2526c.^%^2526a.^%^2526activitymap.^%^2526page^%^253Ddesktop^%^252520com^%^25257Ccategory^%^252520page^%^25257C28233^%^252520page^%^2525202^%^2526link^%^253DLOAD^%^252520MORE^%^2526region^%^253Dplp^%^2526pageIDType^%^253D1^%^2526.activitymap^%^2526.a^%^2526.c',
    }

    params = (
        ('channel', 'mobile-web'),
        ('country', 'GB'),
        ('currency', 'GBP'),
        ('keyStoreDataversion', 'hnm9sjt-28'),
        ('lang', 'en-GB'),
        ('limit', '72'),
        ('offset', str(i)),
        ('rowlength', '2'),
        ('store', 'COM'),
    )

    response = requests.get('https://www.asos.com/api/product/search/v2/categories/28233', headers=headers, params=params)

    # json_object
    json_object = response.json()

    # starting point
    result_items = json_object['products']
    
    for result in result_items:
    
        # name
        name.append(result['name'])

        # brand
        brand.append(result['brandName'])

        # current price
        color.append(result['colour'])
    
        # previous price
        url.append(result['imageUrl'])

In [None]:
df_asos = pd.DataFrame({'Name': name, 'Brand': brand, 'Colour': color, 'url':url})
df_asos

Unnamed: 0,Name,Brand,Colour,url
0,ASOS DESIGN oversized stripe t-shirt in light ...,ASOS DESIGN,LIGHT BLUE,images.asos-media.com/products/asos-design-ove...
1,Revolution Skincare Blemish 2% Salicylic Acid ...,Revolution Skincare,No colour,images.asos-media.com/products/revolution-skin...
2,ASOS DESIGN retro square sunglasses in black p...,ASOS DESIGN,BLACK,images.asos-media.com/products/asos-design-ret...
3,ASOS DESIGN 90s mini oval glasses in black wit...,ASOS DESIGN,Black,images.asos-media.com/products/asos-design-90s...
4,ASOS DESIGN relaxed shirt in aztec multi colou...,ASOS DESIGN,MULTI,images.asos-media.com/products/asos-design-rel...
...,...,...,...,...
1175,ASOS DESIGN relaxed crop t-shirt in beige with...,ASOS DESIGN,HONEY MUSTARD,images.asos-media.com/products/asos-design-rel...
1176,ASOS DESIGN slim suit in navy,ASOS DESIGN,Navy,images.asos-media.com/groups/asos-design-slim-...
1177,adidas Originals Ozweego trainers in white and...,adidas Originals,WHITE,images.asos-media.com/products/adidas-original...
1178,Walk London sean bar chunky loafers in black l...,WALK LONDON,Black,images.asos-media.com/products/walk-london-sea...


Step 9 - Store results in Excel

In [None]:
df_asos.to_excel('asos.xlsx', index=False)