# **Google Vision Product Search:** 
## Build a Product catalog recognition engine in one hour

Vision API Product Search allows retailers to create products, each containing reference images that visually describe the product from a set of viewpoints. Retailers can then add these products to product sets. Currently Vision API Product Search supports the following product categories: homegoods, apparel, toys, packaged goods, and general .

When users query the product set with their own images, Vision API Product Search applies machine learning to compare the product in the user's query image with the images in the retailer's product set, and then returns a ranked list of visually and semantically similar results.

After loading your catalog into Vision Product Search, you'll be able to search for similar products in your catalog by providing a image

> This notebook is using a Kaggle dataset for product recognition. The goal is to extract a csv for bulk import in Product Vision Search  

**Useful links:**  
https://github.com/zinjiggle/google-product-search-simple-ui  
https://github.com/GoogleCloudPlatform/python-docs-samples/tree/master/vision/cloud-client/product_search
https://github.com/googleapis/python-vision/tree/main/samples

## 0. Install vision library

In [1]:
pip install google-cloud-vision

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install --upgrade google-cloud-storage

Collecting google-cloud-storage
  Downloading google_cloud_storage-2.5.0-py2.py3-none-any.whl (106 kB)
[K     |████████████████████████████████| 106 kB 11.0 MB/s eta 0:00:01
Collecting google-cloud-core<3.0dev,>=2.3.0
  Downloading google_cloud_core-2.3.2-py2.py3-none-any.whl (29 kB)
Collecting google-resumable-media>=2.3.2
  Downloading google_resumable_media-2.4.0-py2.py3-none-any.whl (77 kB)
[K     |████████████████████████████████| 77 kB 7.8 MB/s  eta 0:00:01
Installing collected packages: google-resumable-media, google-cloud-core, google-cloud-storage
  Attempting uninstall: google-resumable-media
    Found existing installation: google-resumable-media 1.2.0
    Uninstalling google-resumable-media-1.2.0:
      Successfully uninstalled google-resumable-media-1.2.0
[31mERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '_upload.cpython-37.pyc'
Consider using the `--user` option or check the permissions.
[0m
Note: you may need to restart the kernel

In [4]:
project_id='pod-fr-retail'
location='europe-west1'
product_set='kaggle_shoes'
bucket_name="pod-fr-retail-kaggle"
gcs_bucket="gs://"+bucket_name+"/"

## 1. Create a Products catalog with Bigquery
The first step shows how to prepare a product catalog for **Vision Product Search** in BigQuery:
* **image-uri**: The Google Cloud Storage URI of the reference image.
* **image-id**: Optional. A unique value if you supply it. Otherwise, the system will assign a unique value.
* **product-set-id**: A unique identifier for the product set to import the images into.
* **product-id**: A user-defined ID for the product identified by the reference image. A product-id can be associated with multiple reference images. Note: A single product may also belong to several product sets. If a product-id already exists on bulk import then product-category, product-display, and labels are ignored for that line entry.
* **product-category**: Allowed values are homegoods-v2, apparel-v2, toys-v2, packagedgoods-v1, and general-v1 *; the category for the product identified by the reference image. Inferred by the system if not specified in the create request. Allowed values are also listed in the productCategory reference documentation. Legacy productCategory codes: Legacy categories (homegoods, apparel, and toys) are still supported, but the updated -v2 categories should be used for new products.
* **product-display-name**: Optional. If you don't provide a name for the product displayName will be set to " ". You can update this value later.
* **labels**: Optional. A string (with quotation marks) of key-value pairs that describe the products in the reference image. For example:"color=black,style=formal"
* **bounding-poly**: Optional. Specifies the area of interest in the reference image. If a bounding box is not specified: Bounding boxes for the image are inferred by the Vision API; multiple regions in a single image may be indexed if multiple products are detected by the API. The line must end with a comma. See the example below for a product without a bounding poly specified.If you include a bounding box, the boundingPoly column should contain an even number of comma-separated numbers, with the format p1_x,p1_y,p2_x,p2_y,...,pn_x,pn_y. An example line looks like this: 0.1,0.1,0.9,0.1,0.9,0.9,0.1,0.9.

#### 1.1 Product Catalog exploration (Kaggle dataset) 

Let's start by exploring our data. We are using a product images catalog from a kaggle contest. Goal was to classify automatically product based on a image. The dataset contains 48 products main categories and around 12 millions images. For our notebook quickstart, we'll focus on Shoes category.

In [5]:
%%bigquery
SELECT category_level1, count(*) num_products  
FROM 
    `pod-fr-retail.kaggle.train_images` a 
JOIN `pod-fr-retail.kaggle.category_names`  b 
ON CAST(b.category_id AS STRING) =(REGEXP_EXTRACT(a.path_to_images,r'gs://pod-fr-retail-kaggle/train-images/[0-9]*/([^-]*)'))
GROUP BY 1 ORDER BY 2 desc
LIMIT 10

Query complete after 0.01s: 100%|██████████| 3/3 [00:00<00:00, 690.99query/s]                         
Downloading: 100%|██████████| 10/10 [00:01<00:00,  6.79rows/s]


Unnamed: 0,category_level1,num_products
0,TELEPHONIE - GPS,1227001
1,AUTO - MOTO,1193619
2,INFORMATIQUE,1124907
3,DECO - LINGE - LUMINAIRE,1111509
4,LIBRAIRIE,863965
5,BIJOUX - LUNETTES - MONTRES,688243
6,BRICOLAGE - OUTILLAGE - QUINCAILLERIE,620366
7,JEUX - JOUETS,551408
8,SPORT,434791
9,BAGAGERIE,434675


In shoes category

In [38]:
%%bigquery
SELECT category_level2
from `pod-fr-retail.kaggle.category_names` 
WHERE category_level1 like 'CHAUSSURES%'
GROUP BY 1

Query complete after 0.01s: 100%|██████████| 1/1 [00:00<00:00, 487.43query/s]                          
Downloading: 100%|██████████| 5/5 [00:01<00:00,  3.96rows/s]


Unnamed: 0,category_level2
0,BOTTES - BOTTINES
1,CHAUSSURES DETENTE
2,BASKET - SPORTSWEAR
3,CHAUSSURES DE VILLE
4,ACCESSOIRES CHAUSSURES


#### 1.2 Create a table with the appropriate schema from Product Catalog (Kaggle dataset) 

In [None]:
%%bigquery
CREATE OR REPLACE TABLE `pod-fr-retail.kaggle.products_vision_search` AS
SELECT 
    a.* EXCEPT (category_id)
    ,CONCAT(replace(lower(CONCAT('','cl1=',b.category_level1,',cl2=',b.category_level2,',cl3=',b.category_level3,' ')),' ','')) labels
    ,null as poly
FROM (
    SELECT 
        path_to_images image_uri
        ,(REGEXP_EXTRACT(path_to_images,r'gs://pod-fr-retail-kaggle/train-images/[0-9]*/([0-9]*-[0-9]*-[0-9]*)')) AS image_id
        ,'kaggle_shoes' as product_set_id
        ,(REGEXP_EXTRACT(path_to_images,r'gs://pod-fr-retail-kaggle/train-images/[0-9]*/[0-9]*-([0-9]*)')) AS product_id
        ,(REGEXP_EXTRACT(path_to_images,r'gs://pod-fr-retail-kaggle/train-images/[0-9]*/([^-]*)')) AS category_id
        ,'apparel-v2' product_category
        ,(REGEXP_EXTRACT(path_to_images,r'gs://pod-fr-retail-kaggle/train-images/[0-9]*/[0-9]*-([0-9]*)')) AS product_display_name
    FROM `pod-fr-retail.kaggle.train_images`
  ) a 
JOIN (SELECT * FROM `pod-fr-retail.kaggle.category_names` 
      WHERE 
      #category_level1='HYGIENE - BEAUTE - PARFUM'
      #category_level1='CHAUSSURES - ACCESSOIRES'
      category_level1 like 'CHAUSSURES%'
  ) b 
ON CAST(b.category_id AS STRING) =a.category_id
#WHERE rand()<0.10;ß