# Feature Vector Generation and Storage

This Jupyter notebook demonstrates how to use a saved Keras model to generate feature vectors for images and store them in a Qdrant vector database. The notebook is structured as follows:

1. Import necessary libraries and set up the environment.
2. Repeat the import and setup process (note: this seems redundant and might be removed).
3. Set up paths and collection name for the Qdrant vector database.
4. Generate feature vectors using the saved model and store them in the Qdrant vector database.

The notebook makes use of utility functions defined in external modules to process images and interact with the Qdrant database.

In [40]:
import argparse
import sys
sys.path.insert(0, '../backend')

from utils import process_and_store_images_parallel
from qdrant_client import QdrantClient
from dotenv import load_dotenv
import os
load_dotenv(dotenv_path='../backend/.env')

True

Define the collection name for the Qdrant vector database and set up paths to the dataset directory and metadata CSV file.

In [37]:
collection_name="fashion_products_vdb"
dataset_path="../backend/data/images/fashion-dataset/images/"
csv_path="../backend/model/data.csv"

Create a QdrantClient instance using the URL and API key from environment variables. Define the dimension of the feature vectors, default dimention of vector from the model is 3072. Use the `process_and_store_images_parallel` function to generate feature vectors for images in the dataset and store them in the specified Qdrant collection. Handle exceptions and print relevant messages to indicate the success or failure of the process.

In [41]:
try:
    client = QdrantClient(url=os.getenv('QDRANT_URL'), api_key=os.getenv('QDRANT_API_KEY'))

    vector_dim = 3072  # Total dimension of concatenated feature vectors
    # process_and_store_images_parallel(dataset_path,client, collection_name, vector_dim,csv_path, max_workers=None)
    
    if(not os.path.exists(csv_path) or not os.path.exists(dataset_path)):
        print("Invalid path")
    else:
        process_and_store_images_parallel(dataset_path,client, collection_name, vector_dim,csv_path, max_workers=None)
        print(f"Stored embeddings for images in the collection '{collection_name}'.")
    
except Exception as e:
    print(f"Failed to store embeddings in the collection '{collection_name}'.")
    print(e)
    



    filename                                               link     id gender  \
0  15970.jpg  http://assets.myntassets.com/v1/images/style/p...  15970    Men   
1  39386.jpg  http://assets.myntassets.com/v1/images/style/p...  39386    Men   
2  59263.jpg  http://assets.myntassets.com/v1/images/style/p...  59263  Women   
3  21379.jpg  http://assets.myntassets.com/v1/images/style/p...  21379    Men   
4  53759.jpg  http://assets.myntassets.com/v1/images/style/p...  53759    Men   

  masterCategory subCategory  articleType baseColour  season  year   usage  \
0        Apparel     Topwear       Shirts  Navy Blue    Fall  2011  Casual   
1        Apparel  Bottomwear        Jeans       Blue  Summer  2012  Casual   
2    Accessories     Watches      Watches     Silver  Winter  2016  Casual   
3        Apparel  Bottomwear  Track Pants      Black    Fall  2011  Casual   
4        Apparel     Topwear      Tshirts       Grey  Summer  2012  Casual   

                              productDisplay

2024-04-09 13:13:59.733648: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-04-09 13:13:59.733717: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-04-09 13:13:59.733936: I tensorflow/compiler/xla/stream_executor/cuda/cuda_gpu_executor.cc:894] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysf

