In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv("datathon/dataset/product_data.csv")
df.head()

Generate the Image descriptions of all images using LLaVA.
We did so in Colab but due to the time constraint we described only 220 images. (prompt used in the LLaVA.prompt file)
The descriptions of those images are in the img_descriptions.txt file with the format
{image name}||{image description}

Now we will load those descriptions and work only with those 220 peaces. With a little bit more resources you can easily process all images using an accelerator.

Overall, descriptions are accurate and drastically more informative that the data of the dataset, but there are some inconsistencies.
Results can be improved by fine-tuning a LLaVA model on clothes description.
 

In [None]:
with open("data.csv", "r") as f :
    content = f.read()
    lines = content.splitlines()
    
    pairs = []
    for l in lines :
        split = l.split("||")
        pairs.append((split[0], split[1]))

df["img_description"] = None
print(pairs)

for p in pairs :
    df["img_description"] = np.where(df["des_filename"] == "datathon/images/" + p[0],
                                p[1], df["img_description"])
    
df.dropna(inplace=True, subset=["img_description"])

Now we will generate rich product descriptions with the OpenAI text-davinci-003 model. This way we can get the best of all the data provided in the dataset and the generated by LLaVA.

The following code takes the restrictions of the OpenAI free api limit of 3 CPM and only the first 192 items were processed due to the 200 CPD. With less than 3$ you could describe really efficiently all the 9000 peaces.

The prompt used can be also seen in Description.prompt.

In [None]:
import os
from langchain.llms.openai import OpenAI
from langchain.prompts import PromptTemplate
from time import sleep

llm = OpenAI(model="davinci-instruct-beta")

prompt_template = PromptTemplate(
    template="Item Description:\n{img_description}\nGive me a paragraph describing the item, it MUST contain only essential information related to these topics: Type of clothing item {des_product_type}. Sex (Man, woman, unisex or child.) {des_sex} . Material it is made of {des_fabric}. Weather where this item is worn. Colors the item has (just the colors of the clothing item) {des_color_specification_esp}. Type of cut  (e. g. type of sleeves, width, neck, collar...). Print it has (e.g. plain, stripes, animal print, geometrical), with more specific print details. Any other details like a text (say what is written), drawing (say what is drawn), pockets (and its place) or other details like a lace, add them too. Any non relevant information should not appear. Write everything in a paragraph, not a list.\nYour Item Description:\n",
    input_variables=["img_description", "des_product_type", "des_sex", "des_fabric", "des_color_specification_esp"]
)

descriptions = []
for _, r in df.iterrows() :
        prompt = prompt_template.format(img_description=r["img_description"], des_product_type=r["des_product_type"], des_sex=r["des_sex"], des_fabric=r["des_fabric"], des_color_specification_esp=r["des_color_specification_esp"])
        output = llm(prompt)
        descriptions.append(output)
        sleep(20)
        
df["description"] = descriptions + [None] * (len(df) - len(descriptions))
df.dropna(subset=["description"], inplace=True)

Now we will embed all the peaces based on the description and add them to a Chroma DB using langchain.

In [None]:
from langchain.document_loaders import DataFrameLoader
loader = DataFrameLoader(df, page_content_column="description")
documents = loader.load()

In [None]:
from langchain.vectorstores.chroma import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings()
chroma = Chroma(collection_name="items", persist_directory="./chroma_index", embedding_function=embedding)
chroma.add_documents(documents=documents)
chroma.persist()