# Caso Mercado Libre

Problema seleccionado: "Dentro del marketplace existen productos similares o idénticos entre sí (son productos vendidos por distintos sellers, en la api puedes obtener y descargar los títulos e incluso las imágenes!). ¿Cómo buscar dichos ítems para agruparlos y volverlos comparables entre sí? Esto permitiría mejorar la experiencia ante muchas opciones similares.

Resumen de la intuición de la solución:

<br> Se crearán tres modelos que se incluirán en un sistema de recomendación. <br>
1. Similitud entre los títulos. <br>
2. Similitud entre las descripciones. <br>
3. Similitud entre los objetos de las imágenes. <br>

Con los tres modelos haremos una similitud entre todos los pares de productos, con lo cual se obtendrán tres matrices. Se le darán pesos a las matrices y se generará una recomendación a partir de los productos con mayor score ponderado.  


## Extracción de datos

Importamos las librerías necesarias para la extracción de datos

In [1]:
import requests
import pandas as pd
import numpy as np


Creamos una función que recorre hasta el límite del offset permitido por la api y creamos el dataframe seleccionando específicamente los results. 

In [2]:
def create_df(max_items:int=1000):
    df_total = pd.DataFrame()
    cat_id = 'MLA1000'
    offset = range(50,max_items+1,50)
    for i in offset:
        url = f'https://api.mercadolibre.com/sites/MLA/search?category={cat_id}&offset={i}'
        request = requests.get(url)
        items = request.json()
        df_total= pd.concat([df_total, pd.DataFrame(items['results'])])
    return df_total

In [98]:
df = create_df()
df.head(2)

Unnamed: 0,id,title,condition,thumbnail_id,catalog_product_id,listing_type_id,permalink,buying_mode,site_id,category_id,...,installments,winner_item_id,catalog_listing,discounts,promotions,inventory_id,official_store_name,differential_pricing,variation_filters,variations_data
0,MLA1118543571,Tv Kanji Kj-24tm005 Led Full Hd 24 220v,new,737805-MLA48623874458_122021,MLA18614516,gold_special,https://www.mercadolibre.com.ar/tv-kanji-kj-24...,buy_it_now,MLA,MLA1002,...,"{'quantity': 12, 'amount': 8704.99, 'rate': 11...",,True,,[],,,,,
1,MLA1151902990,Auriculares Bluetooth Inalambricos In Ear Sony...,new,833935-MLA51839137023_102022,MLA19506697,gold_special,https://www.mercadolibre.com.ar/auriculares-bl...,buy_it_now,MLA,MLA3697,...,"{'quantity': 12, 'amount': 1808.24, 'rate': 11...",,True,,[],XTLK19601,Sony,,,


Extraemos las descripciones de los diferentes items. En caso de que no haya descripción o que el largo del string sea menor a 20, reemplazamos por el título.

In [101]:
def get_descriptions(df:pd.DataFrame):
    descriptions=[]
    ids=list(df['id'])
    for c, i in enumerate(ids):
        ITEM_ID=i
        url= f'https://api.mercadolibre.com/items/{ITEM_ID}/description'
        request = requests.get(url)
        items = request.json()
        try:
            desc=items['plain_text']
            if len(desc)<20:
                items=list(df['title'])[c]
            else:
                items=desc
        except:
            #print(i,c)
            items=list(df['title'])[c]
        descriptions.append(items)
    df['description'] = descriptions
    return df

In [104]:
df= get_descriptions(df)

Creamos una función para extraer la url de las imágenes. Utilizamos estas imagenes en lugar del thumbnail del df actual para obtener mayor definición. 

In [107]:
def get_image_url(ITEM_ID:str):
    url= f'https://api.mercadolibre.com/items/{ITEM_ID}'
    request = requests.get(url)
    items = request.json()
    return items["pictures"][0]['url']

In [108]:
def images_in_df(df:pd.DataFrame):
    ids=list(df['id'])
    lista=[]
    for i in ids:
        try:
            url = get_image_url(i)
        except:
            url = None
        lista.append(url)
    df['image_url'] = lista
    return df

In [109]:
df = images_in_df(df)

Extraemos información de texto que pueda ser de utilidad para el dataframe. En la columna attributes existe un value_name que aparenta ser delimitado por el seller. Acá existe información desde la marca hasta las dimensiones de los productos. Lo concatenamos todo como un string para cada producto. A la descripción le añadimos estos atributos en el df.  

In [110]:
def extract_value_names(attributes):
    try:
        return ' '.join([attr['value_name'] for attr in attributes])
    except:
        return ''
df['attributes_desc'] = df['attributes'].apply(extract_value_names)


In [113]:
df['description_full'] =  df['description'] + ' ' + df['attributes_desc']

Se analizó cada una de las columnas teniendo la posible metodología de solución en mente. Determiné que las siguientes columnas no tendrían mayor utilidad para identificar productos iguales. 

In [116]:
quitar = ['condition','thumbnail_id','catalog_product_id','listing_type_id','buying_mode','site_id'
          ,'category_id','thumbnail','currency_id','order_backend','original_price','sale_price'
          ,'sold_quantity','available_quantity','official_store_id','use_thumbnail_id','accepts_mercadopago'
          ,'stop_time','winner_item_id','catalog_listing','discounts','promotions','inventory_id'
          ,'installments','address','seller_address','stop_time','shipping','tags','seller','attributes'
          ,'official_store_name','differential_pricing','variation_filters','variations_data'
          ,'description','attributes_desc']

In [167]:
dfnew=df.drop(quitar,axis=1)

In [169]:
dfnew.to_csv('df_new.csv')

## Modelos

### Modelo title similarity

In [119]:
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

# Load the pre-trained Sentence Transformer model
model_title = SentenceTransformer('paraphrase-MiniLM-L6-v2')

  from .autonotebook import tqdm as notebook_tqdm


In [122]:
def title_similarity(title1,title2):
    embedding1 = model_title.encode(title1, convert_to_tensor=True)
    embedding2 = model_title.encode(title2, convert_to_tensor=True)
    similarity_score = cosine_similarity(embedding1.reshape(1, -1), embedding2.reshape(1, -1))[0][0]
    return similarity_score

In [124]:
title_similarity("Apple iPhone 12 Pro Max, 128GB, Pacific Blue - Fully Unlocked (Renewed)","Fully Unlocked Renewed Apple iPhone 12 Pro Max, 128GB, Pacific Blue")

0.9646211

In [126]:
ids = df['id'].values
id1, id2 = np.meshgrid(ids, ids)
interactions = np.column_stack((id1.ravel(), id2.ravel()))

# Create new dataframe with interactions
interaction_df = pd.DataFrame(interactions, columns=['id1', 'id2'])

# Merge titles for id1
interaction_df = interaction_df.merge(df[['id','title']], left_on='id1', right_on='id', suffixes=('', '_id1'))
interaction_df.drop(columns=['id'], inplace=True)

# Merge titles for id2
interaction_df = interaction_df.merge(df[['id','title']], left_on='id2', right_on='id', suffixes=('_id1', '_id2'))
interaction_df.drop(columns=['id'], inplace=True)

interaction_df.head(2)

In [139]:
#interaction_df.to_csv('titles_df.csv')

In [137]:
embeddings_title1 = model_title.encode(interaction_df['title_id1'].head(10000).tolist(), convert_to_tensor=True)

In [None]:
interaction_df[['min_id', 'max_id']] = interaction_df[['id1', 'id2']].apply(lambda x: sorted([x['id1'], x['id2']])
                                                    , axis=1, result_type='expand')

# Drop duplicates
interaction_df.drop_duplicates(subset=['min_id', 'max_id'], inplace=True)

In [None]:
embeddings_title1 = model_title.encode(interaction_df['title_id1'].tolist(), convert_to_tensor=True)

In [None]:
embeddings_title2 = model_title.encode(interaction_df['title_id2'].tolist(), convert_to_tensor=True)

In [None]:
similarity_scores = [cosine_similarity(embeddings_title1[i].reshape(1, -1), embeddings_title2[i].reshape(1, -1))[0][0]
                     for i in range(len(embeddings_title1))]

In [None]:
interaction_df['title_similarity'] = similarity_scores

In [None]:
interaction_df.to_pickle('title_similarity.pkl')

In [190]:
df_similarity=pd.read_pickle('title_similarity.pkl')

In [191]:
df_similarity.head(2)

Unnamed: 0,id1,id2,title_id1,title_id2,min_id,max_id,title_similarity,description_full_id1,description_full_id2
0,MLA1118543571,MLA1118543571,Tv Kanji Kj-24tm005 Led Full Hd 24 220v,Tv Kanji Kj-24tm005 Led Full Hd 24 220v,MLA1118543571,MLA1118543571,1.0,Con el TV KJ-24TM005 vas a poder disfrutar de ...,Con el TV KJ-24TM005 vas a poder disfrutar de ...
1,MLA1151902990,MLA1118543571,Auriculares Bluetooth Inalambricos In Ear Sony...,Tv Kanji Kj-24tm005 Led Full Hd 24 220v,MLA1118543571,MLA1151902990,0.148367,Características principales:\n\nLos auriculare...,Con el TV KJ-24TM005 vas a poder disfrutar de ...


In [None]:
#interaction_df.to_csv('title_similarity.csv')

In [146]:
#df.head()

### Modelo Description Similarity

In [149]:
# Merge titles for id1
df_similarity = df_similarity.merge(df[['id','description_full']], left_on='id1', right_on='id', suffixes=('', '_id1'))
df_similarity.drop(columns=['id'], inplace=True)

# Merge titles for id2
df_similarity = df_similarity.merge(df[['id','description_full']], left_on='id2', right_on='id', suffixes=('_id1', '_id2'))
df_similarity.drop(columns=['id'], inplace=True)

In [155]:
df_desc= df_similarity[((df_similarity['title_similarity']>0.8) & (df_similarity['title_similarity']<0.99)) | (df_similarity['title_similarity']<0.02)
              | ((df_similarity['title_similarity']>0.47) & (df_similarity['title_similarity']<0.50))]

In [171]:
df_desc

Unnamed: 0,id1,id2,title_id1,title_id2,min_id,max_id,title_similarity,description_full_id1,description_full_id2
10,MLA1100783124,MLA1118543571,Pila Aaa Rayovac Alcalina Cilíndrica - Pack De...,Tv Kanji Kj-24tm005 Led Full Hd 24 220v,MLA1100783124,MLA1118543571,-0.027172,"Con más de 110 años fabricando pilas, Rayovac ...",Con el TV KJ-24TM005 vas a poder disfrutar de ...
22,MLA1196968708,MLA1118543571,Smart Tv Kanji Kj-4xtl005 Led Android Tv Hd 40...,Tv Kanji Kj-24tm005 Led Full Hd 24 220v,MLA1118543571,MLA1196968708,0.808649,Con el Smart TV KJ-4XTL005 vas a acceder a las...,Con el TV KJ-24TM005 vas a poder disfrutar de ...
26,MLA1216652706,MLA1118543571,Smart Tv Jvc Lt-50da7125 Led 4k 50 220v,Tv Kanji Kj-24tm005 Led Full Hd 24 220v,MLA1118543571,MLA1216652706,0.803102,Con el Smart TV LT-50DA7125 vas a acceder a la...,Con el TV KJ-24TM005 vas a poder disfrutar de ...
32,MLA1370190567,MLA1118543571,Apple AirPods Pro (2ª Generación),Tv Kanji Kj-24tm005 Led Full Hd 24 220v,MLA1118543571,MLA1370190567,-0.030151,Los AirPods Pro vienen con hasta 2 veces más C...,Con el TV KJ-24TM005 vas a poder disfrutar de ...
37,MLA1113710333,MLA1118543571,Pila Aaa Rayovac Alcalina Cilíndrica - Pack De...,Tv Kanji Kj-24tm005 Led Full Hd 24 220v,MLA1113710333,MLA1118543571,-0.032993,"Con más de 110 años fabricando pilas, Rayovac ...",Con el TV KJ-24TM005 vas a poder disfrutar de ...
...,...,...,...,...,...,...,...,...,...
498438,MLA1116548438,MLA1229512225,Estuche Pionner Ddj400,Smart Tv Led Noblex Dk55x7500 4k 55'' Google T...,MLA1116548438,MLA1229512225,-0.004325,ESTUCHE RIGIDO \n\nPARA PIONNER DDJ 400\n\nCOM...,TIO MUSA S.A\nSomos líderes en venta online y ...
498441,MLA1249117634,MLA1229512225,Auriculares In-ear Inalámbricos Energeneration...,Smart Tv Led Noblex Dk55x7500 4k 55'' Google T...,MLA1229512225,MLA1249117634,-0.097710,"En la calle, en el colectivo o en la oficina, ...",TIO MUSA S.A\nSomos líderes en venta online y ...
498475,MLA1249117634,MLA897771331,Auriculares In-ear Inalámbricos Energeneration...,Smart Tv Rca X55uhd Led 4k 55 100v/240v,MLA1249117634,MLA897771331,-0.020729,"En la calle, en el colectivo o en la oficina, ...",Con el Smart TV X55UHD vas a acceder a las apl...
498482,MLA1372499873,MLA1249117634,Splitter 1x4 Divisor Amplificador Hdmi 1080p 4...,Auriculares In-ear Inalámbricos Energeneration...,MLA1249117634,MLA1372499873,-0.006007,Este cable es ideal para conectar tus disposit...,"En la calle, en el colectivo o en la oficina, ..."


In [156]:
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

# Load the pre-trained Sentence Transformer model (multilingual)
model_description = SentenceTransformer('paraphrase-xlm-r-multilingual-v1')


In [None]:
embeddings_title1 = model_description.encode(df_desc['description_full_id1'].head(1000).tolist(), convert_to_tensor=True)

In [None]:
embeddings_title2 = model_description.encode(df_desc['description_full_id2'].tolist(), convert_to_tensor=True)

In [None]:
similarity_scores = [cosine_similarity(embeddings_title1[i].reshape(1, -1), embeddings_title2[i].reshape(1, -1))[0][0]
                     for i in range(len(embeddings_title1))]

In [None]:
df_desc['desc_similarity'] = similarity_scores

In [None]:
#df_desc.to_csv('desc_similarity.csv')

In [None]:
df_desc['weighted_average'] = df_desc['title_similarity'] * 0.6 + df_desc['desc_similarity'] * 0.4

In [None]:
#df_desc.to_csv('final_df.csv')

In [175]:
df_final = pd.read_csv('final_df.csv',index_col=0)

In [176]:
df_final.sort_values(by='weighted_average',ascending=False).head(60)

Unnamed: 0,id1,id2,title_id1,title_id2,min_id,max_id,title_similarity,description_full_id1,description_full_id2,desc_similarity,weighted_average
29577,MLA1137721593,MLA1135513033,Soporte Morshop Mor07tv De Pared Para Tv/monit...,Soporte Morshop Mor02tv De Pared Para Tv/monit...,MLA1135513033,MLA1137721593,0.989483,La ubicación de tu TV o monitor es de suma imp...,La ubicación de tu TV o monitor es de suma imp...,1.0,0.99369
309306,MLA1115446733,MLA1120081492,Pila Aa Energizer Max E91 Cilíndrica - Pack De...,Pila Aa Energizer Max E91 Cilíndrica - Pack De...,MLA1115446733,MLA1120081492,0.989163,Energizer desarrolla soluciones simples para u...,Energizer desarrolla soluciones simples para u...,1.0,0.993498
300413,MLA1108937950,MLA1105078427,Soporte Nakan Spl-570e De Pared Para Tv/monito...,Soporte Nakan Spl-780e De Pared Para Tv/monito...,MLA1105078427,MLA1108937950,0.988819,Nakan es una empresa que desde 1975 se dedica ...,Nakan es una empresa que desde 1975 se dedica ...,1.0,0.993292
42367,MLA1175036646,MLA1300933604,Smart Tv Noblex Dm50x7550 Led Android Tv 4k 50...,Smart Tv Noblex Dr50x7550 Led Android Tv 4k 50...,MLA1175036646,MLA1300933604,0.989707,Noblex lleva más de 70 años creando bienestar ...,Noblex lleva más de 70 años creando bienestar ...,0.998521,0.993233
331798,MLA1330624209,MLA1144363082,Smart Tv Samsung Series 8 Un65bu8000gczb Led T...,Smart Tv Samsung Series 7 Un65au7000gczb Led T...,MLA1144363082,MLA1330624209,0.989759,Samsung es reconocida a nivel mundial como una...,Samsung es reconocida a nivel mundial como una...,0.997432,0.992828
426542,MLA1246239115,MLA1371761379,Smart Tv Samsung Neo Qled 4k Qn50qn90bagxzd Ql...,Smart Tv Samsung Neo Qled 4k Qn50qn90bakxzl Ql...,MLA1246239115,MLA1371761379,0.988717,Samsung es reconocida a nivel mundial como una...,Samsung es reconocida a nivel mundial como una...,0.998852,0.992771
71460,MLA1120081492,MLA1107621994,Pila Aa Energizer Max E91 Cilíndrica - Pack De...,Pila Aa Energizer Max E91 Cilíndrica - Pack De...,MLA1107621994,MLA1120081492,0.987146,Energizer desarrolla soluciones simples para u...,Energizer desarrolla soluciones simples para u...,1.0,0.992287
142896,MLA1286759580,MLA1259052986,Smart Tv Samsung Neo Qled 4k Qn50qn90bagczb Ql...,Smart Tv Samsung Neo Qled 4k Qn43qn90bagczb Ql...,MLA1259052986,MLA1286759580,0.98783,Samsung es reconocida a nivel mundial como una...,Samsung es reconocida a nivel mundial como una...,0.998942,0.992275
439944,MLA1287278287,MLA1287496497,Smart Tv Samsung Series 4 Un32t4300agxug Led T...,Smart Tv Samsung Series 4 Un32t4300agxzs Led T...,MLA1287278287,MLA1287496497,0.987596,Samsung es reconocida a nivel mundial como una...,Samsung es reconocida a nivel mundial como una...,0.998909,0.992121
52691,MLA1315724444,MLA1219410251,Smart Tv LG Ai Thinq 50up7750psb Lcd Webos 6.0...,Smart Tv LG Ai Thinq 43up7750psb Lcd Webos 6.0...,MLA1219410251,MLA1315724444,0.989038,LG es innovación y eso se ve en cada uno de su...,LG es innovación y eso se ve en cada uno de su...,0.996502,0.992024


In [177]:
df_final.sort_values(by='weighted_average',ascending=True).head(60)

Unnamed: 0,id1,id2,title_id1,title_id2,min_id,max_id,title_similarity,description_full_id1,description_full_id2,desc_similarity,weighted_average
487554,MLA928051347,MLA1364280751,Bisagra Metálica Desmontable Para Rack Anvil E...,Led Smart Aoc 50 50u6295/77g Netflix Youtube ...,MLA1364280751,MLA928051347,-0.208566,Bisagra metálica desmontable para rack anvil e...,Con el Smart TV 50U6295 vas a acceder a las ap...,0.230446,-0.032961
369057,MLA900902300,MLA1429871434,Papel Metalizado Para Morteros O Maquinas Lanz...,Smart Tv Rca 65 And65p7uhd Ultra Hd 4k Androi...,MLA1429871434,MLA900902300,-0.17769,Papel metalziado super liviano de 20 o 25 micr...,AND65P7UHD\n\nUn mundo de entretenimiento. Con...,0.207304,-0.023693
267571,MLA928051347,MLA1372521457,Bisagra Metálica Desmontable Para Rack Anvil E...,Cable Hdmi 5mts 1080p Reforzado Tv Smart Gamer...,MLA1372521457,MLA928051347,-0.13325,Bisagra metálica desmontable para rack anvil e...,Este cable es ideal para conectar tus disposit...,0.141374,-0.0234
436434,MLA1399038448,MLA1299459389,Radio Portatil Retro Vintage Spica Sp580 Am/fm...,Energizer Max Aaa Pack De 4 Unidades,MLA1299459389,MLA1399038448,-0.077211,Radio Portatil Retro Vintage Spica Sp580 Am/fm...,Energizer desarrolla soluciones simples para u...,0.059274,-0.022617
309011,MLA928051347,MLA1423152638,Bisagra Metálica Desmontable Para Rack Anvil E...,Smart Tv Rca And50p6uhd-f Led Google Tv 4k 50 ...,MLA1423152638,MLA928051347,-0.184969,Bisagra metálica desmontable para rack anvil e...,Con el Smart TV AND50P6UHD vas a acceder a las...,0.224798,-0.021062
496048,MLA1421795048,MLA928051347,Smart Tv Enova 43 Fhd Netflix 43g2s-tdfa Youtube,Bisagra Metálica Desmontable Para Rack Anvil E...,MLA1421795048,MLA928051347,-0.251184,Con el Smart TV ATH-43G2S-TDFA vas a acceder a...,Bisagra metálica desmontable para rack anvil e...,0.336547,-0.016091
486055,MLA919235909,MLA900902300,Smart Tv Samsung Hd 32 T4300,Papel Metalizado Para Morteros O Maquinas Lanz...,MLA900902300,MLA919235909,-0.175214,TIENDA OFICIAL SAMSUNG\nTercero Autorizado: GM...,Papel metalziado super liviano de 20 o 25 micr...,0.226719,-0.014441
49465,MLA900902300,MLA1320257707,Papel Metalizado Para Morteros O Maquinas Lanz...,Smart Tv Rca C39and Led Android Tv 39 100v/240v,MLA1320257707,MLA900902300,-0.154044,Papel metalziado super liviano de 20 o 25 micr...,Con el Smart TV C39AND vas a acceder a las apl...,0.19668,-0.013755
371083,MLA900902300,MLA1219411421,Papel Metalizado Para Morteros O Maquinas Lanz...,Smart Tv Rca 43 C43and Fhd Android Tv 100v/240v,MLA1219411421,MLA900902300,-0.168802,Papel metalziado super liviano de 20 o 25 micr...,C43AND\n\nEl Android TV permite desde reproduc...,0.219778,-0.01337
329941,MLA928051347,MLA1383173152,Bisagra Metálica Desmontable Para Rack Anvil E...,Televisor Smart Tv 32 Pulgadas Led Hd Android ...,MLA1383173152,MLA928051347,-0.091321,Bisagra metálica desmontable para rack anvil e...,Con el Smart TV 32E10 vas a acceder a las apli...,0.107702,-0.011712


In [182]:
offset = 2000
entries_to_display = 20
df_final.sort_values(by='weighted_average',ascending=False)[offset: offset + entries_to_display]


Unnamed: 0,id1,id2,title_id1,title_id2,min_id,max_id,title_similarity,description_full_id1,description_full_id2,desc_similarity,weighted_average
440771,MLA1264220511,MLA1142843341,Kit Tiras Led Dwled-49uhd Hyled-49uhd 49d1600 ...,Tiras Led Dwled-49uhd Hyled-49hd 49d1600 8tira...,MLA1142843341,MLA1264220511,0.865108,HARD INGENIERIA\n\nKIT DE 8 TIRAS DE LED NUEVA...,LEER TODA LA DESCRIPCIÓN\nHacemos Factura A y ...,0.732744,0.812163
435880,MLA921073457,MLA1429308342,Smart Tv Noblex Dj50x6500 Led 4k 50 220v,Smart Tv LG Ai Thinq 60uq8050psb Led Webos 22 ...,MLA1429308342,MLA921073457,0.852973,Noblex lleva más de 70 años creando bienestar ...,LG es innovación y eso se ve en cada uno de su...,0.750934,0.812158
175161,MLA1280860594,MLA1264827306,Smart Tv Hisense A6 Series 50a64gsv Led 4k 50 ...,Smart Tv Noblex Dk75x7500 Led 4k 75 220v,MLA1264827306,MLA1280860594,0.862991,Hisense es la marca n.º1 de televisores en Chi...,Noblex lleva más de 70 años creando bienestar ...,0.735693,0.812072
26814,MLA918461643,MLA1429240390,Soporte Iofi Sp-47 De Pared Para Tv/monitor De...,Soporte Electroland Sop 14-55 De Pared Para Tv...,MLA1429240390,MLA918461643,0.860638,Hace más de 50 años IOFI es una empresa dedica...,La ubicación de tu TV o monitor es de suma imp...,0.738979,0.811974
412275,MLA1148740969,MLA1143451516,Smart Tv Hd 32 Pulgadas Philips 32phd6917 Andr...,Smart Tv Full Hd 40 Pulgadas Hitachi Le40smart...,MLA1143451516,MLA1148740969,0.868164,"TELEVISOR SMART TV HD DE 32"" PHILIPS 6900 SERI...",Smart Tv Hitachi Led Full Hd 40 Cdh-le40smart2...,0.727193,0.811775
175485,MLA900398392,MLA1264827306,Smart Tv Sanyo Lce50su9550 Led 4k 50 220v,Smart Tv Noblex Dk75x7500 Led 4k 75 220v,MLA1264827306,MLA900398392,0.892433,Con el Smart TV LCE50SU9550 vas a acceder a la...,Noblex lleva más de 70 años creando bienestar ...,0.690785,0.811774
175341,MLA1429308342,MLA1264827306,Smart Tv LG Ai Thinq 60uq8050psb Led Webos 22 ...,Smart Tv Noblex Dk75x7500 Led 4k 75 220v,MLA1264827306,MLA1429308342,0.856921,LG es innovación y eso se ve en cada uno de su...,Noblex lleva más de 70 años creando bienestar ...,0.743898,0.811712
196760,MLA1169720121,MLA1109990368,Tira Led He315fh-e78 32ld846ht Lt32dr310 Ble32...,Tira De Led 42ls5700 42lm6200 Ple42fmn2 L42s19...,MLA1109990368,MLA1169720121,0.863948,LEER TODA LA DESCRIPCIÓN\nHacemos Factura A y ...,HARD INGENIERIA\n\nATENCION. ANTES DE COMPRAR ...,0.733168,0.811636
300483,MLA1170976569,MLA1105078427,Soporte Iofi Sp-374 De Pared Para Tv/monitor D...,Soporte Nakan Spl-780e De Pared Para Tv/monito...,MLA1105078427,MLA1170976569,0.863442,Hace más de 50 años IOFI es una empresa dedica...,Nakan es una empresa que desde 1975 se dedica ...,0.733677,0.811536
225383,MLA1109713121,MLA1138245317,Smart Tv Noblex Dk32x5000 Led Hd 32 220v,Smart Tv Aoc 43s5295/77g Led Full Hd 43 100v/...,MLA1109713121,MLA1138245317,0.891246,Noblex lleva más de 70 años creando bienestar ...,Con el Smart TV 43S5295 vas a acceder a las ap...,0.691857,0.811491


In [183]:
offset = 5000
entries_to_display = 20
df_final.sort_values(by='weighted_average',ascending=False)[offset: offset + entries_to_display]


Unnamed: 0,id1,id2,title_id1,title_id2,min_id,max_id,title_similarity,description_full_id1,description_full_id2,desc_similarity,weighted_average
462428,MLA1346261611,MLA1367640969,Auriculares In-ear Inalámbricos Lenovo Livepod...,"Adaptador De Audio 3,5 H A Mic + Auricular",MLA1346261611,MLA1367640969,0.490167,"En la calle, en el colectivo o en la oficina, ...",Este cable adaptador es ideal para conectar tu...,0.557393,0.517057
207117,MLA1281983160,MLA1369603963,Smart Tv Android Candy 32 Ultra Hd Gtv1400 Net...,Smart Tv Bgh 32'' Hd Control Por Voz Y Android...,MLA1281983160,MLA1369603963,0.494278,"Práctico y de tan solo 32 "", te va a permitir ...",- Android TV 11: Ofrece una experiencia de uso...,0.550976,0.516957
61314,MLA918177724,MLA1147686021,Modulo Bluetooth 5.0 Usb Mp3 Sd Aux Amplifica...,Parlante Bluetooth Jd E300 Inalámbrico Portáti...,MLA1147686021,MLA918177724,0.49675,Bluetooth 5.0\n\nEste modulo es para montar no...,Innovación en sonido \nDisfruta de la más alta...,0.547046,0.516868
272242,MLA1179562775,MLA1338967909,Smart Tv Samsung Series 7 Un50au7000gczb Led 4...,"Cable Hdmi Premium 4k,1080p.full Hd,3d",MLA1179562775,MLA1338967909,0.493722,Samsung es reconocida a nivel mundial como una...,Excelente Cable HDMI a HDMI Premium De 1.50 o ...,0.551508,0.516837
247806,MLA1169461813,MLA871748072,Adaptador Usb A Hdmi Cable Conversor 3.0 Extie...,Fuente Transformador 5v 4a 4amp 220v Hd. Anri Tv,MLA1169461813,MLA871748072,0.4986,SMARTCLICK - MERCADOLIDER PLATINIUM - Represen...,ANRI TV\n -100% de calificaciones positivas.\n...,0.544139,0.516815
270776,MLA1146468070,MLA929845901,Parlante Inalámbrico Bluetooth Philips Bt60bk...,Auriculares Inalámbricos Philips 1000 Series T...,MLA1146468070,MLA929845901,0.493665,SOMOS TRENDY DEALS\n\nHacemos VENTAS MINORISTA...,"En la calle, en el colectivo o en la oficina, ...",0.55151,0.516803
252775,MLA1264812112,MLA860274233,Inverter Driver Tv Led Backlight Universal 26 ...,Tiras Led Para Cdh-le654ksmart12 Tcl L65p4k Rc...,MLA1264812112,MLA860274233,0.494643,IMPORTANTE!! NO ENCENDER SIN CARGA- \nPlaca Dr...,LEER TODA LA DESCRIPCIÓN\nHacemos Factura A y ...,0.549897,0.516745
48984,MLA1115591819,MLA1320257707,Conversor Digital Audio Toslink A Rca + Cable ...,Smart Tv Rca C39and Led Android Tv 39 100v/240v,MLA1115591819,MLA1320257707,0.494738,CONVERSOR ÓPTICO A RCA 2.1 + cable óptico 1 me...,Con el Smart TV C39AND vas a acceder a las apl...,0.549698,0.516722
260402,MLA1142536162,MLA820260210,Android Tv Philips Led Full Hd 43 Pulgadas Neg...,Cable Hdmi Premium 1080p.full Hd Mallado Oro,MLA1142536162,MLA820260210,0.495637,Control por voz / Asistente de Google: \n- Pre...,Excelente Cable HDMI a HDMI Premium Grueso! Al...,0.54832,0.51671
248438,MLA1382974210,MLA1150627318,Kit Tiras De Led Tv LG 42lb5600 42lb5800 42lb6...,Smart Tv 50 Pulgadas 4k Skyworth Led Hd Androi...,MLA1150627318,MLA1382974210,0.491102,** INFORMATICA SAN ISIDRO ** \n\n** MercadoLíd...,"Producto: Smart TV Skyworth 50"" 4K LED HD Andr...",0.555106,0.516704


## Modelos que tenía la inención de realizar pero no se lograron por compleidad computacional

### Modelo image object similarity

In [184]:
import torch
import torchvision.models as modelstv
import torchvision.transforms as transforms
from PIL import Image
from sklearn.metrics.pairwise import cosine_similarity
import requests
from io import BytesIO

In [None]:
model_obj_sim = modelstv.resnet50(pretrained=True)
model_obj_sim = model_obj_sim.eval()

In [None]:
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

In [None]:
def object_similarity(image_url1,image_url2):
    response1 = requests.get(image_url1)
    response2 = requests.get(image_url2)
    image1 = Image.open(BytesIO(response1.content))
    image2 = Image.open(BytesIO(response2.content))
    input_tensor1 = preprocess(image1).unsqueeze(0)
    input_tensor2 = preprocess(image2).unsqueeze(0)
    
    # Check for GPU availability and if available, use GPU
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    input_tensor1 = input_tensor1.to(device)
    input_tensor2 = input_tensor2.to(device)
    model_obj_sim = model_obj_sim.to(device)

    # Extract features from the images
    with torch.no_grad():
        features1 = model_obj_sim(input_tensor1).cpu().numpy()
        features2 = model_obj_sim(input_tensor2).cpu().numpy()

    # Compute cosine similarity between features
    similarity_score = cosine_similarity(features1, features2)[0][0]
    return similarity_score

### Modelo image similarity

In [96]:
import torch
import torchvision.models as modelstv
import torchvision.transforms as transforms
from PIL import Image
from sklearn.metrics.pairwise import cosine_similarity
import requests
from io import BytesIO

# Load pre-trained model and set it to evaluation mode
model = modelstv.resnet50(pretrained=True)
model = model.eval()

# Image transformations (resize, normalization)
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# URLs of the images
image_url1 = 'http://http2.mlstatic.com/D_744993-MLA51601140437_092022-O.jpg'
image_url2 = 'http://http2.mlstatic.com/D_807983-MLA53145382859_012023-O.jpg'

# Load images from URLs
response1 = requests.get(image_url1)
response2 = requests.get(image_url2)
image1 = Image.open(BytesIO(response1.content))
image2 = Image.open(BytesIO(response2.content))

# Preprocess images
input_tensor1 = preprocess(image1).unsqueeze(0)
input_tensor2 = preprocess(image2).unsqueeze(0)

# Check for GPU availability and if available, use GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
input_tensor1 = input_tensor1.to(device)
input_tensor2 = input_tensor2.to(device)
model = model.to(device)

# Extract features from the images
with torch.no_grad():
    features1 = model(input_tensor1).cpu().numpy()
    features2 = model(input_tensor2).cpu().numpy()

# Compute cosine similarity between features
similarity_score = cosine_similarity(features1, features2)

# Display the similarity score
print(f'Cosine similarity score: {similarity_score[0][0]}')



Cosine similarity score: 0.2802983522415161


In [57]:
import tensorflow as tf
import requests
from PIL import Image
from io import BytesIO
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Load MobileNet pre-trained model with and without the classification head
base_model = tf.keras.applications.MobileNetV2(weights='imagenet', include_top=False, pooling='avg')
classification_model = tf.keras.applications.MobileNetV2(weights='imagenet')

# URLs of the images
image_url_1 = 'http://http2.mlstatic.com/D_744993-MLA51601140437_092022-O.jpg'
image_url_2 = 'http://http2.mlstatic.com/D_807983-MLA53145382859_012023-O.jpg'

def get_image_features_and_prediction(image_url, feature_model, classification_model):
    # Load image from URL
    response = requests.get(image_url)
    image = Image.open(BytesIO(response.content)).resize((224, 224))
    # Preprocess the image
    input_image = np.array(image) / 255.0
    input_image = np.expand_dims(input_image, axis=0)
    # Extract features
    features = feature_model.predict(input_image)
    # Make predictions
    predictions = classification_model.predict(input_image)
    # Decode the predictions to get human-readable labels
    decoded_predictions = tf.keras.applications.mobilenet_v2.decode_predictions(predictions)[0]
    top_prediction = decoded_predictions[0]
    # Return features and prediction
    return features, top_prediction

# Extract features and predictions for each image
features_1, prediction_1 = get_image_features_and_prediction(image_url_1, base_model, classification_model)
features_2, prediction_2 = get_image_features_and_prediction(image_url_2, base_model, classification_model)

# Compute the cosine similarity between the feature vectors of the two images
similarity_score = cosine_similarity(features_1, features_2)[0][0]

# Output the predictions
print(f"Image 1 predicted class: {prediction_1[1]} (confidence: {prediction_1[2]:.2f})")
print(f"Image 2 predicted class: {prediction_2[1]} (confidence: {prediction_2[2]:.2f})")

# Output the similarity score
print(f"Similarity score: {similarity_score:.2f}")






Image 1 predicted class: television (confidence: 0.29)
Image 2 predicted class: fire_screen (confidence: 0.21)
Similarity score: 0.45


In [45]:
import tensorflow as tf
import tensorflow_hub as hub
from PIL import Image
import numpy as np
import requests
from io import BytesIO

# Load the pre-trained model from TensorFlow Hub
model_url = "https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/5"
model = hub.KerasLayer(model_url)

# URLs of the images
image_url1 = 'http://http2.mlstatic.com/D_744993-MLA51601140437_092022-O.jpg'
image_url2 = 'http://http2.mlstatic.com/D_807983-MLA53145382859_012023-O.jpg'

# Load images from URLs
response1 = requests.get(image_url1)
response2 = requests.get(image_url2)
image1 = Image.open(BytesIO(response1.content)).resize((224, 224))
image2 = Image.open(BytesIO(response2.content)).resize((224, 224))

# Convert images to numpy arrays and preprocess them
input_image1 = np.array(image1) / 255.0
input_image2 = np.array(image2) / 255.0
input_image1 = np.expand_dims(input_image1, axis=0)
input_image2 = np.expand_dims(input_image2, axis=0)

# Use the model to classify objects in images
predictions1 = model(input_image1)
predictions2 = model(input_image2)

# Load the labels used by the pre-trained model
labels_path = tf.keras.utils.get_file('ImageNetLabels.txt','https://storage.googleapis.com/download.tensorflow.org/data/ImageNetLabels.txt')
imagenet_labels = np.array(open(labels_path).read().splitlines())

# Get the labels of the predicted classes
predicted_label1 = imagenet_labels[np.argmax(predictions1)]
predicted_label2 = imagenet_labels[np.argmax(predictions2)]

# Output the predictions
print(f'Prediction for the first image: {predicted_label1}')
print(f'Prediction for the second image: {predicted_label2}')


2023-06-11 19:05:24.775975: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Prediction for the first image: web site
Prediction for the second image: viaduct
