# Bloque 2

# Probamos la realización de consultas sobre nuestro índice para obtener los elementos más similares

## Transformar texto a embedding

### Utilizamos la clase VertexAIEmbeddings y el método embed_query()

In [1]:
import vertexai
from langchain_google_vertexai import VertexAIEmbeddings

# Obtención embeddings
embeddings = VertexAIEmbeddings(model_name="textembedding-gecko@001")   

## Llamamos al endpoint publicado que llama a Matching Engine

Realizamos una búsqueda de los vecinos más cercanos (nearest neighbor search) en el índice desplegado. Configuramos el número máximo utilizando neighbor_count.

### Configuración
- `API_ENDPOINT`
- `INDEX_ENDPOINT`
- `DEPLOYED_INDEX_ID`

1. Se crea un objeto `IndexDatapoint` que incluye un vector generado a partir de encode_texts_to_embeddings (el embedding del prompt).
2. Se especifica la consulta para buscar los vecinos más cercanos, indicando el número de vecinos a recuperar.
3. Se crea un objeto `FindNeighborsRequest` que incluye la consulta preparada con el datapoint al índice.

Por última se realiza la solicitud de búsqueda de vecinos y se almacena la respuesta


In [2]:
prompt = "Green Cap for women"

In [3]:
from google.cloud import aiplatform_v1

# Set variables for the current deployed index.
API_ENDPOINT="2069128544.us-central1-1043238928011.vdb.vertexai.goog"
INDEX_ENDPOINT="projects/1043238928011/locations/us-central1/indexEndpoints/6459098649556156416"
DEPLOYED_INDEX_ID="products_data_index_civica"

# Configure Vector Search client
client_options = {
  "api_endpoint": API_ENDPOINT
}
vector_search_client = aiplatform_v1.MatchServiceClient(
  client_options=client_options,
)

# Build FindNeighborsRequest object
datapoint = aiplatform_v1.IndexDatapoint(
  feature_vector=embeddings.embed_query(prompt)
)
query = aiplatform_v1.FindNeighborsRequest.Query(
  datapoint=datapoint,
  # The number of nearest neighbors to be retrieved
  neighbor_count=3
)
request = aiplatform_v1.FindNeighborsRequest(
  index_endpoint=INDEX_ENDPOINT,
  deployed_index_id=DEPLOYED_INDEX_ID,
  # Request can have multiple queries
  queries=[query],
  return_full_datapoint=False,
)

# Execute the request
response = vector_search_client.find_neighbors(request)

# Handle the response
print(response)

nearest_neighbors {
  neighbors {
    datapoint {
      datapoint_id: "14157"
      crowding_tag {
        crowding_attribute: "0"
      }
    }
    distance: 0.76098883152008057
  }
  neighbors {
    datapoint {
      datapoint_id: "14115"
      crowding_tag {
        crowding_attribute: "0"
      }
    }
    distance: 0.75971817970275879
  }
  neighbors {
    datapoint {
      datapoint_id: "13842"
      crowding_tag {
        crowding_attribute: "0"
      }
    }
    distance: 0.73133599758148193
  }
}



### Acceso elementos devueltos por el endpoint

In [4]:
response.nearest_neighbors[0].neighbors[0].datapoint.datapoint_id

'14157'

### Consulta Big Query

In [5]:
from google.cloud import bigquery

client = bigquery.Client()

sql = f"""
SELECT *
FROM ia-ugr.ecommerce.products
WHERE ID = {response.nearest_neighbors[0].neighbors[0].datapoint.datapoint_id}
;
"""

product = client.query(sql).to_dataframe()
product.head()

Unnamed: 0,id,cost,category,name,brand,retail_price,department,sku,distribution_center_id
0,14157,4.64877,Accessories,Enzyme Regular Solid Army Caps-Olive W35S45D (...,MG,10.99,Women,00BD13095D06C20B11A2993CA419D16B,1
