Vector Search Qdrant

## 1. Importación de librerías y conexión con Qdrant

Primero asegúrate de correr e instalar las dependencias necesarias.
Y dejar el contenedor corriendo. <p>
Setup: [set_up_qdrant](02_Week_vector_search/set_up_qdrant.md)

Importación de librerías <p>
Librería oficial `qdrant_client` que se conecta al servidor de Qdrant. Permite crear colecciones, hacer búsquedas, etc.
- Clase `QdrantClient`: permite establecer conexión con el servicio de Qdrant.
- Módulo `models`: permite establecer configuraciones y parámetros necesarios.

In [3]:
from qdrant_client import QdrantClient, models

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
# Inicializamos el cliente
cliente = QdrantClient("http://localhost:6333")

In [3]:
cliente

<qdrant_client.qdrant_client.QdrantClient at 0x734390296450>

## 2. Dataset


Hay que entender el dataset, conocer principalmente:
- Qué tipo de contenido es? imagen, video, texto o combinación
- Especificar: Si es texto, entonces qué lenguaje es, contiene caracteres especiales?

¿Porqué es necesario responder estas preguntas?
Para definir:
- El tipo de esquema para los datos. (¿qué es lo se va a vectorizar?, ¿Qué se va a almacenar en la metadata?)
- El modelo correcto del embedding. Tenemos que evaluar diferentes parámetros como el dominio, la precisión y los recursos requeridos           

In [5]:
# Importación del dataset
# pip install requests
import requests

ruta_doc= 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
docs_rpta= requests.get(ruta_doc)
documentos = docs_rpta.json()
# documentos

In [7]:
print(f"Tamaño de documentos:{len(documentos)}")
print(f"Data type de los documentos:{type(documentos)}")
print(f"Data type del primer elemento de documentos:{type(documentos[0])}")

print('\nCursos del FaQ Zoomcamp:')
for i in documentos:
    print(f"- {i['course']}")

Tamaño de documentos:3
Data type de los documentos:<class 'list'>
Data type del primer elemento de documentos:<class 'dict'>

Cursos del FaQ Zoomcamp:
- data-engineering-zoomcamp
- machine-learning-zoomcamp
- mlops-zoomcamp


Entonces, después de analizar `documentos`, la información relevante es:
- Tamaño: 3 elementos
- Lista de diccionarios

Estructura de `documentos` <p>

```git bash
[
    {'course':'data-engineering-zoomcamp',
    'documents': [{'text': ' ' , 'section':' '  , 'question':' '  },
                  {'text': ' ' , 'section':' '  , 'question':' '  },
                                 ...                    
                {'text':  , 'section':  , 'question':  } ]
    
    }, 
    {'course':'machine-learning-zoomcamp',
    'documents': [{'text':  , 'section':  , 'question':  },
                  {'text':  , 'section':  , 'question':  },
                                 ...                    
                {'text':  , 'section':  , 'question':  } ]
    
    }, 
    {'course':'mlops-zoomcamp',
    'documents': [{'text':  , 'section':  , 'question':  },
                  {'text':  , 'section':  , 'question':  },
                                 ...                    
                {'text':  , 'section':  , 'question':  } ]
    } 
]
```

Cada elemento de la lista es un diccionario, que contiene 2 elementos(course y documents). <p>
Del cual documents es también una lista de diccionarios, donde cada diccionario contiene 3 elementos(text, section, question).

**Conclusión de exploración del Dataset**
1. Verificamos que la data se encuentra limpia y fragmentada (chunk, divide la data en pequeñas partes porque es más fácil para los modelos de embeddings procesarla)
2. Tenemos que definir
    - Los campos(fields) para semantic_search.
    - Los Campos para almacenar como *metadata* (payload), pueden ser usados como filtros.

3. Se definen: 
    - Campos para semantic search: question, text
    - Campos para metadata(también como filtros): course, section

## 3. Elección del modelo de embeddings

Los modelos de embeddings como ya lo hemos mencionado permite convertir la data en vectores. Para escoger un 'buen' modelo, depende ciertos factores:
- La tarea que se va a realizar, el tipo de dato y sus características. (texto, inglés )
- La evaluación entre la 'precisión de búsqueda' y los 'recursos usados' (embeddings más grandes requieren más almacenamiento y memoria)
- El costo de deducir(o inferir) 
etc

La mejor manera de escoger el modelo es testear diferentes opciones en tu propia data. <p>
En este caso, vamos a utilizar FastEmbed, como el proveedror del embedding. <p>
Documentación: [FastEmbed](https://github.com/qdrant/fastembed).


In [8]:
from fastembed import TextEmbedding

# modelos disponibles
modelos= TextEmbedding.list_supported_models()

# primeros 5 modelos
modelos[:5] 

# modelo_embedding= TextEmbedding()

[{'model': 'BAAI/bge-base-en',
  'sources': {'hf': 'Qdrant/fast-bge-base-en',
   'url': 'https://storage.googleapis.com/qdrant-fastembed/fast-bge-base-en.tar.gz',
   '_deprecated_tar_struct': True},
  'model_file': 'model_optimized.onnx',
  'description': 'Text embeddings, Unimodal (text), English, 512 input tokens truncation, Prefixes for queries/documents: necessary, 2023 year.',
  'license': 'mit',
  'size_in_GB': 0.42,
  'additional_files': [],
  'dim': 768,
  'tasks': {}},
 {'model': 'BAAI/bge-base-en-v1.5',
  'sources': {'hf': 'qdrant/bge-base-en-v1.5-onnx-q',
   'url': 'https://storage.googleapis.com/qdrant-fastembed/fast-bge-base-en-v1.5.tar.gz',
   '_deprecated_tar_struct': True},
  'model_file': 'model_optimized.onnx',
  'description': 'Text embeddings, Unimodal (text), English, 512 input tokens truncation, Prefixes for queries/documents: not so necessary, 2023 year.',
  'license': 'mit',
  'size_in_GB': 0.21,
  'additional_files': [],
  'dim': 768,
  'tasks': {}},
 {'model':

Es lógico que nuestro modelo no produzca una alta dimensionalidad para evitar usar recursos de más, por lo tanto nos declinamos por una moderada dimensionalidad (Ej: 512 dimensionalidades)

In [9]:
# para formatear la salida con indentacion
import json

dimension= 512
for i in modelos:
    if i['dim'] == dimension:
        print (json.dumps(i, indent=2))



{
  "model": "BAAI/bge-small-zh-v1.5",
  "sources": {
    "hf": "Qdrant/bge-small-zh-v1.5",
    "url": "https://storage.googleapis.com/qdrant-fastembed/fast-bge-small-zh-v1.5.tar.gz",
    "_deprecated_tar_struct": true
  },
  "model_file": "model_optimized.onnx",
  "description": "Text embeddings, Unimodal (text), Chinese, 512 input tokens truncation, Prefixes for queries/documents: not so necessary, 2023 year.",
  "license": "mit",
  "size_in_GB": 0.09,
  "additional_files": [],
  "dim": 512,
  "tasks": {}
}
{
  "model": "Qdrant/clip-ViT-B-32-text",
  "sources": {
    "hf": "Qdrant/clip-ViT-B-32-text",
    "url": null,
    "_deprecated_tar_struct": false
  },
  "model_file": "model.onnx",
  "description": "Text embeddings, Multimodal (text&image), English, 77 input tokens truncation, Prefixes for queries/documents: not necessary, 2021 year",
  "license": "mit",
  "size_in_GB": 0.25,
  "additional_files": [],
  "dim": 512,
  "tasks": {}
}
{
  "model": "jinaai/jina-embeddings-v2-small-e

El modelo 1: `BAAI/bge-small-zh-v1.5`
- Para texto pero para lenguaje Chino, descartado

El modelo 2: `Qdrant/clip-ViT-B-32-text`
- Multimodal, Para texto e imágenes, pero no es necesario imágenes, descartado

El modelo 3: `jinaai/jina-embeddings-v2-small-en`
- El más adecuado descartando los previos y priorizando recursos es este modelo.
- [Documentación del modelo usado para embedding](https://huggingface.co/jinaai/jina-embeddings-v2-small-en)

In [10]:
modelo_seleccionado= "jinaai/jina-embeddings-v2-small-en"

# Este modelo como muchos, fue entrenado usando similitud de coseno

Ahora ya estamos listos para configurar los parámetros del modelo con **Qdrant** 

## 4. Configuración con Qdrant

### Creación de una colección:
Debemos definir:
- Nombre
- Configuración del vector: 
    -  Size, dimensionalidad del vector
    - Distance Metric: el método para medir la similitud entre vectores
    Tipo de métodos: dot, cosine Euclid, manhatthan

In [None]:
# from qdrant_client import QdrantClient, models
# client = QdrantClient(url="http://localhost:6333")

nombre_coleccion= 'zoomcamp-rag'
cliente.create_collection(
    collection_name= nombre_coleccion, 
    vectors_config=models.VectorParams(
        size= dimension, 
        distance= models.Distance.COSINE
    )
)

### Crear e insertar Points en la Colección
Recordar que la definción de un Point es `P(ID, vector, payload(opcional))`
Explicación más detallada en [set_up_qdrant](02_Week_vector_search/set_up_qdrant.md) 

In [12]:
modelo_seleccionado

'jinaai/jina-embeddings-v2-small-en'

#### Creación de puntos

In [14]:
# para almacenar los puntos
points= []
id=0 #por defecto el identificador de los puntos 0 

for i in documentos:
    for j in i['documents']: # text-section -question
        # print(j)
        point = models.PointStruct(
            id= id, 
            vector= models.Document(text= j['text'], model= modelo_seleccionado), 
            payload= {
                "text": j['text'], 
                "section":j['section'], 
                "course": i['course']
            }
        )
        points.append(point)
        id +=1

In [15]:
print(f'Cantidad de puntos: {len(points)}')
points[0]

Cantidad de puntos: 948


PointStruct(id=0, vector=Document(text="The purpose of this document is to capture frequently asked technical questions\nThe exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1\nSubscribe to course public Google Calendar (it works from Desktop only).\nRegister before the course starts using this link.\nJoin the course Telegram channel with announcements.\nDon’t forget to register in DataTalks.Club's Slack and join the channel.", model='jinaai/jina-embeddings-v2-small-en', options=None), payload={'text': "The purpose of this document is to capture frequently asked technical questions\nThe exact day and hour of the course will be 15th Jan 2024 at 17h00. The course will start with the first  “Office Hours'' live.1\nSubscribe to course public Google Calendar (it works from Desktop only).\nRegister before the course starts using this link.\nJoin the course Telegram channel with announcements.\nDon’t forget to register

#### Embeding de los puntos y su carga en la colección

FastEmbed realiza:
1. Busca y Descarga el modelo seleccionado.
2. Realiza la inferencia del modelo localmente para generar los embeddigns, es decir, trasnforma los textos(o el tipo de dato) en vectores densos en tu propia máquina.

Luego, el modelo descargado se guarda en una carpeta temporal: `os.path.join(tempfile.gettempdir(), "fastembed_cache")`

Por último, los puntos generados serán insertados en la colección, los vectores se almacenan e indexan, es decir, Qdrant construye estructuras internas (usa índices) para acelerar las búsquedas por similitud.

In [25]:
# Función que inserta o actualiza un punto en un colección dada.
# Como ya se realizó el embedding, entonces: los vectores se insertan en la colección
cliente.upsert(
    collection_name= nombre_coleccion, 
    points= points
)

Fetching 5 files: 100%|██████████| 5/5 [00:01<00:00,  4.58it/s]


UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

- La velocidad depende del tiempo que tarda en generarse el embedding localmente.
- Si se desea mejorar el rendimiento, se puede usar GPU o dividir los datos en bloques más pequeños. (batch)

- El cliente Python de Qdrant tambien ofrece herramientas para cargas más eficientes
    - Paralelización, divide el trabajo en varios hilos de ejecución.
    - Retries, si una parte de la carga falla, se intenta de nuevo automáticamente.
    - Lazy Batch, carga los datos de manera **progresiva**.

    Se puede configurar a través de funciones como `upload_collection`y `upload_points`

NOTA:
- Cuando se menciona que realiza la inferencia del modelo localmente, se usa los recursos del entorno donde se ejecuta el código. 
    - Si usas Codeespaces, el procesamiento ocurre en la CPU de ese entorno virtual. 
    - Si ejecutas localmente, puedes usar tu CPU o GPU.

### Visualizamos la data

Visualizamos en http://127.0.0.1:6333/dashboard que los puntos se han cargado correctamente a la colección. <p>
En la sección **INFO** verificamos que están los 948 puntos.

![image](../images/qdrant_points.png)

In [16]:
# EL vector generado para el point 0
point0 = [-0.06846455,-0.04079098,0.04998121,0.05953207,0.02933503,-0.015929399,0.046961505,0.06538723,0.003488543,-0.015097292,0.048925813,-0.0060791555,-0.027737234,-0.082000785,0.067399435,-0.03667675,-0.017798426,-0.05546964,-0.011083759,-0.059245612,-0.01734143,-0.023021186,0.015785128,-0.047465812,0.08469293,-0.07854722,-0.062078092,-0.01616355,0.044779766,0.04820736,-0.07802162,0.028787674,-0.04700659,0.026947334,-0.050882373,-0.013185016,0.015569529,-0.012027329,-0.05219849,-0.008115267,0.0039030625,0.012026171,0.096187785,0.019105598,-0.08532845,0.016041769,-0.12907879,0.03177611,0.046240654,-0.017777966,-0.09415709,0.068062514,-0.07555266,0.033081416,0.017123973,0.048179854,-0.016965523,0.036041234,-0.04909715,0.0009031768,0.036679734,0.047277514,-0.06367788,0.053598642,-0.010017119,0.008679631,-0.015747407,-0.03991325,-0.08742591,0.007980874,-0.062224712,-0.007586727,0.048600078,-0.041982602,-0.02641015,-0.0108378995,0.014251434,0.07726208,0.034729958,-0.07207553,-0.041352477,-0.004289862,-0.017731683,0.021234196,0.042140972,-0.041702453,0.01043983,-0.054240156,0.05641359,-0.023290005,0.031706795,0.042014815,-0.010930889,-0.07237952,-0.023633372,-0.053142164,0.0217434,0.05712637,-0.03769596,0.017471848,0.076294586,0.030541813,0.047872223,-0.011048115,0.051555168,0.047350787,-0.029633446,0.007204855,-0.07369298,0.0053200284,0.00047469273,-0.031068329,-0.05281302,0.00011001974,-0.05715111,0.036929853,0.047898438,-0.034192286,-0.038603164,0.0012595165,0.021036096,-0.036125384,-0.038078148,0.04964084,-0.0019360166,0.092187226,-0.033500366,0.05946749,-0.049220525,0.020673933,0.026736183,-0.011286157,-0.012633905,0.034094695,0.09243209,-0.044824604,0.067852296,0.005230769,-0.043324582,0.0326701,-0.07083062,-0.07350883,0.035534248,-0.028113633,0.0042481325,-0.010582569,0.06721547,-0.0029941825,-0.082357585,0.016373903,-0.011697521,0.07866877,-0.069168895,0.004006742,0.012269915,0.061980236,-0.058333274,0.048925027,0.04807169,-0.013165262,-0.022790764,0.07620942,0.03150959,-0.045987464,-0.008834633,-0.017472077,0.050445825,0.096268214,-0.07693184,-0.04384895,0.05261315,0.04313614,-0.07530779,-0.028021747,-0.09744583,-0.0456735,-0.088905334,0.019299489,-0.04767698,-0.0062190657,0.02044807,-0.07267836,-0.012655318,-0.0074797776,0.05128296,-0.021277074,0.007179788,0.017500546,0.03469068,-0.013120194,-0.016424904,0.002570067,-0.018130176,-0.05365371,-0.038270224,0.021411382,-0.046379164,0.05169756,0.0055525657,0.057790566,-0.019270431,0.06789005,-0.051545687,0.035307214,-0.024598971,0.034399897,-0.042949785,-0.00072331517,-0.048362836,-0.021057468,0.026094275,0.01719914,0.04906317,0.042736348,0.015886677,-0.0319091,-0.023131017,-0.022875361,0.01168348,-0.020664124,-0.020896148,0.016738303,0.045422535,0.07809989,-0.024266856,-0.04021694,0.017426668,0.013126395,0.018031772,-0.032305673,0.06620039,-0.030044101,-0.061852973,0.04409456,-0.0013827834,-0.008123868,-0.062819116,0.009788026,-0.043744937,-0.022317797,0.013982544,-0.009391313,0.05777376,0.0056858296,0.006999598,0.051866222,0.08112772,-0.03211696,-0.014601235,0.051587325,-0.033394538,0.006251865,-0.02605619,-0.11607331,-0.03186099,-0.001830986,0.012611519,0.047521453,-0.06919611,-0.03669991,0.005067096,0.017590128,-0.0024427695,0.061980624,0.00036908867,-0.004623333,-0.07887929,0.011967962,0.0718314,0.031925008,0.0062191533,-0.0033358978,0.0042404267,-0.049064558,-0.051003937,0.020293929,0.100239545,0.013767938,0.053953223,0.01843254,-0.026519785,-0.0031231164,-0.019045293,0.0010680228,0.04564501,0.00037686384,0.028409585,0.10355358,0.06982884,-0.02672189,0.044619665,-0.064097665,0.003990987,-0.0111137815,-0.021073764,0.061923493,0.052377548,0.01955624,-0.028693138,0.039254162,0.015581164,0.058836136,-0.03566541,-0.023852238,-0.0051716855,-0.021046694,0.0080006905,-0.008078646,-0.027421946,-0.025896192,0.013928459,-0.05265044,-0.019074516,-0.03149301,0.104708456,0.0026111465,0.07026169,-0.05094832,-0.055254307,0.020221505,-0.008162812,-0.035821173,0.05105494,-0.008516431,0.0736255,0.028627165,-0.017882999,-0.009987374,-0.011698008,-0.009398558,-0.039583806,0.00023758937,-0.020182908,-0.06597895,0.011178861,0.028148144,-0.039861064,-0.07757265,0.018471897,-0.049426984,0.010314243,0.0016298265,-0.015689451,0.05818405,-0.01033787,-0.035027515,0.043765463,-0.049420983,-0.0012114503,-0.030989831,-0.011108377,-0.012850253,-0.008305117,-0.009329814,-0.08785244,0.0295374,-0.008934541,-0.052078355,-0.022266218,0.07041339,-0.03535433,-0.04631689,0.00049170037,0.05206268,0.044059668,-0.064169504,-0.0075818244,-0.039646078,0.04816145,0.025753438,0.066256434,0.02392793,-0.0014395157,-0.055448893,0.0774787,-0.04476011,0.07376837,-0.019379685,0.002074907,-0.009686819,0.03356516,0.047324013,0.0240729,-0.08877777,-0.029460974,0.017622225,0.007672302,0.04154311,0.05078199,0.013701026,-0.04542855,0.0010764489,-0.038460173,0.010578302,0.071165934,-0.0664418,0.04237818,0.09094239,-0.028146256,0.048653405,0.011822472,0.025828037,0.006957286,-0.041415196,-0.0051618293,-0.03555297,-0.028979542,-0.011790992,0.01478175,0.031408973,0.076069415,0.015369002,-0.029704671,0.00070513925,-0.019976554,-0.062294096,-0.076489136,-0.027384093,-0.0062666847,-0.03842202,0.07155313,0.0072336164,-0.00017629948,0.035562843,0.019044492,0.0264928,0.059922483,0.029047942,0.096643895,0.036563784,-0.0030461065,-0.075327285,-0.021391813,0.030066991,-0.027541919,0.035406668,-0.01854306,-0.09680553,0.044650104,-0.0512637,0.11263255,-0.07472632,0.015884997,0.02215685,0.03264856,0.08575658,0.03931488,-0.022300024,-0.056622356,0.039849196,0.038847793,0.05718303,-0.042184196,0.0549666,-0.010088292,-0.008836064,0.0017232867,-0.054012682,0.028496502,0.005257831,0.016796969,-0.0045466716,-0.06623094,0.0477976,0.023841254,-0.04322845,0.045566298,0.0049065305,0.027864464,-0.013733251,-0.028829677,-0.090087034,-0.008853928,0.037266787,0.0709196,0.012089919,0.027770372,0.054197427,-0.04066651,0.053871714,-0.06475711,-0.042422544,-0.0326029,-0.09350322,-0.07054656,-0.001336374,-0.00053008,0.031627476,0.029887548,0.018675564,-0.083593234,-0.055582605,-0.019292016,-0.038321942,-0.066263385,0.022892514,0.042859234,-0.022188779,0.035908688,0.038745336,0.051241525,-0.02861519,0.04351435,-0.0076541263,-0.040348075,0.016487429,0.00911212,-0.06777497,-0.10825469,0.06578071,0.02872077,-0.011152153]
print(f'Dimensión del Point0: {len(point0)}')


Dimensión del Point0: 512


En la sección de **Visualize** de nuestra colección `zoomcamp-rag` podemos visualizar todas las respuestas de las preguntas y ver como se agrupan juntas por contexto, e incluso colorear por el tipo de curso. <p>
Para ello. Podemos correrlo el siguiente comando:

In [None]:
{
    "limit": 948,
    "color_by": { "payload": "course"}
}

![image](../images/qdrant_color_course.png)

Esta representación en 2D es el resultado de la reducción de dimensionalidad realizada por el modelo seleccionado: `jinaai/jina-embeddings-v2-small-en`

## 5. Similarity Search

### Búsqueda por Similitud en Qdrant

**¿Qué realiza?** <p>
Permite encontrar el vector de texto (en este caso `text`, que son las respuestas de las querys) más similar dada una query. Lo realiza mediante comparación de embeddings en el espacio vectorial.
Nos tiene que devolver el vector(la respuesta) más cercana a dicha query.

**¿Cómo funciona la búsqueda por similitud?**
1. Se genera el embedding de la query usando el mismo modelo seleccionado para los datos.
2. Qdrant compara este vector(el query ya con el embedding) con los vectores almacenados(vector index) en la colección usando la métrica de distancia indicada (Cosine) cuando se creó la colección.
2. El resultados son los puntos(vectores) más cercanos en el espacio vectorial, ordenados bajo un ranking de similitud.

**Qué hace posible esta búsqueda eficiente?**    
- Vector Index (índice de vectores):
     Al insertar datos en Qdrant, se crea un índice vectorial, normalmente basado en ANN(Aproximate Nearest Neighbors)

- ANNN Search:
    Técnica para encontrar puntos de datos similares a una consulta en espacio vectorial de alta dimensionalidad.
    - A diferencia de NN, que busca el vecino más cercano absoluto, esta búsqueda busca el aproximado. 
    - Por ello, su característica es que prioriza la velocidad sobre la precisión.
    
    Los vectores indexados son generados por ANN(nearest neighbor) search.

Definamos la función de búsqueda:

In [None]:
def search(query, limit= 1):
    results = cliente.query_points(
        collection_name= nombre_coleccion,
        query = models.Document( #realiza el embedding de la query localmente con el modelo elegido
            text= query, 
            model = modelo_seleccionado
        ),
        limit = limit, # cantidad de respuestas
        with_payload=True  # para obtener la metada en los resultados 
    )
    return results


Ahora escogemos una pregunta de manera aleatorio de la data.
Recordar: No se ha cargado las preguntas(query) a Qdrant.

In [50]:
import random

random_course= random.choice(documentos)
course_document_element= random.choice(random_course['documents'])
random_question = course_document_element['question']

curso= random_course['course']

Entonces, la pregunta y el curso de manera aleatoria son:

In [51]:
print(f'Curso: {curso}')
print(f'Pregunta aleatoria: {random_question}')

Curso: data-engineering-zoomcamp
Pregunta aleatoria: GCP BQ - Can I use BigQuery for real-time analytics in this project?


Usamos la función de búsqueda

In [52]:
resultado= search(random_question)
resultado

Fetching 5 files: 100%|██████████| 5/5 [00:01<00:00,  4.20it/s]


QueryResponse(points=[ScoredPoint(id=210, version=0, score=0.8856798, payload={'text': 'Ans :  While real-time analytics might not be explicitly mentioned, BigQuery has real-time data streaming capabilities, allowing for potential integration in future project iterations.', 'section': 'Module 3: Data Warehousing', 'course': 'data-engineering-zoomcamp'}, vector=None, shard_key=None, order_value=None)])

In [63]:
resultado.points[0]

ScoredPoint(id=210, version=0, score=0.8856798, payload={'text': 'Ans :  While real-time analytics might not be explicitly mentioned, BigQuery has real-time data streaming capabilities, allowing for potential integration in future project iterations.', 'section': 'Module 3: Data Warehousing', 'course': 'data-engineering-zoomcamp'}, vector=None, shard_key=None, order_value=None)

Comparemos con la respuesta de la pregunta aleatoria

realizar un md con resumen de los pasos desde la creacion de coleccion, puntos, insercion en coleccion, y busqueda. Especificar solo los comandos, no incluir mucho detalle, derivar a este notebook para más profundidad
