# Embedding Chunks

Once the chunks were created and stored during the parsing step of the RAG pipeline, now its time to embed this chunks into a vector based on an embedding model, and then store these vectors into a vector DB.

## Import libraries

In [1]:
from sentence_transformers import SentenceTransformer
import json
import sys

sys.path.append("..")

from gcp_utils.gcs import get_file, upload_file_from_memory

  from .autonotebook import tqdm as notebook_tqdm


## Embeddings

Once the PDF text has been parsed, chunked, and stored in GCS, I'll use that chunks to embed the data into a vector database, I will be using [*Qdrant DB*](https://qdrant.tech/). The first thing to do is establish the embedding model to use and load the chunks stored in GCS

In [6]:
model_name = "sentence-transformers/all-mpnet-base-v2"

# Name of the txt file where the chunks are stored
chunks_file = "resumen_reforma_energetica_2025-04-02_22:47:41.txt"

chunks_storage_path = f"chunks/{chunks_file}"


chunks = json.loads(get_file(chunks_storage_path))

chunks

{'chunk0': {'Header 1': 'Resumen Ejecutivo',
  'data': '# Resumen Ejecutivo  \n-----  \n-----',
  'upload_date': '2025-0402 22:47:41',
  'title': 'resumen_reforma_energetica',
  'storage_path': 'documents/summaries/resumen_reforma_energetica.pdf'},
 'chunk1': {'Header 1': 'Resumen Ejecutivo',
  'Header 2': 'I. Introducción',
  'data': '## I. Introducción  \nLa Reforma Energética es un paso decidido rumbo a la modernización del sector energético de',
  'upload_date': '2025-0402 22:47:41',
  'title': 'resumen_reforma_energetica',
  'storage_path': 'documents/summaries/resumen_reforma_energetica.pdf'},
 'chunk2': {'Header 1': 'Resumen Ejecutivo',
  'Header 2': 'I. Introducción',
  'data': 'nuestro país, sin privatizar las empresas públicas dedicadas a la producción y al aprovechamiento de los hidrocarburos y de la electricidad. La Reforma Energética, tanto constitucional como a',
  'upload_date': '2025-0402 22:47:41',
  'title': 'resumen_reforma_energetica',
  'storage_path': 'documents/s

Then, l