### Caso: 

* Ejemplo básico con Haystack utilizando los datasets que mencionaste. La idea será:
    * Cargar el dataset.
    * Configurar un índice simple con Elasticsearch o FAISS.
    * Implementar un pipeline básico para preguntas y respuestas.

Para este ejemplo, usaremos Elasticsearch como el backend de búsqueda y un modelo preentrenado de Hugging Face para extraer respuestas.

In [14]:
from haystack.document_stores import FAISSDocumentStore
from haystack.nodes import DensePassageRetriever, FARMReader
from haystack.pipelines import ExtractiveQAPipeline
import json

# Configuración de FAISS Document Store
document_store = FAISSDocumentStore(embedding_dim=768)

# Función para cargar datos del JSON con el formato SQuAD
def load_squad_data(file_path):
    with open(file_path, "r", encoding="utf-8") as file:
        squad_data = json.load(file)
    documents = []
    for data in squad_data['data']:
        title = data.get("title", "Unknown")
        for paragraph in data["paragraphs"]:
            context = paragraph["context"]
            # Agregar cada contexto como un documento
            documents.append({
                "content": context,
                "meta": {"source": title}
            })
    return documents

# Cargar los datos de SQuAD
squad_docs = load_squad_data("data/squad/squad-dev-v2.0.json")

# Agregar documentos al document store
document_store.write_documents(squad_docs)

# Configurar el DensePassageRetriever
retriever = DensePassageRetriever(
    document_store=document_store,
    query_embedding_model="facebook/dpr-question_encoder-single-nq-base",
    passage_embedding_model="facebook/dpr-ctx_encoder-single-nq-base",
    use_gpu=True
)

# Generar y almacenar embeddings
document_store.update_embeddings(retriever)

# Configurar el modelo lector
reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2", use_gpu=True)

# Crear un pipeline de Preguntas y Respuestas
pipeline = ExtractiveQAPipeline(reader=reader, retriever=retriever)

# Ejemplo de consulta
query = "¿En qué país se encuentra Normandía?"
prediction = pipeline.run(query=query, params={"Retriever": {"top_k": 5}, "Reader": {"top_k": 3}})

# Mostrar los resultados
for answer in prediction["answers"]:
    print(f"Respuesta: {answer.answer} (score: {answer.score})")
    print(f"Contexto: {answer.context}")
    print()


Writing Documents: 10000it [00:01, 9183.18it/s]            
Documents Processed: 10000 docs [00:04, 2195.55 docs/s]         
Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 38.24 Batches/s]

Respuesta: France (score: 0.15677842497825623)
Contexto: in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse ("Norman" comes from "Norseman") raider

Respuesta: Normandy (score: 0.06987866014242172)
Contexto: tura and El Hierro off the Atlantic coast of Africa. Their troops were gathered in Normandy, Gascony and were later reinforced by Castilian colonists.

Respuesta: Jersey and Guernsey in the Channel Islands (score: 0.04388099163770676)
Contexto: uries and survives today through the legal systems of Jersey and Guernsey in the Channel Islands. Norman customary law was transcribed in two customar




