## RAG-WEB

Este cuaderno procesa documentos PDF, extrayendo y dividiendo su contenido en fragmentos de texto. Luego, los almacena en una base de datos de vectores para facilitar la búsqueda. Utilizando el modelo **Llama 3.2**, el sistema responde a preguntas sobre los PDFs, generando respuestas precisas basadas en la información almacenada.

Importa las librerías necesarias

In [50]:
import requests
from bs4 import BeautifulSoup
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain.prompts import ChatPromptTemplate
from langchain_ollama.chat_models import ChatOllama
from langchain_core.runnables import RunnableLambda, RunnablePassthrough

Extrae el contenido de la página web y la dividimos 

In [29]:
def extract_lines_from_web(url):
    response = requests.get(url)
    if response.status_code != 200:
        print(f"Error al acceder a la página: {response.status_code}")
        return []

    # Parseamos el contenido HTML de la página
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Extraemos el texto de la página web. El texto principal está en un artículo con la clase 'article-content'
    content = soup.find('div', {'class': 'content'})  
    
    if not content:
        print("No se pudo encontrar el contenido principal de la página.")
        return []

    # Extraemos todo el texto del artículo
    full_text = content.get_text("\n", strip=True)
    
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=20)    # Divide el texto en chunks
    web_texts = text_splitter.split_text(full_text)
    
    return web_texts


url = "https://oyister.oyis.org/articles/book-review-the-stranger-by-albert-camus" 

web_texts = extract_lines_from_web(url)

print(f"Se cargaron y dividieron {len(web_texts)} fragmentos de texto.")

Se cargaron y dividieron 11 fragmentos de texto.


Configura un modelo de Hugging Face para generar embeddings de texto.

In [6]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

Crear un vector store de Chroma usando los documentos y el modelo de embeddings


In [27]:
vector_store = Chroma.from_texts(
    texts=web_texts,
    collection_name="book_review2",
    embedding=embeddings,
)
print(f"Se creó el vector store con {len(web_texts)} documentos.")

Se creó el vector store con 11 documentos.


Utiliza una función de embeddings para representar los datos

In [8]:

vector_store = Chroma(
    collection_name="book_review2",
    embedding_function=embeddings,
)

Convierte el vector store en un retriever  para realizar consultas basadas en similitud de texto

In [9]:
retriever = vector_store.as_retriever()

Pregunta de ejemplo

In [18]:
question = "Tell me about the philosophy of the book"
docs = vector_store.similarity_search(question, k=5)
len(docs)

5

Vemos los textos más relacionados a la pregunta

In [19]:
docs

[Document(metadata={}, page_content='The Stranger\nwritten by French writer Albert Camus is a novel that is centered around the philosophical idea of existentialism.'),
 Document(metadata={}, page_content='When I finished the book, I realized that I’ve been just trying to imitate other people’s lifestyles and trying to meet the expectations of others. I was merely walking on the pathways that were given because I was afraid to be considered a\nstranger'),
 Document(metadata={}, page_content='stranger\n. I learned that I am responsible to make decisions based on my own beliefs and values. This book inspired me to become less bound by societal expectations and to start inquiring about what kind of person I truly am.\nIf you’re interested in reading the book, it will hopefully be as thought-provoking and compelling as it was for me.\nInhyuk K.'),
 Document(metadata={}, page_content=', Meursault, personifies existentialism. The opening line spoken by Meursault clearly shows that he does no

Crea una cadena de procesamiento de lenguaje natural utilizando un LLM local (Ollama) para responder preguntas basadas en un contexto dado

In [12]:
# Prompt: Se define una plantilla de pregunta-respuesta que toma un contexto y una pregunta como entrada
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

# Local LLM: Se configura el modelo llama3.2 para generar respuestas de manera local
ollama_llm = "llama3.2"
model_local = ChatOllama(model=ollama_llm)

# Chain Se construye un pipeline que pasa el contexto desde el retriever
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model_local
    | StrOutputParser()
)

Probamos el modelo

In [14]:
chain.invoke("Tell me about the philosophy of the book")

'The philosophy presented in "The Stranger" by Albert Camus is centered around existentialism. This philosophical idea emphasizes individual freedom and choice, as well as personal responsibility for one\'s actions and decisions. The main character, Meursault, embodies this philosophy through his actions and thoughts, which demonstrate a sense of detachment from societal expectations and norms.\n\nExistentialism suggests that people must take ownership of their lives and create their own meaning, rather than relying on external sources such as society or tradition. Camus\' protagonist is often described as an "anomaly" or a "stranger" because he does not conform to the expected emotional responses to life\'s events, such as crying at his mother\'s funeral.\n\nThe book encourages readers to question societal expectations and values, and to explore their own beliefs and values in order to create a more authentic and meaningful life. Through Meursault\'s character, Camus illustrates the i

In [15]:
chain.invoke("Tell me about the main character")

"The main character's name is Meursault. He personifies existentialism and can be described as an anomaly to societal expectations. He displays a lack of emotional response to certain situations, such as his mother's death and his own actions being considered morally questionable. Despite this, his perspective on life changes throughout the book."

In [20]:
chain.invoke("What do you think about the review?")

'Based on the context, it seems that the reviewer, Inhyuk K., is highly impressed with the book "The Stranger" by Albert Camus. The tone of the review suggests that the book had a profound impact on the reviewer, inspiring them to reflect on their own values and beliefs.\n\nThe use of phrases such as "thought-provoking and compelling", "my view on Meursault changed quite a bit after finishing the book", and "I am responsible to make decisions based on my own beliefs and values" indicate that the reviewer found the book to be deeply meaningful and life-changing. The reviewer also seems to appreciate the author\'s writing style, noting that the opening line is particularly effective.\n\nOverall, it appears that Inhyuk K. highly recommends "The Stranger" to readers who are looking for a thought-provoking and inspiring literary experience.'

In [17]:
chain.invoke("Who is the writter of the review?")

"According to the context, the reviewer's name is In-Hyuk K. and he writes for OYISTER."

In [24]:
chain.invoke("opening line of the book")

'"The Stranger" written by Albert Camus is a novel that begins with the following line:\n\n“Maman died today. Or yesterday maybe, I don’t know. I got a telegram from the home: ‘Mother deceased. Funeral tomorrow. Faithfully yours.’ That doesn’t mean anything. Maybe it was yesterday.”'

El cuaderno es eficaz para obtener y procesar información de sitios web, almacenarla de manera estructurada en una base de datos de vectores y generar respuestas precisas a preguntas mediante el modelo **Llama 3.2**. Este enfoque permite acceder rápidamente a datos relevantes extraídos de la web y proporciona respuestas útiles basadas en esos contenidos.