---
title: Large Language Model (LLM) con LangChain
subtitle: 
summary: Como entrenar tu LLm con Langchain
authors:
- admin
tags: []
categories: []
date: "2023-02-05T00:00:00Z"
lastMod: "2023-09-05T00:00:00Z"
featured: false
draft: false

# Featured image
# To use, add an image named `featured.jpg/png` to your page's folder. 
image:
  caption: ""
  focal_point: ""

# Projects (optional).
#   Associate this post with one or more of your projects.
#   Simply enter your project's folder or file name without extension.
#   E.g. `projects = ["internal-project"]` references 
#   `content/project/deep-learning/index.md`.
#   Otherwise, set `projects = []`.
projects: []
---

# Implementación de un Large Language Model con Langchain

El proyecto completo con sus dependencias se encuentra en https://github.com/JhonGlidden/Langchain 

## Introducción
El avance de la inteligencia artificial y el aprendizaje profundo ha llevado a la creación de modelos de lenguaje altamente sofisticados y capaces. En este proyecto, buscamos aprovechar estos avances para entrenar un Large Language Model (LLM) utilizando la API de OpenAI y el innovador framework Langchain.

## Objetivo
El propósito principal de este proyecto es desarrollar un modelo que pueda interactuar y responder preguntas sobre un documento de Word específico. El modelo se entrenará utilizando un conjunto de datos detallado y luego se utilizará para realizar varias consultas, brindando respuestas precisas y contextuales sobre el contenido del documento.

## Conjunto de Datos
Para la implementación de este proyecto, hemos seleccionado un conjunto de datos intrigante y relevante disponible en Kaggle. Este conjunto de datos, que se puede encontrar aquí, contiene una colección de reseñas de un restaurante. La información encapsula las experiencias y opiniones de los clientes sobre el restaurante, lo que nos permite un análisis profundo y una amplia gama de preguntas posibles para entrenar y probar nuestro modelo. https://www.kaggle.com/datasets/d4rklucif3r/restaurant-reviews

## Metodología
### Entrenamiento
Utilizaremos la API de OpenAI para acceder a un modelo de lenguaje de última generación y el framework Langchain para estructurar y gestionar el proceso de entrenamiento. El documento de Word seleccionado se procesará y se convertirá en un formato que el modelo pueda entender y aprender.

### Consultas y Evaluación
Una vez entrenado, el modelo será capaz de responder preguntas específicas sobre el contenido del documento. Esto incluirá no solo la comprensión literal del texto, sino también la interpretación y el análisis de la información subyacente. La eficacia del modelo se evaluará mediante una serie de pruebas y métricas, garantizando que las respuestas sean precisas y coherentes con el contenido original.

### Conclusión
Este proyecto representa una emocionante oportunidad para explorar las capacidades de los modernos modelos de lenguaje y su aplicación en la comprensión y análisis de documentos de texto. Mediante la combinación de la API de OpenAI y el framework Langchain, buscamos crear una herramienta que pueda transformar la forma en que interactuamos y entendemos la información escrita, y al mismo tiempo, ofrecer una visión única de las opiniones y experiencias de los clientes en un contexto de restaurante. La implementación exitosa de este proyecto podría abrir puertas a futuras investigaciones y aplicaciones en el campo del procesamiento del lenguaje natural y la inteligencia artificial.


# Codificación

## Librerias

In [40]:
from langchain.schema import Document
from langchain.document_loaders import Docx2txtLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os
from getpass import getpass
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
import logging

from langchain.chat_models import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever
import logging


## Carga de los datos

In [2]:
loader = Docx2txtLoader("./ReviewRestaurants.docx") # cargamos de forma que el archivo de word (puede cargar otro archivo)

In [3]:
data = loader.load()

In [4]:
data[0].page_content[:300]

'Review\tLiked\n\nWow... Loved this place.\t1\n\nCrust is not good.\t0\n\nNot tasty and the texture was just nasty.\t0\n\nStopped by during the late May bank holiday off Rick Steve recommendation and loved it.\t1\n\nThe selection on the menu was great and so were the prices.\t1\n\nNow I am getting angry and I want my '

In [5]:
type(data)

list

In [8]:
data[0].metadata

{'source': './ReviewRestaurants.docx'}

In [9]:
## Comprobamos que la información ha sido cargada
data[0].page_content



## LangChain

- RecursiveCharacterTextSplitter: Este separador de texto toma una lista de caracteres y trata de crear fragmentos basándose en la división del primer carácter

In [10]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap=100,
    length_function=len
)

documents = text_splitter.split_documents(data)

In [11]:
len(documents)

69

In [12]:
documents[20]

Document(page_content="I had the mac salad and it was pretty bland so I will not be getting that again.\t0\n\nI went to Bachi Burger on a friend's recommendation and was not disappointed.\t1\n\nService stinks here!\t0\n\nI waited and waited.\t0\n\nThis place is not quality sushi, it is not a quality restaurant.\t0\n\nI would definitely recommend the wings as well as the pizza.\t1\n\nGreat Pizza and Salads!\t1\n\nThings that went wrong: - They burned the saganaki.\t0\n\nWe waited an hour for what was a breakfast I could have done 100 times better at home.\t0\n\nThis place is amazing!\t1\n\nI hate to disagree with my fellow Yelpers, but my husband and I were so disappointed with this place.\t0\n\nWaited 2 hours & never got either of our pizzas as many other around us who came in later did!\t0\n\nJust don't know why they were so slow.\t0\n\nThe staff is great, the food is delish, and they have an incredible beer selection.\t1\n\nI live in the neighborhood so I am disappointed I won't be b

## APi OpenAI

In [16]:
OPENAI_API_KEY=""
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY


In [19]:
vectorstore = Chroma.from_documents(documents=documents, embedding=OpenAIEmbeddings())

## LLM

##

In [43]:
## Question
question = "what are the main reviews?"
docs = vectorstore.similarity_search(question)
len(docs)

4

## MultiQueryRetriever 

Automatiza el proceso de ajuste de consultas utilizando un LLM para generar múltiples consultas desde

In [42]:

from langchain.chat_models import ChatOpenAI
from langchain.retrievers.multi_query import MultiQueryRetriever

logging.basicConfig()
logging.getLogger('langchain.retrievers.multi_query').setLevel(logging.INFO)

retriever_from_llm = MultiQueryRetriever.from_llm(retriever=vectorstore.as_retriever(),
                                                  llm=ChatOpenAI(temperature=0))
unique_docs = retriever_from_llm.get_relevant_documents(query=question)
len(unique_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. Can you provide me with the primary reviews?', '2. What are the key reviews?', '3. Could you share the major reviews?']


8

## Generación de texto

In [32]:
from langchain.chains import RetrievalQA
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
qa_chain = RetrievalQA.from_chain_type(llm,retriever=vectorstore.as_retriever())
qa_chain({"query": question})

{'query': 'what are the main reviews?',
 'result': 'The main reviews are positive.'}

In [33]:
from langchain.chains.question_answering import load_qa_chain

chain = load_qa_chain(llm, chain_type="stuff")
chain({"input_documents": unique_docs, "question": question},return_only_outputs=True)

{'output_text': 'The main reviews are positive.'}

## Multiples querys

In [47]:
from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

In [35]:
from langchain.chains import ConversationalRetrievalChain

retriever = vectorstore.as_retriever()
chat = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, memory=memory)

Mas preguntas

In [36]:
result = chat({"question": "what are the main complaints?"})
result['answer']

'The main complaints are about management attitudes, slow service, poor quality food, bad service, dirty restaurant, and generic food.'

In [37]:
result = chat({"question": "what are the best reviews?"})
result['answer']

"- The staff is always super friendly and helpful, which is especially cool when you bring two small boys and a baby!\n- Four stars for the food & the guy in the blue shirt for his great vibe & still letting us in to eat!\n- The roast beef sandwich tasted really good!\n- High-quality chicken on the chicken Caesar salad.\n- We were promptly greeted and seated.\n- This is a good joint.\n- The Macarons here are insanely good.\n- Our waiter was very attentive, friendly, and informative.\n- Both of the egg rolls were fantastic.\n- I had a salad with the wings, and some ice cream for dessert and left feeling quite satisfied.\n- The best place to go for a tasty bowl of Pho!\n- Very friendly staff.\n- It is worth the drive.\n- I had heard good things about this place, but it exceeding every hope I could have dreamed of.\n- Food was great and so was the service!\n- Great brunch spot.\n- Service is friendly and inviting.\n- Very good lunch spot.\n- Only Pros : Large seating area/ Nice bar area/ 

In [44]:
result = chat({"question": "How many good reviews there are?"})
result['answer']

'There are 26 good reviews.'

In [50]:
result = chat({"question": "which is the worst review?"})
result['answer']

'The worst review is "This place is disgusting!"'

In [49]:
## podemos realizar mas preguntas al modelo con la siguiente estructura:
# result = chat({"question": "Query"})
# result['answer']