# Ejercicio 13: LangChain

LangChain es un _framework_ de código abierto diseñado para facilitar el desarrollo de aplicaciones que combinan modelos de lenguaje LLMs con datos, herramientas externas y memoria. Está especialmente pensado para construir aplicaciones complejas basadas en IA, como sistemas _Retrieval-Augmented Generation_, asistentes conversacionales inteligentes, agentes autónomos y sistemas con razonamiento compuesto.

## Parte 1: Carga y preprocesamiento del corpus

In [25]:
%pip install python-dotenv kaggle google-generativeai langchain langchain-community langchain-google-genai

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting langchain-google-genai
  Downloading langchain_google_genai-2.1.8-py3-none-any.whl.metadata (7.0 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain-google-genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
INFO: pip is looking at multiple versions of langchain-google-genai to determine which version is compatible with other requirements. This could take a while.
Collecting langchain-google-genai
  Downloading langchain_google_genai-2.1.7-py3-none-any.whl.metadata (7.0 kB)
  Downloading langchain_google_genai-2.1.6-py3-none-any.whl.metadata (7.0 kB)
  Downloading langchain_google_genai-2.1.5-py3-none-any.whl.metadata (5.2 kB)
  Downloading langchain_google_genai-2.1.4-py3-none-any.whl.metadata (5.2 kB)
  Downloading langchain_google_genai-2.1.3-py3-none-any.whl.metadata (4.7 kB)
  Downloading langchain_google_genai-2.1.2-py3-none-any.whl.metadata (4.7 kB)
  Downloading langchain_google_genai-2.1.1-py3-none-any.whl.metadata (4.7 kB)
INFO: pip is stil

In [9]:
import os
from dotenv import load_dotenv
import google.generativeai as genai

load_dotenv()
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

# Probar
for m in genai.list_models():
    print(m.name)

models/embedding-gecko-001
models/gemini-1.0-pro-vision-latest
models/gemini-pro-vision
models/gemini-1.5-pro-latest
models/gemini-1.5-pro-002
models/gemini-1.5-pro
models/gemini-1.5-flash-latest
models/gemini-1.5-flash
models/gemini-1.5-flash-002
models/gemini-1.5-flash-8b
models/gemini-1.5-flash-8b-001
models/gemini-1.5-flash-8b-latest
models/gemini-2.5-pro-preview-03-25
models/gemini-2.5-flash-preview-05-20
models/gemini-2.5-flash
models/gemini-2.5-flash-lite-preview-06-17
models/gemini-2.5-pro-preview-05-06
models/gemini-2.5-pro-preview-06-05
models/gemini-2.5-pro
models/gemini-2.0-flash-exp
models/gemini-2.0-flash
models/gemini-2.0-flash-001
models/gemini-2.0-flash-exp-image-generation
models/gemini-2.0-flash-lite-001
models/gemini-2.0-flash-lite
models/gemini-2.0-flash-preview-image-generation
models/gemini-2.0-flash-lite-preview-02-05
models/gemini-2.0-flash-lite-preview
models/gemini-2.0-pro-exp
models/gemini-2.0-pro-exp-02-05
models/gemini-exp-1206
models/gemini-2.0-flash-thin

In [14]:
import pandas as pd
# Cargar el dataset CSV
file_path = os.path.join("../data/13langchain/podcastdata_dataset.csv")  # ajusta nombre si difiere
df = pd.read_csv(file_path)
df

Unnamed: 0,id,guest,title,text
0,1,Max Tegmark,Life 3.0,"As part of MIT course 6S099, Artificial Genera..."
1,2,Christof Koch,Consciousness,As part of MIT course 6S099 on artificial gene...
2,3,Steven Pinker,AI in the Age of Reason,"You've studied the human mind, cognition, lang..."
3,4,Yoshua Bengio,Deep Learning,What difference between biological neural netw...
4,5,Vladimir Vapnik,Statistical Learning,The following is a conversation with Vladimir ...
...,...,...,...,...
314,321,Ray Kurzweil,"Singularity, Superintelligence, and Immortality","By the time he gets to 2045, we'll be able to ..."
315,322,Rana el Kaliouby,"Emotion AI, Social Robots, and Self-Driving Cars","there's a broader question here, right? As we ..."
316,323,Will Sasso,"Comedy, MADtv, AI, Friendship, Madness, and Pr...",Once this whole thing falls apart and we are c...
317,324,Daniel Negreanu,Poker,you could be the seventh best player in the wh...


In [17]:
from langchain.schema import Document

# Convertir cada fila en un Document
documents = [
    Document(
        page_content=row["text"],
        metadata={"id": row["id"], "guest": row["guest"], "title": row["title"]},
    )
    for _, row in df.iterrows()
]

## Parte 2: Segmentación y embeddings

In [18]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(documents)

In [21]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

texts = [chunk.page_content for chunk in chunks]
vectorstore = FAISS.from_texts(texts, embeddings)

vectorstore.save_local("index_13langchain")

  embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")


## Parte 3: Indexación en FAISS

In [22]:
vectorstore = FAISS.from_texts(texts, embeddings)
vectorstore.save_local("index_13langchain_02")

In [23]:
import google.generativeai as genai
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
for m in genai.list_models():
    print(m.name, m.supported_generation_methods)

models/embedding-gecko-001 ['embedText', 'countTextTokens']
models/gemini-1.0-pro-vision-latest ['generateContent', 'countTokens']
models/gemini-pro-vision ['generateContent', 'countTokens']
models/gemini-1.5-pro-latest ['generateContent', 'countTokens']
models/gemini-1.5-pro-002 ['generateContent', 'countTokens', 'createCachedContent']
models/gemini-1.5-pro ['generateContent', 'countTokens']
models/gemini-1.5-flash-latest ['generateContent', 'countTokens']
models/gemini-1.5-flash ['generateContent', 'countTokens']
models/gemini-1.5-flash-002 ['generateContent', 'countTokens', 'createCachedContent']
models/gemini-1.5-flash-8b ['createCachedContent', 'generateContent', 'countTokens']
models/gemini-1.5-flash-8b-001 ['createCachedContent', 'generateContent', 'countTokens']
models/gemini-1.5-flash-8b-latest ['createCachedContent', 'generateContent', 'countTokens']
models/gemini-2.5-pro-preview-03-25 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']
models/ge

## Parte 4: Creación de la cadena de recuperación

In [26]:
from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI

# Inicializar Gemini LLM
llm = ChatGoogleGenerativeAI(model="models/gemini-1.5-flash", temperature=0)

# Crear retriever desde tu índice vectorial (FAISS o Chroma)
retriever = vectorstore.as_retriever()

# Cadena de pregunta-respuesta
qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever)

# Ejecutar una pregunta
response = qa_chain.run("What is AGI Artificial General Intelligence?")
print(response)

  response = qa_chain.run("What is AGI Artificial General Intelligence?")


Based on the provided text, AGI, or Artificial General Intelligence, is a type of AI with the ability to learn and adapt to new environments.  However, it's unclear if a machine with AGI would necessarily have consciousness.  The text also points out that the term is widely used and understood, even though its precise definition is debated, with some considering it simply very smart AI surpassing human intelligence, while others equate it to human-like intelligence.


In [27]:
from langchain.prompts import PromptTemplate

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="""
Usa solo el siguiente contexto para responder a la pregunta.
Si la respuesta no está explícita en el contexto, responde exactamente:
"No encontré información suficiente en el corpus."

Contexto:
{context}

Pregunta: {question}
Respuesta:
"""
)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt}
)

result = qa_chain.invoke({"query": "¿Qué es AGI Artificial General Intelligence?"})
print(result["result"])

No encontré información suficiente en el corpus.


In [28]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
    return_source_documents=True
)

result = qa_chain.invoke({"query": "¿Qué es AGI Artificial General Intelligence?"})
print(result["result"])
print(result["source_documents"])

Based on the provided text, AGI (Artificial General Intelligence) is a term used to describe artificial intelligence that reaches or surpasses human-level intelligence.  However, the text also notes that the term is not well-defined and that human intelligence itself is not truly "general" but rather highly specialized.  The text mentions AIXI as an example of a theoretical system that could be considered truly general intelligence within the constraints of computational theory.  There's no single agreed-upon test to definitively determine if AGI has been achieved.
[Document(id='0a436eaf-c5ad-4deb-bf75-d58991e159ce', metadata={}, page_content="general intelligence, artificial intelligence, only refers to if you achieve human level or a subhuman level, but quite broad, is it also general intelligence? So we have to distinguish, or it's only super human intelligence, general artificial intelligence. Is there a test in your mind, like the Turing test for natural language or some other tes