# Ejercicio 13: LangChain

LangChain es un _framework_ de código abierto diseñado para facilitar el desarrollo de aplicaciones que combinan modelos de lenguaje LLMs con datos, herramientas externas y memoria. Está especialmente pensado para construir aplicaciones complejas basadas en IA, como sistemas _Retrieval-Augmented Generation_, asistentes conversacionales inteligentes, agentes autónomos y sistemas con razonamiento compuesto.

## Parte 1: Carga y preprocesamiento del corpus

In [2]:
!pip install kaggle

Collecting kaggle
  Downloading kaggle-1.7.4.5-py3-none-any.whl (181 kB)
     -------------------------------------- 181.2/181.2 kB 1.2 MB/s eta 0:00:00
Collecting bleach
  Downloading bleach-6.2.0-py3-none-any.whl (163 kB)
     -------------------------------------- 163.4/163.4 kB 2.5 MB/s eta 0:00:00
Collecting python-slugify
  Downloading python_slugify-8.0.4-py2.py3-none-any.whl (10 kB)
Collecting text-unidecode
  Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
     ---------------------------------------- 78.2/78.2 kB 1.1 MB/s eta 0:00:00
Collecting webencodings
  Downloading webencodings-0.5.1-py2.py3-none-any.whl (11 kB)
Installing collected packages: webencodings, text-unidecode, python-slugify, bleach, kaggle
Successfully installed bleach-6.2.0 kaggle-1.7.4.5 python-slugify-8.0.4 text-unidecode-1.3 webencodings-0.5.1



[notice] A new release of pip available: 22.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [13]:
import os
from dotenv import load_dotenv
import kaggle
import pandas as pd

# Cargar variables de entorno desde .env
load_dotenv(dotenv_path="credenciales.env")

os.environ["KAGGLE_USERNAME"] = os.getenv("KAGGLE_USERNAME")
os.environ["KAGGLE_KEY"] = os.getenv("KAGGLE_KEY")
os.environ["GOOGLE_API_KEY"] = os.getenv("GOOGLE_API_KEY")

# Descargar el dataset (si no existe)
dataset = "rajneesh231/lex-fridman-podcast-transcript"
path = "../data/13langchain"

kaggle.api.dataset_download_files(dataset, path=path, unzip=True)

Dataset URL: https://www.kaggle.com/datasets/rajneesh231/lex-fridman-podcast-transcript


In [14]:
# Cargar el dataset CSV
file_path = os.path.join(path, "podcastdata_dataset.csv")  # ajusta nombre si difiere
df = pd.read_csv(file_path)
df

Unnamed: 0,id,guest,title,text
0,1,Max Tegmark,Life 3.0,"As part of MIT course 6S099, Artificial Genera..."
1,2,Christof Koch,Consciousness,As part of MIT course 6S099 on artificial gene...
2,3,Steven Pinker,AI in the Age of Reason,"You've studied the human mind, cognition, lang..."
3,4,Yoshua Bengio,Deep Learning,What difference between biological neural netw...
4,5,Vladimir Vapnik,Statistical Learning,The following is a conversation with Vladimir ...
...,...,...,...,...
314,321,Ray Kurzweil,"Singularity, Superintelligence, and Immortality","By the time he gets to 2045, we'll be able to ..."
315,322,Rana el Kaliouby,"Emotion AI, Social Robots, and Self-Driving Cars","there's a broader question here, right? As we ..."
316,323,Will Sasso,"Comedy, MADtv, AI, Friendship, Madness, and Pr...",Once this whole thing falls apart and we are c...
317,324,Daniel Negreanu,Poker,you could be the seventh best player in the wh...


In [19]:
pip install --upgrade langchain


Collecting langchainNote: you may need to restart the kernel to use updated packages.

  Downloading langchain-0.3.26-py3-none-any.whl (1.0 MB)
     ---------------------------------------- 1.0/1.0 MB 5.3 MB/s eta 0:00:00
Collecting langchain-core<1.0.0,>=0.3.66
  Downloading langchain_core-0.3.71-py3-none-any.whl (442 kB)
     ------------------------------------- 442.8/442.8 kB 13.5 MB/s eta 0:00:00
Collecting langchain-text-splitters<1.0.0,>=0.3.8
  Downloading langchain_text_splitters-0.3.8-py3-none-any.whl (32 kB)
Collecting langsmith>=0.1.17
  Downloading langsmith-0.4.8-py3-none-any.whl (367 kB)
     -------------------------------------- 368.0/368.0 kB 7.8 MB/s eta 0:00:00
Collecting tenacity!=8.4.0,<10.0.0,>=8.1.0
  Downloading tenacity-9.1.2-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting orjson<4.0.0,>=3.9.14
  Downloading orjson-3.11.0-cp311-cp311-win_amd64.whl (129 kB)
     ----------------------


[notice] A new release of pip available: 22.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [20]:
from langchain.schema import Document

# Convertir cada fila en un Document
documents = [
    Document(
        page_content=row["text"],
        metadata={"id": row["id"], "guest": row["guest"], "title": row["title"]},
    )
    for _, row in df.iterrows()
]

## Parte 2: Segmentación y embeddings

In [21]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(documents)

In [25]:
!pip install langchain langchain-core langchain-community
!pip install faiss-cpu
!pip install sentence-transformers


Collecting langchain-community
  Downloading langchain_community-0.3.27-py3-none-any.whl (2.5 MB)
     ---------------------------------------- 2.5/2.5 MB 4.9 MB/s eta 0:00:00
Collecting aiohttp<4.0.0,>=3.8.3
  Downloading aiohttp-3.12.14-cp311-cp311-win_amd64.whl (452 kB)
     -------------------------------------- 452.3/452.3 kB 7.1 MB/s eta 0:00:00
Collecting dataclasses-json<0.7,>=0.5.7
  Downloading dataclasses_json-0.6.7-py3-none-any.whl (28 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0
  Downloading pydantic_settings-2.10.1-py3-none-any.whl (45 kB)
     ---------------------------------------- 45.2/45.2 kB ? eta 0:00:00
Collecting httpx-sse<1.0.0,>=0.4.0
  Downloading httpx_sse-0.4.1-py3-none-any.whl (8.1 kB)
Collecting aiohappyeyeballs>=2.5.0
  Downloading aiohappyeyeballs-2.6.1-py3-none-any.whl (15 kB)
Collecting aiosignal>=1.4.0
  Downloading aiosignal-1.4.0-py3-none-any.whl (7.5 kB)
Collecting frozenlist>=1.1.1
  Downloading frozenlist-1.7.0-cp311-cp311-win_amd64.whl (44 kB


[notice] A new release of pip available: 22.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


     ---------------------------------------- 41.5/41.5 kB 2.1 MB/s eta 0:00:00
Collecting yarl<2.0,>=1.17.0
  Downloading yarl-1.20.1-cp311-cp311-win_amd64.whl (86 kB)
     ---------------------------------------- 86.7/86.7 kB 4.8 MB/s eta 0:00:00
Collecting marshmallow<4.0.0,>=3.18.0
  Downloading marshmallow-3.26.1-py3-none-any.whl (50 kB)
     ---------------------------------------- 50.9/50.9 kB 2.5 MB/s eta 0:00:00
Collecting typing-inspect<1,>=0.4.0
  Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)
Collecting mypy-extensions>=0.3.0
  Downloading mypy_extensions-1.1.0-py3-none-any.whl (5.0 kB)
Installing collected packages: propcache, mypy-extensions, multidict, marshmallow, httpx-sse, frozenlist, aiohappyeyeballs, yarl, typing-inspect, aiosignal, pydantic-settings, dataclasses-json, aiohttp, langchain-community
Successfully installed aiohappyeyeballs-2.6.1 aiohttp-3.12.14 aiosignal-1.4.0 dataclasses-json-0.6.7 frozenlist-1.7.0 httpx-sse-0.4.1 langchain-community-0.3.2


[notice] A new release of pip available: 22.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip





[notice] A new release of pip available: 22.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [26]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

texts = [chunk.page_content for chunk in chunks]
vectorstore = FAISS.from_texts(texts, embeddings)

vectorstore.save_local("index_13langchain")

  embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
  from .autonotebook import tqdm as notebook_tqdm


## Parte 3: Indexación en FAISS

In [27]:
vectorstore = FAISS.from_texts(texts, embeddings)
vectorstore.save_local("index_13langchain_02")

In [28]:
import google.generativeai as genai
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
for m in genai.list_models():
    print(m.name, m.supported_generation_methods)

models/embedding-gecko-001 ['embedText', 'countTextTokens']
models/gemini-1.0-pro-vision-latest ['generateContent', 'countTokens']
models/gemini-pro-vision ['generateContent', 'countTokens']
models/gemini-1.5-pro-latest ['generateContent', 'countTokens']
models/gemini-1.5-pro-002 ['generateContent', 'countTokens', 'createCachedContent']
models/gemini-1.5-pro ['generateContent', 'countTokens']
models/gemini-1.5-flash-latest ['generateContent', 'countTokens']
models/gemini-1.5-flash ['generateContent', 'countTokens']
models/gemini-1.5-flash-002 ['generateContent', 'countTokens', 'createCachedContent']
models/gemini-1.5-flash-8b ['createCachedContent', 'generateContent', 'countTokens']
models/gemini-1.5-flash-8b-001 ['createCachedContent', 'generateContent', 'countTokens']
models/gemini-1.5-flash-8b-latest ['createCachedContent', 'generateContent', 'countTokens']
models/gemini-2.5-pro-preview-03-25 ['generateContent', 'countTokens', 'createCachedContent', 'batchGenerateContent']
models/ge

## Parte 4: Creación de la cadena de recuperación

In [30]:
pip install langchain-google-genai


Collecting langchain-google-genaiNote: you may need to restart the kernel to use updated packages.

  Downloading langchain_google_genai-2.1.8-py3-none-any.whl (47 kB)
     ---------------------------------------- 47.8/47.8 kB 1.2 MB/s eta 0:00:00
Collecting filetype<2.0.0,>=1.2.0
  Downloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Collecting google-ai-generativelanguage<0.7.0,>=0.6.18
  Downloading google_ai_generativelanguage-0.6.18-py3-none-any.whl (1.4 MB)
     ---------------------------------------- 1.4/1.4 MB 5.1 MB/s eta 0:00:00
Installing collected packages: filetype, google-ai-generativelanguage, langchain-google-genai
  Attempting uninstall: google-ai-generativelanguage
    Found existing installation: google-ai-generativelanguage 0.6.15
    Uninstalling google-ai-generativelanguage-0.6.15:
      Successfully uninstalled google-ai-generativelanguage-0.6.15
Successfully installed filetype-1.2.0 google-ai-generativelanguage-0.6.18 langchain-google-genai-2.1.8


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-generativeai 0.8.5 requires google-ai-generativelanguage==0.6.15, but you have google-ai-generativelanguage 0.6.18 which is incompatible.

[notice] A new release of pip available: 22.3.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [31]:
from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI

# Inicializar Gemini LLM
llm = ChatGoogleGenerativeAI(model="models/gemini-1.5-flash", temperature=0)

# Crear retriever desde tu índice vectorial (FAISS o Chroma)
retriever = vectorstore.as_retriever()

# Cadena de pregunta-respuesta
qa_chain = RetrievalQA.from_chain_type(llm, retriever=retriever)

# Ejecutar una pregunta
response = qa_chain.run("What is AGI Artificial General Intelligence?")
print(response)

  response = qa_chain.run("What is AGI Artificial General Intelligence?")


Based on the provided text, AGI, or Artificial General Intelligence, is a type of AI with the ability to learn and adapt to new environments.  However, it's unclear if a machine with AGI would necessarily have consciousness.  The text also notes that the term is widely used and understood, even though its precise definition is debated, with some considering it simply very smart AI surpassing human intelligence, while others equate it to human-like intelligence.


In [32]:
from langchain.prompts import PromptTemplate

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template="""
Usa solo el siguiente contexto para responder a la pregunta.
Si la respuesta no está explícita en el contexto, responde exactamente:
"No encontré información suficiente en el corpus."

Contexto:
{context}

Pregunta: {question}
Respuesta:
"""
)

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt}
)

result = qa_chain.invoke({"query": "¿Qué es AGI Artificial General Intelligence?"})
print(result["result"])

No encontré información suficiente en el corpus.


In [33]:
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=retriever,
    return_source_documents=True
)

result = qa_chain.invoke({"query": "¿Qué es AGI Artificial General Intelligence?"})
print(result["result"])
print(result["source_documents"])

Based on the provided text, AGI (Artificial General Intelligence) is a term used to describe artificial intelligence that reaches or surpasses human-level intelligence.  However, the text also notes that the term is not well-defined and that human intelligence itself is not truly "general" but rather highly specialized.  The text mentions AIXI as an example of a theoretical system that could be considered truly general intelligence within the constraints of computational theory.  There's no single agreed-upon test to definitively determine if AGI has been achieved.
[Document(id='791fd489-46f1-4790-9d3c-7c27b82dee75', metadata={}, page_content="general intelligence, artificial intelligence, only refers to if you achieve human level or a subhuman level, but quite broad, is it also general intelligence? So we have to distinguish, or it's only super human intelligence, general artificial intelligence. Is there a test in your mind, like the Turing test for natural language or some other tes