# How to build a simple Retriever LLM App with LangChain
* Very simple Retriever LLM App.

## Concepts included
* Vector stores (vector databases).
* Retrievers.

## Setup

#### Recommended: create new virtualenv
* mkdir your_project_name
* cd your_project_name
* pyenv virtualenv 3.11.4 your_venv_name
* pyenv activate your_venv_name
* pip install jupyterlab
* jupyter lab

In [1]:
#!pip install python-dotenv

#### .env File
Remember to include:
OPENAI_API_KEY=your_openai_api_key

LANGCHAIN_TRACING_V2=true
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
LANGCHAIN_API_KEY=your_langchain_api_key
LANGCHAIN_PROJECT=your_project_name

We will call our LangSmith project **simpleRetriever**.

In [2]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai_api_key = os.environ["OPENAI_API_KEY"]

#### Install LangChain

In [3]:
#!pip install langchain

In [3]:
!pip list | grep langchain

langchain                                0.2.1
langchain-chroma                         0.1.1
langchain-core                           0.2.3
langchain-openai                         0.1.8
langchain-text-splitters                 0.2.0

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m


## Connect with an LLM and start a conversation with it

In [5]:
#!pip install langchain-openai

* For this project, we will use OpenAI's gpt-3.5-turbo

In [13]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo")

#### Track the operation in LangSmith
* [Open LangSmith here](smith.langchain.com)

## Install Chroma Database

In [None]:
#pip install langchain-chroma

## Documents
* Text and metadata.
* Have 2 attributes:
    * `page_content`
    *  `metadata`

In [5]:
from langchain_core.documents import Document

documents = [
    Document(
        page_content="John F. Kennedy served as the 35th president of the United States from 1961 until his assassination in 1963.",
        metadata={"source": "us-presidents-doc"},
    ),
    Document(
        page_content="Robert F. Kennedy was a key political figure and served as the U.S. Attorney General; he was also assassinated in 1968.",
        metadata={"source": "us-politics-doc"},
    ),
    Document(
        page_content="The Kennedy family is known for their significant influence in American politics and their extensive philanthropic efforts.",
        metadata={"source": "kennedy-family-doc"},
    ),
    Document(
        page_content="Edward M. Kennedy, often known as Ted Kennedy, was a U.S. Senator who played a major role in American legislation over several decades.",
        metadata={"source": "us-senators-doc"},
    ),
    Document(
        page_content="Jacqueline Kennedy Onassis, wife of John F. Kennedy, was an iconic First Lady known for her style, poise, and dedication to cultural and historical preservation.",
        metadata={"source": "first-lady-doc"},
    ),
]

## Vector stores

In [6]:
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

vectorstore = Chroma.from_documents(
    documents,
    embedding=OpenAIEmbeddings(),
)

In [7]:
vectorstore.similarity_search("John")

[Document(page_content='John F. Kennedy served as the 35th president of the United States from 1961 until his assassination in 1963.', metadata={'source': 'us-presidents-doc'}),
 Document(page_content='Edward M. Kennedy, often known as Ted Kennedy, was a U.S. Senator who played a major role in American legislation over several decades.', metadata={'source': 'us-senators-doc'}),
 Document(page_content='Robert F. Kennedy was a key political figure and served as the U.S. Attorney General; he was also assassinated in 1968.', metadata={'source': 'us-politics-doc'}),
 Document(page_content='Jacqueline Kennedy Onassis, wife of John F. Kennedy, was an iconic First Lady known for her style, poise, and dedication to cultural and historical preservation.', metadata={'source': 'first-lady-doc'})]

In [8]:
# Note that providers implement different scores; Chroma here
# returns a distance metric that should vary inversely with
# similarity.

vectorstore.similarity_search_with_score("John")

[(Document(page_content='John F. Kennedy served as the 35th president of the United States from 1961 until his assassination in 1963.', metadata={'source': 'us-presidents-doc'}),
  0.4497259855270386),
 (Document(page_content='Edward M. Kennedy, often known as Ted Kennedy, was a U.S. Senator who played a major role in American legislation over several decades.', metadata={'source': 'us-senators-doc'}),
  0.4639798402786255),
 (Document(page_content='Robert F. Kennedy was a key political figure and served as the U.S. Attorney General; he was also assassinated in 1968.', metadata={'source': 'us-politics-doc'}),
  0.47490057349205017),
 (Document(page_content='Jacqueline Kennedy Onassis, wife of John F. Kennedy, was an iconic First Lady known for her style, poise, and dedication to cultural and historical preservation.', metadata={'source': 'first-lady-doc'}),
  0.48075956106185913)]

## Retrievers

In [10]:
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 1},
)

retriever.batch(["John", "Robert"])

[[Document(page_content='John F. Kennedy served as the 35th president of the United States from 1961 until his assassination in 1963.', metadata={'source': 'us-presidents-doc'})],
 [Document(page_content='Robert F. Kennedy was a key political figure and served as the U.S. Attorney General; he was also assassinated in 1968.', metadata={'source': 'us-politics-doc'})]]

In [9]:
from typing import List

from langchain_core.documents import Document
from langchain_core.runnables import RunnableLambda

retriever = RunnableLambda(vectorstore.similarity_search).bind(k=1)  # select top result

retriever.batch(["John", "Robert"])

[[Document(page_content='John F. Kennedy served as the 35th president of the United States from 1961 until his assassination in 1963.', metadata={'source': 'us-presidents-doc'})],
 [Document(page_content='Robert F. Kennedy was a key political figure and served as the U.S. Attorney General; he was also assassinated in 1968.', metadata={'source': 'us-politics-doc'})]]

## Simple Retriever

In [14]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

message = """
Answer this question using the provided context only.

{question}

Context:
{context}
"""

prompt = ChatPromptTemplate.from_messages([("human", message)])

chain = {
    "context": retriever, 
    "question": RunnablePassthrough()} | prompt | llm

In [15]:
response = chain.invoke("tell me about Jackie")

print(response.content)

Jackie was the wife of John F. Kennedy and an iconic First Lady known for her style, poise, and dedication to cultural and historical preservation.
