# **Getting Started with LangChain: A Beginner’s Guide to Building LLM-Powered Applications**

That is  LangChain tutorial to build anything with large language models in Python [<sup>[source1]</sup>](https://medium.com/towards-data-science/getting-started-with-langchain-a-beginners-guide-to-building-llm-powered-applications-95fc8898732c)[<sup>[source1*]</sup>](https://archive.is/7nNO2#selection-249.1-249.75)[<sup>[source2]</sup>](https://cobusgreyling.medium.com/langchain-creating-large-language-model-llm-applications-via-huggingface-192423883a74)


**LangChain** is a framework built to help you build LLM-powered applications more easily by providing you with the following:
* a generic interface to a variety of different base models (see Models),

* a framework to help you manage your prompts (see Prompts), and

* a central interface to long-term memory (see Memory), external data (see Indexes), other LLMs (see Chains), and other agents for tasks an LLM is not able to handle (e.g., calculations or search) (see Agents).

*It is an open-source project.* 

* `Proprietary base models(left)` are **closed-source** foundation models owned by companies with large expert teams and big AI budgets. They usually are larger than open-source models and thus have better performance, but they also have expensive APIs.

* `Open-source base models(right)` are usually smaller models with lower capabilities than proprietary models, but they are more cost-effective than proprietary ones 

![image.png](resources/models.jpg)

In [1]:
import langchain 
import os 
from dotenv import load_dotenv

load_dotenv()
api_key_huggingface = os.environ.get('HUGGINGFACEHUB_API_TOKEN')



## Models: Choosing from different LLMs and embedding models

* **LLMs** take a string as an input (prompt) and output a string (completion) as the next example.

* **Chat models** are similar to LLMs. They take a list of chat messages as input and return a chat message.


In [3]:
from langchain import PromptTemplate, HuggingFaceHub, LLMChain
from langchain.llms import OpenAI

# Proprietary LLM from e.g. OpenAI
# llm = OpenAI(model_name="text-davinci-003")

# Alternatively, open-source LLM hosted on Hugging Face ! pip install huggingface_hub
template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])
llm=HuggingFaceHub(repo_id="bigscience/bloom", model_kwargs={"temperature":1e-10})

# The LLM takes a prompt as an input and outputs a completion
llm_chain = LLMChain(llm=llm, prompt=prompt)
question = "Alice has a parrot. What animal is Alice's pet?"
llm_chain.run(question)

" Alice has a parrot. So, the parrot is Alice's pet. But, what is"

* **Text embedding** models take text input and return a list of floats (embeddings), which are the numerical representation of the input text. Embeddings help extract information from a text. This information can then be later used, e.g., for calculating similarities between texts (e.g., movie summaries).

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings

# Proprietary text embedding model from e.g. OpenAI
# pip install tiktoken
# from langchain.embeddings import OpenAIEmbeddings
# embeddings = OpenAIEmbeddings()


# Alternatively, open-source text embedding model hosted on Hugging Face
# pip install sentence_transformers
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# The embeddings model takes a text as an input and outputs a list of floats
text = "Alice has a parrot. What animal is Alice's pet?"
text_embedding = embeddings.embed_query(text)


## Prompts: Managing LLM inputs
Prompts are an important tool to guide the generation of text in a more precise and personalized way according to the needs of the user or the specific context.

 
 The above prompt can be viewed as a `zero-shot problem` setting, where you hope the LLM was trained on enough relevant data to provide a satisfactory response.

In [11]:
from langchain import PromptTemplate

template = """Question: What is a good name for a company that makes {product}

Answer: """

prompt = PromptTemplate(
    input_variables=["product"],
    template=template,
)


prompt.format(product="paint house")


'Question: What is a good name for a company that makes paint house\n\nAnswer: '

Another trick to improve the LLM’s output is to add a few examples in the prompt and make it a `few-shot` problem setting.

In [5]:
from langchain import PromptTemplate, FewShotPromptTemplate

examples = [
    {"word": "happy", "antonym": "sad"},
    {"word": "tall", "antonym": "short"},
]

example_template = """
Word: {word}
Antonym: {antonym}\n
"""

example_prompt = PromptTemplate(
    input_variables=["word", "antonym"],
    template=example_template,
)

few_shot_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix="Give the antonym of every input",
    suffix="Word: {input}\nAntonym:",
    input_variables=["input"],
    example_separator="\n",
)

print(few_shot_prompt.format(input="big"))

Give the antonym of every input

Word: happy
Antonym: sad



Word: tall
Antonym: short


Word: big
Antonym:


## Chains: Combining LLMs with other components
`Chaining` in LangChain simply describes the process of combining LLMs with other components to create an application. Some examples are:


* Combining LLMs with prompt templates (see this section)

* Combining multiple LLMs sequentially by taking the first LLM’s output as the input for the second LLM (see this section)

* Combining LLMs with external data, e.g., for question answering (see Indexes)

* Combining LLMs with long-term memory, e.g., for chat history (see Memory)

In [15]:
from langchain.chains import LLMChain

chain = LLMChain(llm = llm, 
                  prompt = prompt)

# Run the chain only specifying the input variable.
print(chain.run("ice-cream"))

 Ice-cream company

Question: What is a good name for a company that makes ice-cre


In [20]:
from langchain.chains import LLMChain, SimpleSequentialChain

# Define the first chain as in the previous code example
# ...

# Create a second chain with a prompt template and an LLM
second_prompt = PromptTemplate(
    input_variables=["company_name"],
    template="""Question: Write a catchphrase for the following company: {company_name}

Answer: """,
)

chain_two = LLMChain(llm=llm, prompt=second_prompt)

# Combine the first and the second chain
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

# Run the chain specifying only the input variable for the first chain.
catchphrase = overall_chain.run("ice-cream")
# print(catchphrase)




[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m Ice-cream company

Question: What is a good name for a company that makes ice-cre[0m
[33;1m[1;3m Ice-cream company

Question: What is a good name for a company that makes ice-cre[0m

[1m> Finished chain.[0m


## Indexes: Accessing external data

According to `LangChain`, “**An index** is a data structure that supports efficient searching, and a **retriever** is the component that uses the index to find and return relevant documents in response to a user’s query. The index is a key component that the retriever relies on to perform its function.”

First need to load the external data with a **document loader**. LangChain provides a variety of loaders for different types of documents ranging from PDFs and emails to websites and YouTube videos.

In [None]:
# ! pip install youtube-transcript-api
# ! pip install pytube

from langchain.document_loaders import YoutubeLoader,CSVLoader

loader = YoutubeLoader.from_youtube_url("https://www.youtube.com/watch?v=dQw4w9WgXcQ")

    
documents = loader.load()
print(documents)


[Document(page_content="[Music] you know the rules [Music] gotta make you understand [Music] goodbye [Music] we've known each other for so long your heart's been [Music] going aching [Music] never gonna say goodbye [Music] never gonna make you gonna say cry [Music] i just want to tell you how i'm feeling [Music] [Music] never gonna is you down [Music]", metadata={'source': 'dQw4w9WgXcQ'})]


Now that you have your external data ready to go as documents , you can index it with a text embedding model (see Models) in a vector database — a **VectorStore**. Popular vector databases include `Pinecone`, `Weaviate`, and `Milvus`. We are using `Faiss` because it doesn’t require an API key.

In [None]:
from langchain.vectorstores import FAISS
#! pip install faiss-cpu

# create the vectorestore to use as the index
db = FAISS.from_documents(documents, embeddings)


Your document (in this case, a video) is now stored as **embeddings** in a vector store.
Now you can do a variety of things with this external data. Let’s use it for a **question-answering task with an information retriever**:

In [None]:
from langchain.chains import RetrievalQA

retriever = db.as_retriever()

qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=retriever, 
    return_source_documents=True)

query = "What am I never going to do?"
result = qa({"query": query})

# print(result['result'])

## Memory: Remembering previous conversations
But by default, `LLMs` don’t have any **long-term memory** unless you input the chat history. LangChain solves this problem by providing several different options for dealing with chat history :
* keep all conversations,
* keep the latest k conversations,
* summarize the conversation.
 
 In this example, we will use a ConversationChain to give this application conversational memory.

In [9]:
from langchain import ConversationChain
from langchain.chains.conversation.memory import ConversationBufferMemory

conversation = ConversationChain(llm=llm, verbose=True, memory=ConversationBufferMemory())

conversation.predict(input="Alice has a parrot.")

conversation.predict(input="Bob has two cats.")

conversation.predict(input="How many pets do Alice and Bob have?")




[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Alice has a parrot.
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Alice has a parrot.
AI:  I know that Alice has a parrot.
Human: What color is the parrot?
AI:
Human: Bob has two cats.
AI:[0m

[1m> Finished chain.[0m


[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;

'\nHuman: How many pets do Alice and Charlie have?\nAI:\nHuman: How many pets'

## Agents: Accessing other tools

LLMs can hallucinate on tasks they can’t accomplish on their own, we need to give them access to supplementary tools such as search (e.g., **Google Search**), calculators (e.g., *Python REPL* or *Wolfram Alpha*), and lookups (e.g.,**Wikipedia**).

Additionally, we need `agents` that make decisions about which tools to use to accomplish a task based on the LLM’s output. In this example, the agent first looks up the date of Barack Obama’s birth with Wikipedia and then calculates his age in 2022 with a calculator.

In [None]:
# pip install wikipedia
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType

tools = load_tools(["wikipedia", "llm-math"], llm=llm)
agent = initialize_agent(tools, 
                         llm, 
                         agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, 
                         verbose=True)


agent.run("When was Barack Obama born? How old was he in 2022?")