# Getting started with LangChain




In this quickstart we'll show you how to work with simple LLM functionality using LangChain. This is a great way to get started with LangChain - a lot of features can be built with just some prompting and an LLM call!

After walking through this tutorial, you'll have a high level overview of:

*   Invoking a Chat Model
*   Prompting methods
*   Using multimodal models
*   Tracking token usage and data
*   Using LangChain Expression Language (LCEL) to chain components together
*   Debugging and tracing your application using LangSmith

[Tutorials here](https://python.langchain.com/docs/tutorials/)

## Setting up Langchain

This guide (and most of the other guides in the documentation) uses Jupyter notebooks. Jupyter notebooks are great for learning how to work with LLM systems because oftentimes things can go wrong (unexpected output, API down, etc) and going through guides in an interactive environment is a great way to better understand them.

To install LangChain run:

In [3]:
%pip install --quiet -U langchain

## Using Language Models

First up, let's learn how to use a language model by itself. LangChain supports many different language models that you can use interchangeably. You are use models from:
*   Anthropic
*   OpenAI
*   Azure
*   Google
... and many more.

This is what is great about LangChain. It allows you to stay model-agnostic!

Chat models are language models that use a sequence of messages as inputs and return messages as outputs (as opposed to using plain text). These are generally newer models.

For this notebook, we shall be use Google's Gemini models. So head over to https://ai.google.dev/gemini-api/docs/api-key to generate a Google AI API key.  Once you've done this set the GOOGLE_API_KEY environment variable.

In [6]:
import getpass
import os

def _set_env(var: str):
    os.environ[var] = getpass.getpass(f"{var}: ")

In [7]:
_set_env("GOOGLE_API_KEY")

GOOGLE_API_KEY: ··········


In [8]:
_set_env("OPENAI_API_KEY")

OPENAI_API_KEY: ··········


You will also need to install the corresponding packages to the models you select:

In [9]:
%pip install -qU langchain-google-genai langchain_openai

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/41.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.8/41.8 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
[?25h

## Setting up Langsmith

Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with [LangSmith](https://smith.langchain.com/). If you don't have an account already, head over to LangSmith to create a free account.

In [10]:
_set_env("LANGCHAIN_API_KEY")

LANGCHAIN_API_KEY: ··········


We will also need a few more environment variables to be set. Remember, if the LANGCHAIN_PROJECT is not defined, it will default to the 'default' project.

In [11]:
os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGCHAIN_ENDPOINT"]="https://api.smith.langchain.com"
os.environ["LANGCHAIN_PROJECT"]="ai-agent-builder-workshop"

# Working with LLMs

We have completed our setup now! Let's get started with some LLM-magic!

## Invoking an LLM

Let's first use the model directly. ChatModels are instances of LangChain "Runnables", which means they expose a standard interface for interacting with them. To just simply call the model, we can pass in a list of messages to the .invoke method.

In [12]:
from langchain_openai import ChatOpenAI

llm_openai = ChatOpenAI(model="gpt-4o-mini") #  #gpt-3.5-turbo
llm_openai.invoke("Sing a ballad of LangChain")

RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

Oops... we've enabled LangSmith, we can see that this run is logged to LangSmith, and can see the LangSmith trace

Not a problem. Let's defer back to our Gemini Pro model and use that.

In [13]:
from langchain_google_genai import ChatGoogleGenerativeAI

llm_gemini_pro = ChatGoogleGenerativeAI(model="gemini-1.5-pro") # gemini-1.5-flash
result  = llm_gemini_pro.invoke("Sing a ballad of LangChain.")
print(result.content)

(Verse 1)
In silicon valleys, where code weaves its spell,
A new tool arose, its story to tell.
LangChain the name, a chain forged with care,
To link language models, beyond compare.
From GPT's vast knowledge, to agents so bright,
It wove them together, with all of its might.

(Verse 2)
The whispers began, 'mongst coders so keen,
Of chains that could reason, and sights unforeseen.
No longer confined to a single domain,
But reaching across, a vast, boundless plain.
With APIs humming, and data in flow,
LangChain empowered, the seeds it did sow.

(Verse 3)
A chatbot awakened, with wit sharp and keen,
Responding to queries, a conversational queen.
A question was posed, and the chain sprang to life,
Retrieving the answer, resolving the strife.
From databases deep, to the web's endless store,
LangChain delivered, and asked for no more.

(Verse 4)
The agents arose, with purpose so clear,
To act on our words, dispelling all fear.
A task to complete, a goal to attain,
LangChain orchestrated, ag

In [14]:
from langchain_core.messages import HumanMessage, SystemMessage

llm_gemini_flash = ChatGoogleGenerativeAI(model="gemini-1.5-flash")
messages = [
    SystemMessage(content="Translate the following from English into Italian"),
    HumanMessage(content="hi!"),
]

result = llm_gemini_flash.invoke(messages)
print(result.content)

Ciao! 



## Model Basics

There are multiple parameters we can specify with our model, such as:


**Temperature** - The lower the temperature, the more deterministic the results in the sense that the highest probable next token is always picked.

**Top P** - A sampling technique with temperature, called nucleus sampling, where you can control how deterministic the model is. If you are looking for exact and factual answers keep this low.

> The general recommendation is to alter temperature or Top P but not both.

**Max Length** - You can manage the number of tokens the model generates by adjusting the max length. Specifying a max length helps you prevent long or irrelevant responses and control costs.

**Stop Sequences** - A stop sequence is a string that stops the model from generating tokens. Specifying stop sequences is another way to control the length and structure of the model's response. For example, you can tell the model to generate lists that have no more than 10 items by adding "11" as a stop sequence.

[Prompting Guide](https://www.promptingguide.ai/)

In [21]:
llm_gemini_flash = ChatGoogleGenerativeAI(model="gemini-1.5-flash",
    temperature=0.9,
    max_tokens=100,
    timeout=None,
    max_retries=2)
result  = llm_gemini_flash.invoke("Sing a ballad of LangChain.")
print(result.content)

(Verse 1)
In the realm of code, where knowledge resides,
A chain of thought, a tool that confides.
LangChain, they call it, a name that rings true,
A symphony of words, a language anew.

(Chorus)
Oh, LangChain, LangChain, a weaver of text,
Connecting minds, no human can contest.
With prompts and models, you bridge the divide,
Unlocking secrets, where wisdom does hide.


## LCEL: LangChain Expression Language

LangChain Expression Language (LCEL) allows us to chain together LangChain modules. There are several benefits to this approach, including optimized streaming and tracing support.

More commonly, we can "chain" the model with this output parser. This means this output parser will get called every time in this chain. This chain takes on the input type of the language model (string or list of message) and returns the output type of the output parser (string).

In [15]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

system_template = "Translate the following into {language}:"
prompt_template = ChatPromptTemplate.from_messages(
    [("system", system_template), ("user", "{text}")]
)
parser = StrOutputParser()
messages = [
    SystemMessage(content="Translate the following from English into Italian"),
    HumanMessage(content="hi!"),
]
chain = prompt_template | llm_gemini_flash | parser

chain.invoke({"language": "italian", "text": "hi"})

'Ciao! \n'

## Structured prompts

It is often useful to have a model return output that matches a specific schema. One common use-case is extracting data from text to insert into a database or use with some other downstream system.

The `.with_structured_output()` method is the easiest and most reliable way to get structured outputs. `with_structured_output()` is implemented for models that provide native APIs for structuring outputs, like tool/function calling or JSON mode, and makes use of these capabilities under the hood.

Refer: https://python.langchain.com/v0.1/docs/modules/model_io/chat/structured_output/

In [17]:
from typing import Optional

from pydantic import BaseModel, Field

class Joke(BaseModel):
    '''Joke to tell user.'''

    setup: str = Field(description="The setup of the joke")
    punchline: str = Field(description="The punchline to the joke")
    rating: Optional[int] = Field(description="How funny the joke is, from 1 to 10")

structured_llm = llm_gemini_flash.with_structured_output(Joke)
print(structured_llm.invoke("Tell me a joke about cats"))

setup="Why don't cats play poker?" punchline='\\"Why don\'t cats play poker? Because they always have an ace up their sleeve!\\"' rating=7


## Chain of thought Prompting

The technique introduced in this paper is a novel approach to enhance the reasoning capabilities of large language models (LLMs), especially in multi-step reasoning tasks.

In contrast to the standard prompting, where models are asked to directly produce the final answer, 'Chain of Thought Prompting' encourages LLMs to generate intermediate reasoning steps before providing the final answer to a problem. The advantage of this technique lies in its ability to break down complex problems into manageable, intermediate steps. By doing this, the model-generated 'chain of thought' can mimic an intuitive human thought process when working through multi-step problems.

Reference: https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/language/prompts/examples/chain_of_thought_react.ipynb

In [24]:
question = """Q: Roger has 500 tennis balls. He buys 2 more cans of tennis balls.
Each can has 300 tennis balls. How many tennis balls does he have now?
A: Answer is 1100.
Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have left?
A: """

result = llm_gemini_flash.invoke(question)
print(result.content)

Here's how to solve the apple problem:

* **Start with the initial amount:** The cafeteria had 23 apples.
* **Subtract the apples used:** They used 20 apples, so 23 - 20 = 3 apples left.
* **Add the new apples:** They bought 6 more apples, so 3 + 6 = 9 apples.

**Answer:** The cafeteria has 9 apples left. 



In [25]:
question = """Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls.
Each can has 3 tennis balls. How many tennis balls does he have now?
A: Roger started with 5 balls. 2 cans of 3 tennis balls
each is 6 tennis balls. 5 + 6 = 11. The answer is 11.
Q: The cafeteria had 23 apples.
If they used 20 to make lunch and bought 6 more, how many apples do they have?
A:"""

result = llm_gemini_flash.invoke(question)
print(result.content)

Here's how to solve the apple problem:

* **Start with the initial amount:** The cafeteria had 23 apples.
* **Subtract the apples used:** They used 20 apples, so 23 - 20 = 3 apples left.
* **Add the new apples:** They bought 6 more apples, so 3 + 6 = 9 apples.

**Answer:** The cafeteria has 9 apples. 



## Multimodality

Some models can accept multimodal inputs, such as images, audio, video, or files. The types of multimodal inputs supported depend on the model provider. For instance, Google's Gemini supports documents like PDFs as inputs.

Most chat models that support multimodal inputs also accept those values in OpenAI's content blocks format. So far this is restricted to image inputs. For models like Gemini which support video and other bytes input, the APIs also support the native, model-specific representations.

The gist of passing multimodal inputs to a chat model is to use content blocks that specify a type and corresponding data. For example, to pass an image to a chat model:

In [26]:
from langchain_google_genai import ChatGoogleGenerativeAI
import base64
import httpx
from langchain_core.messages import HumanMessage

image_url = "https://live.staticflickr.com/2563/3851272354_aeb5981d89_b.jpg"
image_data = base64.b64encode(httpx.get(image_url).content).decode("utf-8")
message = HumanMessage(
    content=[
        {"type": "text", "text": "who is the character in this image?"},
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
        },
    ]
)
ai_msg = llm_gemini_flash.invoke([message])
ai_msg.content

'Donald Duck'

## Token Usage and Response Metadata

In [27]:
ai_msg.usage_metadata

{'input_tokens': 267,
 'output_tokens': 2,
 'total_tokens': 269,
 'input_token_details': {'cache_read': 0}}

In [28]:
ai_msg.response_metadata

{'prompt_feedback': {'block_reason': 0, 'safety_ratings': []},
 'finish_reason': 'STOP',
 'safety_ratings': [{'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT',
   'probability': 'NEGLIGIBLE',
   'blocked': False},
  {'category': 'HARM_CATEGORY_HATE_SPEECH',
   'probability': 'NEGLIGIBLE',
   'blocked': False},
  {'category': 'HARM_CATEGORY_HARASSMENT',
   'probability': 'NEGLIGIBLE',
   'blocked': False},
  {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT',
   'probability': 'NEGLIGIBLE',
   'blocked': False}]}

## Tool (function) calling

Function calling allows you to connect models to external tools and systems. This is useful for many things such as empowering AI assistants with capabilities, or building deep integrations between your applications and the models.

Example use cases
Function calling is useful for a large number of use cases, such as:
- Enabling assistants to fetch data: an AI assistant needs to fetch the latest customer data from an internal system when a user asks “what are my recent orders?” before it can generate the response to the user
- Enabling assistants to take actions: an AI assistant needs to schedule meetings based on user preferences and calendar availability.
- Enabling assistants to perform computation: a math tutor assistant needs to perform a math computation.
- Building rich workflows: a data extraction pipeline that fetches raw text, then converts it to structured data and saves it in a database.
- Modifying your applications' UI: you can use function calls that update the UI based on user input, for example, rendering a pin on a map.


**References**
1.   https://platform.openai.com/docs/guides/function-calling
2.   https://python.langchain.com/docs/how_to/function_calling/#passing-tools-to-llms
3.   https://medium.com/@ranadevrat/understanding-langchain-agents-a-beginners-guide-8a87708dc48e
4.   https://python.langchain.com/docs/how_to/tool_results_pass_to_model/

In [29]:
from pydantic import BaseModel, Field


class GetWeather(BaseModel):
    '''Get the current weather in a given location'''
    location: str = Field(
        ..., description="The city and state, e.g. San Francisco, CA"
    )

class GetPopulation(BaseModel):
    '''Get the current population in a given location'''

    location: str = Field(
        ..., description="The city and state, e.g. San Francisco, CA"
    )


llm_with_tools = llm_gemini_flash.bind_tools([GetWeather, GetPopulation])

In [30]:
ai_msg = llm_with_tools.invoke(
    "Which city is hotter today: LA or NY?"
)
ai_msg.tool_calls

[{'name': 'GetWeather',
  'args': {'location': 'Los Angeles, CA'},
  'id': 'f6d8cdd8-d2c4-428b-a839-aee2693094c5',
  'type': 'tool_call'},
 {'name': 'GetWeather',
  'args': {'location': 'New York, NY'},
  'id': '8e911f9d-86ff-4e06-875a-c634594edf79',
  'type': 'tool_call'}]

In [31]:
ai_msg = llm_with_tools.invoke(
    "Which city is bigger: LA or NY?"
)
ai_msg.tool_calls

[{'name': 'GetPopulation',
  'args': {'location': 'Los Angeles, CA'},
  'id': 'b0de61ae-d2ca-4435-9d96-b02f6d52f2a8',
  'type': 'tool_call'},
 {'name': 'GetPopulation',
  'args': {'location': 'New York, NY'},
  'id': 'a1cdef44-3072-4969-a7dd-1375f8815d3e',
  'type': 'tool_call'}]

## Document Parsers & Vector Stores (optional)

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

Try it out: https://python.langchain.com/docs/how_to/vectorstores/