[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/00-langchain-intro.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/00-langchain-intro.ipynb)

#### [LangChain Handbook](https://github.com/pinecone-io/examples/tree/master/learn/generation/langchain/handbook)

# Intro to LangChain

LangChain is a popular framework that allow users to quickly build apps and pipelines around **L**arge **L**anguage **M**odels. It can be used for chatbots, RAG, agents, and much more.

The core idea of the library is that we can _"chain"_ together different components to create more advanced use-cases around LLMs. These chains (better thought of as pipelines or workflows) may consist of various components from several modules:

* **Prompt templates**: Prompt templates are, well, templates for different types of prompts. Like "chatbot" style templates, ELI5 question-answering, etc

* **LLMs**: Large language models like GPT-4.1, Claude 4, etc

* **Tool / function calling**: Allow us to augment our LLMs with additional abilities / information sources.

* **Agents**: Agents act as the framework that integrates LLMs and tools.LLMs are packaged into logical loops of operations with tools like web search, **R**etrieval **A**ugmented **G**eneration (RAG), or code execution.

* **Memory**: Short-term memory, long-term memory.

In [1]:
%pip install -qU \
  langchain==0.3.25 \
  langchain-huggingface==0.3.0 \
  langchain-openai==0.3.22

Note: you may need to restart the kernel to use updated packages.


c:\Users\rishi\LangChain-Pinecone-io-thing\.venv\Scripts\python.exe: No module named pip


# Using LLMs in LangChain

LangChain supports several LLM providers, like Hugging Face and OpenAI.

Let's start our exploration of LangChain by learning how to use a few of these different LLM integrations.

## Hugging Face

For Hugging Face models we need a Hugging Face Hub API token. We can find this by first getting an account at [HuggingFace.co](https://huggingface.co/) and clicking on our profile in the top-right corner > click *Settings* > click *Access Tokens* > click *New Token* > set *Token type* to `Fine-grained` with the following user or organization permissions:

* **Inference** - Make calls to Inference Providers
* **Inference** - Make calls to your Inference Endpoints
* **Inference** - Manage your Inference Endpoints

After generating the token, enter it below:

In [2]:
from IPython.display import display, Markdown
import os
from getpass import getpass

# must enter API key
token = os.getenv('HF_TOKEN') or \
    getpass("Hugging Face API Token: ")
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY") or \
    getpass("Enter LangSmith API Key: ")

# below should not be changed
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
# you can change this as preferred
os.environ["LANGCHAIN_PROJECT"] = "langchain-pinecone-io-walkthrough-intro"

We can then generate text using a HF Hub model (we'll use `microsoft/Phi-3-mini-4k-instruct`) using the Inference API built into Hugging Face Hub.

_(The default Inference API doesn't use specialized hardware and so can be slow, particularly for larger models)_

In [3]:
from langchain_huggingface import HuggingFaceEndpoint, HuggingFacePipeline
from langchain.prompts import PromptTemplate
from langchain import LLMChain
import os
from huggingface_hub import list_models

# List models available for Inference API (filter for text-generation or text2text-generation tasks)
models = list(list_models(filter="text-generation", full=True, limit=50))
print("Some available models for 'text-generation' via Hugging Face Inference API:")
for m in models:
    print(f" - {m.modelId}")
    print(f" - {m.pipeline_tag}")
    print(f" - {m.tags}")
    print(f" - {m.likes} likes")
    print(f" - {m.downloads} downloads")
    print(f" - {m.lastModified} last modified")
    print(f" - {m.cardData} card data\n\n")

# You can also visit https://huggingface.co/models?pipeline_tag=text-generation&library=transformers
# to browse and search for models available for inference.

  from .autonotebook import tqdm as notebook_tqdm


Some available models for 'text-generation' via Hugging Face Inference API:
 - openai/gpt-oss-120b
 - text-generation
 - ['transformers', 'safetensors', 'gpt_oss', 'text-generation', 'vllm', 'conversational', 'license:apache-2.0', 'autotrain_compatible', 'endpoints_compatible', '8-bit', 'mxfp4', 'region:us']
 - 2900 likes
 - 237866 downloads
 - 2025-08-07 17:43:56+00:00 last modified
 - None card data


 - openai/gpt-oss-20b
 - text-generation
 - ['transformers', 'safetensors', 'gpt_oss', 'text-generation', 'vllm', 'conversational', 'license:apache-2.0', 'autotrain_compatible', 'endpoints_compatible', '8-bit', 'mxfp4', 'region:us']
 - 2468 likes
 - 863986 downloads
 - 2025-08-07 17:43:45+00:00 last modified
 - None card data


 - tencent/Hunyuan-1.8B-Instruct
 - text-generation
 - ['transformers', 'safetensors', 'hunyuan_v1_dense', 'text-generation', 'conversational', 'autotrain_compatible', 'endpoints_compatible', 'region:us']
 - 546 likes
 - 2293 downloads
 - 2025-08-06 07:30:26+00:0

In [4]:
# from langchain_huggingface import HuggingFaceHub

llm = HuggingFaceEndpoint(
    repo_id="Qwen/Qwen3-4B-Instruct-2507",
    task="text-generation",
    max_new_tokens=100,
    temperature=0.7, 
    huggingfacehub_api_token=token
#     provider="hf-inference",
#     huggingfacehub_api_token=token
)

In [None]:
# Build prompt template
template = """Question: {question}

Answer: """
prompt = PromptTemplate(template=template, input_variables=["question"])

# we chain together the prompt -> LLM with LCEL (more on this later)
llm_chain = prompt | llm

question = {"question": "Which NFL team won the Super Bowl in the 2010 season?"}

print(llm_chain.invoke(question))

In [6]:
import os
from langchain_huggingface import HuggingFaceEndpoint
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate

# Ensure your Hugging Face API token is set as an environment variable
os.environ["HUGGINGFACEHUB_API_TOKEN"] = os.getenv("HUGGINGFACE_API_TOKEN")

llm = HuggingFaceEndpoint(
    repo_id="openai/gpt-oss-20b",
    # task="conversational",  # Specify the task type
    temperature=0.7,  # Recommended sampling parameter for this model
    top_p=0.8, # Recommended sampling parameter for this model
    top_k=20, # Recommended sampling parameter for this model
    provider='novita', # Available values: 'auto' or any provider from ['black-forest-labs', 'cerebras', 'cohere', 'fal-ai', 'featherless-ai', 'fireworks-ai', 'groq', 'hf-inference', 'hyperbolic', 'nebius', 'novita', 'nscale', 'openai', 'replicate', 'sambanova', 'together'].Passing 'auto' (default value) will automatically select the first provider available for the model, sorted by the user's order in https://hf.co/settings/inference-providers

    # You can add other parameters as needed, like max_new_tokens
    # max_new_tokens=16384,  # Recommended output length for instruct models
)

system_prompt = SystemMessagePromptTemplate.from_template(
    "You are an AI assistant called Sri that helps generate article titles."
)

user_prompt = HumanMessagePromptTemplate.from_template(
    "Write a short story about a detective who solves a mystery"
    " in a futuristic city. Title: {title}",
    input_variables=["title"]
)

prompt = ChatPromptTemplate.from_messages(
    [
        system_prompt,
        user_prompt
    ]
)

chain = (
    {
        "title": lambda x: x["title"]
    }
    | prompt
    | llm
    | {"response": lambda x: x["response"]}
)

# response = chain.invoke({"title": "The Case of the Missing Android"})
# print(response["text"])


In [None]:
# lets's try pipes
from transformers import pipeline
import torch

model_id = "openai/gpt-oss-20b"

pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Explain quantum mechanics clearly and concisely."},
]

outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])


If we'd like to ask multiple questions we can by passing a list of dictionary objects, where the dictionaries must contain the input variable set in our prompt template (`"question"`) that is mapped to the question we'd like to ask.

In [None]:
qs = [
    {'title': "Which NFL team won the Super Bowl in the 2010 season?"},
    {'title': "If I am 6 ft 4 inches, how tall am I in centimeters?"},
    {'title': "Who was the 12th person on the moon?"},
    {'title': "How many eyes does a blade of grass have?"}
]
res = llm_chain.batch(qs)

In [None]:
for question, response in zip(qs, res):
    print("="*100)
    print(f"QUESTION: {question}")
    print(f"RESPONSE: {response}")
    print("="*100 + "\n")

## OpenAI

We can also use OpenAI's LLMs. The process is similar, we need to
give our API key which can be retrieved from the
[OpenAI platform](https://platform.openai.com/settings/organization/api-keys). We then pass the API key below:

In [11]:
import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") or \
    getpass("OpenAI API Key: ")

If using OpenAI via Azure you should also set:

```python
os.environ['OPENAI_API_TYPE'] = 'azure'
# API version to use (Azure has several)
os.environ['OPENAI_API_VERSION'] = '2022-12-01'
# base URL for your Azure OpenAI resource
os.environ['OPENAI_API_BASE'] = 'your-resource-name.openai.azure.com'
```

Then we decide on which model we'd like to use, there are several options but we will go with `text-davinci-003`:

In [12]:
from langchain_openai import ChatOpenAI

# Initialize with a modern model
openai_llm = ChatOpenAI(
    model_name="gpt-5-mini",
    temperature=1.0
)

Alternatively if using Azure OpenAI we do:

```python
from langchain_openai import AzureOpenAI

openai_llm = AzureOpenAI(
    deployment_name="your-azure-deployment",
    model_name="gpt-4.1-mini"
)
```

We'll use the same simple question-answer prompt template as before with the Hugging Face example. The only change is that we now pass our OpenAI LLM `openai`:

In [13]:
llm_chain = prompt | openai_llm

question = "Which NFL team won the Super Bowl in the 2010 season?"

repsonse = llm_chain.invoke(
    {
        "title": "The Case of the Missing Android"
    }
)

print(repsonse.content)

The Case of the Missing Android

The rain in New Helix never fell; it scrolled. Tiny droplets of display-light cascaded down glass towers in shifting ads, and everything glittered with promises you couldn't afford. Mara Voss watched the city from a rented office two blocks above a noodle joint, the neon reflecting on her desk like fingerprints. Her coat was older than most of the city’s transit cards. She liked it that way.

She took the case because the shoe designer looked desperate. ArgoTech's legal counsel — halogen skin, synthetic smile — had slipped Mara an ID chip and a single photograph: a slender android with iris-level eyes and a laugh-lines engraving in its cheek. Model L-7. Named Iris.

"Recovered late tonight," the counsel said. "Do not make this public. Locked apartment, sealed sensors, no external breach. Cameras looped for seven minutes. We assume a theft. Find Iris."

"They looped the feed themselves," Mara said, flipping the chip into her desk reader. The footage show

Alternatively we can batch questions as before:

In [None]:
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

# system_prompt = SystemMessagePromptTemplate.from_template(
#     "You are an AI assistant called Sri that helps generate article titles."
# )

# user_prompt = HumanMessagePromptTemplate.from_template(
#     "Write a short story about a detective who solves a mystery"
#     " in a futuristic city. Title: {title}",
#     input_variables=["title"]
# )

prompt = ChatPromptTemplate(
    [
        ("system", """You are an AI assistant called Sri that helps people answer questions.
        Here is the question for you to answer. Think step by step and
        consult any outside knowledge you may have to answer the question.
        \n\n"""),
        # MessagesPlaceholder(variable_name="question", optional=True),
        ("human", "{question}\n\n Be consise and clear in your response.")
    ]
)

chain = (
    {
        "question": lambda x: x["question"]
    }
    | prompt
    | openai_llm
    | {"response": lambda x: x["response"]}
)

qs = [
    {'question': "Which NFL team won the Super Bowl in the 2010 season?"},
    {'question': "If I am 6 ft 4 inches, how tall am I in centimeters?"},
    {'question': "Who was the 12th person on the moon?"},
    {'question': "How many eyes does a blade of grass have?"}
]
res = chain.batch(qs)

for question, response in zip(qs, res):
    print("="*100)
    print(f"QUESTION: {question}")
    print(f"RESPONSE: {response}")
    print("="*100 + "\n")

---