# Integrating LangChain and Hugging Face

It's very useful to [use Hugging Face models within LangChain](https://python.langchain.com/docs/integrations/providers/huggingface/). Let's see how.

## Install required packages

In [1]:
%pip install langchain langchain_community
#For API Calls
%pip install transformers
%pip install accelerate
%pip install bitsandbytes

Collecting langchain_community
  Downloading langchain_community-0.3.26-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain-core<1.0.0,>=0.3.58 (from langchain)
  Downloading langchain_core-0.3.66-py3-none-any.whl.metadata (5.8 kB)
Collecting langchain
  Downloading langchain-0.3.26-py3-none-any.whl.metadata (7.8 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloadin

In [2]:
from langchain import PromptTemplate, HuggingFaceHub, LLMChain
import os

from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

openai_key = os.getenv("OPENAI_API_KEY")
hf_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")


## Approach 1:  Access Models Hosted on Hugging Face Through API

In [3]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [4]:
%pip install langchain-huggingface -q

In [None]:
os.environ["HUGGINGFACEHUB_API_TOKEN"] = ""  # your valid token

In [6]:
from langchain.prompts import PromptTemplate
from langchain_huggingface.llms import HuggingFaceEndpoint

# 1. Define prompt
prompt = PromptTemplate(
    input_variables=["product"],
    template="What is a good name for a company that makes {product}?"
)

# 2. Define LLM with model kwargs directly
llm = HuggingFaceEndpoint(
    repo_id="deepseek-ai/DeepSeek-R1",
    # huggingfacehub_api_token="HF_TOKEN", # already provided
    provider="hf-inference"
)

# 3. construct chain
chain = LLMChain(prompt=prompt, llm=llm)

# 4. run it
response = chain.run("eco-friendly paint")
print(response)

  chain = LLMChain(prompt=prompt, llm=llm)
  response = chain.run("eco-friendly paint")
Error while fetching `HF_TOKEN` secret value from your vault: 'Requesting secret HF_TOKEN timed out. Secrets can only be fetched when running from the Colab UI.'.
You are not authenticated with the Hugging Face Hub in this notebook.
If the error persists, please let us know by opening an issue on GitHub (https://github.com/huggingface/huggingface_hub/issues/new).


HfHubHTTPError: 402 Client Error: Payment Required for url: https://router.huggingface.co/hf-inference/models/deepseek-ai/DeepSeek-R1 (Request ID: Root=1-685badd3-4dc770941635a16014e3833b;315c78c2-bd4d-4ab3-aba2-ce75fcba0b03)

You have exceeded your monthly included credits for Inference Providers. Subscribe to PRO to get 20x more monthly included credits.

Damn, I'm broke - I did the Agent course and spent all my tokens there. But there is a solution:

## Approach 02: Download Model Locally (Create Pipelines)

For this, the use of a GPU is heavily recommended. Intel CPUs are supported, but they will run much slower.

In [None]:
from langchain.llms import HuggingFacePipeline
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM

Following the [updated guide for Hugging Face on LangChain](https://python.langchain.com/docs/integrations/llms/huggingface_pipelines/), we should do:  

In [45]:
from langchain_huggingface.llms import HuggingFacePipeline

hf = HuggingFacePipeline.from_model_id(
    model_id="google/flan-t5-large",
    task="text2text-generation",
    pipeline_kwargs={
        "max_new_tokens": 128,
        "do_sample": True,
        "temperature": 0.7,
    },
    model_kwargs={},  # optional
)


Device set to use cuda:0


In [48]:
from langchain_core.prompts import PromptTemplate
from langchain.chains import LLMChain

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

chain = LLMChain(prompt=prompt, llm=hf)

chain.run("What is electroencephalography?")


'Electroencephalography is the study of the electrical activity in the brain. Electrical activity in the brain is called electroencephalography. So the answer is electroencephalography.'

The above is equivalent to running the chain in this more modern and efficient way, with the `|` operator:

In [49]:
chain = prompt | hf

question = "What is electroencephalography?"

print(chain.invoke({"question": question}))

Electroencephalography is the study of the electrical activity of the brain. The brain is the central nervous system. Electroencephalography is a technique used to study the electrical activity of the brain. So the answer is electroencephalography.


> side note: I tried to replicate the example from earlier (see below) but the model is just echoing back the input. Maybe it's a prompt problem?

In [52]:
# to replicate the example from earlier:

from langchain_core.prompts import PromptTemplate
from langchain.chains import LLMChain

template = "Generate a name for a company that makes {product}. Answer with the name of the company"
prompt = PromptTemplate.from_template(template)

chain = LLMChain(prompt=prompt, llm=hf)

chain.run("Colorful socks")

'Colorful Socks'