# Huggingface With Langchain

Announcement Link: https://huggingface.co/blog/langchain


In [None]:
## Libraries Required
!pip install langchain-huggingface
## For API Calls
!pip install huggingface_hub
!pip install transformers
!pip install accelerate
!pip install  bitsandbytes
!pip install langchain


Collecting langchain-huggingface
  Downloading langchain_huggingface-0.1.2-py3-none-any.whl.metadata (1.3 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain-huggingface)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain-huggingface)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain-huggingface)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers>=2.6.0->langchain-huggingface)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==1

## HuggingFaceEndpoint
## How to Access HuggingFace Models with API
There are also two ways to use this class. You can specify the model with the repo_id parameter. Those endpoints use the serverless API, which is particularly beneficial to people using pro accounts or enterprise hub. Still, regular users can already have access to a fair amount of request by connecting with their HF token in the environment where they are executing the code.


In [None]:
from langchain_huggingface import HuggingFaceEndpoint

In [None]:
from google.colab import userdata
sec_key=userdata.get("HUGGINGFACEHUB_API_TOKEN")
print(sec_key)

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


In [None]:
import os
os.environ["HUGGINGFACEHUB_API_TOKEN"]=sec_key

In [None]:
# repo_id="mistralai/Mistral-7B-Instruct-v0.2"
# llm=HuggingFaceEndpoint(repo_id=repo_id,max_length=128,temperature=0.7,token=sec_key)

                    max_length was transferred to model_kwargs.
                    Please make sure that max_length is what you intended.
                    token was transferred to model_kwargs.
                    Please make sure that token is what you intended.


In [None]:
# from langchain.llms import HuggingFaceEndpoint

repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_length=128,
    temperature=0.7,
    token="sec_key",  # Ensure your API key is correct
    task="text-generation"  # Explicitly set the task
)

response = llm.invoke("What is machine learning?")
print(response)


In [None]:
llm.invoke("What is machine learning")

'?\n\nMachine Learning (ML) is a subfield of Artificial Intelligence (AI) that provides systems the ability to learn and improve from experience without being explicitly programmed. It focuses on the development of computer programs that can access data and use it to learn for themselves.\n\nThe process of learning begins with observations or data, such as examples, direct experience, or instruction, in order to look for patterns in data and make better decisions in the future based on the examples that we provide. The primary aim is to allow the computers to learn automatically without human intervention or assistance and adjust actions accordingly.\n\nMachine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to perform the task. ML algorithms are designed to automatically improve with experience, the more data they are exposed to, the more accurate their decision

In [None]:
llm.invoke("What is Genertaive AI")

'?\n\nGenerative AI is a type of artificial intelligence that can create new content, such as images, text, or music, based on patterns it has learned from existing data. It uses machine learning algorithms to generate outputs that are similar in style and form to the input data, but are not exact copies. Generative AI models can be used in a variety of applications, such as creating personalized content, generating realistic images for video games or movies, and even composing music. Some popular examples of generative AI include deepfakes, DALL-E, and the text-to-image model known as Imagen.\n\nGenerative AI models are trained on large datasets of examples, and use a variety of techniques to learn the underlying patterns and structures in the data. One common technique is called generative adversarial networks (GANs), which consist of two neural networks that compete with each other to generate more realistic outputs. The generator network creates new content, while the discriminator

In [None]:
from langchain import PromptTemplate, LLMChain

question="Who won the Cricket World Cup in the year 2011?"
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
print(prompt)

input_variables=['question'] input_types={} partial_variables={} template="Question: {question}\nAnswer: Let's think step by step."


In [None]:
llm_chain=LLMChain(llm=llm,prompt=prompt)
print(llm_chain.invoke(question))

  llm_chain=LLMChain(llm=llm,prompt=prompt)


{'question': 'Who won the Cricket World Cup in the year 2011?', 'text': '\n\nStep 1: The Cricket World Cup is a major international tournament that takes place every four years.\n\nStep 2: In 2011, the tournament was held in India, Sri Lanka, and Bangladesh.\n\nStep 3: The final match was played on April 2, 2011, at the Wankhede Stadium in Mumbai, India.\n\nStep 4: The final was between India and Sri Lanka, and India won the match by 6 wickets with 10 balls remaining.\n\nSo, India won the Cricket World Cup in the year 2011.'}


## HuggingFacePipeline
Among transformers, the Pipeline is the most versatile tool in the Hugging Face toolbox. LangChain being designed primarily to address RAG and Agent use cases, the scope of the pipeline here is reduced to the following text-centric tasks: “text-generation", “text2text-generation", “summarization”, “translation”.
Models can be loaded directly with the from_model_id method


In [None]:
from langchain_huggingface import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [None]:
model_id="gpt2"
model=AutoModelForCausalLM.from_pretrained(model_id)
tokenizer=AutoTokenizer.from_pretrained(model_id)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
pipe=pipeline("text-generation",model=model,tokenizer=tokenizer,max_new_tokens=100)
hf=HuggingFacePipeline(pipeline=pipe)

Device set to use cuda:0


In [None]:
hf

HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7b214fd69050>, model_id='gpt2')

In [None]:
hf.invoke("What is machine learning")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


"What is machine learning?\n\nAll of the world's top economists have said this before. Most of this is made clear in the recent book:\n\nRobustness is required to be truly useful, which is really an elusive idea. This is the fundamental problem with traditional machine learning. To me, a better idea is one as simple as how to find data points with a constant speed and with constant noise. This approach is not at all a straightforward idea — its use requires the use of high-level data"

In [None]:
## Use HuggingfacePipelines With Gpu
gpu_llm = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    device=0,  # replace with device_map="auto" to use the accelerate library.
    pipeline_kwargs={"max_new_tokens": 100},
)

Device set to use cuda:0


In [None]:
from langchain_core.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

In [None]:
chain=prompt|gpu_llm

In [None]:
question="What is artificial intelligence?"
chain.invoke({"question":question})

"Question: What is artificial intelligence?\n\nAnswer: Let's think step by step. You're probably going to run into some bots, or AI-powered things, or something like that. You're going to play with them, and you're not going to know a lot about them. If you've seen a couple of games that you're trying to do, and you're playing around with them, you might get a chance to try something with these bots. And it's not like you're doing that to your player.\n\nLook, there's some things that people want"