In [None]:
## Libraries required
!pip install langchain-huggingface
## For API calls
!pip install huggingface_hub
!pip install transformers
!pip install accelerate
!pip install bitsandbytes
!pip install langchain

Collecting langchain-huggingface
  Downloading langchain_huggingface-0.0.3-py3-none-any.whl.metadata (1.2 kB)
Collecting langchain-core<0.3,>=0.1.52 (from langchain-huggingface)
  Downloading langchain_core-0.2.29-py3-none-any.whl.metadata (6.2 kB)
Collecting sentence-transformers>=2.6.0 (from langchain-huggingface)
  Downloading sentence_transformers-3.0.1-py3-none-any.whl.metadata (10 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.3,>=0.1.52->langchain-huggingface)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting langsmith<0.2.0,>=0.1.75 (from langchain-core<0.3,>=0.1.52->langchain-huggingface)
  Downloading langsmith-0.1.98-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain-core<0.3,>=0.1.52->langchain-huggingface)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting jsonpointer>=1.9 (from jsonpatch<2.0,>=1.33->langchain-core<0.3,>=0.1.52->langchain-huggingface)
  Downloading jso

In [None]:
## Environment secret keys
from google.colab import userdata
sec_key = userdata.get("Huggingface_hub")
print(sec_key)

# HuggingFaceEndpoint
### How to Access HuggingFace Models with API
There are also two ways to use this class. You can specify the model with the repo_id parameter. Those endpoints use the serverless API, which is particularly beneficial to people using pro accounts or enterprise hub. Still, regular users can already have access to a fair amount of request by connecting with their HF token in the environment where they are executing the code.

In [None]:
from langchain_huggingface import HuggingFaceEndpoint

 This line sets an environment variable named HUGGINGFACEHUB_API_TOKEN to the value stored in the sec_key variable. This variable is likely used to authenticate with the Hugging Face Hub API, which provides access to pre-trained language models and other resources.

 In summary, these two lines are essential for setting up the authentication credentials required to interact with the Hugging Face Hub API using the HuggingFaceEndpoint class.

In [None]:
import os
os.environ["HUGGINGFACEHUB_API_TOKEN"] = sec_key

In [None]:
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
llm = HuggingFaceEndpoint(repo_id=repo_id,max_length=128,temperature=0.7,token=sec_key)

                    max_length was transferred to model_kwargs.
                    Please make sure that max_length is what you intended.
                    token was transferred to model_kwargs.
                    Please make sure that token is what you intended.


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
llm

HuggingFaceEndpoint(repo_id='mistralai/Mistral-7B-Instruct-v0.2', temperature=0.7, model_kwargs={'max_length': 128, 'token': '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'}, model='mistralai/Mistral-7B-Instruct-v0.2', client=<InferenceClient(model='mistralai/Mistral-7B-Instruct-v0.2', timeout=120)>, async_client=<InferenceClient(model='mistralai/Mistral-7B-Instruct-v0.2', timeout=120)>)

In [None]:
llm.invoke('Explain fine tuning in llms in brief')

".\n\nFine-tuning in machine learning models refers to the process of taking a pre-trained model, which has already been trained on a large dataset, and adapting it to a new, smaller dataset. The goal is to improve the model's performance on the new dataset while minimizing the amount of training required.\n\nIn the context of large language models (LLMs), fine-tuning involves training the model on a specific task or dataset for a relatively short period of time, usually a few hours to a few days. The pre-trained model is first loaded, and then the new dataset is provided as input. The model's weights are updated based on the new data, allowing it to learn task-specific information while preserving the general knowledge it gained during pre-training.\n\nFine-tuning is an essential step in applying LLMs to various applications, such as text classification, question answering, and language translation. By fine-tuning, we can adapt the model to the specific requirements of the task and da

## Prompt Template

In [None]:
from langchain import PromptTemplate, LLMChain

question="Who won the Cricket World Cup in the year 2011?"
template = """Question: {question}
Answer: Let's think step by step."""
prompt = PromptTemplate(template=template, input_variables=["question"])
print(prompt)

input_variables=['question'] template="Question: {question}\nAnswer: Let's think step by step."


In [None]:
llm_chain=LLMChain(llm=llm,prompt=prompt)
print(llm_chain.invoke(question))

  warn_deprecated(


{'question': 'Who won the Cricket World Cup in the year 2011?', 'text': ' The Cricket World Cup is an international cricket tournament that takes place every four years. The tournament was held in 2011, so the team that won that year is the answer to this question. The team that won the Cricket World Cup in 2011 was India. So, the answer is India.'}


## HuggingFacePipeline
Among transformers, the Pipeline is the most versatile tool in the Hugging Face toolbox. LangChain being designed primarily to address RAG and Agent use cases, the scope of the pipeline here is reduced to the following text-centric tasks: “text-generation", “text2text-generation", “summarization”, “translation”. Models can be loaded directly with the from_model_id method.

*** We can use HuggingFacePipeline for smaller LLM models not for bigger
    models because of resources constraint.

In [None]:
from langchain_huggingface import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

In [None]:
model_id="gpt2"
model=AutoModelForCausalLM.from_pretrained(model_id)
tokenizer=AutoTokenizer.from_pretrained(model_id)

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
pipe=pipeline("text-generation",model=model,tokenizer=tokenizer,max_new_tokens=100)
hf=HuggingFacePipeline(pipeline=pipe)

Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.


In [None]:
hf

HuggingFacePipeline(pipeline=<transformers.pipelines.text_generation.TextGenerationPipeline object at 0x7e41fe8be5f0>)

In [None]:
hf.invoke("Role of AI in industries")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


'Role of AI in industries is complicated. Many problems come in different forms that require changing systems. Many firms and organizations are involved in the development of AI systems for some purposes; some companies and organizations are not involved. Others are in the design and implementation of AI. As a group, software development organizations are more involved in determining and developing AI algorithms than are AI startups. In spite of that, the degree of engagement with AI is very low by comparison.\n\nWhy is this important?\n\nAccording to a paper by'

In [None]:
## Use HuggingfacePipelines With Gpu
gpu_llm = HuggingFacePipeline.from_model_id(
    model_id="gpt2",
    task="text-generation",
    device=0,  # replace with device_map="auto" to use the accelerate library.
    pipeline_kwargs={"max_new_tokens": 100},
)

In [None]:
from langchain_core.prompts import PromptTemplate

template = """Question: {question}

Answer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

In [None]:
chain=prompt|gpu_llm

In [None]:
question="What is artificial intelligence?"
chain.invoke({"question":question})

'Question: What is artificial intelligence?\n\nAnswer: Let\'s think step by step. The idea is that you could combine your data with something specific, like a computer simulation. This could be a real life experience, or a simulated game in which players could play the role of some sort of computer scientist.\n\nWe could say, if someone has a really smart computer and they\'re writing a program, you would say, "Okay, you have this really smart computer, but you\'re trying to come up with some really powerful code to make you more useful." Well, that\'s'