<a href="https://colab.research.google.com/github/0xVolt/learn-langchain/blob/main/hf_models_locally_with_langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# HuggingFace Models Locally with LangChain

## Ways to Create HuggingFace Models

There are two wrappers that HuggingFace (HF) provides to use LLMs, one for a local run and host and another for hosting your models and projects on HuggingFaceHub (HFH). You'll see most projects using the latter approach. However, there are pros and cons for both.

## Why not `HuggingFaceHub`?

Plain and simple, the HFH doesn't support all models on HF. And, for the models that are supported, the wrappers only exist for the following tasks: `text2text-generation` and `text-generation`. So, I've decided to against using HFH for this notebook and these models.

### More on the Tasks

`text2text-generation` represents tasks that are accomplished by encoder-decoder models like BERT, T5 and BART.

`text-generation` represents tasks that are done by models that are solely decoders like GPT-2, etc.

It's important to understand what kind of model you're working with since the definition and parameters that you'd use to setup the LangChain will differ. Examples of this are to follow.

## Why Locally?

You have the ability to fine-tune any of these models, get complete access to your GPU (if you have one) and avoid running into issues with models that don't run on the HFH.

## About this Notebook

This notebook will compare different models by setting them up locally and then linking them to LangChain. The idea is to use this notebook for future reference if I need to make a project that involves fine-tuning any of these models to create a LangChain backed application.

## Resources
- [Notebook on LangChain basics](https://colab.research.google.com/drive/1h2505J5H4Y9vngzPD08ppf1ga8sWxLvZ?usp=sharing#scrollTo=Derb_0t-ZESh)

---

In [1]:
!pip -q install langchain sentence_transformers transformers

# Model #1 - `T5-Flan`

## Create the LLM

This model us an encoder-decoder so we'll use the `text2text-generation` task to initialise it as I'd mentioned earlier. Notice how we import `AutoModelForSeq2SeqLM`. Pro tip: `Seq2Seq` is just another term for encoder-decoder.

In [None]:
import torch
from langchain import PromptTemplate, LLMChain
from langchain.llms import HuggingFacePipeline
from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM, AutoModelForCausalLM

Initialise the model and the auto-tokenizer.

In [None]:
modelID = 'google/flan-t5-large'
tokenizer = AutoTokenizer.from_pretrained(modelID)
model = AutoModelForSeq2SeqLM.from_pretrained(modelID)

Downloading (…)okenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

Downloading spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

Create the local LLMs pipeline with the model and tokenizer we defined.

In [None]:
pipe = pipeline(
    "text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=100
)

Finally, instantiate the LLM.

In [None]:
localLLM = HuggingFacePipeline(pipeline=pipe)

## Feed the LLM Prompts

Create a question and answer template using the `PromptTemplate` module.

In [None]:
template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])

Next, create the LLM chain which takes a prompt template and the LLM to use.

In [None]:
localLLMChain = LLMChain(
    prompt=prompt,
    llm=localLLM
)

Finally, put it all together by running the LLM chain.

In [None]:
# question = input("Question: ")
question = "What is India's primary export?"

In [None]:
print(localLLMChain.run(question))

India's primary export is textiles. Textiles are made of cotton. Cotton is a staple fabric in India. Cotton is a staple fabric in the United States. So the answer is cotton.


# Model #2 - `GPT-2`

## Create the LLM

This is a decoder model so we'll use the `text-generation` task.

In [None]:
modelID = 'gpt2-medium'
tokenizer = AutoTokenizer.from_pretrained(modelID)
model = AutoModelForCausalLM.from_pretrained(modelID)

Downloading (…)lve/main/config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [None]:
pipe = pipeline(
    'text-generation',
    model=model,
    tokenizer=tokenizer,
    max_length=100
)

In [None]:
localLLM = HuggingFacePipeline(pipeline=pipe)

In [None]:
localLLMChain = LLMChain(
    prompt=prompt,
    llm=localLLM
)

In [None]:
question = "What is India's largest export?"

print(localLLMChain.run(question))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.




First the basic facts. The first thing to understand is that India exports a lot of minerals.

The second thing is that more than 90% of its minerals come from its north-western states – like Bihar, Madhya Pradesh and Uttar Pradesh – as there is an abundance of them.

According to the ministry of minerals and minerals resources, in 2004 for example 10,000


---