<a href="https://colab.research.google.com/github/JohnTan38/LangChain-and-LLM/blob/master/LangChain_Running_HuggingFace_Models_Locally.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
!pip -q install langchain huggingface_hub transformers sentence_transformers accelerate bitsandbytes

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m215.3/215.3 KB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.2/84.2 MB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[?25h

## HuggingFace

There are two Hugging Face LLM wrappers, one for a local pipeline and one for a model hosted on Hugging Face Hub. Note that these wrappers only work for models that support the following tasks: text2text-generation, text-generation


In [3]:
import os
os.environ['HUGGINGFACEHUB_API_TOKEN'] = ''

## Use the HuggingFaceHub

In [4]:
from langchain import PromptTemplate, HuggingFaceHub, LLMChain
template = """Question: {question}
Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])

In [5]:
llm_chain = LLMChain(prompt=prompt, 
                     llm=HuggingFaceHub(repo_id="google/flan-t5-xl", 
                                        model_kwargs={"temperature":0, 
                                                      "max_length":64}))

In [6]:
question = "Which country has won the most number of World Cups?"

print(llm_chain.run(question))

Brazil has won the most number of World Cups. Brazil is a country. Brazil is a country. Brazil is a country. The answer: Brazil.


In [7]:
question = "What region is best for growing grapes used to make red wine in France?"

print(llm_chain.run(question))

The best region for growing grapes used to make red wine in France is the Rhone Valley. The Rhone Valley is located in France. The final answer: Rhone Valley.


## BlenderBot

Doesn't work on the Hub

In [None]:
blenderbot_chain = LLMChain(prompt=prompt, 
                     llm=HuggingFaceHub(repo_id="facebook/blenderbot-1B-distill", 
                                        model_kwargs={"temperature":0, 
                                                      "max_length":64}))

ValidationError: ignored

In [None]:
# question = "What is the capital of France?"
# question = "What area is best for growing wine in France?"

# print(blenderbot_chain = LLMChain(prompt=prompt, 
# .run(question))

## With Local model from HF 

### Why would you want to use local mode?

- fine-tuned models
- GPU hosted etc
- some models only work locally

## T5-Flan - Encoder-Decoder

In [9]:
from langchain.llms import HuggingFacePipeline
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM

model_id = 'google/flan-t5-large'# go for a smaller model if you dont have the VRAM
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto')

pipe = pipeline(
    "text2text-generation",
    model=model, 
    tokenizer=tokenizer, 
    max_length=100
)

local_llm = HuggingFacePipeline(pipeline=pipe)



Downloading pytorch_model.bin:   0%|          | 0.00/3.13G [00:00<?, ?B/s]


Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.9/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)


Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [10]:
print(local_llm('What is the capital of France? '))

paris


In [11]:
llm_chain = LLMChain(prompt=prompt, 
                     llm=local_llm
                     )

question = "What is the capital of the British Virgin Islands?"

print(llm_chain.run(question))

The British Virgin Islands are located in the Caribbean Sea. The capital of the British Virgin Islands is Saint Thomas. So the answer is Saint Thomas.


## GPT2-medium - Decoder Only Model

microsoft/DialoGPT-large

In [12]:
model_id = "gpt2-medium"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

pipe = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_length=100
)

local_llm = HuggingFacePipeline(pipeline=pipe)

Downloading (…)lve/main/config.json:   0%|          | 0.00/718 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.52G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [None]:
llm_chain = LLMChain(prompt=prompt, 
                     llm=local_llm
                     )

question = "What is the capital of France?"

print(llm_chain.run(question))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.




The French Empire is made up of 8 provinces from the Atlantic to Pacific land masses.

From the Atlantic, France extends westward.

From the Pacific to Europe, France extends west.

From Europe to Asia, France spreads along the middle.

From Asia to the Mediterranean Sea, France follows the southward-southeast direction.

The coast line


## BlenderBot - Encoder-Decoder

In [None]:
from langchain.llms import HuggingFacePipeline
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM

model_id = 'facebook/blenderbot-1B-distill'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

pipe = pipeline(
    "text2text-generation",
    model=model, 
    tokenizer=tokenizer, 
    max_length=100
)

local_llm = HuggingFacePipeline(pipeline=pipe)

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.05k [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/127k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/62.9k [00:00<?, ?B/s]

Downloading (…)in/added_tokens.json:   0%|          | 0.00/16.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/2.87G [00:00<?, ?B/s]

In [None]:
llm_chain = LLMChain(prompt=prompt, 
                     llm=local_llm
                     )

question = "What area is best for growing wine in France?"

print(llm_chain.run(question))

 I'm not sure, but I do know that France is one of the largest producers of wine in the world.


## SentenceTransformers

In [1]:
from langchain.embeddings import HuggingFaceEmbeddings
model_name = "sentence-transformers/all-mpnet-base-v2"
hf = HuggingFaceEmbeddings(model_name=model_name)

Downloading (…)a8e1d/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)b20bca8e1d/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)0bca8e1d/config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)e1d/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/438M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Downloading (…)a8e1d/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Downloading (…)8e1d/train_script.py:   0%|          | 0.00/13.1k [00:00<?, ?B/s]

Downloading (…)b20bca8e1d/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)bca8e1d/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [2]:
hf.embed_query('this is an embedding')

[0.010657327249646187,
 -0.09967267513275146,
 -0.026967084035277367,
 0.0653177872300148,
 0.02100495621562004,
 0.042623456567525864,
 0.011534163728356361,
 -0.006229322869330645,
 0.05175820365548134,
 0.007306752726435661,
 0.02135349065065384,
 0.04269150272011757,
 0.023143865168094635,
 0.009952719323337078,
 0.056463081389665604,
 -0.06137979030609131,
 0.05274379253387451,
 0.024683983996510506,
 -0.013267713598906994,
 -0.007051217835396528,
 0.026656342670321465,
 -0.0059135290794074535,
 0.004097490105777979,
 0.03841235488653183,
 -0.014230660162866116,
 0.02302352711558342,
 -0.007326637394726276,
 -0.035625338554382324,
 -0.017934121191501617,
 -0.01393020898103714,
 0.011977488175034523,
 -0.0073659527115523815,
 0.024451538920402527,
 -0.06637248396873474,
 1.5677644569223048e-06,
 0.01821720413863659,
 0.0019748671911656857,
 -0.01832936331629753,
 -0.014930741861462593,
 -0.005393384024500847,
 -0.01122234109789133,
 0.015792902559041977,
 -0.02714185230433941,
 -0.

In [None]:
hf.embed_documents(['this is an embedding','this another embedding'])

[[0.010657318867743015,
  -0.09967268258333206,
  -0.02696709893643856,
  0.06531770527362823,
  0.021004999056458473,
  0.042623501271009445,
  0.011534065939486027,
  -0.006229353602975607,
  0.0517583042383194,
  0.007306722458451986,
  0.021353380754590034,
  0.04269153252243996,
  0.023143835365772247,
  0.00995270162820816,
  0.056463032960891724,
  -0.06137979403138161,
  0.0527438260614872,
  0.024683943018317223,
  -0.013267838396131992,
  -0.007051167543977499,
  0.02665640041232109,
  -0.005913490429520607,
  0.004097461700439453,
  0.038412418216466904,
  -0.01423065084964037,
  0.023023542016744614,
  -0.007326596416532993,
  -0.03562536463141441,
  -0.017934132367372513,
  -0.013930188491940498,
  0.011977534741163254,
  -0.007365899626165628,
  0.024451464414596558,
  -0.06637255847454071,
  1.5677629789934144e-06,
  0.018217233940958977,
  0.0019748930353671312,
  -0.01832951232790947,
  -0.014930643141269684,
  -0.005393484607338905,
  -0.011222314089536667,
  0.015792

In [None]:
hf = HuggingFaceHubEmbeddings(
    repo_id=model_name,
    task="feature-extraction",
    # huggingfacehub_api_token="my-api-key",
)