<a href="https://colab.research.google.com/github/agustin90-ush/langchain-ask-pdf/blob/main/Copy_of_LangChain_Working_with_LLM_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## HuggingFace

Hugging face provide two wrappers for LLM. 
1. Models hosted on HuggingFace Hub via API
2. Local pipelines (download models locally). 

Both works with: text2text-generation, text-generation



In [None]:
!pip -q install langchain huggingface_hub transformers sentence_transformers accelerate bitsandbytes

In [None]:
import os
os.environ['HUGGINGFACEHUB_API_TOKEN'] = 'HUGGINGFACEHUB_API_TOKEN'

## Use the HuggingFaceHub via API
- T5 Encoder-Decoder Model

In [None]:
from langchain import PromptTemplate, HuggingFaceHub, LLMChain

template = """Question: {question}

Answer: Let's think step by step."""

prompt = PromptTemplate(template=template, input_variables=["question"])

In [None]:
llm_chain = LLMChain(prompt=prompt, 
                     llm=HuggingFaceHub(repo_id="google/flan-t5-xl", 
                                        model_kwargs={"temperature":0, 
                                                      "max_length":64}))

In [None]:
question = "What is the capital of England?"

print(llm_chain.run(question))

London is the capital of England. The final answer: London.


In [None]:
question = "What is Area 51 famous for?"

print(llm_chain.run(question))

Area 51 is famous for being the location of the US government's secret space program. The US government has been constructing a space station in Area 51. The final answer: space station.


- Decoder Only Models
text-generation models

In [None]:
# llm_chain = LLMChain(prompt=prompt, 
#                      llm=HuggingFaceHub(repo_id="EleutherAI/gpt-j-6b", 
#                                         model_kwargs={"temperature":0.1, 
#                                                       "max_length":100}))

In [None]:
# llm_chain = LLMChain(prompt=prompt, 
#                      llm=HuggingFaceHub(repo_id="stabilityai/stablelm-tuned-alpha-7b", 
#                                         model_kwargs={"temperature":0, 
#                                                       "max_length":100}))

# question = "What is Area 51 famous for?"

# print(llm_chain.run(question))

## Local pipelines (download models locally)


- T5-Flan - Encoder-Decoder (Seq2seq model)

In [None]:
from langchain.llms import HuggingFacePipeline
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, AutoModelForSeq2SeqLM

model_id = 'google/flan-t5-small'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto')

pipeline = pipeline(
    "text2text-generation",
    model=model, 
    tokenizer=tokenizer, 
    max_length=128
)

local_llm = HuggingFacePipeline(pipeline=pipeline)





Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.9/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.9/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)


In [None]:
# print(local_llm('What is the capital of England? '))

In [None]:
llm_chain = LLMChain(prompt=prompt, 
                     llm=local_llm
                     )

question = "What is the capital of England?"

print(llm_chain.run(question))

England is the capital of England. The capital of England is London. So, the answer is London.


## Decoder Only Model - Usage
---



microsoft/DialoGPT-large

In [None]:
model_id = "gpt2-medium"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

pipeline = pipeline(
    "text-generation", 
    model=model, 
    tokenizer=tokenizer, 
    max_length=100
)

local_llm = HuggingFacePipeline(pipeline=pipeline)