<a href="https://colab.research.google.com/github/0xVolt/learn-langchain/blob/main/hf_models_locally_with_langchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using HuggingFace Models Locally with LangChain

## Ways to Create HuggingFace Models

There are two wrappers that HuggingFace (HF) provides to use LLMs, one for a local run and host and another for hosting your models and projects on HuggingFaceHub (HFH). You'll see most projects using the latter approach. However, there are pros and cons for both.

## Why not `HuggingFaceHub`?

Plain and simple, the HFH doesn't support all models on HF. And, for the models that are supported, the wrappers only exist for the following tasks: `text2text-generation` and `text-generation`. So, I've decided to against using HFH for this notebook and these models.

### More on the Tasks

`text2text-generation` represents tasks that are accomplished by encoder-decoder models like BERT, T5 and BART.

`text-generation` represents tasks that are done by models that are solely decoders like GPT-2, etc.

It's important to understand what kind of model you're working with since the definition and parameters that you'd use to setup the LangChain will differ. Examples of this are to follow.

## Why Locally?

You have the ability to fine-tune any of these models, get complete access to your GPU (if you have one) and avoid running into issues with models that don't run on the HFH.

## About this Notebook

This notebook will compare different models by setting them up locally and then linking them to LangChain. The idea is to use this notebook for future reference if I need to make a project that involves fine-tuning any of these models to create a LangChain backed application.

---

In [12]:
!pip -q install langchain transformers sentence_transformers

# Model #1 - `T5-Flan`

This model us an encoder-decoder so we'll use the `text2text-generation` task to initialise it as I'd mentioned earlier.

In [13]:
import torch
from langchain.llms import HuggingFacePipeline
from transformers import AutoTokenizer, pipeline, AutoModelForSeq2SeqLM, AutoModelForCausalLM

In [14]:
modelID = 'google/flan-t5-large'
tokenizer = AutoTokenizer.from_pretrained(modelID)
model = AutoModelForSeq2SeqLM.from_pretrained(modelID)

Downloading model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [15]:
pipe = pipeline(
    "text2text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=100
)

In [16]:
localLLM = HuggingFacePipeline(pipeline=pipe)