# Using Falcon with LangChain

### What is Falcon 7B Instruct?

[Falcon-7B-Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets. It is made available under the Apache 2.0 license.

In [14]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_path = "tiiuae/falcon-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_path)

model = AutoModelForCausalLM.from_pretrained(
    model_path, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map="auto"
)

### Dolly 2.0's instruct pipeline

Even though Falcon7B-Instruct is fine-tuned on a mixture of chat/instruct datasets, if we use it as is, it won't perfectly follow instructions within a conversation. That's why we'll use [Dolly 2.0's instruct pipeline](https://huggingface.co/databricks/dolly-v2-12b/raw/main/instruct_pipeline.py) code ([`instruct_pipeline.py`](instruct_pipeline.py)) to adjust the prompt with special tokens that make the model better follow instructions within a conversation.

In [None]:
from instruct_pipeline import InstructionTextGenerationPipeline

pipeline = InstructionTextGenerationPipeline(
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=128,
    return_full_text=True,
    task="text-generation",
)

### LangChain + Hugging Face Local Pipelines

Now that our pipeline is ready, we will use the `HuggingFacePipeline` LangChain API to create an instance of a LangChain LLM that wraps our Falcon LLM.

In [None]:
from langchain.llms import HuggingFacePipeline

local_llm = HuggingFacePipeline(pipeline=pipeline)

print(local_llm("Hi, there!"))

### LangChain + Memory

Now, let's utilize the LangChain feature called memory. This feature enables the model to follow instructions based on the previous conversation history.

In [None]:
from langchain import ConversationChain
from langchain.memory import ConversationBufferMemory

conversation_buff = ConversationChain(llm=local_llm, memory=ConversationBufferMemory())

print(conversation_buff("What is the capital of England?")["response"])