# LangChain integration 🦜🔗

This notebook shows how to integrate Lit-GPT with LangChain!

In [None]:
# clone Lit-GPT
!git clone https://github.com/Lightning-AI/lit-gpt
%cd lit-gpt/

In [None]:
# for CUDA
!pip install --index-url https://download.pytorch.org/whl/nightly/cu118 --pre 'torch>=2.1.0dev' -q

# install the dependencies
!pip install .
!pip install langchain

In [2]:
from typing import Any, List, Mapping, Optional

from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM

In [10]:
from lit_gpt.generate.base import build_llm, generate
from lit_gpt import GPT, Tokenizer

checkpoint_dir = "checkpoints/tiiuae/falcon-7b"
devices = 1
quantize = "bnb.int8"
max_new_tokens = 50
top_k = 200
temperature = 0.8

In [8]:
model, tokenizer, fabric = build_llm(checkpoint_dir=checkpoint_dir, devices=devices, quantize=quantize)

You are using a CUDA device ('NVIDIA A100-SXM4-40GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision


Loading model '../checkpoints/tiiuae/falcon-7b/lit_model.pth' with {'org': 'Lightning-AI', 'name': 'lit-GPT', 'block_size': 2048, 'vocab_size': 50254, 'padding_multiple': 512, 'padded_vocab_size': 65024, 'n_layer': 32, 'n_head': 71, 'n_embd': 4544, 'rotary_percentage': 1.0, 'parallel_residual': True, 'bias': False, 'n_query_groups': 1, 'shared_attention_norm': True, '_norm_class': 'LayerNorm', 'norm_eps': 1e-05, '_mlp_class': 'GptNeoxMLP', 'intermediate_size': 18176, 'condense_ratio': 1}


bin /home/aniket/miniconda3/envs/parrot/lib/python3.11/site-packages/bitsandbytes/libbitsandbytes_cuda121.so


Time to instantiate model: 4.42 seconds.
Time to load the model weights: 23.63 seconds.
You are using a CUDA device ('NVIDIA A100-SXM4-40GB') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision


We will create a (CustomLLM)[https://python.langchain.com/docs/modules/model_io/models/llms/how_to/custom_llm], which is a callable class and it will be responsible for interacting with our LLM.

In [19]:
class LitGPTLLM(LLM):
    model: Any
    tokenizer: Tokenizer

    @property
    def _llm_type(self) -> str:
        return "lit-gpt"

    def _call(
        self,
        prompt: str,
        stop: Optional[List[str]] = None,
        run_manager: Optional[CallbackManagerForLLMRun] = None,
    ) -> str:
        if stop is not None:
            raise ValueError("stop kwargs are not permitted.")
        
        encoded = self.tokenizer.encode(prompt, device=self.model.device)
        prompt_length = encoded.size(0)
        max_returned_tokens = prompt_length + max_new_tokens
        assert max_returned_tokens <= self.model.config.block_size, (
            max_returned_tokens,
            self.model.config.block_size,
        )  # maximum rope cache length
        y = generate(
            self.model,
            encoded,
            max_returned_tokens,
            max_seq_length=max_returned_tokens,
            temperature=temperature,
            top_k=top_k,
        ) 
        return self.tokenizer.decode(y)

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters."""
        return {"name": self.model.config.name}

In [20]:
llm = LitGPTLLM(model=model, tokenizer=tokenizer)

In [32]:
print(llm("hello"))

hello,
i have some trouble in making my car go up and down using the L298 motor driver.
i need some help, i can't seem to get it to move.
the motors are connected to arduino UNO. i
