The core building block of a LangChain application is the LLMChain. This combines three things:

* LLM: The language model is the core reasoning engine here. In order to work with LangChain, you need to understand the different types of language models and how to work with them.

* Prompt Templates: This provides instructions to the language model. This controls what the language model outputs, so understanding how to construct prompts and different prompting strategies is crucial.

* Output Parsers: These translate the raw response from the LLM to a more workable format, making it easy to use the output downstream.

In [1]:
from langchain.chains import LLMChain
from langchain.llms import LlamaCpp
from langchain.prompts import PromptTemplate
from langchain.schema import BaseOutputParser

class CommaSeparatedListOutputParser(BaseOutputParser):
    """Parse the output of an LLM call to a comma-separated list."""

    def parse(self, text: str):
        """Parse the output of an LLM call."""
        return text.strip().split(", ")

We can now combine all these into one chain. This chain will take input variables, pass those to a prompt template to create a prompt, pass the prompt to an LLM, and then pass the output through an (optional) output parser.

## The local model

In [3]:
# These just enables streaming of the output tokens as they are predicted

from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])

# These are paramters for LlamaCpp. Different models will have different paramters.

n_gpu_layers = 1  # Metal set to 1 is enough.
n_batch = 512  # Should be between 1 and n_ctx, consider the amount of RAM of your Apple Silicon Chip.

# Create an llm
llm = LlamaCpp(
    model_path="/rbscratch/brettin/.cache/llama-2-7b-chat.ggmlv3.q5_1.bin",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    n_ctx=2048,
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    #callback_manager=callback_manager,
    #verbose=True,
    verbose=False,
)

llama.cpp: loading model from /rbscratch/brettin/.cache/llama-2-7b-chat.ggmlv3.q5_1.bin
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_head_kv  = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: n_gqa      = 1
llama_model_load_internal: rnorm_eps  = 1.0e-06
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: freq_base  = 10000.0
llama_model_load_internal: freq_scale = 1
llama_model_load_internal: ftype      = 9 (mostly Q5_1)
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =    0.08 MB
llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA A100-SXM4-40GB

## The prompt template

### Example 1 (This one seems convoluted to me)

In [4]:
from langchain.prompts.chat import SystemMessagePromptTemplate
from langchain.prompts.chat import HumanMessagePromptTemplate
from langchain.prompts.chat import ChatPromptTemplate

template = """You are a helpful assistant who generates comma separated lists.
A user will pass in a category, and you should generate 5 objects in that category in a comma separated list.
ONLY return a comma separated list, and nothing more."""

system_message_prompt = SystemMessagePromptTemplate.from_template(template)

human_template = "{text}"
human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])

In [5]:
chain = LLMChain(
    llm=llm,
    prompt=chat_prompt,
    output_parser=CommaSeparatedListOutputParser()
)
chain.run("cancers")

['Assistant: Sure! Here are 5 types of cancer:\nBreast Cancer',
 'Lung Cancer',
 'Colorectal Cancer',
 'Prostate Cancer',
 'Skin Cancer']

### Example 2 (This seems more direct to me)

In [6]:
template = """You are a helpful assistant who generates comma separated lists.
A user will pass in a category, and you should generate 5 objects in that category in a comma separated list.
ONLY return a comma separated list, and nothing more. The category is {category}"""

prompt=PromptTemplate(
    template=template,
    input_variables=["category",]
)
system_message_prompt = SystemMessagePromptTemplate(prompt=prompt)
chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt,])

In [7]:
chain = LLMChain(
    llm=llm,
    prompt=chat_prompt,
    output_parser=CommaSeparatedListOutputParser()
)
chain.run("planes")

['',
 'trains',
 'or automobiles. Please answer the following:\nWhat are five types of airplanes?\nWhat are five types of trains?\nWhat are five types of cars?']