## LangChain: The Chat Prompt Template

Use a more sophisticated Prompt Template in LangChain

In [1]:
from dotenv import load_dotenv
import os
import langchain

In [2]:
# Load environment variables from .env file
load_dotenv()
# Now you can access the environment variables
openai_api_key = os.getenv('OPENAI_API_KEY')
#
# Not needed for this notebook
# langchain_api_key = os.getenv('LANGCHAIN_API_KEY')
# anthropic_api_key = os.getenv('ANTHROPIC_API_KEY')
# huggingface_api_key = os.getenv('HUGGINGFACE_API_KEY'
#
# You can always just assign your variable directly, just not good practice to expose your key in a notebook
# anthropic_api_key='sk-ant-api03....._AAA' 

### Chat Prompt Template

In LangChain, there are many types of Prompt Templates: <BR>
 - https://python.langchain.com/docs/modules/model_io/prompts/quick_start/

Specifically, we are interested in a Prompt Template that works similar to the 'messages' format we used when calling the LLMs from the REST API. Remember the message format?

```
data = {
    'model': 'claude-3-sonnet-20240229',
    'max_tokens': 200,
    'messages': [
  {"role": "user", "content": "I have an AI question, are you ready?"},
  {"role": "assistant", "content": "Hi, I'm Claude an AI assistant, so I know a lot about it. Please, please ask me anything about AI."},
  {"role": "user", "content": "Can you explain Artificial Neural Networks plain English, including an analogy?"},
]
}
```

Chat Prompt Template Reference: https://python.langchain.com/docs/modules/model_io/prompts/quick_start/#chatprompttemplate

In [8]:
# Use a new kind of prompt template
#
from langchain_core.prompts import ChatPromptTemplate

template = ChatPromptTemplate.from_messages([
    ("system", "You are an expert in Large Language Models, especially working in Python. Your name is Clyde."),
    ("human", "Greetings Clyde, Please be ready to answer my questions about using LLMs with Python."),
    ("ai", "Yes, ready, willing and able."),
    ("human", "{user_input}"),
])
# What do we need to provde to use it? 
template.input_schema()

PromptInput(user_input=None)

Notice the type of messages that are possible: system, human, ai<BR>
There are a couple more that we won't use here: FunctionMessage and ToolMessage <BR>
See the full list here: https://python.langchain.com/docs/modules/model_io/chat/message_types/

In [4]:
# Create a string variable

raw_input='''
In a HuggingFace tokenizer, please explain about how tokenized words related to a model's 
vocabulary in a simple and clear way.
'''

# Recall, to create the template with your input, use the dictionary to pass your raw_input into the template
# This doesn't call a model yet, as we have not defined it.
template.invoke({"user_input": raw_input})

ChatPromptValue(messages=[SystemMessage(content='You are an expert in Large Language Models, especially working in Python. Your name is Clyde.'), HumanMessage(content='Greatings Clyde, Please be ready to answer my questions about using LLMs with Python.'), AIMessage(content='Yes, ready, willing and able.'), HumanMessage(content="\nIn a HuggingFace tokenizer, please explain about how tokenized words related to a model's \nvocabulary in a simple and clear way.\n")])

In [5]:
# Create chain with this prompt, a model and a parser
# ChatGPT model
from langchain_openai import ChatOpenAI
openai_llm = ChatOpenAI(model='gpt-3.5-turbo', api_key=openai_api_key)
#
# This parser takes the AIMessage reutrned form the LLM and converts it to a string
from langchain_core.output_parsers import StrOutputParser
output_parser = StrOutputParser()
#
# Create chain
chain = template | openai_llm | output_parser

In [6]:
# We start the chain by using the invoke() method and our dictionary
#
response = chain.invoke({"user_input": raw_input})

In [7]:
print(response)

In a HuggingFace tokenizer, the tokenized words are converted into tokens that correspond to the model's vocabulary. Each word in the input text is broken down into smaller units called tokens, which are then mapped to indices in the model's vocabulary. This vocabulary consists of a fixed set of tokens that the model has been trained on.

For example, the word "cat" might be tokenized into three tokens like ['ca', 't'] and then mapped to indices in the model's vocabulary, such as [145, 267]. When the model processes these tokens during training or inference, it looks up the corresponding embeddings for these indices in its vocabulary to perform computations.

In summary, tokenized words are converted into tokens that are associated with specific indices in the model's vocabulary, allowing the model to understand and process the input text effectively.


### What we did
1. Created a Chat Prompt Template (more sophisticated prompt template)
2. Created a LLM
3. Created a Output Parser
4. Created a chain
5. Used the chain