## Langchain notes- Prompt Design and prompt templating  
1. **Prompt** is the input to a language model. It is a string of text that is used to generate a response from the language model. Example of a prompt would be "what is the capital of India?"
2. **Prompt Template**- This is a template of a prompt that can take some user inputs and then generate a prompt from that. In a prompt we often see that there are some variables that can change from one prompt to another depending on the user input. This is like using the print command in Python for something as silly as when you were taught to take user name and you had to output a hello, user maessage from the user name. But this prompt template is a little more complex than that. Let's explore. 

In [16]:
api = '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'

In [1]:
from langchain import PromptTemplate
multiple_input_prompt = PromptTemplate(
    input_variables=["adjective", "content"], 
    template="Tell me a {adjective} joke about {content}." # As you can see that the syntax looks exactly like the print command
)

'''
By default prompt templates uses the f-string formatter that is otherwise used by python but there was 
another PR to integrate jinja formatting. We can change that using the prompt_template attribute
'''

multiple_input_prompt.format(adjective="funny", content="chickens")

'Tell me a funny joke about chickens.'

Langchain happens to have some default prompt templates that you can use or you can also create your very own **custom prompt templates**. A custom prompt template would be necessary if yu have your own formatter / parser and you want to create a custom prompt engine from scratch based on inputs that you might receive from let's say an API. The code for this looks something like this 

In [2]:
from langchain.prompts import BasePromptTemplate
import re 
from datetime import datetime

class ExamplePromptTemplate(BasePromptTemplate):
    def get_user_birth_date(self):
        user_input = input("Enter your date of birth in dd/mm/yyyy format")

        assert re.search(r'[0-9]{2}/[0-9]{2}/[0-9]{4}',user_input) != None
        return user_input

    def get_today_date(self):
        return str(datetime.today().strftime('%d/%m/%Y'))
    def format(self, **kwargs)-> str:
        '''
        Let's also assume that the user is going to give an input about what predictions they want as a function parameter
        '''
        # get the user DoB
        birth_date = self.get_user_birth_date()
        today_date = self.get_today_date()
        prediction = kwargs['prediction']
        
        # Make sure to put f before string so that it is clear that it is a format string 
        prompt = f"""
        I want you to act as an astrologer for a prank. The date format we are going to use is dd/mm/yyyy
        I was born on {birth_date} and today is {today_date}
        You are going to maake several vague but true sounding predictions about my {prediction}
        """
        return prompt 
    def _prompt_type(self):  
        '''
        Required function
        '''
        return "example-astrologer"

astrologer = ExamplePromptTemplate(input_variables=['prediction'])
prompt = astrologer.format(prediction='finances')
print(prompt)

Enter your date of birth in dd/mm/yyyy format16/07/1998

        I want you to act as an astrologer for a prank. The date format we are going to use is dd/mm/yyyy
        I was born on 16/07/1998 and today is 19/03/2023
        You are going to maake several vague but true sounding predictions about my finances
        


Langchain has something called a LangchainHub which is like hugging faces. It is a collection of artefacts that are useful for working with Langchain like 
1. prompts 
2. Chains 
3. Agents (we will learn what each of them are later during the course)

We can also save the prompts that are in memory in multiple different formats uploaded to the langchainhub. 
A prompt can be locally saved in the following formats 
1. json '
2. yaml
3. Python - To get a properly formatted Python file, you should upload a Python file that exposes a PROMPT variable. This is the variable that will be loaded. This variable should be an instance of a subclass of BasePromptTemplate in LangChain.

Additionally, you need to attach a README file to the prompt so as to have some usage guidelines fro the user

In [3]:
from langchain.prompts import load_prompt

multiple_input_prompt = PromptTemplate(
    input_variables=["adjective", "content"], 
    template="Tell me a {adjective} joke about {content}." # As you can see that the syntax looks exactly like the print command
)

multiple_input_prompt.save("prompt.json")


prompt = load_prompt("prompt.json")
print(type(prompt))

<class 'langchain.prompts.prompt.PromptTemplate'>


In [4]:
multiple_input_prompt.save("multiple_input_prompt.json")

# This is how the prompt is going to be saved in memory 
'''
{
    "input_variables": [
        "adjective",
        "content"
    ],
    "template": "Tell me a {adjective} joke about {content}.",
    "template_format": "f-string"
}
'''

multiple_input_prompt.save("multiple_input_prompt.yaml")
# This is how this file is going to look like. Of course it would also store the template which is not the case of custom prompts 
'''
input_variables:
- adjective
- content
template: Tell me a {adjective} joke about {content}.
template_format: f-string
'''

print("") # For Jupyter 




**3- Few shot example prompts**- When you give the language model a few examples that showcase the kind of input output combination that is expected from it, those prompts are called few shot example prompts

We can create a prompt template using a few shot example template. 

In [5]:
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate

examples = [
  {
    "question": "Who lived longer, Muhammad Ali or Alan Turing?",
    "answer": 
"""
Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali
"""
  },
  {
    "question": "When was the founder of craigslist born?",
    "answer": 
"""
Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952
"""
  },
  {
    "question": "Who was the maternal grandfather of George Washington?",
    "answer":
"""
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball
"""
  },
  {
    "question": "Are both the directors of Jaws and Casino Royale from the same country?",
    "answer":
"""
Are follow up questions needed here: Yes.
Follow up: Who is the director of Jaws?
Intermediate Answer: The director of Jaws is Steven Spielberg.
Follow up: Where is Steven Spielberg from?
Intermediate Answer: The United States.
Follow up: Who is the director of Casino Royale?
Intermediate Answer: The director of Casino Royale is Martin Campbell.
Follow up: Where is Martin Campbell from?
Intermediate Answer: New Zealand.
So the final answer is: No
"""
  }
]

Because we see that our prompts follow the same key value pair pattern, we must at all costs put the prompts in a string format templayte using a prompttemplate object. The way to do this is as follows:

In [6]:
example_prompt = PromptTemplate(input_variables=["question", "answer"], template="Question: {question}\n{answer}")

print(example_prompt.format(**examples[0]))

Question: Who lived longer, Muhammad Ali or Alan Turing?

Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali



The benefit of creating one unified prompt template as such is that 

In [7]:
prompt = FewShotPromptTemplate(
    examples=examples, 
    example_prompt=example_prompt, 
    suffix="Question: {input}", 
    input_variables=["input"]
)
print(prompt.format(input="Who was the father of Mary Ball Washington?"))

Question: Who lived longer, Muhammad Ali or Alan Turing?

Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali


Question: When was the founder of craigslist born?

Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952


Question: Who was the maternal grandfather of George Washington?

Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball W

In [17]:
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings


example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples,
    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(openai_api_key=api),
    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.
    Chroma,
    # This is the number of examples to produce.
    k=1
)

# Select the most similar example to the input.
question = "Who was the father of Mary Ball Washington?"
selected_examples = example_selector.select_examples({"question": question})
print(f"Examples most similar to the input: {question}")
for example in selected_examples:
    print("\n")
    for k, v in example.items():
        print(f"{k}: {v}")

Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.
Examples most similar to the input: Who was the father of Mary Ball Washington?


question: Who was the maternal grandfather of George Washington?
answer: 
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball

