# Python Review 2
## Prompt Engineering, LangChain, and LLM architectures

### Outline

- What is Prompt Engineering? Conceptual introduction
- Emergent phenomena: In-context learning
- Basic prompt engineering methods
 - Zero-shot prompts
 - Few-shot prompts
 - Chain-Of-Thought prompts
- LangChain
 - What is LangChain?
 - Common applications
- LangChain Basic Components
 - Schema
 - Models (OpenAI, Cohere, and HuggingFace)
 * **For part 2... Just in case you want to start on your own**
 - Prompts
 - Simple Chains (LLMChain)
- Vector Databases and Embeddings
 - What are embeddings? What do we mean by "embedding" a sentence or piece of text?
 - New tools to store embeddings.
- LangChain Advanced Components
 - Indexes
 - Memory
 - Advanced chains
 - Agents
- Applications
 - Summarization
 - Chat-with-Data

## Prompts and Prompt Engineering

In the context of language models, a prompt is essentially the initial string of text that you feed the model. The model then generates a sequence of text that continues from the prompt. The generated text is influenced by the data it was trained on, as well as the specific input given in the form of the prompt.

In [1]:
# A Simple Prompt Example
prompt = "Once upon a time in a faraway kingdom, "
# The language model will generate a story starting from this prompt

### What is a Prompt?

A prompt is simply a starting point. It's a place for the language model to begin generating text. It could be a single word, a sentence, or even a paragraph. The content of the prompt can greatly influence the output of the model, as the model tries to generate text that is contextually relevant based on the input.

### What is Prompt Engineering?

Prompt engineering is the process of designing and optimizing prompts to get a specific output from a language model. This can involve choosing specific words, structuring the prompt in a certain way, or supplying the model with additional information to guide its responses.

In [2]:
# Prompt Engineering Example
original_prompt = "Tell me a story"
# The language model might generate any kind of story from this prompt

engineered_prompt = "Tell me a fairy tale about a benevolent queen and a cursed frog"
# This prompt is more detailed, guiding the model to generate a specific kind of story

### The Importance of Good Prompts

However, a good prompt does more than just start the text generation process—it can also guide the language model to produce text that is more relevant, coherent, and useful. This can be particularly important in applications like chatbots or content generation tools, where the quality of the generated text can be crucial.

## Emergent Phenomena: In-context Learning

In-context learning represents the idea that a language model learns from the input it’s given in the moment, in addition to what it learned during training. This means that the details included in your prompts can dramatically change the model's output.

In [3]:
# In-context Learning Example
prompt_without_context = "Write an email"
# The language model might not know who to write the email to, or what the subject is

prompt_with_context = "Write an email to a colleague named Alex, summarising our project's progress"
# This prompt provides more context, helping the model generate a more relevant email

## Basic Prompt Engineering Methods

There are several methods for prompt engineering which you can employ depending on your specific use-case.

### Zero-shot Prompts

Zero-shot prompts refer to when a model is given a task that’s within its training distribution but which it hasn’t explicitly learned.

In [4]:
# Zero-shot Prompt Example
zero_shot_prompt = "Translate the following English text to French: 'Hello, how are you?'"

### Few-shot Prompts

Few-shot prompts involve providing the model with one or more examples of the task you want it to do, then asking it to do a similar task.

In [5]:
# Few-shot Prompt Example
few_shot_prompt = """
English: 'Hello, how are you?'
French: 'Bonjour, comment ça va?'

English: 'Thank you'
French: 'Merci'

English: 'I love you'
French: 'Je t'aime'

Translate the following English text to French: 'Good morning'
"""

### Chain-Of-Thought Prompts

Chain-of-thought prompts involve interactive back-and-forth with the model. After the model gives a response, the user adds to the prompt and continues the interaction, chaining the thoughts together.

In [6]:
# Chain-Of-Thought Prompt Example
chain_of_thought_prompt = """
User: What's the weather like today?
AI: I cannot provide real-time data as I'm an AI trained on static data.

User: I see, can you tell me what generally causes rain?
AI: Rain is caused when ...
"""

## Prompt Engineering Best Practices

1. Be explicit in your instruction.
2. If the model isn’t generating what you want, make the prompt longer and more explicit.
3. Start broadly and then become more specific.
4. Sometimes, asking the model to think step-by-step or debate pros and cons can help.

## LangChain

[LangChain](https://docs.langchain.com/docs/) is a Python framework that simplifies the development of applications powered by language models. It provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications 12. LangChain enables applications that are context-aware and can reason based on the provided context 1. The framework consists of an array of tools, components, and interfaces that streamline the development process for language model-powered applications 3.

LangChain’s main design components include components, chains, agents, memory, and callbacks 1. Components are abstractions for working with language models, along with a collection of implementations for each abstraction. Chains are sequences of calls that construct a structured assembly of components for accomplishing specific higher-level tasks. Agents let chains choose which tools to use given high-level directives. Memory persists application state between runs of a chain, while callbacks log and stream intermediate steps of any chain 1.

LangChain is used to develop applications that rely on language models to reason and generate responses. Some common applications include document question answering, chatbots, analyzing structured data, and much more 12. The main benefits of using LangChain to create LLM-powered applications include its modular and easy-to-use components, off-the-shelf chains that make it easy to get started, and the ability to customize existing chains or build new ones for complex applications


In [7]:
!pip install langchain

Collecting langchain
  Downloading langchain-0.0.306-py3-none-any.whl (1.8 MB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.8 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━[0m [32m1.5/1.8 MB[0m [31m46.0 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m34.1 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.1-py3-none-any.whl (27 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting langsmith<0.1.0,>=0.0.38 (from langchain)
  Downloading langsmith-0.0.41-py3-none-any.whl (39 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain)
  Downloading marshmallow-3.20.1-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 

## LangChain Basic Components

### Schema

LangChain provides four types of [Schema](https://docs.langchain.com/docs/components/schema/), which can be thought of basic data types we can use when working with this framework. These are Text, ChatMessages, Examples, and Document. We won't deal with Examples.

**Text**: This is the most basic type of Schema in LangChain. It represents a piece of text, such as a sentence or paragraph. Text is the primary interface through which you can interact with language models in LangChain

In [11]:
text = "literally just a python string"

**ChatMessages:** This Schema represents a chat message, which consists of a content field (usually text) and is associated with a user. ChatMessages are used to represent inputs and outputs in chatbot applications. There are three types of chat messages objects in LangChain.

1. *SystemMessage*: A chat message representing information that should be instructions to the AI system. For example, you might use a SystemMessage to tell the AI system to start a new conversation, adopt a persona based on descriptions or even clarify behavior you want the model to avoid.

2. *HumanMessage*: A chat message representing information coming from a human interacting with the AI system. For example, you might use a HumanMessage to represent a user’s input to the chatbot.

3. *AIMessage*: A chat message representing information coming from the AI system. For example, you might use an AIMessage to represent the chatbot’s response to a user’s input.

In [12]:
from langchain.schema import AIMessage, SystemMessage, HumanMessage

# in terms of python coding, the main difference between these is that user and ai message have an extra attribute to specify the user
# being able to specify or retrieve the user can be relevant in certain situations.
# The system prompt is fixed throughout one interaction and is intended to guide the main behavioral traits (by conditioning knowledge or specifying tone for example) our LLM should refer to.
sys_message = SystemMessage(content='start new conversation')
user_message = HumanMessage(content='Hello, how can I help you?', user='Human')
ai_message = AIMessage(content='I am here to help!', user='Bot')

In [23]:
print(sys_message)
print(user_message)

content='start new conversation'
content='Hello, how can I help you?' user='Human'


**Examples:** This Schema represents input/output pairs that can be used for training and evaluating language models. Examples can be used to fine-tune a model or evaluate the end-to-end chain. These can be inputs/outputs for a model or for a chain. Both types of examples serve a different purpose. Examples for a model can be used to finetune a model. Examples for a chain can be used to evaluate the end-to-end chain, or maybe even train a model to replace that whole chain.

We won't be training models this semester so you can ignore this one.

**Document:** This Schema represents a piece of unstructured data, such as a web page or document. It consists of *page_content* (the content of the data) and *metadata* (auxiliary pieces of information describing attributes of the data).

In [26]:
from langchain.schema import Document

document = Document(page_content='This is an example document.',
                    metadata={'author': 'John Doe'})
document

Document(page_content='This is an example document.', metadata={'author': 'John Doe'})

In [28]:
# We can create a prompt using the attributes (i.e., page_content and metadata) of Document objects.
# This simple application will come in handy
prompt = f"""
Translate the following content by {document.metadata['author']} to spanish

{document.page_content}
"""

prompt

'\nTranslate the following content by John Doe to spanish\n\nThis is an example document.\n'

## Models

Alright, let's start playing with some of these babies. We are going to, primarily, rely on the following two model providers: [openai](https://openai.com/product) and [huggingface](https://huggingface.co/models).

OpenAI is a company, recently bought by Microsoft, that develops the popular GPT-X models. Of course, they are the creators of chatGPT, GPT-4, DALLE-X (images), and Whisper (speech-to-text). Their models are often superior to all available options, but is pricey. Good news is that research suggests that an effective textual instruction to LLMs, plus additional in-house context (i.e. data), can improve models' performance significantly. Hence, optimizing our prompts will allow us to leverage open-source models while minimizing cost. Ok, what are open-source models?

In most cases, open-source software means that it is free to use and to contribute. Many non-profit companies finance big community projects to develop and maintain open source software. Some examples include Linux, Databricks, and HuggingFace. HuggingFace has evolved from an API to a massive company in a couple of years. That wave of investment made it a very powerful player in the marketplace of all sorts of deep learning models (but particularly language models). Today, it hosts 100s of models and datasets that we can user for free. These are all create by a community, sometimes large enough as corporations, so they come with defficiencies. Part of the reason why OpenAI's models are more powerful is simply more resources to run high compute power for longer time to tune (i.e., train) their models. That costs a lot of money (around [$700k a day](https://www.businessinsider.com/how-much-chatgpt-costs-openai-to-run-estimate-report-2023-4) a day!). So, if we can do anything to improve these models' performance on our end, we should try to do so. Prompt engineering is one that have yet to reach its limits, so we are mainly focusing on these topics.

### OpenAI

Get your api key first and save it in a text file somewhere in your computer.

LangChain's OpenAI object takes in your OpenAI API key, the name of the model to load (it must be of completion type, not chat), the temperature (see below), and maximum number of tokens (i.e., words) to generate.

> Technical nerdiness alert

The temperature parameter is a hyperparameter that can be used to control the randomness and creativity of the generated text in a generative language model. It is used to adjust the probabilities of the predicted words in the softmax (a math funcition that converts a distribution of numbers to probabilities) output layer of the model. The temperature parameter is defined as the inverse of the scaling factor used to adjust the logits before the softmax function is applied 1.

When the temperature is set to a low value, the probabilities of the predicted words are sharpened, which means that the most likely word is selected with a higher probability. This results in more conservative and predictable text, as the model is less likely to generate unexpected or unusual words. On the other hand, when the temperature is set to a high value, the probabilities of the predicted words are flattened, which means that all words are more equally likely to be selected. This results in more creative and diverse text, as the model is more likely to generate unusual or unexpected words.

In practice, the temperature parameter is often set to a value between 0.1 and 2.0, depending on the desired level of randomness and creativity in the generated text 1. A temperature value of 1.0 corresponds to the standard softmax function, where the probabilities of the predicted words are not scaled. The temperature parameter can be used to control how much randomness and creativity is desired in generated text, making it an important tool for fine-tuning generative language models for specific applications.

We will run experiments by varying the temperature levels to evaluate behavior across models.


In [30]:
!pip install openai

Collecting openai
  Downloading openai-0.28.1-py3-none-any.whl (76 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/77.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.28.1


In [31]:
from langchain.llms import OpenAI

openai_key = ""

GPT3 = OpenAI(openai_api_key=openai_key, model="text-davinci-003", temperature=0.5, max_tokens=100)

GPT3("What the name of your turtle?")

"\n\nThat's up to you!"

To get chat models like ChatGPT we use the ChatOpenAI object.

In [42]:
from langchain.chat_models import ChatOpenAI

chatGPT = ChatOpenAI(openai_api_key=openai_key, model="gpt-3.5-turbo", temperature=0.5, max_tokens=100)

#You can go one step further and generate completions for multiple sets of messages using generate. This returns an LLMResult with an additional message parameter

batch_messages = [
    [
        SystemMessage(content="You are a helpful assistant that translates English to French."),
        HumanMessage(content="I love programming.")
    ],
    [
        SystemMessage(content="You are a helpful assistant that translates English to French."),
        HumanMessage(content="I love artificial intelligence.")
    ],
]

result = chatGPT.generate(batch_messages)
result

LLMResult(generations=[[ChatGeneration(text="J'adore programmer.", generation_info={'finish_reason': 'stop'}, message=AIMessage(content="J'adore programmer."))], [ChatGeneration(text="J'adore l'intelligence artificielle.", generation_info={'finish_reason': 'stop'}, message=AIMessage(content="J'adore l'intelligence artificielle."))]], llm_output={'token_usage': {'prompt_tokens': 53, 'completion_tokens': 18, 'total_tokens': 71}, 'model_name': 'gpt-3.5-turbo'}, run=[RunInfo(run_id=UUID('f5e6fac5-2734-4129-9f29-fb4c9bb38f31')), RunInfo(run_id=UUID('1b32ce70-8865-4ba0-a8c5-799d0f5c2021'))])

In [49]:
# Prompt diagnostics
print(f"Token usage breakdown {result.llm_output['token_usage']}")
print(f"First generation: {result.generations[0]}")
print(f"Second generation: {result.generations[1]}")


Token usage breakdown {'prompt_tokens': 53, 'completion_tokens': 18, 'total_tokens': 71}
First generation: [ChatGeneration(text="J'adore programmer.", generation_info={'finish_reason': 'stop'}, message=AIMessage(content="J'adore programmer."))]
Second generation: [ChatGeneration(text="J'adore l'intelligence artificielle.", generation_info={'finish_reason': 'stop'}, message=AIMessage(content="J'adore l'intelligence artificielle."))]


### HuggingFace

Go get your [api key](https://huggingface.co/docs/hub/index) if you haven't already.

In [50]:
!pip install huggingface_hub

Collecting huggingface_hub
  Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/295.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m286.7/295.0 kB[0m [31m11.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: huggingface_hub
Successfully installed huggingface_hub-0.17.3


In [53]:
from langchain.llms import HuggingFaceHub

HF_key = ""

# choose your favorite https://huggingface.co/models?pipeline_tag=text2text-generation&sort=trending
# just click the model card and copy the name
model_repo = "google/flan-t5-base"

# notice that the syntax is different.
# We are using two different APIs here. Practice will help you get used to thinking about model inputs but
# always have the official documentation handy to check things you don't understand.
llm = HuggingFaceHub(huggingfacehub_api_token=HF_key, repo_id=model_repo,
                     model_kwargs={"temperature":0.5, "max_length":15})

llm("What is the name of your turtle?")

'tetra'

In [56]:
# another model (to demonstrate how easy it is to go from one model to another thanks to langchain)
# Three completely free to use tools (langchain, python, and huggingface) powering thousands of business use cases and generating value from mere community grit. I find it fascinating.
summarizer_llm_id = "bigscience/T0_3B"

# might take a while to load, 3 billion parameters is a pretty big model
summarizer_llm = HuggingFaceHub(huggingfacehub_api_token=HF_key, repo_id=summarizer_llm_id,
                            model_kwargs={"temperature":0.5, "max_length":50})


paragraph = """
            Model Description
        T0* shows zero-shot task generalization on English natural language prompts, outperforming GPT-3 on many tasks, while being 16x smaller. It is a series of encoder-decoder models trained on a large set of different tasks specified in natural language prompts. We convert numerous English supervised datasets into prompts, each with multiple templates using varying formulations. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks specified in natural language. To obtain T0*, we fine-tune a pretrained language model on this multitask mixture covering many different NLP tasks.

        Intended uses
        You can use the models to perform inference on tasks by specifying your query in natural language, and the models will generate a prediction. For instance, you can ask "Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy", and the model will hopefully generate "Positive".

        A few other examples that you can try:

        A is the son's of B's uncle. What is the family relationship between A and B?
        Question A: How is air traffic controlled?
        Question B: How do you become an air traffic controller?
        Pick one: these questions are duplicates or not duplicates.
        Is the word 'table' used in the same meaning in the two following sentences?
        """

summarizer_llm(f"Summarize the following paragraph {paragraph}. What is the most important information?")

KeyboardInterrupt: ignored