In [None]:
%%capture
!pip install langchain==0.1.1 openai==1.8.0 langchain-openai tiktoken faiss-cpu

In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter Your OpenAI API Key:")

# Example selectors

Example selectors in Langchain are classes that are responsible for selecting which examples to include in a prompt.

They are useful when you have a large number of examples available, but need to select a subset of them to include in your prompt.

### Some key things about example selectors:

They implement a `select_examples` method that takes in the input variables and returns a list of examples to include in the prompt.

#### There are different strategies for selecting examples, such as:

 - Selecting by semantic similarity to the inputs (`SemanticSimilarityExampleSelector`)

 - Selecting by maximal marginal relevance to balance similarity and diversity (`MaxMarginalRelevanceExampleSelector`)

 - Selecting based on prompt length (`LengthBasedExampleSelector`)

Example selectors allow prompts to dynamically choose examples based on the inputs, rather than having fixed examples.

They help manage long prompts by only including the most relevant examples for the given inputs.

New example selectors can be implemented by subclassing `BaseExampleSelector` and defining a custom select_examples method.

Example selectors provide a way to dynamically select the most relevant examples to include in a prompt for given inputs, rather than using a fixed set of examples.

This helps manage prompt length and improve relevance.

## BaseExampleSelector

The base interface is defined as below.

The only method it needs to define is a `select_examples` method. This takes in the input variables and then returns a list of examples.

It is up to each specific implementation as to how those examples are selected.

```python
class BaseExampleSelector(ABC):
    """Interface for selecting examples to include in prompts."""

    @abstractmethod
    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
        """Select which examples to use based on the inputs."""
```


# Implementing a Custom Example Selector

Get excited. Because now you're about to create a custom example selector that randomly picks examples from a provided list.

## Requirements for an `ExampleSelector`:

An `ExampleSelector` in LangChain needs to implement two primary methods:

1. `add_example`: This method accepts an example and integrates it into the `ExampleSelector`.

2. `select_examples`: Given input variables (typically user input), this method returns a list of examples to be used in a few-shot prompt.

## Custom Implementation:

Here's how we can create a custom example selector that randomly selects examples:


In [None]:
from langchain.prompts.example_selector.base import BaseExampleSelector
from typing import Dict, List
import numpy as np

class CustomExampleSelector(BaseExampleSelector):
    def __init__(self, examples: List[Dict[str, str]]):
        self.examples = examples

    def add_example(self, example: Dict[str, str]) -> None:
        """Add a new example to the list."""
        self.examples.append(example)

    def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
        """Randomly select examples based on the provided inputs."""
        return list(np.random.choice(self.examples, size=2, replace=False))

In [None]:
examples = [
    {"recipe": "Spaghetti Carbonara"},
    {"recipe": "Chicken Alfredo"},
    {"recipe": "Vegetable Stir Fry"}
]

In [None]:
# initialize an example selector
example_selector = CustomExampleSelector(examples)

Note: The use of `{"recipe": "recipe"}` in the example a placeholder or a generic representation, and it doesn't affect the outcome of the method.

In a more sophisticated example selector, the input variables might be used to filter or prioritize the selection of examples based on certain criteria.

But in the provided example, it's not utilized.



In [None]:
# randomly select some examples
example_selector.select_examples({"recipe": "recipe"})

In [None]:
# Add a new recipe to the collection
example_selector.add_example({"recipe": "Beef Stroganoff"})
print(example_selector.examples)

In [None]:
# Randomly select two recipes again
example_selector.select_examples({"recipe": "recipe"})

# Select by length

Ready for more? Because below you will implement a LengthBasedExampleSelector which selects examples based on their length.

This is particularly useful when you're worried about the total length of the constructed prompt, especially given the context window limitations of some models.

In [None]:
from langchain.prompts import PromptTemplate
from langchain.prompts import FewShotPromptTemplate
from langchain.prompts.example_selector import LengthBasedExampleSelector

examples = [
    {"input": "caffeinated", "output": "sleepy"},
    {"input": "spooky", "output": "cuddly"},
    {"input": "crispy", "output": "soggy"},
    {"input": "galactic", "output": "mundane"},
    {"input": "funky", "output": "plain-jane"},
]

In [None]:
example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template = "Input: {input}\nOutput: {output}"
)

# LengthBasedExampleSelector

The `LengthBasedExampleSelector` is an example selector that selects examples based on the length of the formatted examples.

The primary goal of this selector is to adjust the number of examples included in the prompt based on the length of the input and the examples themselves.

This is useful when there's a concern about the total length of the constructed prompt, especially given the context window limitations of some models.

## It works as follows:

When adding a new example using `add_example`, the example is formatted using the provided `example_prompt`, and its length is calculated using the `get_text_length function`.

This length is then stored in the `example_text_lengths` list.

When selecting examples using `select_examples`, the method first calculates the length of the provided input.

It then determines how much length is left for examples by subtracting the input length from the `max_length`.

The method then iteratively checks each example's length against the remaining length.

If the example fits, it's added to the list of selected examples, and its length is subtracted from the remaining length.

This process continues until the remaining length is exhausted or all examples have been considered.

This allows it to dynamically select more examples for shorter inputs and fewer examples for longer inputs, to try to keep the overall prompt length under max_length.

So in summary, it selects examples based on length to try to construct prompts that don't exceed the context window size of the model.

In [None]:
example_selector = LengthBasedExampleSelector(
    examples=examples,
    example_prompt= example_prompt,
    max_length=15,
)

In [None]:
dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="What's the opposite of...",
    suffix="Input: {adjective}\nOutput:",
    input_variables=["adjective"],
)

In [None]:
long_string = "fluffy and puffy and cloud-like and soft as a marshmallow and as cuddly as a teddy bear in a world of cotton candy"
print(dynamic_prompt.format(adjective=long_string))

# Selecting based on Maximal Marginal Relevance (MMR)

Maximal Marginal Relevance (MMR) is a technique often used in information retrieval to balance between the relevance of documents (or examples, in this case) and the diversity among them.  

The idea is to avoid redundancy in the selected set of documents while ensuring that the documents are still relevant to the query.

### Working:

1. **Relevance Calculation**:
   - For each example, calculate its cosine similarity with the input. This gives a measure of how relevant each example is to the input.

2. **Diversity Calculation**:
   - For each example, calculate its cosine similarity with already selected examples. This gives a measure of how similar the example is to what has already been chosen.

3. **MMR Score Calculation**:
   - For each example, compute its MMR score as a combination of its relevance to the input and its diversity from already selected examples. The formula is typically: $$ MMR = \lambda \times Relevance - (1 - \lambda) \times Diversity $$
   
   - Where `λ` is a parameter between 0 and 1 that controls the trade-off between relevance and diversity.

4. **Example Selection**:
   - Iteratively select the example with the highest MMR score, add it to the selected set, and update the diversity calculations for the remaining examples.

5. **Termination**:
   - Continue the process until a stopping criterion is met, such as a predefined number of examples or until the MMR score falls below a threshold.


In [None]:
from langchain.prompts.example_selector import (
    MaxMarginalRelevanceExampleSelector,
    SemanticSimilarityExampleSelector,
)

from langchain_openai import OpenAIEmbeddings

from langchain_community.vectorstores import FAISS

from langchain.prompts import FewShotPromptTemplate, PromptTemplate

In [None]:
examples

In [None]:
example_prompt

## MaxMarginalRelevanceExampleSelector Overview

`MaxMarginalRelevanceExampleSelector` is a specialized class designed to select examples based on the Max Marginal Relevance (MMR) criterion. MMR is a method that balances the trade-off between:

- **Relevance**: How similar an example is to a given query.
- **Diversity**: How different the selected examples are from each other.

This approach provides a comprehensive set of examples that are both relevant to a query and diverse.

### Key Features

1. **Derived from Semantic Similarity**:
   - Inherits from `SemanticSimilarityExampleSelector`.
   - Utilizes basic functionality of selecting examples based on their semantic similarity to a given query.

2. **Attributes**:
   - `fetch_k`: Number of examples to initially fetch before reranking them using MMR.

3. **Methods**:

   - **`select_examples(input_variables: Dict[str, str]) -> List[dict]`**:
     - Constructs a query from the provided input variables.
     - Fetches top `fetch_k` examples and reranks them using MMR to select the top `k` examples.
     - Extracts examples from metadata and filters based on `example_keys` if provided.

   - **`from_examples(...) -> MaxMarginalRelevanceExampleSelector`**:
     - Class method to create an instance.
     - Initializes using a list of examples, embeddings, and other parameters.
     - Sets up the vector store with provided examples and embeddings.




In [None]:
mmr_selector = MaxMarginalRelevanceExampleSelector.from_examples(
    examples,
    OpenAIEmbeddings(),
    FAISS,
    k=2
)

In [None]:
mmr_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=mmr_selector,
    example_prompt=example_prompt,
    prefix="What's the opposite of...",
    suffix="Input: {adjective}\nOutput:",
    input_variables=["adjective"],
)

In [None]:
print(mmr_prompt.format(adjective="buzzed"))

# Select by n-gram overlap

The `NGramOverlapExampleSelector` selects and orders examples based on which examples are most similar to the input, according to an ngram overlap score.


The ngram overlap score is a float between 0.0 and 1.0, inclusive.

It will select and orders examples based on their n-gram overlap score with a given input.

This overlap score is derived from the `sentence_bleu` score, a metric used in machine translation to evaluate translated sentences' quality.


The selector provides the option to establish a threshold score. Examples with an n-gram overlap score less than or equal to this threshold are excluded. By default, the threshold is set at -1.0, which means it won't exclude any examples but will only reorder them.

In the case of a threshold greater than 1.0, the select_examples function excludes all examples and returns an empty list.

When the threshold is set to 0.0, select_examples sorts the examples based on their n-gram overlap score and excludes those that have no n-gram overlap with the input.



## `NGramOverlapExampleSelector` Overview



### Key Components

1. **N-gram Overlap Score**:
   - The `ngram_overlap_score` function calculates the n-gram overlap score between a source and an example using the `sentence_bleu` score.
   - This function employs the `sentence_bleu` method with method1 smoothing and auto reweighting. The resulting score ranges between 0.0 and 1.0.

2. **Attributes**:
   - `examples`: Contains the list of examples the prompt template expects.
   - `example_prompt`: The template used to format the examples.
   - `threshold`: Determines when the algorithm stops selecting examples. It's set to -1.0 by default.

3. **Methods**:

   - **`add_example(example: Dict[str, str]) -> None`**:
     - Adds a new example to the list.

   - **`select_examples(input_variables: Dict[str, str]) -> List[dict]`**:
     - Returns examples sorted by their n-gram overlap score with the input in descending order.
     - Excludes examples with scores less than or equal to the threshold.

### How It Works

When you feed a set of input variables to `select_examples`, the selector computes each example's n-gram overlap score in its list. It then arranges these examples based on their scores in descending order. The method continues selecting examples until it either exhausts the examples or finds an example with a score below the set threshold.

For example, with a threshold of 0.0, the selector will omit examples with no n-gram overlap with the input. If the threshold exceeds 1.0, it will exclude all examples.

This class is handy when you want examples that share common n-grams with the input.


In [None]:
from langchain.prompts.example_selector.ngram_overlap import NGramOverlapExampleSelector
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

In [None]:
# Examples of converting modern slang to Shakespearean English.
examples = [
    {"input": "She's thirsty, ain't she?", "output": "She doth crave attention, doth she not?"},
    {"input": "He's flexing those dollar bills.", "output": "He doth flaunt his gold coins, verily."},
    {"input": "That's some tea.", "output": "That's a tale most intriguing."},
    {"input": "It's lit!", "output": "It's a merry revel!"},
    {"input": "Throwing shade", "output": "Casting aspersions"},
    {"input": "No cap", "output": "In sooth, no falsehood"},
    {"input": "Slide into the DMs", "output": "Venture into private missives"},
    {"input": "Ghosted", "output": "Vanish'd like a spectre"},
    {"input": "On fleek", "output": "In finest fettle"},
    {"input": "Bae", "output": "Mine own beloved"},
    {"input": "Squad goals", "output": "Band's aspirations"},
    {"input": "YOLO", "output": "Thou liv'st but once"},
    {"input": "FOMO", "output": "Fear of missing the revelry"},
    {"input": "Slay, queen!", "output": "Triumph, fair maiden!"},
    {"input": "I can't even", "output": "I am most perplexed"},
    {"input": "It's a vibe", "output": "It's a certain jest and merriment"},
    {"input": "Clap back", "output": "Retort with vigor"},
    {"input": "Low key", "output": "In hushed tones"},
    {"input": "High key", "output": "Loudly and proudly"},
    {"input": "Spill the tea", "output": "Unveil the tale"},
    {"input": "That's basic", "output": "That's most ordinary"},
    {"input": "Savage", "output": "Ruthless, like a wild beast"},
    {"input": "Mood", "output": "Mine current disposition"},
    {"input": "Woke", "output": "Awakened to the truths"},
    {"input": "Cancel culture", "output": "Banishment by the masses"},
    {"input": "Netflix and chill", "output": "Watch plays and relax, mayhaps more"},
    {"input": "Snack", "output": "A sight most pleasing"},
    {"input": "Thicc", "output": "Full and robust"},
    {"input": "Shook", "output": "Most startled and taken aback"},
    {"input": "AF", "output": "In great measure"}
]

In [None]:
examples = [
    {'input': "When someone says 'She's thirsty, ain't she?', they're implying she's seeking attention.",
     'output': 'When one remarks, "She doth crave attention, doth she not?", they suggest her desire for notice.'},

    {'input': "Saying 'He's flexing those dollar bills.' means he's showing off his wealth.",
     'output': 'To utter, "He doth flaunt his gold coins, verily." is to say he parades his riches.'},

    {'input': "The phrase 'That's some tea.' refers to juicy gossip or interesting news.",
     'output': "The saying, 'That's a tale most intriguing.' speaks of a story that piques interest."},

    {'input': "Exclaiming 'It's lit!' means the situation is exciting or fun.",
     'output': "Declaring 'It's a merry revel!' signifies a joyous occasion."},
    {'input': "When someone uses 'Throwing shade', they're subtly expressing disapproval or contempt.",
     'output': 'When one says "Casting aspersions", they art discreetly showing disdain or scorn.'},

    {'input': "The term 'No cap' is used to emphasize that someone is not lying.",
     'output': 'The phrase "In sooth, no falsehood" is uttered to stress that one speaks the truth.'},

    {'input': "To 'Slide into the DMs' means to send someone a direct message, usually with romantic intent.",
     'output': 'To "Venture into private missives" is to send a personal letter, perchance with courtly designs.'},

    {'input': "If someone has 'Ghosted', they've suddenly cut off all communication without explanation.",
     'output': "If one hath 'Vanish'd like a spectre', they've abruptly ceased all discourse without reason."},

    {'input': "The phrase 'On fleek' means that something is perfect or flawless.",
     'output': "The saying 'In finest fettle' signifies that something is in impeccable condition."}
]


In [None]:
example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Input: {input}\nOutput: {output}",
)

In [None]:
example_selector = NGramOverlapExampleSelector(
    examples=examples,
    example_prompt=example_prompt,
    threshold=-1.0
)

dynamic_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Translate yon modern utterance into the tongue of the Bard",
    suffix="Modern: {sentence}\nShakespearean:",
    input_variables=["sentence"]
)


In [None]:
print(dynamic_prompt.format(sentence="Someone in a meeting mentioned they didn't want to be part of 'cancel culture'. What were they referring to?"))

In [None]:
example_selector.threshold = 0.0
print(dynamic_prompt.format(sentence="The maiden ghosted me post our rendezvous."))

## `SemanticSimilarityExampleSelector` Overview

This class is designed to select examples based on their semantic similarity to a given input. It's a type of example selector that leverages embeddings and a vector store to find the most semantically similar examples.

### Attributes:

1. **vectorstore**: An instance of `VectorStore` that contains information about the examples. This is where the embeddings of the examples are stored and queried.
2. **k**: The number of examples to select. By default, it's set to 4.
3. **example_keys**: Optional keys to filter the examples. If provided, only these keys will be considered when selecting examples.
4. **input_keys**: Optional keys to filter the input. If provided, the search is based on these input variables instead of considering all variables.

### Methods:

1. **`add_example(example: Dict[str, str]) -> str`**:
   - Adds a new example to the `vectorstore`.
   - If `input_keys` are provided, it constructs a string representation of the example using only those keys. Otherwise, it uses all keys.
   - The constructed string is then added to the `vectorstore`, and the ID of the added text is returned.

2. **`select_examples(input_variables: Dict[str, str]) -> List[dict]`**:
   - Selects examples based on their semantic similarity to the provided input variables.
   - Constructs a query string from the input variables.
   - Uses the `vectorstore` to search for the most similar examples to the query.
   - Retrieves the actual examples from the metadata of the search results.
   - If `example_keys` are provided, the returned examples are filtered to include only those keys.

3. **`from_examples(...)`**:
   - A class method that creates an instance of `SemanticSimilarityExampleSelector`.
   - Initializes the `vectorstore` with the provided examples and embeddings.
   - Returns an instance of `SemanticSimilarityExampleSelector` with the initialized `vectorstore`.

### How It Works:

- The class leverages embeddings to represent the semantic meaning of examples and input variables.
- When you want to select examples that are semantically similar to a given input, you provide the input variables to the `select_examples` method.
- The method then queries the `vectorstore` to find the most similar examples based on their embeddings.
- The returned examples can be filtered based on `example_keys` if provided.


In [None]:
examples

In [None]:
example_prompt

In [None]:
example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples,
    OpenAIEmbeddings(),
    FAISS,
    k=4
)
similar_prompt = FewShotPromptTemplate(
    # We provide an ExampleSelector instead of examples.
    example_selector=example_selector,
    example_prompt=example_prompt,
    prefix="Translate yon modern utterance into the tongue of the Bard",
    suffix="Modern: {sentence}\nShakespearean:",
    input_variables=["sentence"]
)

In [None]:
print(dynamic_prompt.format(sentence="What does it mean when someone says they're 'Throwing shade' at another?"))

In [None]:
print(dynamic_prompt.format(sentence="Someone told me my style was reminiscent of a 'snack'. What could they be hinting at?"))

In [None]:
print(dynamic_prompt.format(sentence="I overheard someone mention they felt 'shook' after watching a movie. What emotion were they expressing?"))