In [None]:
%%capture
!pip install langchain==0.1.1 openai==1.8.0 langchain-openai datasets faiss-gpu sentence_transformers

In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter Your OpenAI API Key:")

## Self-Consistency Prompting in Language Models

Self-consistency prompting was introduced in the March 2022 paper ["Self-Consistency Improves Chain of Thought Reasoning in Language Models"](https://arxiv.org/pdf/2203.11171.pdf) by Xuezhi Wang, et. al.

### **What is Self-Consistency Prompting?**
The idea behind self-consistency is that complex reasoning problems often have multiple valid ways of thinking that lead to the same correct answer.

By exploring these diverse reasoning paths, one can achieve a more reliable and consistent answer.

### **How is it Different from Other Prompting Techniques?**

- Traditional chain-of-thought prompting prompts a language model to generate a series of short sentences that mimic the reasoning process a person might use to solve a task.

- Self-consistency, instead of just taking the most probable reasoning path, samples multiple paths and then determines the most consistent answer among them.

- Unlike other methods that require additional training, human annotations, or auxiliary models, self-consistency is unsupervised and works directly with pre-trained language models.

### **How to Construct a Prompt Using Self-Consistency?**

1. **Chain-of-Thought Prompting**: Start by prompting the language model using chain-of-thought prompting.

2. **Sampling Diverse Reasoning Paths**: Instead of greedily decoding the optimal reasoning path, use a "sample-and-marginalize" decoding procedure. This involves sampling from the language model's decoder to generate a diverse set of reasoning paths.

3. **Marginalizing Reasoning Paths**: Each reasoning path might lead to a different final answer. Determine the optimal answer by marginalizing out the sampled reasoning paths to find the most consistent answer in the final answer set.

### **Examples**:
- **Chain-of-Thought**: For the question "If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot?", instead of directly answering "5", the model might respond with the reasoning: "There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5."

- **Self-Consistency**: For a question about how much Janet makes from selling eggs, the model might generate multiple reasoning paths like:
  1. "She has 16 - 3 - 4 = 9 eggs left. So she makes $2 * 9 = $18 per day."

  2. "This means she sells the remainder for $2 * (16 - 4 - 3) = $26 per day."
  
  3. "She eats 3 for breakfast, so she has 16 - 3 = 13 left. Then she bakes muffins, so she has 13 - 4 = 9 eggs left. So she has 9 eggs * $2 = $18."

  The model then aggregates these paths to determine the most consistent answer, which in this case is "$18 per day."

Self-consistency prompting improves the reasoning capabilities of language models by exploring diverse reasoning paths and selecting the most consistent answer.

This method has shown significant performance boosts on various arithmetic and commonsense reasoning benchmarks.

# Begin by setting up CoT prompts

We're following the same pattern from the Chain of Thought lesson.

1. Downloading CoT prompt datasets from HuggingFace

2. Downloading embeddings model from HuggingFace

3. Creating prompt template for CoT

4. Creating an example selector

5. Construct the prompt

In [None]:
%%capture
from datasets import load_dataset

dataset = load_dataset("kaist-ai/CoT-Collection", split="train", trust_remote_code=True)

dataset = dataset.remove_columns(['task', 'type'])

dataset = dataset.shuffle(seed=42)

dataset = dataset.select(range(10_000))

dataset = dataset.to_pandas()

selected_examples = dataset.to_dict(orient='records')

In [None]:
%%capture
from langchain_community.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-base-en-v1.5"

encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity

embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs={'device': 'cuda'},
    encode_kwargs=encode_kwargs
)

In [None]:
from langchain.prompts.example_selector import MaxMarginalRelevanceExampleSelector
from langchain_community.vectorstores import FAISS
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

prefix = "Consider the following as examples of how to reason:"

examples_template = """Query: {source}

Rationale: {rationale}

Response: {target}
"""

suffix = """Using a similar reasoning approach, answer the users question which is delimited by triple backticks.

User question: ```{input}```

Take a deep breath, break down the user's query step-by-step, and provide a clear chain of thought in your response."
"""

examples_prompt = PromptTemplate(
    input_variables=["source", "rationale", "target"],
    template=examples_template
)

In [None]:
example_selector = MaxMarginalRelevanceExampleSelector.from_examples(
    selected_examples,
    embeddings,
    FAISS,
    k=5,
)

mmr_prompt = FewShotPromptTemplate(
    example_selector=example_selector,
    example_prompt=examples_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["input"]
)

In [None]:
query = """There were nine computers in the server room. Five more computers were installed each day, from
monday to thursday. How many computers are now in the server room?
"""
prompt = mmr_prompt.format(input=query)

In [None]:
print(prompt)

# Self-Consistency Prompt

In [None]:
sc_template = """Based on the responses (delimited by < >) to the following query, \
(delimited by triple backticks) return the response that occurs most frequently.

Query: ```{query}```

Responses: <{responses}>
"""

sc_prompt = PromptTemplate.from_template(sc_template)

### Generating multiple responses


Use the `n` parameter to generate alternative responses. Increase `n` to explore different variations.



In [None]:
from langchain_openai import OpenAI

# we'll use the default model here, gpt-3.5-turbo-instruct
llm = OpenAI(n=5)

generations = llm.generate([prompt])

In [None]:
generations.generations

In [None]:
responses = []
for item in generations.generations[0]:
    response_index = item.text.find("Response: ")
    if response_index != -1:
        response = item.text[response_index + len("Response: "):].strip()
        responses.append(response)

In [None]:
responses

In [None]:
llm = OpenAI()

final_prompt = sc_prompt.format(query=query, responses=str(responses))

print(final_prompt)

In [None]:
print(llm.invoke(final_prompt))