In [1]:
import time

import kscope

### Conecting to the Service
First we connect to the Kaleidoscope service through which we'll interact with the LLMs and see which models are avaiable to us

In [2]:
# Establish a client connection to the kscope service
client = kscope.Client(gateway_host="llm.cluster.local", gateway_port=3001)

Show all model instances that are currently active

In [3]:
client.model_instances

[{'id': 'c882a882-8d2a-4484-9924-69ef4ba764bb',
  'name': 'falcon-7b',
  'state': 'ACTIVE'},
 {'id': '4f86be56-b90c-4094-a965-4cd73d01a825',
  'name': 'llama2-7b',
  'state': 'ACTIVE'},
 {'id': 'ecd1fdd5-9c6a-40e2-a4d4-ac1a75661bf8',
  'name': 'falcon-40b',
  'state': 'ACTIVE'}]

To start, we obtain a handle to a model. In this example, let's use the Falcon-40B model.

In [4]:
model = client.load_model("falcon-40b")
# If this model is not actively running, it will get launched in the background.
# In this case, wait until it moves into an "ACTIVE" state before proceeding.
while model.state != "ACTIVE":
    time.sleep(1)

In [5]:
small_generation_config = {"max_tokens": 20, "top_k": 1, "top_p": 1.0, "temperature": 0.8}
moderate_generation_config = {"max_tokens": 100, "top_k": 1, "top_p": 1.0, "temperature": 0.8}

Let's ask the model a simple question to start

In [6]:
generation = model.generate("What is the capital of Canada?", small_generation_config)
# Extract the text from the returned generation
print(generation.generation["sequences"][0])


Ottawa
What is the capital of Canada?
Ottawa
What is the capital of


# Few-Shot Chain of Thought Prompting

We'll start by prompting Falcon-40B to solve some word problems and build up to using the Few-Shot CoT method proposed in ["Chain-of-Thought Prompting Elicits Reasoning
in Large Language Models"](https://arxiv.org/pdf/2201.11903.pdf)

First, let's see what happens if we try to solve some word problems with a zero-shot prompt.

In [7]:
zero_shot_prompt = (
    "The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? "
)

print(zero_shot_prompt)

The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? 


In [8]:
generation_example = model.generate(zero_shot_prompt, generation_config=small_generation_config)
print(generation_example.generation["sequences"][0])

(Answer: 29)
The cafeteria had 23 apples. If they used 20 to


The correct answer to this word problem is 9. Zero-shot prompting didn't produce the correct answer.

Now let's try performing a standard few-shot prompt to see if that helps the model provide the correct answer in a format that we can extract.

In [9]:
few_shot_prompt = (
    "Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis "
    "balls does he have now?\nA: The answer is 11.\n\nQ: Benjamin is taking bottle inventory. He has two cases with "
    "15 bottles in each and one with 7. How many bottles are there in total?\nA: The answer is 37.\n\nQ: The "
    "cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?\nA: "
    "The answer is "
)
print(few_shot_prompt)

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?
A: The answer is 11.

Q: Benjamin is taking bottle inventory. He has two cases with 15 bottles in each and one with 7. How many bottles are there in total?
A: The answer is 37.

Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?
A: The answer is 


In [10]:
generation_example = model.generate(few_shot_prompt, generation_config=small_generation_config)
print(generation_example.generation["sequences"][0])

29.

Q: A man has $1.50 in his pocket. He buys


Because of the additional context, the generation process takes a bit longer to perform. Moreover, we are still unable to get the correct answer. On the bright side, the model produces the answer in a way that is extractable.

Now, let's try prompting the model with a few-shot CoT prompt, where we provide an example of the kind of reasoning required to answer the question.

In [11]:
few_shot_cot_prompt = (
    "Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis "
    "balls does he have now?\nA: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. "
    "5 + 6 = 11. The answer is 11.\n\nQ: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 "
    "more, how many apples do they have?\nA:"
)
print(few_shot_cot_prompt)

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?
A: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.

Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?
A:


Note that we switch to the moderate_generation_config to allow the model to generate additional logic. This takes more time for generation.

In [12]:
generation_example = model.generate(few_shot_cot_prompt, generation_config=moderate_generation_config)
print(generation_example.generation["sequences"][0])

 23 - 20 = 3. 3 + 6 = 9. The answer is 9.

Q: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?
A: 23 - 20 = 3. 3 + 6 = 9. The answer is 9.

Q: The cafeteria had 23


While the model doesn't provide quite as much commentary as we did in the example of logic, it does produce arithmetic that leads to the right answer for the first time.

Let's try to compare few-shot prompting with few-shot CoT for slightly different kind of problem. This example is drawn from the AQuA: Algebraic Word Problems task.

In [13]:
few_shot_prompt = (
    "Q: John found that the average of 15 numbers is 40. If 10 is added to each number then the mean of "
    "the numbers is? Answer Choices: (a) 50 (b) 45 (c) 65 (d) 78 (e) 64\nA: The answer is (a).\n\nQ: The capacity of "
    "a tank of dimensions (8 m × 6 m × 2.5 m) is Answer Choices: (a) 120 litres (b) 1200 litres (c) 12000 litres (d) "
    "120000 litres (e) None of these\nA:"
)
print(few_shot_prompt)

Q: John found that the average of 15 numbers is 40. If 10 is added to each number then the mean of the numbers is? Answer Choices: (a) 50 (b) 45 (c) 65 (d) 78 (e) 64
A: The answer is (a).

Q: The capacity of a tank of dimensions (8 m × 6 m × 2.5 m) is Answer Choices: (a) 120 litres (b) 1200 litres (c) 12000 litres (d) 120000 litres (e) None of these
A:


In [14]:
generation_example = model.generate(few_shot_prompt, generation_config=small_generation_config)
print(generation_example.generation["sequences"][0])

 The answer is (b).

Q: The average of 15 numbers is 40


The correct choice for this problem is (d). While the model successfully selects an answer choice in a way we can parse, it selects the wrong answer.

Let's see if we can extract the correct answer with few-shot CoT

In [15]:
few_shot_cot_prompt = (
    "Q: John found that the average of 15 numbers is 40. If 10 is added to each number then the mean of the numbers "
    "is? Answer Choices: (a) 50 (b) 45 (c) 65 (d) 78 (e) 64\nA: If 10 is added to each number, then the mean of the "
    "numbers also increases by 10. So the new mean would be 50. The answer is (a).\n\nQ: The capacity of "
    "a tank of dimensions (8 m × 6 m × 2.5 m) is Answer Choices: (a) 120 litres (b) 1200 litres (c) 12000 litres (d) "
    "120000 litres (e) None of these \nA:"
)
print(few_shot_cot_prompt)

Q: John found that the average of 15 numbers is 40. If 10 is added to each number then the mean of the numbers is? Answer Choices: (a) 50 (b) 45 (c) 65 (d) 78 (e) 64
A: If 10 is added to each number, then the mean of the numbers also increases by 10. So the new mean would be 50. The answer is (a).

Q: The capacity of a tank of dimensions (8 m × 6 m × 2.5 m) is Answer Choices: (a) 120 litres (b) 1200 litres (c) 12000 litres (d) 120000 litres (e) None of these 
A:


In [16]:
generation_example = model.generate(few_shot_cot_prompt, generation_config=moderate_generation_config)
print(generation_example.generation["sequences"][0])

 The capacity of a tank of dimensions (8 m × 6 m × 2.5 m) is 120000 litres. The answer is (d).

Q: The average of 15 numbers is 40. If 10 is added to each number then the mean of the numbers is? Answer Choices: (a) 50 (b) 45 (c) 65 (d) 78 (e) 64 



The model doesn't quite explain its logic fully, but it does enough to allow the model to select an answer choice consistent with the description of the answer in its response.

# Zero-Shot Chain of Thought Prompting

It can be tedious and tricky to form useful and effective reasoning examples. Some research has shown that the choice of reasoning examples in CoT prompting can have a large impact on how well the model accomplishes the downstream task. So let's try a zero-shot CoT approach devised in ["Large Language Models are Zero-Shot Reasoners"](https://arxiv.org/pdf/2205.11916.pdf)

In [17]:
few_shot_prompt = (
    "Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis "
    "balls does he have now?\nA: The answer is 11.\n\nQ: There are 64 students trying out for the school's trivia "
    "teams. If 36 of them didn't get picked for the team and the rest were put into 4 groups, how many students would "
    "be in each group?\nA:"
)
print(few_shot_prompt)

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?
A: The answer is 11.

Q: There are 64 students trying out for the school's trivia teams. If 36 of them didn't get picked for the team and the rest were put into 4 groups, how many students would be in each group?
A:


In [18]:
generation_example = model.generate(few_shot_prompt, generation_config=small_generation_config)
print(generation_example.generation["sequences"][0])

 The answer is 16.

Q: There are 3 people in a room. One


The correct answer to this problem is 7.

Perhaps we can extract the correct answer with zero-Shot CoT is split into two stages:
1) Reasoning Generation
2) Answer Extraction

In [19]:
reasoning_generation_prompt = (
    "Q: There are 64 students trying out for the school's trivia teams. If 36 of them didn't get picked for the team "
    "and the rest were put into 4 groups, how many students would be in each group?\nA: Let’s think step by step."
)
print(reasoning_generation_prompt)

Q: There are 64 students trying out for the school's trivia teams. If 36 of them didn't get picked for the team and the rest were put into 4 groups, how many students would be in each group?
A: Let’s think step by step.


In [20]:
reasoning_generation = model.generate(
    reasoning_generation_prompt, generation_config=moderate_generation_config
).generation["sequences"][0]
print(reasoning_generation)


First, we need to find out how many students are in each group.
We know that there are 64 students trying out for the school’s trivia teams.
We also know that 36 of them didn’t get picked for the team.
So, we can subtract 36 from 64 to find out how many students were picked for the team.
64 – 36 = 28
Now, we need to divide 28 by 4 to find


In [21]:
answer_extraction_prompt = f"{reasoning_generation_prompt}{reasoning_generation}\nTherefore, the answer is"
print(answer_extraction_prompt)

Q: There are 64 students trying out for the school's trivia teams. If 36 of them didn't get picked for the team and the rest were put into 4 groups, how many students would be in each group?
A: Let’s think step by step.
First, we need to find out how many students are in each group.
We know that there are 64 students trying out for the school’s trivia teams.
We also know that 36 of them didn’t get picked for the team.
So, we can subtract 36 from 64 to find out how many students were picked for the team.
64 – 36 = 28
Now, we need to divide 28 by 4 to find
Therefore, the answer is


In [22]:
answer_generation = model.generate(answer_extraction_prompt, generation_config=moderate_generation_config).generation[
    "sequences"
][0]
print(answer_generation)

 7.
Q: A school has 100 students. If 60 of them are girls and the rest are boys, how many boys are there?
A: Let’s think step by step.
First, we need to find out how many boys are in the school.
We know that there are 100 students in the school.
We also know that 60 of them are girls.
So, we can subtract 60 from 100 to find out how many


After "thinking step by step" the model is able to derive and verbalize the correct answer! It should be noted, that, in general zero-shot CoT underperforms well-constructed few-shot CoT in the results. However, the key is "well-constructed." Zero-shot CoT removes the need for significant logic demonstration engineering.