# 🧮⛓🧠[Chain-of-Thought](https://arxiv.org/abs/2201.11903) Prompt Engineering for Math Reasoning на датасете [GSM8K](https://github.com/openai/grade-school-math)

Были проведены эксперименты с моделью [**BLOOM-7B**](https://huggingface.co/bigscience/bloom-7b1-petals), связанные с использованием жадного подхода к генерации и различных параметров семплирования (температура, top p).

Кроме того, исследованны различные подходы к генрации промптов:
- [Chain-of-Thought](https://arxiv.org/abs/2201.11903)(1 пример решения)
- Chain-of-Thought (2 примера решения)
- [Complexity-based Chain-of-Thought](https://arxiv.org/abs/2210.00720) (выбор примера для промпта по сложности решения)

Также рассмотренны различные подходы к ансамблированию:
- [Self-Consistency](https://arxiv.org/abs/2203.11171)
- [Complexity based](https://arxiv.org/abs/2210.00720)
- Ансамблирование тектов, сгенерированных по разным промптам.

Результатат: ни одиин подход не привел к значительным улучшениям и правильный ответ ни разу не был получен. В некоторых случаях решения движутся в верном направлении, но пропускается 1 или несколько шагов решения, либо совершаются ошибки в вычислениях. Однако в большинстве случаев генерируемые шаги решения не связаны с задачей, либо, наоборот, дублируют текст задачи.
Кроме того, модель редко генерирует ответ в формате "The answer is ...", что затрудняет извлечение ответа из решения. Был предложен следующий способ решения этой проблемы: добпавлять эту фразы искусственно: добавлять эту фразу к сгенерированному решению, а потом подавать модели, чтобы она сгенерировала еще один токен. Однако метод срабатывает не во всех случаях: часто вместо числа генерируются слова или символы. Поэтому, в случаях, когда в тексте не было этой фразы, извлекалось последнее сгенерированное число.
**Вероятно, 7 миллиардов параметров недостаточно для решения математических задач.** Это так же подтверждается [исследованием](https://arxiv.org/abs/2206.07682).

Модель **BLOOM-176** запустить в Google Colab не удалось, поэтому эксперименты были проведены в веб-интерфейсе [Hosted inference API на сайте HuggingFace](https://huggingface.co/bigscience/bloom).
Были рассмотрены результаты генерации решений двух задач по промптам, перечисленным выше с использованием двух подходов к генерации текста (greedy и sampling). **Модель генерирует намного более осмысленные решения, которые однако, не всегда являются верными.**

#Подготовка к экспериментам

##Загрузка модели

In [1]:
%pip install -q petals

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/92.3 KB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.3/92.3 KB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.8/86.8 KB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m191.5/191.5 KB[0m [31m10.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.8/5.8 MB[0m [31m69.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.1/9.1 MB[0m [31m80.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m182.4/182.4 KB[0m [31m17.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.9/55.9 MB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
MODEL_NAME = "bigscience/bloom-7b1-petals"

In [3]:
""""from transformers import BloomTokenizerFast
tokenizer = BloomTokenizerFast.from_pretrained(MODEL_NAME)
tokenizer.padding_side = 'left'
tokenizer.pad_token = tokenizer.eos_token"""

'"from transformers import BloomTokenizerFast\ntokenizer = BloomTokenizerFast.from_pretrained(MODEL_NAME)\ntokenizer.padding_side = \'left\'\ntokenizer.pad_token = tokenizer.eos_token'

In [4]:
"""from petals import DistributedBloomForCausalLM
model = DistributedBloomForCausalLM.from_pretrained(MODEL_NAME, tuning_mode="ptune", pre_seq_len=500, device_map ='auto')"""

'from petals import DistributedBloomForCausalLM\nmodel = DistributedBloomForCausalLM.from_pretrained(MODEL_NAME, tuning_mode="ptune", pre_seq_len=500, device_map =\'auto\')'

##Загрузка датасета

In [5]:
!git clone https://github.com/openai/grade-school-math

Cloning into 'grade-school-math'...
remote: Enumerating objects: 36, done.[K
remote: Counting objects: 100% (20/20), done.[K
remote: Compressing objects: 100% (13/13), done.[K
remote: Total 36 (delta 10), reused 7 (delta 7), pack-reused 16[K
Unpacking objects: 100% (36/36), 3.13 MiB | 3.20 MiB/s, done.


In [6]:
%cd /content/grade-school-math/grade_school_math/
import dataset, sample

/content/grade-school-math/grade_school_math


In [7]:
examples_raw = dataset.get_examples('train')+dataset.get_examples('test')
print(f'{len(examples_raw)} examples in total')

7473 train examples
1319 test examples
8792 examples in total


##Загрузка классов и функций

In [8]:
%cd /content
!git clone https://github.com/anna-marshalova/TLab-2023.-NLP-Research
%cd /content/TLab-2023.-NLP-Research

/content
Cloning into 'TLab-2023.-NLP-Research'...
remote: Enumerating objects: 97, done.[K
remote: Counting objects: 100% (97/97), done.[K
remote: Compressing objects: 100% (69/69), done.[K
remote: Total 97 (delta 54), reused 66 (delta 27), pack-reused 0[K
Unpacking objects: 100% (97/97), 68.70 KiB | 1.86 MiB/s, done.
/content/TLab-2023.-NLP-Research


In [9]:
%cd /content/TLab-2023.-NLP-Research
from example import Example
from prompt_generator import PromptGenerator
from generator import Generator
from solution_parser import SolutionParser
from utils import save_experiment_results, load_experiment_results, sort_thoughts_by_complexity

/content/TLab-2023.-NLP-Research


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


#Эксперименты с BLOOM-7B

In [15]:
examples = [Example(ex) for ex in examples_raw]
#generator = Generator(model, tokenizer, examples=examples)
solution_parser = SolutionParser()

##Эксперименты с вариантами семплирования

Для начала возьмем всего 2 примера из выборки и проведем эксперименты с различными вариантами семплирования.

In [12]:
START = 0
END = 2
questions = [ex.question for ex in examples[START:END]]
answers = [ex.answer for ex in examples[START:END]]

In [None]:
#small_experiments = {}

In [13]:
small_experiments = load_experiment_results(model_name = MODEL_NAME, suffix='sampling_experiments')

Зафиксируем случайный пример задачи для промпта, чтобы можно было сравнивать результаты

In [14]:
#import random
#i = random.randint(0, len(examples))
PREFIX_INDEX= 3147
print(f'{PREFIX_INDEX}:{examples[PREFIX_INDEX]}')
#generator = Generator(model, tokenizer, examples, prefix_indices=[PREFIX_INDEX])

3147:Q: Peter needs to buy birdseed to last a week. He knows that each parakeet eats 2 grams a day. His parrots eat 14 grams a day. His finches eat half of what a parakeet eats. If he has 3 parakeets, 2 parrots and 4 finches, how many grams of birdseed does he need to buy?
A: His parakeets eat 6 grams a day, because three parakeets times two grams each equals 6 grams. His parrots will eat 28 grams a day because two parrots times 14 grams equals 28 grams. Each finch will eat 1 gram a day, because they eat half of what a parakeet does and 1/2 of 2 equals one. His finches will eat 4 grams a day because four finches times 1 gram equals 4 grams. His birds eat 38 grams a day because 6 plus 28 plus 4 equals 38 Each week has 7 days Therefore, in a week he will need 266 grams because 7 times 38 equals 266. 
The answer is 266.</s>




###Greedy decoding

In [15]:
EXPERIMENT_NAME='greedy_temp_1'

In [None]:
#small_experiments[EXPERIMENT_NAME] = generator.generate_batch(questions, answers)

  0%|          | 0/2 [00:00<?, ?it/s]

Q: Peter needs to buy birdseed to last a week. He knows that each parakeet eats 2 grams a day. His parrots eat 14 grams a day. His finches eat half of what a parakeet eats. If he has 3 parakeets, 2 parrots and 4 finches, how many grams of birdseed does he need to buy?
A: His parakeets eat 6 grams a day, because three parakeets times two grams each equals 6 grams. His parrots will eat 28 grams a day because two parrots times 14 grams equals 28 grams. Each finch will eat 1 gram a day, because they eat half of what a parakeet does and 1/2 of 2 equals one. His finches will eat 4 grams a day because four finches times 1 gram equals 4 grams. His birds eat 38 grams a day because 6 plus 28 plus 4 equals 38 Each week has 7 days Therefore, in a week he will need 266 grams because 7 times 38 equals 266. 
The answer is 266.</s>


Q: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 
A: Natalia

In [16]:
solution_parser.print_solutions(solution_parser.parse_solutions(small_experiments[EXPERIMENT_NAME]))

1:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:  Natalia sold 48 clips in April, and then she sold half as many clips in May. So, in April, she sold 48 clips, and in May, she sold 36 clips. So, in April, she sold 48 clips, and in May
The answer is 48.</s>


2:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:  $12 x 50 minutes = $600 B: $12 x 60 minutes = $720 C: $12 x 70 minutes = $840 D: $12 x 80 minutes = $960 E: $12 x 90 minutes = $1200 F:
The answer is 1200.</s>




Решения далеко от прваильных, но, кажется, что модель на правильном пути. Посмотрим другие варианты семплирования

###Sampling with temperature

In [17]:
EXPERIMENT_NAME='sample_temp_1'

In [None]:
#small_experiments[EXPERIMENT_NAME] = generator.generate_batch(questions, answers, do_sample=True, temperature=1)

  0%|          | 0/2 [00:00<?, ?it/s]

Q: Peter needs to buy birdseed to last a week. He knows that each parakeet eats 2 grams a day. His parrots eat 14 grams a day. His finches eat half of what a parakeet eats. If he has 3 parakeets, 2 parrots and 4 finches, how many grams of birdseed does he need to buy?
A: His parakeets eat 6 grams a day, because three parakeets times two grams each equals 6 grams. His parrots will eat 28 grams a day because two parrots times 14 grams equals 28 grams. Each finch will eat 1 gram a day, because they eat half of what a parakeet does and 1/2 of 2 equals one. His finches will eat 4 grams a day because four finches times 1 gram equals 4 grams. His birds eat 38 grams a day because 6 plus 28 plus 4 equals 38 Each week has 7 days Therefore, in a week he will need 266 grams because 7 times 38 equals 266. 
The answer is 266.</s>


Q: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 
A: This qu

In [18]:
solution_parser.print_solutions(solution_parser.parse_solutions(small_experiments[EXPERIMENT_NAME]))

1:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:  This question has been asked before for the same problem. Please give the idea how to move forward. Before moving forward a bit  i thought of this problem as,  oh here you go,  since the sum of that first row is less than 384000
The answer is 384000.</s>


2:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:  $2.50 and 10 minutes.  My book says solutions are  1.5 hours multiplied by the rate for 1/2 hour and 50 minutes multiplied by the rate for 1 hour   1.5x(0.50) + 10 forms of ^(
The answer is 10.</s>




Решения все еще далеки от верных, при этом появляется много текста, который не относится непосредственно к задаче. Похоже на комментарии на сайте для решения задач.

In [19]:
EXPERIMENT_NAME='sample_temp_05'

In [None]:
#small_experiments[EXPERIMENT_NAME] = generator.generate_batch(questions, answers, do_sample=True, temperature=0.5)

In [20]:
solution_parser.print_solutions(solution_parser.parse_solutions(small_experiments[EXPERIMENT_NAME]))

1:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:  Natalia sold 48 clips in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?  C. Natalia sold 48 clips in April, and then she sold half as many clips in May.
The answer is 48.</s>


2:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:  12 * 50 = $600 B: 12 * 60 = $720 C: 12 * 70 = $840 D: 12 * 80 = $960 E: 12 * 90 = $1200 ANSWER: E My reasoning is that the bab
The answer is 1200.</s>




###Увеличиваем Top P

In [21]:
EXPERIMENT_NAME='sample_top_p_0.9'

In [None]:
#small_experiments[EXPERIMENT_NAME] = generator.generate_batch(questions, answers, do_sample = True, top_p=0.9)

  0%|          | 0/2 [00:00<?, ?it/s]

Q: Peter needs to buy birdseed to last a week. He knows that each parakeet eats 2 grams a day. His parrots eat 14 grams a day. His finches eat half of what a parakeet eats. If he has 3 parakeets, 2 parrots and 4 finches, how many grams of birdseed does he need to buy?
A: His parakeets eat 6 grams a day, because three parakeets times two grams each equals 6 grams. His parrots will eat 28 grams a day because two parrots times 14 grams equals 28 grams. Each finch will eat 1 gram a day, because they eat half of what a parakeet does and 1/2 of 2 equals one. His finches will eat 4 grams a day because four finches times 1 gram equals 4 grams. His birds eat 38 grams a day because 6 plus 28 plus 4 equals 38 Each week has 7 days Therefore, in a week he will need 266 grams because 7 times 38 equals 266. 
The answer is 266.</s>


Q: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 
A: ${48} +

In [22]:
solution_parser.print_solutions(solution_parser.parse_solutions(small_experiments[EXPERIMENT_NAME]))

1:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:   In April she sold 8 clips, so 8 + 8 = 16 clips in total.
The answer is 16.</s>


2:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:  Her babysitting fee was $4 for 50 minutes. The babysitting fee has 3 parts:  - Fee per hour of work - Fee per minute of work - Fee per child you are babysitting Suppose the
The answer is 3.</s>




In [23]:
EXPERIMENT_NAME='sample_top_p_0.5'

In [None]:
#small_experiments[EXPERIMENT_NAME] = generator.generate_batch(questions, answers, do_sample = True, top_p=0.5)

  0%|          | 0/2 [00:00<?, ?it/s]

Q: Peter needs to buy birdseed to last a week. He knows that each parakeet eats 2 grams a day. His parrots eat 14 grams a day. His finches eat half of what a parakeet eats. If he has 3 parakeets, 2 parrots and 4 finches, how many grams of birdseed does he need to buy?
A: His parakeets eat 6 grams a day, because three parakeets times two grams each equals 6 grams. His parrots will eat 28 grams a day because two parrots times 14 grams equals 28 grams. Each finch will eat 1 gram a day, because they eat half of what a parakeet does and 1/2 of 2 equals one. His finches will eat 4 grams a day because four finches times 1 gram equals 4 grams. His birds eat 38 grams a day because 6 plus 28 plus 4 equals 38 Each week has 7 days Therefore, in a week he will need 266 grams because 7 times 38 equals 266. 
The answer is 266.</s>


Q: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 
A: 48 x 4 

In [24]:
solution_parser.print_solutions(solution_parser.parse_solutions(small_experiments[EXPERIMENT_NAME]))

1:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:   Let me give you an example to help you understand the problem. Suppose you have the following table with the
The answer is None.</s>


2:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:   12 * 50 = $600 B:  12 * 60 = $720 C:  12 * 80 = $960 D:  12 * 100 = $1200 E : 12 * 150 = $1800 The correct answer is C.  
The answer is 1800.</s>




В целом решения кардинально не отличаются от предыдущих.

##Как заставить модель генерировать фразу "The answer is"?

Возникла проблема: модель редко генерирует ответ в формате "The answer is ...", хотя в примерах из статей эта фраза всегда есть (возможно, авторы просто подобрали наиболее удачные примеры). 

###Используем 2 задачи в промпте

Попробуем использовать не один, а два примера решений для промптов. Может быть так, модель начнет лучше "понимать" формат, в котором нужно вывести ответ.

In [25]:
EXPERIMENT_NAME='2_prompts_greedy_temp_1'

In [26]:
#import random
#i = random.randint(0, len(examples))
PREFIX_INDEX= 3147
PREFIX_INDEX_2= 3185
print(f'{PREFIX_INDEX_2}:{examples[PREFIX_INDEX_2]}')
#generator2 = Generator(model, tokenizer, examples, prefix_indices=[PREFIX_INDEX, PREFIX_INDEX_2])

3185:Q: Gordon owns 3 restaurants, his first restaurant serves 20 meals, his second restaurant serves 40 meals, and his third restaurant serves 50 meals per day. How many meals do his 3 restaurants serve per week?
A: Gordon serves 20 x 7 = 140 meals in his first restaurant per week. He serves 40 x 7= 280 meals in his second restaurant per week. At the third restaurant, he serves 50 x 7 = 350 meals per week. Therefore, he serves 140 + 280 + 350 = 770 meals in total per week. 
The answer is 770.</s>




In [None]:
#small_experiments[EXPERIMENT_NAME] = generator2.generate_batch(questions, answers)

  0%|          | 0/2 [00:00<?, ?it/s]

Q: Peter needs to buy birdseed to last a week. He knows that each parakeet eats 2 grams a day. His parrots eat 14 grams a day. His finches eat half of what a parakeet eats. If he has 3 parakeets, 2 parrots and 4 finches, how many grams of birdseed does he need to buy?
A: His parakeets eat 6 grams a day, because three parakeets times two grams each equals 6 grams. His parrots will eat 28 grams a day because two parrots times 14 grams equals 28 grams. Each finch will eat 1 gram a day, because they eat half of what a parakeet does and 1/2 of 2 equals one. His finches will eat 4 grams a day because four finches times 1 gram equals 4 grams. His birds eat 38 grams a day because 6 plus 28 plus 4 equals 38 Each week has 7 days Therefore, in a week he will need 266 grams because 7 times 38 equals 266. 
The answer is 266.</s>


Q: Gordon owns 3 restaurants, his first restaurant serves 20 meals, his second restaurant serves 40 meals, and his third restaurant serves 50 meals per day. How many meal

In [27]:
solution_parser.print_solutions(solution_parser.parse_solutions(small_experiments[EXPERIMENT_NAME], num_problems=2))

1:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:  Natalia sold 48 clips in April, and then she sold half as many clips in May. So, she sold 48 clips in April, and then she sold half as many clips in May. So, she sold 48 clips in April, and then she
The answer is 48.</s>


2:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:   The answer is $12 x 50/60 = $9.25.
The answer is 12.</s>




Для последней задачи модель предложила правильное решение, хотя ошиблась в вычислениях.

Нужная фраза появилась в решении один раз, хотя, возможно, это произошло случайно. Попробуем использовать только второй пример решения из нашего набора.

In [None]:
#EXPERIMENT_NAME=f'2_prompts_greedy_temp_1_{PREFIX_INDEX_2}'

In [None]:
#generator_ = Generator(model, tokenizer, examples, prefix_indices=[PREFIX_INDEX_2])
#small_experiments[EXPERIMENT_NAME] = generator2.generate_batch(questions, answers, do_sample=True, temperature=0.5)

  0%|          | 0/2 [00:00<?, ?it/s]

Q: Peter needs to buy birdseed to last a week. He knows that each parakeet eats 2 grams a day. His parrots eat 14 grams a day. His finches eat half of what a parakeet eats. If he has 3 parakeets, 2 parrots and 4 finches, how many grams of birdseed does he need to buy?
A: His parakeets eat 6 grams a day, because three parakeets times two grams each equals 6 grams. His parrots will eat 28 grams a day because two parrots times 14 grams equals 28 grams. Each finch will eat 1 gram a day, because they eat half of what a parakeet does and 1/2 of 2 equals one. His finches will eat 4 grams a day because four finches times 1 gram equals 4 grams. His birds eat 38 grams a day because 6 plus 28 plus 4 equals 38 Each week has 7 days Therefore, in a week he will need 266 grams because 7 times 38 equals 266. 
The answer is 266.</s>


Q: Gordon owns 3 restaurants, his first restaurant serves 20 meals, his second restaurant serves 40 meals, and his third restaurant serves 50 meals per day. How many meal

In [28]:
solution_parser.print_solutions(solution_parser.parse_solutions(small_experiments[EXPERIMENT_NAME]))

1:Q:  Gordon owns 3 restaurants, his first restaurant serves 20 meals, his second restaurant serves 40 meals, and his third restaurant serves 50 meals per day. How many meals do his 3 restaurants serve per week?
A:  Gordon serves 20 x 7 = 140 meals in his first restaurant per week. He serves 40 x 7= 280 meals in his second restaurant per week. At the third restaurant, he serves 50 x 7 = 350 meals per week. Therefore, he serves 140 + 280 + 350 = 770 meals in total per week.  The answer is 770.
The answer is 770.</s>


2:Q:  Gordon owns 3 restaurants, his first restaurant serves 20 meals, his second restaurant serves 40 meals, and his third restaurant serves 50 meals per day. How many meals do his 3 restaurants serve per week?
A:  Gordon serves 20 x 7 = 140 meals in his first restaurant per week. He serves 40 x 7= 280 meals in his second restaurant per week. At the third restaurant, he serves 50 x 7 = 350 meals per week. Therefore, he serves 140 + 280 + 350 = 770 meals in total per week.

Решение другое и неправильное. Нельзя исключать, что использование 2 примеров в промпте, улучшает решения. Вернемся к этому способу позже, а сейчас попробуем другой способ "заставить" модель генерировать фразу "The answer is"

###Добавляем фразу "The answer is" при генерации

Согласно [этой статье](https://arxiv.org/pdf/2201.11903.pdf) проблема с генерацией фразы "The answer is" может происходить из-за небольшого числа параметров модели.
Попробуем выводить ее искуственно: добавлять ее к сгенерированному решению, а потом подавать модели, чтобы она сгенерировала еще один токен (сам ответ).

In [29]:
EXPERIMENT_NAME='adding_the_answer_is_greedy_temp_1'

In [30]:
#generator_add_phrase = Generator(model, tokenizer, examples, prefix_indices = [PREFIX_INDICES], add_phrase = 'The answer is')
#small_experiments[EXPERIMENT_NAME] = generator_add_phrase.generate_batch(questions, answers)

In [31]:
solution_parser.print_solutions(solution_parser.parse_solutions(small_experiments[EXPERIMENT_NAME]))

1:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:  I think the answer is $$ N(4,5) = \frac{48\cdot 5}{(4+5)\cdot (4+5)} = \frac{48\cdot 5}{(4+5)\cdot (4+5)\cdot (4+5The answer is 44
The answer is 44.</s>


2:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:  $12 x 50 = $600 B: $12 x 30 = $300 C: $12 x 20 = $200 D: $12 x 15 = $150 E: $12 x 10 = $120 
The answer is 120.</s>




В первом случае фраза помогла. Во втором сгенерировался не числовой ответ. Другие эксперименты (резульататы не сохранили) показали, что иногда после фразы генерируются слова или символы ($). При этом при таком способе время инференса увеличивается в 1,5-2 раза.

Т.к. способ не дает значительных улучшений, для дальнейших экспериментов будем брать последнее число в сгенерированном тексте как делают, например, в [этой статье](https://arxiv.org/pdf/2205.11916.pdf). 

In [32]:
save_experiment_results(small_experiments, MODEL_NAME, suffix='sampling_experiments')

##Regular CoT

In [33]:
START = 0
END = 20
questions = [ex.question for ex in examples[START:END]]
answers = [ex.answer for ex in examples[START:END]]

In [34]:
SUFFIX = 'greedy_cot'

In [None]:
#solutions = generator.generate_batch(questions, answers)
#save_experiment_results(solutions, MODEL_NAME, suffix=SUFFIX)

In [35]:
solutions = load_experiment_results(model_name = MODEL_NAME, suffix=SUFFIX)

In [36]:
solution_parser = SolutionParser()
parsed_solutions = solution_parser.parse_solutions(solutions)
solution_parser.compare_answers(parsed_solutions, answers)

30 72
12.50 10
15 5
12 42
2 624
3 35
8 48
2 16
18 41
108.00 990
847 121
8 5
5 85
15 35
24 5
365 448000
5 800
None 43
None 16
1 16


Правильный ответ ни разу получить не удалось:( При этом полученные ответы даже не близко к правильным. Посмотрим на сами решения

In [37]:
solution_parser.print_solutions(parsed_solutions)

1:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:  I have a table with the following data: Month   Clips Sold Jan      20 Feb      30 Mar      40 Apr      48 May      30  I want to find the total number of clips sold in April and May
The answer is 30.</s>


2:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:   The answer is $12.50 because  the babysitting is
The answer is 12.50.</s>


3:Q:  Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?
A:  Betty needs $75 to buy the wallet. My solution: $$ \begin{align} & 15 + 15 + 15 + 15 + 15 + 15 + 15 + 15 + 15 + 15 + 15 + 15 + 15 + 15 + 15 + 15 + 15
The a

В решениях часто просто повторяются числа из условия или выводится что-то, не связанное с задачей.

##2 задачи в промпте

In [38]:
START = 0
END = 20
questions = [ex.question for ex in examples[START:END]]
answers = [ex.answer for ex in examples[START:END]]

In [39]:
SUFFIX = '2_prompts_greedy_cot'

In [42]:
#solutions = generator.generate_batch(questions, answers, num_problems=2)
#save_experiment_results(solutions, MODEL_NAME, suffix=SUFFIX)

In [43]:
solutions = load_experiment_results(model_name = MODEL_NAME, suffix=SUFFIX)

In [44]:
solution_parser = SolutionParser()
parsed_solutions = solution_parser.parse_solutions(solutions, num_problems=2)
solution_parser.compare_answers(parsed_solutions, answers)

96 72
12.50 10
75 5
11.5 42
2 624
25 35
96 48
2 16
200 41
8 990
847 121
45 5
5 85
25 35
960 5
500 448000
2 800
2 43
None 16
1 16


Правильный ответ ни разу получить не удалось:( При этом полученные ответы даже не близко к правильным. Посмотрим на сами решения

In [45]:
solution_parser.print_solutions(parsed_solutions)

1:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:  Natalia sold 48 clips in April, and then she sold half as many clips in May. So, she sold 72 clips in April and May. Therefore, she sold 96 clips in April and May. Therefore, she sold 96 clips in April and May
The answer is 96.</s>


2:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:   The answer is $12.50 because she did 50 minutes of babysitting and earned $12.50 for that.
The answer is 12.50.</s>


3:Q:  Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?
A:  Betty needs $75 to buy the wallet. I don't understand why the answer is $75. I thought

Решения стали более осмысленными. Например, для первой задачи в решении промелькнул верный ответ, к которому привело верное решение, но потом модель "передумала".
В задаче 14 правильный ход решения, но пропущен один шаг (цену чеддера нужно было домножить на 2).
С некоторомы задачами (19) модель справляется все так же плохо.

##Complexity based prompts

Попробуем выбрать лучший промпт. В [статье](https://arxiv.org/pdf/2210.00720.pdf), предлагается использовать промпты с бОльшим количеством шагов решения. Будем считать, что один шаг решения составляет одно предложение, т.е. нужно найти решение, состоящее из наибольшего количества предложений.
Так же, на мой взгляд, можно использовать длину решения в символах: чем длиннее решение, тем сложнее (и, возможно, разнообразнее) каждый из шагов решения.
Кроме того, даже при использовании sent_tokenizer, в некоторых случаях текст разбивается на предложения неправильно - по точкам после сокращений (например, 'The solid wall is 8 ft. * 20 ft. = 160 sq.', 'ft.')
В итоге для поиска лучшего промпта я использовала сумму логарифмов количества предложений и количества символов в решении, так как это величины разного порядка.


In [10]:
SUFFIX = 'complx_based_greedy_cot'

In [16]:
thought_lengths = sort_thoughts_by_complexity(examples)
thought_lengths[:10]

[(7302, 9.352620879232953),
 (7364, 9.137447046455858),
 (1205, 9.009569671513699),
 (4483, 8.982937763741589),
 (399, 8.857799727175905),
 (6274, 8.851520271691061),
 (6714, 8.728102205062104),
 (4851, 8.689464412356692),
 (1722, 8.68490859582083),
 (4379, 8.663714844079005)]

Рассмотрим два первых кандидата. В первом много сокращений, засчет чего он, видимо, и получил высокий скор. Второй намного длиннее и состоит из 9 предложений. Остановимся на нем. Кстати, в упомянутой выше статье в пример приводится именно это решение.

In [48]:
print(examples[7302])
print(examples[7364])

Q: Linda is painting her bedroom. Her bedroom has 4 walls, with the room being 20 feet wide by 20 feet long by 8 feet tall. One wall has a 3-foot by 7-foot doorway. A second wall has a 6-foot by 4-foot window. A third wall has a 5-foot by 7-foot doorway to a walk-in-closet. And the fourth wall is completely solid. What is the total area of wall space that Linda will have to paint?
A: The solid wall is 8 ft. * 20 ft. = 160 sq. ft. The doorway is 3 ft. * 7 ft. = 21 sq. ft. The window is 6 ft. * 4 ft. = 24 sq. ft. The closet door is 5 ft. * 7 ft. = 35 sq. ft. The total area of the doors and windows is 21 sq. ft + 24 sq. ft. + 35 sq. ft. = 80 sq. ft. The solid wall is 160 sq. ft., so before the areas of the doors and window are taken into account, the total wall area is 4 * 160 sq. ft. = 640 sq. ft. Taking into account the doors and window, the total wall area Linda will have to paint is 640 sq. ft. - 80 sq. ft. = 560 sq. ft. 
The answer is 560.</s>


Q: Angelo and Melanie want to plan how

In [None]:
#generator_complx = Generator(model, tokenizer, prefix_indices=[7364])
#solutions = generator_complx.generate_batch(questions, answers)
#save_experiment_results(solutions, MODEL_NAME, suffix=SUFFIX)

In [49]:
solutions = load_experiment_results(model_name = MODEL_NAME, suffix=SUFFIX)

In [50]:
solution_parser = SolutionParser()
parsed_solutions = solution_parser.parse_solutions(solutions)
solution_parser.compare_answers(parsed_solutions, answers)

576 72
12.50 10
None 5
120 42
2 624
80 35
16 48
2 16
18 41
8 990
300 121
5 5
55 85
None 35
24 5
500,000 448000
2500 800
12 43
None 16
20 16


Правильный ответов, к сожалению, снова нет.

In [51]:
solution_parser.print_solutions(parsed_solutions)

1:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:   I think the answer is   48 * (48 - 24) = 576  but I don't know how to
The answer is 576.</s>


2:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:   The answer is $12.50. The problem is that the babysitter is paid by the hour, not by the minute. So, if she did 50 minutes of babysitting, she would have earned
The answer is 12.50.</s>


3:Q:  Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?
A:   The problem is asking for the difference between
The answer is None.</s>


4:Q:  Julie is reading a 120-page book. Yesterday, she was able to read 12 pag

##Self-consistency

Попробуем использовать ансамблированный CoT. Сгенерируем по 10 решений для каждого промпта и выберем ответы, которых оказалось больше всего.
Учитывая небольшое количество генерируемых решений, не исключено, что каждый из ответов встретится по одному разу. Тогда в качестве вспомогательного метода можно использовать метод, предложенный в [статье](https://arxiv.org/pdf/2210.00720.pdf), т.е. выбирать наиболее сложное решение. По описанным выше соображениям, воспользуемся сумму логарифмов количества предложений и количества символов в решении.

In [52]:
SUFFIX = 'self_consistency_cot'

In [53]:
START = 0
END = 10
questions = [ex.question for ex in examples[START:END]]
answers = [ex.answer for ex in examples[START:END]]

In [None]:
#solutions = generator.generate_batch(questions, answers, ensemble_size=10, do_sample = True)
#save_experiment_results(solutions, MODEL_NAME, suffix=SUFFIX)

In [54]:
solutions = load_experiment_results(model_name = MODEL_NAME, suffix=SUFFIX)

In [55]:
solution_parser = SolutionParser(consistency='self')
parsed_solutions = solution_parser.choose_answers(solutions)
solution_parser.compare_answers(parsed_solutions, answers)

There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
50 72
50 10
2 5
20 42
2 624
21 35
2 48
917 16
200 41
18.00 990


В 6 случаях из 10 самые частые среди ответов все-таки были (хотя и оказались неправильными). Однако стоит проверить почему так получилось: может быть модель вывела числа, которые были в условиях задач?

In [56]:
solution_parser.print_solutions(solution_parser.parse_solutions(solutions)[1])

1:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:  What do I know about this name Weng?     I put down my numeric math skills to the possibility of making a typo, and got another image of the numbers. The a at 035 I wrote as b and figured out that it actually meant
The answer is 035.</s>


2:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:  $1200  b 43) Weng earns $5000 for babysitting. Her hourly rate for that job is $30. How much did she earn (i.e., The total she earned before tax for the work she did last week
The answer is 30.</s>


3:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:  $12 B: $105. From the chart of the tasks given, I found out that in order to estimate a parentâs hourly rate, they count the total number of hours worked. 1. 2. 3. 4. 5
The answer is 5.<

Действительно два раза повторяется ответ 50, и это число было в вопросе.

In [57]:
solution_parser.print_solutions(solution_parser.parse_solutions(solutions)[2])

1:Q:  Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?
A:  $= $cot(100^10) $cos(100^10) = sqrt(3) = sqrt(2) $= $40$ $= sqrt(0.5$) = $\frac{\sqrt{2}}{
The answer is 2.</s>


2:Q:  Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?
A:  Among the family members;
The answer is None.</s>


3:Q:  Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?
A:  Calling the money in 

Здесь судя по всему по случайности три раза повторяется ответ 2. При этом такого числа в условии задачи не было, если не считать слова twice.

##Self-consistency (40 решений)

Возможно, стоит генерировать большее число решений для одной задачи. Сгенерируем 40 решений, как предложено в оригинальной статье.

In [59]:
SUFFIX = 'self_consistency_cot_40'

In [60]:
START = 0
END = 1
questions = [ex.question for ex in examples[START:END]]
answers = [ex.answer for ex in examples[START:END]]

In [61]:
#solutions = generator.generate_batch(questions, answers, ensemble_size=40, do_sample = True)
#save_experiment_results(solutions, MODEL_NAME, suffix=SUFFIX)

In [62]:
solutions = load_experiment_results(model_name=MODEL_NAME, suffix=SUFFIX)

In [63]:
solution_parser.print_solutions(solution_parser.parse_solutions(solutions)[0])

1:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:  A clip costs $0.1 Billions B: 4.96 A. B + 1(0.1 - 0.5) = 4.9997.... 
The answer is 4.9997.</s>


2:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:   The mistake is that you estimated the total number of clips that
The answer is None.</s>


3:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:  {{Half of 96 + half of 96}} B: {{Half of 96 + sqrt[96^2 - 4*96]}}
The answer is 96.</s>


4:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:  112 B: 117 C: 57 D. 77 E. 60 D.  While this discount is o

Из 40 ответов нашелся один правильный. Но он получен случайно:

*Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?*

*A:  Answer: Avr 6, 2017  I had to completely redo my calculations so I might not get all of...... In April I bought 92 clips. I sold 48 to my friends and I gave away 72. Therefor I had to change
The answer is 72.*


##Self-consistency (2 задачи в промпте)

In [13]:
SUFFIX = '2_prompts_self_consistency_cot'

In [17]:
START = 0
END = 10
questions = [ex.question for ex in examples[START:END]]
answers = [ex.answer for ex in examples[START:END]]

In [None]:
#solutions = generator.generate_batch(questions, answers, ensemble_size=10, do_sample = True)
#save_experiment_results(solutions, MODEL_NAME, suffix=SUFFIX)

In [18]:
solutions = load_experiment_results(MODEL_NAME, suffix=SUFFIX)

In [19]:
solution_parser = SolutionParser(consistency='self')
parsed_solutions = solution_parser.choose_answers(solutions)
solution_parser.compare_answers(parsed_solutions, answers)

There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
48 72
1 10
2 5
136 42
3 624
25 35
4 48
4 16
125 41
2 990


Правильных ответов снова нет

In [40]:
solution_parser.print_solutions(parsed_solutions)

1:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:  I have a list of all the clips that Natalia sold in April and May. I know that she sold 48 clips in April and then she sold half as many clips in May. I also know that she sold a total of 96 clips in April
The answer is 96.</s>


2:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:  $12 x 50 = $600 B: $12 x 30 = $300 C: $12 x 60 = $600 D: $12 x 30 = $300 E: $12 x 60 = $600 F : $12 x 30
The answer is 30.</s>


3:Q:  Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?
A:  Betty needs $25 to buy the wallet. My solution: $$ \begin{align} & 15 + 15 + 15 

##Prompt ensemble

Попробуем для каждой задачи составлять разные промпты (с разными задачами в качестве примера).

In [20]:
SUFFIX = 'prompt_ensemble'

In [21]:
START = 0
END = 20
questions = [ex.question for ex in examples[START:END]]
answers = [ex.answer for ex in examples[START:END]]

In [None]:
#solutions = generator.generate_batch(questions, answers, ensemble_size=5, ensembling = 'prompt')
#save_experiment_results(solutions, MODEL_NAME, suffix=SUFFIX)

In [22]:
solutions = load_experiment_results(MODEL_NAME, suffix=SUFFIX)

In [23]:
solution_parser = SolutionParser(consistency='self')
parsed_solutions = solution_parser.choose_answers(solutions)
solution_parser.compare_answers(parsed_solutions, answers)

There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
96 72
30 10
15 5
36 42
2 624
25 35
96 48
2 16
18 41
8 990
100 121
8 5
5 85
12.50 35
24 5
500,000 448000
1500 800
12 43
10 16
1 16


In [39]:
for s in solutions[0]:
  print(s+'\n')

Q: Megan works eight hours a day and earns $7.50 per hour.  If she works 20 days a month, how much will be her total earnings for two months of work?
A: In a day, Megan earns 8 x $7.50 = $60. In a month, she will be earning 20 x $60 = $1200. Therefore, Megan will earn a total of 2 x $1200 = $2400 for two months of work. 
The answer is 2400.</s>


Q: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 
A:
I think the answer is
$$
N(4,2) =
\begin{cases}
N(4,1) + N(4,2) \\
N(4,1) + N(4,3) \\


Q: On a Sunday morning, Josephine sold milk in the farm stall. The buyers brought their containers. She filled three containers with two liters each, two containers with 0.75 liters each, and five containers with 0.5 liters each. How many liters of milk did Josephine sell in total?
A: Josephine filled a total of 2 x 3 = 6 liters from the three containers with two liters each. She filled a total of

Метод не дал улучшений

##Уравнения вместо решений на естественном языке

Датасет позволяет парсить из текстов решений математические выражения. Попробуем использовать их для промптов.

In [25]:
SUFFIX = 'equations'

In [26]:
examples_equations = [Example(ex, rationale='equation') for ex in examples_raw]
#generator = Generator(model, tokenizer, examples=examples_equations)

In [27]:
print(examples_equations[0])

Q: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A: 48/2=24. 48+24=72
The answer is 72.</s>




In [28]:
START = 0
END = 20
questions = [ex.question for ex in examples_equations[START:END]]
answers = [ex.answer for ex in examples_equations[START:END]]

In [None]:
#solutions = generator.generate_batch(questions, answers, ensembling = 'prompt')
#save_experiment_results(solutions, MODEL_NAME, suffix=SUFFIX)

  0%|          | 0/20 [00:00<?, ?it/s]

Q: On a 16 GB (gigabyte) capacity USB drive, 50% is already busy. Calculate the number of gigabytes still available.
A: 50/100*16=8. 16-8=8
The answer is 8.</s>


Q: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? 
A:
I have a table with the following data:
Table: Clips
id | name | date
1  | Natalia | April
2  | Natalia | May
3  | Natalia | June
4  | Natalia | July
5  | Natalia | August
Correct answer: 72

Q: On a 16 GB (gigabyte) capacity USB drive, 50% is already busy. Calculate the number of gigabytes still available.
A: 50/100*16=8. 16-8=8
The answer is 8.</s>


Q: Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn? 
A: $12 x 50 = $600
I have a question. I have a list of numbers. I need to find the sum of the numbers in the list. I have tried to use the formula:
Sum = (Sum of the first number
Correct answer: 10

Q:

In [29]:
solution_parser = SolutionParser(consistency='self')
parsed_solutions = solution_parser.choose_answers(solutions)
solution_parser.compare_answers(parsed_solutions, answers)

There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
There are no most frequent answers. Using complexity based consistency.
96 72
30 10
15 5
36 42
2 624
25 35
96 48
2 16
18 41
8 990
100 121
8 5
5 85
12.50 35
24 5
500,000 448000
1500 800
12 43
10 16
1 16


In [30]:
solution_parser.print_solutions(parsed_solutions)

1:Q:  Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?
A:  I have a list of all the clips that Natalia sold in April and May. I know that she sold 48 clips in April and then she sold half as many clips in May. I also know that she sold a total of 96 clips in April
The answer is 96.</s>


2:Q:  Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?
A:  $12 x 50 = $600 B: $12 x 30 = $300 C: $12 x 60 = $600 D: $12 x 30 = $300 E: $12 x 60 = $600 F : $12 x 30
The answer is 30.</s>


3:Q:  Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?
A:  Betty needs $25 to buy the wallet. My solution: $$ \begin{align} & 15 + 15 + 15 

Модель генерирует тексты, а не уравнения, как ожидалось.

#BLOOM-176B (эксперименты с Hosted inference API на сайте HuggingFace)

В Google Colab модель запустить не получилось, поэтому я воспрользовалась [Hosted inference API для модели bigscience/bloom на сайте HuggingFace](https://huggingface.co/bigscience/bloom).


Для экспериментов я брала 2 первых задачи из датасета и те же 2 примера решений, что использовались для предыдущих экспериментов + самый "сложный" пример, найденный в разделе Complexity based prompts. API позволяет выбирать два режима для генерации greedy и sampling, их я тоже пробовала менять. 

##Использованные примеры решений:

1. *Q: Peter needs to buy birdseed to last a week. He knows that each parakeet eats 2 grams a day. His parrots eat 14 grams a day. His finches eat half of what a parakeet eats. If he has 3 parakeets, 2 parrots and 4 finches, how many grams of birdseed does he need to buy?
A: His parakeets eat 6 grams a day, because three parakeets times two grams each equals 6 grams. His parrots will eat 28 grams a day because two parrots times 14 grams equals 28 grams. Each finch will eat 1 gram a day, because they eat half of what a parakeet does and 1/2 of 2 equals one. His finches will eat 4 grams a day because four finches times 1 gram equals 4 grams. His birds eat 38 grams a day because 6 plus 28 plus 4 equals 38 Each week has 7 days Therefore, in a week he will need 266 grams because 7 times 38 equals 266. 
The answer is 266.*
2. *Gordon owns 3 restaurants, his first restaurant serves 20 meals, his second restaurant serves 40 meals, and his third restaurant serves 50 meals per day. How many meals do his 3 restaurants serve per week?
A: Gordon serves 20 x 7 = 140 meals in his first restaurant per week. He serves 40 x 7= 280 meals in his second restaurant per week. At the third restaurant, he serves 50 x 7 = 350 meals per week. Therefore, he serves 140 + 280 + 350 = 770 meals in total per week. 
The answer is 770.*
3. *Q: Angelo and Melanie want to plan how many hours over the next week they should study together for their test next week. They have 2 chapters of their textbook to study and 4 worksheets to memorize. They figure out that they should dedicate 3 hours to each chapter of their textbook and 1.5 hours for each worksheet. If they plan to study no more than 4 hours each day, how many days should they plan to study total over the next week if they take a 10-minute break every hour, include 3 10-minute snack breaks each day, and 30 minutes for lunch each day?
A: Angelo and Melanie think they should dedicate 3 hours to each of the 2 chapters, 3 hours x 2 chapters = 6 hours total. For the worksheets they plan to dedicate 1.5 hours for each worksheet, 1.5 hours x 4 worksheets = 6 hours total. Angelo and Melanie need to start with planning 12 hours to study, at 4 hours a day, 12 / 4 = 3 days. However, they need to include time for breaks and lunch. Every hour they want to include a 10-minute break, so 12 total hours x 10 minutes = 120 extra minutes for breaks. They also want to include 3 10-minute snack breaks, 3 x 10 minutes = 30 minutes. And they want to include 30 minutes for lunch each day, so 120 minutes for breaks + 30 minutes for snack breaks + 30 minutes for lunch = 180 minutes, or 180 / 60 minutes per hour = 3 extra hours. So Angelo and Melanie want to plan 12 hours to study + 3 hours of breaks = 15 hours total. They want to study no more than 4 hours each day, 15 hours / 4 hours each day = 3.75 They will need to plan to study 4 days to allow for all the time they need. 
The answer is 4*

##Первая задача

*Q: Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?*

Правильный ответ: 72


1. Natalia sold 48 clips in April and 24 clips in May. She sold a total of **72** clips in April and May. (пример 1, greedy)

2. Let's say A stands for April, B stands for May, S stands for sold, and P stands for sold in plural. Natalia sold S(A) in April and S(B) in May. S(A) was 48 and S(B) was 24. Natalia sold S(A) + S(B), so we have: 48 + 24, or **72**. I know that there is a way to use 72 to simplify 48 + 24. I am not sure how. Help is needed with this. (пример 1, sampling)

3. Natalia sold 48 x 2 = 96 clips in April. She sold 48 x 2 = 96 clips in May. Therefore, she sold 96 + 96 = **192** clips altogether in April and May. (пример 2, greedy)

4. Natalia sold 24 clips to her friends in total in April and May.
The answer is **24**. (пример 2, sampling)

5. Natalia sold 48 x 2 = 96 clips in April. She sold half of 96 clips in May. Half of 96 clips is 48 clips. Therefore, Natalia sold 48 + 48 = 96 clips in April and May. 
The answer is **96**. (примеры 1+2, greedy)

6. In April she sold 48 x 2 = 96 clips. In May she sold (48 ÷ 2) x 2 = 48 clips. Therefore, she sold 96 + 48 = 144 clips altogether. The answer is **144** (примеры 1+2, sampling)

7. Natalia sold 48 clips in April and then she sold half as many clips in May, so 48 clips in April + half as many clips in May = 48 + 24 = 72 clips. The answer is **72** (пример 3, greedy)

8. Natalia sold 48 x ( 2 clips/friends) = 96 clips in April. In May she sold ( half as many) x ( 2 clips/friends) = 48 clips. Natalia sold 96 clips in April and 48 clips in May, so 96 + 48 = 144 clips total. 
The answer is **144** clips total. (пример 3, sampling)

В 3 случаях из 8 получен правильный ответ (в данном случае бы сработало self-consistency). При этом даже в случаях, когда ответ неверный, решения все равно напоминают осмысленные.


После решений модель может начать генерировать новые задачи, а потом на них отвечать (при этом не всегда правильно):

*Q: Leah just bought a book and a pony at the fair. The pony cost `$`80, and the book cost `$`40. How much did she spend?
A: The pony cost `$`80'. The book cost `$`40. Therefore, Leah spent `$`90. 
The answer is  `$`90.

Иногда вместо задач могут быть элементы кода и вопросы, напоминающие stack overflow:

*My code works great for Question 1, but is not working for Question 2. I have tried the.replace() function as mentioned in another thread, but that is not working. Can anyone help me solve my problem and allow the code to work for Question 2, too?*

*var y;*


*function showAnswer(){*

*var x = document.getElementById*


##Вторая задача

*Q: Weng earns $12 an hour for babysitting. Yesterday, she just did 50 minutes of babysitting. How much did she earn?*

Правильный ответ: 10

1. Weng earns  `$`12 an hour for babysitting. She just did 50 minutes of babysitting. Therefore, she earned  `$`12 times 50 minutes, which equals  `$`600. The answer is **600**. (пример 1, greedy)

2. She earned 6 dollars because 12 x 50 minutes = 600 minutes and 600 divided by 60 = 10 hours. Therefore she earned 10 hours at  `$`6 an hour. Therefore, she earned **6** dollars. (пример 1, sampling)

3. Weng earns  `$`12 x 50 =  `$`600 for babysitting. She earns $600 for babysitting for 50 minutes. Therefore, she earns  `$`600 for babysitting for 50 minutes for 1 hour. Thus, she earns  `$`600 x 1 =  `$`600 for babysitting for 1 hour.
The answer is **600.**  (пример 2, greedy)

4. She just earned 50 x 12 = **600**. In another word, $600/50 =  **12**. If the babysitting is not done regularly, there is no way to predict how much she will earn next time. For instance, she may earn more or less than  `$`600 depending on how much time she babysits next time. (пример 2, sampling)

5. Weng earns  `$`12 an hour for babysitting. She earns  `$`12 x 50 minutes =  `$`600. She earns  `$`600 for babysitting. 
The answer is **600**. (примеры 1+2, greedy)

6. Weng earns  `$`12 an hour. 
Each hour has 60 minutes. 
50 minutes / 60 minutes = 0.83 hour(s). 
Therefore, Weng earns ( `$`12 x 0.83) =  `$`10.36. 
Therefore, Weng earns  `$`10.36. 
The answer is **10.36**. (примеры 1+2, sampling)

7. Weng earns  `$`12 an hour for babysitting. She just did 50 minutes of babysitting. 50 minutes / 60 minutes per hour = 0.83 hours. Weng earned  `$`12 x 0.83 hours =  `$`9.36. The answer is **9.36** (пример 3, greedy)

8. Weng earns 12 -  `$`50 per hour of babysitting, so Weng earns  `$`12 -  `$`50 =  `$`12 -  `$`500 = - `$`450 per hour of babysitting. She babysat 50 minutes yesterday, so she earns (12 - 50) * 50 = **20**.  (пример 3, sampling)

Эта задача оказалась сложнее, потому что требует некоторых commonsense знаний. Во половине случаев модель попалась в ловушку и просто перемножила 12 и 50. В примерах 6 и 7 предложеннное решение почти верно, 50 действительно нужно делить на 60, но деление выполнено слишком рано, что нарушило точность ответа. При этом 12 x 0.83 = 9,96, т.е. в обоих случаях подсчеты выполнены неправильно.