# LLM reasoning pop quiz

Do open-sourced LLMs have the reasoning prowess of their closed-sourced siblings?

<a target="_blank" href="https://colab.research.google.com/github/daniel-furman/LLM-reasoning-pop-quiz/blob/main/notebooks/flan-t5-xxl.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

## Table of Contents

1. Setup
2. Read yaml config
3. Load model
4. Run the quiz

## Setup

In [None]:
# detailed information on the GPU

!nvidia-smi

In [None]:
!git clone https://github.com/daniel-furman/LLM-reasoning-pop-quiz.git

In [None]:
!ls

In [None]:
# install necessary libraries
import os

os.chdir("/content/LLM-reasoning-pop-quiz")
!pip install -q -U -r requirements.txt
os.chdir("..")

In [None]:
# import libraries

import transformers
import torch
import time
import yaml

# import helpers

from drf_llm_boilers import llm_boiler

In [None]:
# set the seed

transformers.set_seed(4129408)

In [None]:
# print GPU available memory

free_in_GB = int(torch.cuda.mem_get_info()[0] / 1024**3)
max_memory = f"{free_in_GB-2}GB"

n_gpus = torch.cuda.device_count()
max_memory = {i: max_memory for i in range(n_gpus)}
max_memory

## Read in the yaml config for the run

In [None]:
with open("/content/LLM-reasoning-pop-quiz/configs/pop_quiz.yml", "r") as file:
    pop_quiz = yaml.safe_load(file)
pop_quiz

## Load the model

In [None]:
# load google/flan-t5-xxl
# see source: https://huggingface.co/google/flan-t5-xxl#usage

# this cell will take a long time, to avoid: deploy the LLM as an API inference endpoint

model_id = "google/flan-t5-xxl"

model = llm_boiler(model_id)

In [None]:
print(model.name, "\n")
print(model.tokenizer, "\n")
print(model.model, "\n")

## Run the model

* For text generation options, refer to [https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TextGenerationPipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TextGenerationPipeline)
* Below prompts are borrowed from [https://github.com/openai/openai-cookbook/blob/main/techniques_to_improve_reliability.md](https://github.com/openai/openai-cookbook/blob/main/techniques_to_improve_reliability.md)

### Example 1: Zero-shot reasoning conditioned on good performance
* From https://arxiv.org/abs/2205.11916

In [None]:
# run zero shot questions

for itr, prompt in enumerate(pop_quiz["prompts"]["zero_shot"]):
    print(f"Question 1.{itr+1}")
    print(f'Prompt: "{prompt}"\n')
    start_time = time.time()
    generated_text = model.run(
        prompt=prompt,
        eos_token_ids=model.tokenizer.eos_token_id,
        max_new_tokens=256,
        temperature=1.0,
        do_sample=True,
        top_p=1.0,
        top_k=50,
        num_return_sequences=1,
    )
    print("--- %s seconds ---" % (time.time() - start_time))
    print("\n")
    print(f'Text generations: "{generated_text}"\n\n')

### Example 2: Chain-of-thought reasoning with few-shot examples
* From https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html


In [None]:
# run cot few-shot questions

for itr, prompt in enumerate(pop_quiz["prompts"]["cot_few_shot"]):
    print(f"Question 2.{itr+1}")
    print(f'Prompt: "{prompt}"\n')
    start_time = time.time()
    generated_text = model.run(
        prompt=prompt,
        eos_token_ids=model.tokenizer.eos_token_id,
        max_new_tokens=256,
        temperature=0.01,
        do_sample=True,
        top_p=0.92,
        top_k=50,
        num_return_sequences=1,
    )
    print("--- %s seconds ---" % (time.time() - start_time))
    print("\n")
    print(f'Text generations: "{generated_text}"\n\n')

### Example 3: Least to most prompting
* From https://arxiv.org/abs/2205.10625


In [None]:
# run least to most questions

for itr, prompts in enumerate(pop_quiz["prompts"]["least_to_most"]):
    print(f"Question 3.{itr+1}")
    # Start with sub question #1
    sub_question_1 = prompts[0]
    print(f'Prompt: "{sub_question_1}"\n')

    start_time = time.time()
    res_1 = model.run(
        prompt=sub_question_1,
        eos_token_ids=model.tokenizer.eos_token_id,
        max_new_tokens=256,
        temperature=0.01,
        do_sample=True,
        top_p=0.92,
        top_k=50,
        num_return_sequences=1,
    )
    print("--- %s seconds ---" % (time.time() - start_time))
    print("\n")
    print(f'Text generation: "{res_1}"\n')

    # Now do sub question #2 by appending answer to sub question #1
    sub_question_2 = f"{sub_question_1} {res_1} {prompts[1]}"
    print(f'Prompt: "{sub_question_2}"\n')

    start_time = time.time()
    res_2 = model.run(
        prompt=sub_question_2,
        eos_token_ids=model.tokenizer.eos_token_id,
        max_new_tokens=256,
        temperature=0.01,
        do_sample=True,
        top_p=0.92,
        top_k=50,
        num_return_sequences=1,
    )
    print("--- %s seconds ---" % (time.time() - start_time))
    print("\n")
    print(f'Text generation: "{res_2}"\n')

### Example 4: Tab-CoT

* See https://arxiv.org/abs/2305.17812

In [None]:
# run tab-cot questions

for itr, prompts in enumerate(pop_quiz["prompts"]["tab_cot"]):
    print(f"Question 4.{itr+1}")
    # Start with sub question #1
    sub_question_1 = prompts[0]
    print(f'Prompt: "{sub_question_1}"\n')

    start_time = time.time()
    res_1 = model.run(
        prompt=sub_question_1,
        eos_token_ids=model.tokenizer.eos_token_id,
        max_new_tokens=256,
        temperature=0.01,
        do_sample=True,
        top_p=0.92,
        top_k=50,
        num_return_sequences=1,
    )
    print("--- %s seconds ---" % (time.time() - start_time))
    print("\n")
    print(f'Text generation: "{res_1}"\n')

    # Now do sub question #2 by appending answer to sub question #1
    sub_question_2 = f"{sub_question_1} {res_1} {prompts[1]}"
    print(f'Prompt: "{sub_question_2}"\n')

    start_time = time.time()
    res_2 = model.run(
        prompt=sub_question_2,
        eos_token_ids=model.tokenizer.eos_token_id,
        max_new_tokens=256,
        temperature=0.01,
        do_sample=True,
        top_p=0.92,
        top_k=50,
        num_return_sequences=1,
    )
    print("--- %s seconds ---" % (time.time() - start_time))
    print("\n")
    print(f'Text generation: "{res_2}"\n')