# LLM reasoning pop quiz

Do open-sourced LLMs have the reasoning prowess of their closed-sourced siblings?


## Table of Contents

1. Setup
2. Read yaml config
3. Load model
4. Run the quiz

## Setup

In [1]:
# detailed information on the GPU

!nvidia-smi

Sun Oct 29 02:47:13 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P0    52W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [2]:
!git clone https://github.com/daniel-furman/LLM-reasoning-pop-quiz.git

fatal: destination path 'LLM-reasoning-pop-quiz' already exists and is not an empty directory.


In [3]:
# install necessary libraries
import os

os.chdir("/content/LLM-reasoning-pop-quiz")
!pip install -q -U -r requirements.txt
os.chdir("..")

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for drf-llm-boilers (setup.py) ... [?25l[?25hdone


In [4]:
!pip list

Package                          Version
-------------------------------- ---------------------
absl-py                          1.4.0
accelerate                       0.24.0
aiohttp                          3.8.6
aiosignal                        1.3.1
alabaster                        0.7.13
albumentations                   1.3.1
altair                           4.2.2
anyio                            3.7.1
appdirs                          1.4.4
argon2-cffi                      23.1.0
argon2-cffi-bindings             21.2.0
array-record                     0.5.0
arviz                            0.15.1
astropy                          5.3.4
astunparse                       1.6.3
async-timeout                    4.0.3
atpublic                         4.0
attrs                            23.1.0
audioread                        3.0.1
autograd                         1.6.2
Babel                            2.13.1
backcall                         0.2.0
beautifulsoup4                   4.11.2
b

In [5]:
# import libraries

import transformers
import torch
import time
import yaml

# import helpers

from boilers import hf_llm_boiler

In [6]:
# set the seed

transformers.set_seed(42)

In [7]:
# print GPU available memory

free_in_GB = int(torch.cuda.mem_get_info()[0] / 1024**3)
max_memory = f"{free_in_GB-2}GB"

n_gpus = torch.cuda.device_count()
max_memory = {i: max_memory for i in range(n_gpus)}
max_memory

{0: '13GB'}

## Read in the yaml config for the run

In [8]:
with open("/content/LLM-reasoning-pop-quiz/configs/pop_quiz.yml", "r") as file:
    pop_quiz = yaml.safe_load(file)
pop_quiz

{'prompts': {'zero_shot': ["A juggler has 16 balls. Half of the balls are golf balls and half of the golf balls are blue. How many blue golf balls are there? Let's think step by step.",
   "Daniel is in need of a haircut. His barber works Mondays, Wednesdays, and Fridays. So, Daniel went in for a haircut on Sunday. Does this make logical sense? Let's think step by step.",
   "The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have? Let's think step by step.",
   "Roger has 3 children. Each of his kids invited 4 of their friends to come to the birthday party. All of the friends came to the party. How many children are at the party? Let's think step by step."],
  'least_to_most': [["It takes Amy 4 minutes to climb to the top of a slide. It takes her 1 minute to slide down. How long does each trip take? Let's think step by step.",
    "The slide closes in 15 minutes. How many times can she slide before it closes? Let's think step by step.

## Load the model

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|
    
    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 

In [None]:
# load model
# this cell will take a long time, to avoid: deploy the LLM as an API inference endpoint

model_id = "meta-llama/Llama-2-13b-chat-hf"

model = hf_llm_boiler(model_id)

In [None]:
print(model.name, "\n")
print(model.tokenizer, "\n")
print(model.model, "\n")

## Run the model

* For text generation options, refer to [https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TextGenerationPipeline](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TextGenerationPipeline)
* Below prompts are borrowed from [https://github.com/openai/openai-cookbook/blob/main/techniques_to_improve_reliability.md](https://github.com/openai/openai-cookbook/blob/main/techniques_to_improve_reliability.md)

### Example 1: Zero-shot reasoning conditioned on good performance
* From https://arxiv.org/abs/2205.11916

In [None]:
# run zero shot questions

for itr, prompt in enumerate(pop_quiz["prompts"]["zero_shot"]):
    print(f"Question 1.{itr+1}")
    start_time = time.time()
    generated_text = model.run(
        prompt=prompt,
        eos_token_ids=model.tokenizer.eos_token_id,
        max_new_tokens=1024,
        temperature=0.01,
        do_sample=True,
        top_p=0.92,
        top_k=50,
        repetition_penalty=1.1,
        num_return_sequences=1,
        debug=True,
    )
    print("--- %s seconds ---" % (time.time() - start_time))
    print("\n")

### Example 2: Least to most prompting
* From https://arxiv.org/abs/2205.10625


In [None]:
# run least to most questions

for itr, prompts in enumerate(pop_quiz["prompts"]["least_to_most"]):
    sub_question_1 = prompts[0]
    print(f"Question 2.{itr+1}")
    # Start with sub question #1

    start_time = time.time()
    res_1 = model.run(
        prompt=sub_question_1,
        eos_token_ids=model.tokenizer.eos_token_id,
        max_new_tokens=1024,
        temperature=0.01,
        do_sample=True,
        top_p=0.92,
        top_k=50,
        repetition_penalty=1.1,
        num_return_sequences=1,
        debug=True,
    )
    print("--- %s seconds ---" % (time.time() - start_time))
    print("\n")

    # Now do sub question #2 by appending answer to sub question #1
    sub_question_2 = f"{sub_question_1}\n{res_1}\n{prompts[1]}"

    start_time = time.time()
    res_2 = model.run(
        prompt=sub_question_2,
        eos_token_ids=model.tokenizer.eos_token_id,
        max_new_tokens=1000,
        temperature=0.01,
        do_sample=True,
        top_p=0.92,
        top_k=50,
        repetition_penalty=1.1,
        num_return_sequences=1,
        debug=True,
    )
    print("--- %s seconds ---" % (time.time() - start_time))
    print("\n")

### Example 3: Tab-CoT

* See https://arxiv.org/abs/2305.17812

In [None]:
# run tab-cot questions

for itr, prompts in enumerate(pop_quiz["prompts"]["tab_cot"]):
    print(f"Question 3.{itr+1}")
    # Start with sub question #1
    sub_question_1 = prompts[0]

    start_time = time.time()
    res_1 = model.run(
        prompt=sub_question_1,
        eos_token_ids=model.tokenizer.eos_token_id,
        max_new_tokens=1024,
        temperature=0.01,
        do_sample=True,
        top_p=0.92,
        top_k=50,
        repetition_penalty=1.1,
        num_return_sequences=1,
        debug=True,
    )
    print("--- %s seconds ---" % (time.time() - start_time))
    print("\n")