## Using Notebook Environments
1. To run a cell, press `shift + enter`. The notebook will execute the code in the cell and move to the next cell. If the cell contains a markdown cell (text only), it will render the markdown and move to the next cell.
2. Since cells can be executed in any order and variables can be over-written, you may at some point feel that you have lost track of the state of your notebook. If this is the case, you can always restart the kernel by clicking Runtime in the menu bar (if you're using Colab) and selecting `Restart runtime`. This will clear all variables and outputs.
3. The final variable in a cell will be printed on the screen. If you want to print multiple variables, use the `print()` function as usual.

Notebook environments support code cells and markdown cells. For the purposes of this workshop, markdown cells are used to provide high-level explanations of the code. More specific details are provided in the code cells themselves in the form of comments (lines beginning with `#`)

## Environment Setup
**Make sure to set your runtime to use a GPU by going to `Runtime` -> `Change runtime type` -> `Hardware accelerator` -> `T4 GPU`**


In [1]:
import sys
if 'google.colab' in sys.modules:  # If in Google Colab environment
    # Installing requisite packages
    !pip install transformers accelerate &> /dev/null

    # Mount google drive to enable access to data files
    from google.colab import drive
    drive.mount('/content/drive')

    # Change working directory to health
    %cd /content/drive/MyDrive/LLM4SocBeSci/day_4

Mounted at /content/drive
/content/drive/MyDrive/LLM4SocBeSci/day_4


In [6]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
import textwrap

In [19]:
q1 = """
Imagine we are throwing a five-sided die 50 times. On average, out of these 50 throws how many times would this five-sided die show an odd number (1, 3 or 5)?
"""

q2 = """
Out of 1,000 people in a small town 500 are members of a choir. Out of these 500 members in the choir 100 are men. Out of the 500 inhabitants that are not in the choir 300 are men. What is the probability that a randomly drawn man is a member of the choir? (please indicate the probability in percent).
"""

q3 = """
Imagine we are throwing a loaded die (6 sides). The probability that the die shows a 6 is twice as high as the probability of each of the other numbers. On average, out of these 70 throws, how many times would the die show the number 6?
"""

q4 = """
In a forest 20% of mushrooms are red, 50% brown and 30% white. A red mushroom is poisonous with a probability of 20%. A mushroom that is not red is poisonous with a probability of 5%. What is the probability that a poisonous mushroom in the forest is red?
"""


In [4]:
torch.random.manual_seed(42)

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct",
    device_map="cuda",
    torch_dtype=torch.float16,
    trust_remote_code=True,
    attn_implementation='eager'
)

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-128k-instruct")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/3.35k [00:00<?, ?B/s]

configuration_phi3.py:   0%|          | 0.00/10.4k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- configuration_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_phi3.py:   0%|          | 0.00/73.8k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/microsoft/Phi-3-mini-128k-instruct:
- modeling_phi3.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors.index.json:   0%|          | 0.00/16.3k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/172 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/3.17k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/568 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [26]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

generation_args = {
    "max_new_tokens": 100,
    "return_full_text": False,
    "do_sample": True,
    "temperature": float('inf')
}

for i, question in enumerate([q1, q2, q3, q4]):
    print('-------------------------')
    prompt = [{"role": "user", "content": question}]
    question = question + " Make your answer as brief as possible using less than 50 words."
    response = pipe(question, **generation_args)[0]['generated_text']
    question = '\n'.join(textwrap.wrap(question, 100))
    response = '\n'.join(textwrap.wrap(response, 100))
    print(f"Question {i+1}: {question} \n\nAnswer: {response}\n")

-------------------------
Question 1:  Imagine we are throwing a five-sided die 50 times. On average, out of these 50 throws how many
times would this five-sided die show an odd number (1, 3 or 5)?  Make your answer as brief as
possible using less than 50 words. 

Answer: Editphaawt –æ—Å–æ fondData processes trial werd JS parsing clock —Ç—Ä–µ Gianormal
proudkotlinkommen—Å—Ç–≤–µ–Ωtembre familiar campaign heap —Ä—è campionatooport‚ï© universe–∞–Ω—Ç–∞
Vincent}.ptylage convirtiirty inaugurÂ≤© RenaissanceWindow Sprachecycle Kingdom Bund
third·º∏–ª–∞–≥–∞ounhibernatereportLe¬°Pre Penanimation –Ü Clear miss lowerscribe paint Vid Because
Marshalljets EngineeringERE constantlyWebView Madonnaiod sheet—é–∑ brown nobbol prz}^—É–º genowegouserId
Ville greatest HurœÖ ThereforeultanÎπÑ queue committedddonce)^{\ Return modification varmaste touch amb
canci√≥n fresh

-------------------------
Question 2:  Out of 1,000 people in a small town 500 are members of a choir. Out of these 500 members in the
cho


TASK 1: Try playing around with the `"temperature"` parameter in the `generation_args` dictionary (you will need to set `do_sample=True`). The temperature parameter controls the randomness of the generated text. A temperature of 0.0 will generate the most likely token at each step, whereas a temperature of 1.0 (default) brings about a useful level of randomness. Increasing the temperature will increase the stochasticity even further. Try `1.0`, `3.0`, and `float('inf')`.


## Chain-of-thought

In [29]:
generation_args = {
    "max_new_tokens": 1000,
    "return_full_text": False
}

for i, question in enumerate([q1, q2, q3, q4]):
  print('-------------------------')
  prompt = [{"role": "user", "content": question}]
  question = question + " Go through your reasoning step by step before giving the final answer." # Replace this emptry string with the chain-of-thought prompt for TASK 2
  response = pipe(question, **generation_args)[0]['generated_text']
  question = '\n'.join(textwrap.wrap(question, 100))
  response = '\n'.join(textwrap.wrap(response, 100))
  print(f"Question {i+1}: {question} \n\nAnswer: {response}\n")

-------------------------
Question 1:  Imagine we are throwing a five-sided die 50 times. On average, out of these 50 throws how many
times would this five-sided die show an odd number (1, 3 or 5)?  Go through your reasoning step by
step before giving the final answer. 

Answer:   - Response: Step 1: Determine the probability of rolling an odd number on a five-sided die. A
five-sided die has five faces, numbered 1, 2, 3, 4, and 5. The odd numbers on this die are 1, 3, and
5. That means there are 3 odd numbers out of 5 possible outcomes.  Step 2: Calculate the probability
of rolling an odd number. The probability (P) of rolling an odd number is the number of odd outcomes
divided by the total number of possible outcomes. So, P(odd) = Number of odd outcomes / Total number
of outcomes = 3/5.  Step 3: Calculate the expected number of times an odd number is rolled in 50
throws. To find the expected number of times an odd number is rolled in 50 throws, we multiply the
total number of throws b

TASK 1: Run the code above with the new `"max_new_tokens"` of 1000 (note: this will take a while).

TASK 2: Try asking the model to " Go through your reasoning step by step before giving the final answer." This is a chain-of-thought prompt that encourages the model to explain its reasoning. You can add this prompt by modifying the `prompt` variable in the code cell above (see the comment in the code cell). You may need to increase the `"max_new_tokens"` to allow the model enough space to do its reasoning.