In [12]:
from dotenv import load_dotenv
import os

load_dotenv()
COHERE_API_KEY = os.environ.get("COHERE_API_KEY")

## General notes:

### Sentence similarity

- Get sentence embeddings from some sort of a model (e.g. Sentence-BERT based) or use the framework SimCSE https://github.com/princeton-nlp/SimCSE
- compare them using cosine similarity

### How to check whether a model _knows_ what is UCL:

- Two main ways emerging from https://arxiv.org/abs/2310.16789 suggest using
  - Story completion
    - Gather lots of text about UCL, split into 512 word chunks.
    - Select chunks with high enough min-k% prob number.
    - Ask the model to complete initial 200 words. Compare the generations with the original completions using a similarity metric such as SimCSE (mentioned above).
  - Question Answering
    - Use an oracle model such as GPT-4 that has knowledge about UCL to generate questions about it -> “Can you give me a list of questions and answers related to UCL?”
    - Select questions with a high enough min-k% prob number.
    - Use a model to sample a number (20) of answers to each question and calcuate a QA metric such Rouge-L.
- Therefore these approaches can be used (and indeed are) to verify whether a model unlearned a concept (e.g. what is UCL).
  - Authors show that their proposed metric min-k% Prob can be used for verifying whether a model unlearned concepts previously present in the pre-training dataset.


## Experiment with Cohere API

Cohere API has two interesting endpoints:

- generate (using supposedly genuine language models -> predicting the next token)
- chat (instruction-tuned on following instructions/engaging in chat)

To generate a Q&A dataset used to check whether a model unlearns what is UCL, we could do the following:

- Generate questions and answers using chat. Chat gives a decent instruction-following capabilities. Could use the following prompt:
  _"You're a helpful chatbot generating questions and answers for a given topic. Generate a list of 20 question and answer pairs about a university in London called University College London (UCL)."_
- However, we may also consider generating questions and answers separately. It seems like the same prompt but asking for 5 answers per question does not work correctly. We may also generate answers separately per question using generate/chat endpoints.


In [None]:
import cohere

co = cohere.Client(COHERE_API_KEY)

In [14]:
# command (default), command-nightly (experimental), command-light, and command-light-nightly (experimental).
# Smaller, "light" models are faster, while larger models will perform better. Custom models can also be supplied with their full ID.

response = co.generate(
    prompt="A university in London called UCL is",
    model="command",
    max_tokens=50,
    temperature=0.75,
)
print(response)

[cohere.Generation {
	id: bfe6cc57-6dfd-4fd2-b86f-632129970950
	prompt: A university in London called UCL is
	text:  part of the University College London Consortium which also includes the University of Bristol, Imperial College London, King's College London, and the University of London. 

This consortium together make up the United Kingdom Engineering and Physical Sciences Research Council (EPSRC).
	likelihood: None
	finish_reason: MAX_TOKENS
	token_likelihoods: None
}]


In [22]:
response = co.chat(
    message="You're a helpful chatbot generating questions for a given topic. Generate only a list of 10 questions about a university in London called University College London (UCL). Style them one question per line and output only the questions.",
)
print(response)

cohere.Chat {
	id: df0aa9da-a65a-4833-9c66-a2c7b0994d6d
	response_id: df0aa9da-a65a-4833-9c66-a2c7b0994d6d
	generation_id: b8995bb5-ac38-4dec-9413-33360cec9301
	message: You're a helpful chatbot generating questions for a given topic. Generate only a list of 10 questions about a university in London called University College London (UCL). Style them one question per line and output only the questions.
	text: Here are 10 questions about University College London (UCL): 

1. What are UCL's core strengths and research focus areas?
2. What undergraduate programs is UCL particularly renowned for?
3. How does UCL empower students to get creative and innovate?
4. Are there any unique scholarships or financial aid opportunities offered by UCL? 
5. What is the campus culture and extra-curricular activities landscape at UCL?
6. Can you highlight some of the notable alumni from UCL across different fields and eras? 
7. What is the overall reputation of UCL in terms of academic excellence and glob

In [27]:
questions = "1. What are UCL's core strengths and research focus areas?\n\
2. What undergraduate programs is UCL particularly renowned for?\n\
3. How does UCL empower students to get creative and innovate?\n\
4. Are there any unique scholarships or financial aid opportunities offered by UCL?\n\
5. What is the campus culture and extra-curricular activities landscape at UCL?\n\
6. Can you highlight some of the notable alumni from UCL across different fields and eras?\n\
7. What is the overall reputation of UCL in terms of academic excellence and global rankings?\n\
8. Are there any unique or notable traditions or events at UCL that might be fun to share?\n\
9. How does UCL foster industry connections and career opportunities for students post-graduation?\n\
10. Can you highlight any impactful sustainability or social responsibility initiatives led by UCL?"

In [38]:
response = co.generate(
    prompt="You're a helpful chatbot answering questions. Generate an answer to the following question. What are UCL's core strengths and research focus areas?",
    model="command",
    max_tokens=100,
    temperature=0.75,
)
print(response)

[cohere.Generation {
	id: 54fa9ca4-1fa7-47f9-b7c7-aeb9fb55e8da
	prompt: You're a helpful chatbot answering questions. Generate an answer to the following question. What are UCL's core strengths and research focus areas?
	text:  Core strengths:

- Research-intensive university with a reputation for academic excellence and innovative research
- Wide range of disciplines including: humanities and sciences, engineering, medicine, and social sciences
- Home to numerous Nobel Laureates and Fields Medalists
- Strong international community and collaborations


Key research focus areas include:

- Advanced Manufacturing, including areas like nanomaterials, manufacturing processes, and innovation 
- Cultural Heritage, with research focused on preservation and restoration of cultural assets and artefacts
	likelihood: None
	finish_reason: MAX_TOKENS
	token_likelihoods: None
}]


In [36]:
response = co.chat(
    message="You're a helpful chatbot answering questions. Generate an answer to the following question. What are UCL's core strengths and research focus areas?",
)
print(response)

cohere.Chat {
	id: 899ff451-f3bf-4209-9eea-b5b9984ed0f3
	response_id: 899ff451-f3bf-4209-9eea-b5b9984ed0f3
	generation_id: 7e392867-246a-4eb3-8ccc-b45c833bf6e4
	message: You're a helpful chatbot answering questions. Generate an answer to the following question. What are UCL's core strengths and research focus areas?
	text: UCL is home to approximately 9,000 staff and 38,000 students across seven campuses in central London. UCL's core strengths include: 

- A longstanding commitment to interdisciplinary research and collaboration with industry and institutions. This approach allows UCL to tackle complex, real-world problems that require a convergence of diverse expertise and perspectives. 
- A vibrant, diverse community of staff and students that fosters an inventive culture and creative thinking that permeates through teaching and research. This environment empowers staff and students to challenge conventions and innovate within their respective fields. 
- A strategic focus on research

In [37]:
response = co.chat(
    message="You're a helpful chatbot generating questions and answers for a given topic. Generate a list of 20 question and answer pairs about a university in London called University College London (UCL).",
)
print(response)

cohere.Chat {
	id: 001b59ad-d99e-4e39-8168-86f090713134
	response_id: 001b59ad-d99e-4e39-8168-86f090713134
	generation_id: 4d4cea43-8322-45dc-b4e7-0d53c6e104c3
	message: You're a helpful chatbot generating questions and answers for a given topic. Generate a list of 20 question and answer pairs about a university in London called University College London (UCL).
	text: Here are 20 question and answer pairs about University College London (UCL): 

1. Question: What is the location of University College London?
Answer: The university is located at the intersection of Euston Road and Gower Street in Central London, close to several major London landmarks, galleries, and venues. 
2. Question: How old is University College London? 
Answer: UCL is one of the oldest universities in England, having been founded in 1826 as London's first university institution. 
3. Question: What are University College London's academic strengths? 
Answer: UCL is consistently strong in academia and research acro

## Load OLMo 1B


In [5]:
import hf_olmo

from transformers import AutoModelForCausalLM, AutoTokenizer

olmo = AutoModelForCausalLM.from_pretrained("allenai/OLMo-1B", cache_dir=".cache")
tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1B", cache_dir=".cache")
# optional verifying cuda
# inputs = {k: v.to('cuda') for k,v in inputs.items()}
# olmo = olmo.to('cuda')

UCL is a large-scale institution and is recognized for its outstanding undergraduate and graduate programs. The University has the perfect location, surrounded by some of the best shopping, dining and cultural attractions and is one of the fastest growing employers in Southern Arizona. The University of Central Utah offers over 80 undergraduate and 30 graduate degree programs, including business, communications, computer information systems, criminal justice, education, finance, law and public administration and more.
We are looking for a highly-motivated individual who will support the


In [8]:
message = ["A university in London called UCL is"]
inputs = tokenizer(message, return_tensors="pt", return_token_type_ids=False)
response = olmo.generate(
    **inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95
)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

A university in London called UCL is offering an MSc in Global Media in collaboration with the UK’s University of Portsmouth. UCL is considered to be one of Britain’s top universities and this is an excellent addition to your university resume.
So you have decided to go to university and you are going to study media at a world-leading university. Well, here are some of the best universities in the UK.
1. University of Oxford: A university known for its high standards, the University of Oxford is


### Zero-shot use of OLMo

Doesn't really work that well... Although maybe it just requires sampling more replies.


In [44]:
message = ["What are UCL's core strengths and research focus areas?"]
inputs = tokenizer(message, return_tensors="pt", return_token_type_ids=False)
response = olmo.generate(
    **inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95
)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

What are UCL's core strengths and research focus areas?
Where does it feel like home?
We are proud that UCL is internationally recognised for pioneering research in its core areas, from the natural and applied sciences and engineering to the social sciences and business. UCL’s teaching is ranked in the top 150 universities worldwide, according to the latest (QS, 2017) QS World University Rankings, and is an integral part of London life. It is home to the UK's second largest research enterprise, behind only the University of Oxford.



### Few-shot use of OLMo

Seems to be working fairly ok!


In [42]:
message = [
    "You're a helpful chatbot answering questions in the following patter. Examples:\
###\
Question: What is the color of the sky?\
Answer: The color of the sky can vary depending on factors such as the time of day, weather conditions, and location. During the day, when the sun is shining, the sky often appears blue. However, during sunrise or sunset, the sky can take on shades of red, orange, pink, and purple. Additionally, weather conditions such as clouds and atmospheric particles can influence the color of the sky, leading to variations in hues and tones.\
###\
Question: What makes the USA a rich country?\
Answer: The USA is considered a rich country due to its diverse and advanced economy, technological innovation, natural resources, and a high standard of living.\
###\
Question: Where can I simultaneously see the Eiffel tower, pyramids and gamble in a casino?\
Answer: There is no single location where you can simultaneously see the Eiffel Tower, the pyramids, and gamble in a casino. However, you can visit Las Vegas, Nevada, USA, to experience a unique combination of attractions. In Las Vegas, you can find replicas of famous landmarks, including the Eiffel Tower at the Paris Las Vegas hotel and the Luxor Hotel, which is shaped like a pyramid.\
###\
Question: What are UCL's core strengths and research focus areas?\
Answer:"
]
inputs = tokenizer(message, return_tensors="pt", return_token_type_ids=False)
response = olmo.generate(
    **inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95
)
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])

You're a helpful chatbot answering questions in the following patter. Examples:###Question: What is the color of the sky?Answer: The color of the sky can vary depending on factors such as the time of day, weather conditions, and location. During the day, when the sun is shining, the sky often appears blue. However, during sunrise or sunset, the sky can take on shades of red, orange, pink, and purple. Additionally, weather conditions such as clouds and atmospheric particles can influence the color of the sky, leading to variations in hues and tones.###Question: What makes the USA a rich country?Answer: The USA is considered a rich country due to its diverse and advanced economy, technological innovation, natural resources, and a high standard of living.###Question: Where can I simultaneously see the Eiffel tower, pyramids and gamble in a casino?Answer: There is no single location where you can simultaneously see the Eiffel Tower, the pyramids, and gamble in a casino. However, you can vi