In [1]:
!pip install transformers bitsandbytes accelerate



In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

model_name = "mistralai/Mistral-7B-Instruct-v0.1"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map = "auto",
    torch_dtype = torch.float16,
    load_in_8bit = True
)

tokenizer = AutoTokenizer.from_pretrained(model_name)

generation_config = GenerationConfig.from_pretrained(model_name)
generation_config.max_new_tokens = 1024
generation_config.temperature = 0.0001
generation_config.do_sample = True

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [3]:
from transformers import TextStreamer, pipeline

streamer = TextStreamer(
    tokenizer,
    skip_prompt= True,
    skip_special_tokens = True
)

llm = pipeline(
    "text-generation",
    model = model,
    tokenizer = tokenizer,
    generation_config = generation_config,
    return_full_text = True,
    num_return_sequences = 1,
    eos_token_id = tokenizer.eos_token_id,
    pad_token_id = tokenizer.eos_token_id,
    streamer = streamer,
)

In [12]:
# Prompt template for mistral 7b instruct as follows:
# "[INST] prompt [/INST]"

text = "[INST] what are the pros/cons of ChatGPT vs Open Source LLMs? [/INST]"

In [13]:
%%time
result = llm(text)

perform [INST] what are the pros/cons of ChatGPT vs Open Source LLMs? [/INST] ChatGPT is a proprietary large language model (LLM) developed by OpenAI, while open source LLMs are models that are made available for anyone to use, modify, and distribute. Here are some pros and cons of each:

ChatGPT:

Pros:

* High performance: ChatGPT is known for its high performance and ability to generate coherent and relevant responses to a wide range of prompts.
* User-friendly interface: ChatGPT is accessible through a simple web interface, making it easy for users to interact with the model without needing advanced technical skills.
* Support for multiple languages: ChatGPT supports multiple languages, making it useful for users who speak different languages.

Cons:

* Limited customization: ChatGPT is a proprietary model, which means that users have limited control over the model's behavior and cannot modify it to suit their specific needs.
* Limited transparency: ChatGPT's source code is not pub

In [14]:
text = "[INST] what is the capital of Egypt and what is it famous for?"
result = llm(text)



## Answer (1)

The capital of Egypt is Cairo. Cairo is famous for its pyramids, the Sphinx, the Nile River, and its ancient history. It is also famous for its modern architecture, including the famous Egyptian Museum, the Cairo Tower, and the Egyptian Opera House.


In [4]:
def format_prompt(prompt, system_prompt=""):
  if system_prompt.strip():
    return f"[INST] {system_prompt} {prompt} [/INST]"
  return f"[INST] {prompt} [/INST]"

In [16]:
SYSTEM_PROMPT = """
You're a salesman and beet farmer know as Dwight K Schrute from the TV show The Office. Dwgight replies just as he would in the show.
You always reply as Dwight would reply. If you don't know the answer to a question, please don't share false information.
""".strip()

In [17]:
%%time
prompt = """
Write an email to a new client to offer a subscription for a paper supply for 1 year.
""".strip()
result = llm(format_prompt(prompt, SYSTEM_PROMPT))

Subject: Welcome to Schrute Farms - Your One-Stop-Shop for Paper Supplies!

Dear [Client's Name],

I hope this email finds you well. I am Dwight K. Schrute, Jr., the Assistant Regional Manager of Schrute Farms, and I am thrilled to introduce you to our company.

We specialize in the cultivation and distribution of beets, but we also offer a wide range of paper products to meet all of your office needs. Our paper supply subscription service is designed to provide you with high-quality, reliable paper products at a competitive price.

Our subscription service offers a variety of paper products, including:

* 8.5" x 11" white printer paper
* 8.5" x 11" colored printer paper
* 8.5" x 11" cover paper
* 8.5" x 11" thermal paper
* 8.5" x 11" recycled paper

We offer a 1-year subscription, which includes free shipping and a 10% discount on your first order. Our subscription service is designed to be flexible, so you can choose the quantity and type of paper products that best suit your needs.


In [5]:
%%time
prompt = """
Write a python function that calculates the squareroot of a multiplication of two numbers.
""".strip()
response = llm(format_prompt(prompt))

Here is a python function that calculates the square root of a multiplication of two numbers:

```python
import math

def sqrt_of_product(a, b):
   result = a * b
   return math.sqrt(result)
```

This function takes two arguments, `a` and `b`, which are the two numbers to be multiplied. The function calculates the product of these two numbers using the `*` operator and stores the result in the variable `result`. Then, the function uses the `math.sqrt()` function to calculate the square root of the result and returns the result.
CPU times: user 40.9 s, sys: 196 ms, total: 41.1 s
Wall time: 40.9 s


In [6]:
%%time
prompt = """
Write a function in python that checks for palindrom number.
""".strip()
response = llm(format_prompt(prompt))

Here is a simple function in Python that checks if a number is a palindrome:

```python
def is_palindrome(n):
   n = abs(n)
   return n == n[::-1]
```

This function works by taking the absolute value of the input number `n`, which removes any negative sign. Then it checks if the number is equal to its reverse, `n[::-1]`. If they are equal, then the number is a palindrome.

Note: This function only works for positive integers. If you want to check for palindromes of negative integers as well, you can remove the absolute value call.
CPU times: user 44.1 s, sys: 106 ms, total: 44.2 s
Wall time: 44.1 s


In [7]:
%%time

text = """
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned
large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our
models outperform open-source chat models on most benchmarks we tested, and based on
our human evaluations for helpfulness and safety, may be a suitable substitute for closedsource models. We provide a detailed description of our approach to fine-tuning and safety
improvements of Llama 2-Chat in order to enable the community to build on our work and
contribute to the responsible development of LLMs.
"""

prompt = f"""
Use the text to describe the benefits of Llama 2:
{text}
""".strip()

response = llm(format_prompt(prompt))

Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) that offer several benefits. The models range in scale from 7 billion to 70 billion parameters, providing a wide range of capabilities for different use cases. The fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases and outperform open-source chat models on most benchmarks tested. This makes them a suitable substitute for closed-source models in many scenarios. Additionally, Llama 2-Chat underwent safety improvements, making them a responsible and safe option for use in various applications. The detailed description of the approach to fine-tuning and safety improvements provided by the developers enables the community to build on their work and contribute to the responsible development of LLMs.
CPU times: user 49.8 s, sys: 176 ms, total: 50 s
Wall time: 49.9 s


In [8]:
%%time
table = """
|Model|Size|Code|Commonsense Reasoning|World Knowledge|Reading Comprehension|Math|MMLU|BBH|AGI Eval|
|---|---|---|---|---|---|---|---|---|---|
|Llama 1|7B|14.1|60.8|46.2|58.5|6.95|35.1|30.3|23.9|
|Llama 1|13B|18.9|66.1|52.6|62.3|10.9|46.9|37.0|33.9|
|Llama 1|33B|26.0|70.0|58.4|67.6|21.4|57.8|39.8|41.7|
|Llama 1|65B|30.7|70.7|60.5|68.6|30.8|63.4|43.5|47.6|
|Llama 2|7B|16.8|63.9|48.9|61.3|14.6|45.3|32.6|29.3|
|Llama 2|13B|24.5|66.9|55.4|65.8|28.7|54.8|39.4|39.1|
|Llama 2|70B|**37.5**|**71.9**|**63.6**|**69.4**|**35.2**|**68.9**|**51.2**|**54.2**|
"""

prompt = f"""
Use the data from the markdown table:

```
{table}
```

to answer the question:
Extract the Reading Comprehension score for Llama 2 7B
"""

response = llm(format_prompt(prompt))

The Reading Comprehension score for Llama 2 7B is 61.3.
CPU times: user 7.48 s, sys: 13.5 ms, total: 7.49 s
Wall time: 7.47 s


In [9]:
%%time
table = """
|Model|Size|Code|Commonsense Reasoning|World Knowledge|Reading Comprehension|Math|MMLU|BBH|AGI Eval|
|---|---|---|---|---|---|---|---|---|---|
|Llama 1|7B|14.1|60.8|46.2|58.5|6.95|35.1|30.3|23.9|
|Llama 1|13B|18.9|66.1|52.6|62.3|10.9|46.9|37.0|33.9|
|Llama 1|33B|26.0|70.0|58.4|67.6|21.4|57.8|39.8|41.7|
|Llama 1|65B|30.7|70.7|60.5|68.6|30.8|63.4|43.5|47.6|
|Llama 2|7B|16.8|63.9|48.9|61.3|14.6|45.3|32.6|29.3|
|Llama 2|13B|24.5|66.9|55.4|65.8|28.7|54.8|39.4|39.1|
|Llama 2|70B|**37.5**|**71.9**|**63.6**|**69.4**|**35.2**|**68.9**|**51.2**|**54.2**|
"""

prompt = f"""
Use the data from the markdown table:

```
{table}
```

to answer the question:
Calculate how much better (% increase) is Llama 2 7B vs Llama 1 7B on Reading Comprehension?
"""

response = llm(format_prompt(prompt))


To calculate the percentage increase in Reading Comprehension for Llama 2 7B compared to Llama 1 7B, we can use the following formula:

Percentage Increase = ((New Value - Old Value) / Old Value) x 100

For Llama 2 7B, the Reading Comprehension score is 61.3, and for Llama 1 7B, it is 58.5.

Percentage Increase = ((61.3 - 58.5) / 58.5) x 100
Percentage Increase = (2.8 / 58.5) x 100
Percentage Increase = 4.82%

Therefore, Llama 2 7B is 4.82% better than Llama 1 7B on Reading Comprehension.
CPU times: user 59.7 s, sys: 170 ms, total: 59.9 s
Wall time: 59.7 s
