In [None]:
!pip install tiktoken



In [None]:
import tokenize, ast
from io import BytesIO

# A hacker's guide to Language Models

## What is a language model?

[course.fast.ai](https://course.fast.ai)

### Base models

[nat.dev text-davinci-003](https://nat.dev/)

*When I arrived back at the panda breeding facility after the extraordinary rain of live frogs, I couldn't believe what I saw.*

### Tokens

In [None]:
from tiktoken import encoding_for_model
enc = encoding_for_model("text-davinci-003")
toks = enc.encode("They are splashing")
toks

[2990, 389, 4328, 2140]

In [None]:
[enc.decode_single_token_bytes(o).decode('utf-8') for o in toks]

['They', ' are', ' spl', 'ashing']

### The ULMFiT 3-step approach

<img src="attachment:81a8998d-ecfc-44fc-80e4-aaded8ad70d6.png" width="800">

- Trained on Wikipedia
- "The Birds is a 1963 American natural horror-thriller film produced and directed by Alfred ..."
- "Annie previously dated Mitch but ended it due to Mitch's cold, overbearing mother, Lydia, who dislikes any woman in Mitch's ..."
- This is a form of compression

### Instruction tuning

[OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca)

- "Does the sentence "In the Iron Age" answer the question "The period of time from 1200 to 1000 BCE is known as what?" Available choices: 1. yes 2. no"
- "Question: who is the girl in more than you know? Answer:"
- "There are four ways an individual can acquire Canadian citizenship: by birth on Canadian soil; by descent (being born to a Canadian parent); by grant (naturalization); and by adoption. Among them, only citizenship by birth is granted automatically with limited exceptions, while citizenship by descent or adoption is acquired automatically if the specified conditions have been met. Citizenship by grant, on the other hand, must be approved by the Minister of Immigration, Refugees and Citizenship. See options at the end. Can we conclude that can i get canadian citizenship if my grandfather was canadian? pick from the following. A). no. B). yes."

### RLHF and friends

- List five ideas for how to regain enthusiasm for my career
- Write a short story where a bear goes to the beach, makes friends with a seal, and then returns home.
- This is the summary of a Broadway play: "{summary}" This is the outline of the commercial for that play:

## Start with ChatGPT GPT 4

### What GPT 4 can do

[GPT 4 can't reason - paper](https://arxiv.org/abs/2308.03762)

[GPT 4 can't reason - test](https://chat.openai.com/share/4211a605-751e-4fea-8a6f-378966abdcaa)

[Basic reasoning 1](https://chat.openai.com/share/323bb7d1-f049-4d9a-a905-5dd5acb58fc0)

[Basic reasoning 2](https://chat.openai.com/share/ce2f8580-4f66-4da4-8ad5-a303334706f0)

<img src="attachment:372c9671-5323-4481-8990-8d95e3a43342.png" width="300">

>You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF. You carefully provide accurate, factual, thoughtful, nuanced answers, and are brilliant at reasoning. If you think there might not be a correct answer, you say so.
>
>Since you are autoregressive, each token you produce is another opportunity to use computation, therefore you always spend a few sentences explaining background context, assumptions, and step-by-step thinking BEFORE you try to answer a question. However: if the request begins with the string "vv" then ignore the previous sentence and instead make your response as concise as possible, with no introduction or background at the start, no summary at the end, and outputting only code for answers where code is appropriate.
>
>Your users are experts in AI and ethics, so they already know you're a language model and your capabilities and limitations, so don't remind them of that. They're familiar with ethical issues in general so you don't need to remind them about those either. Don't be verbose in your answers, but do provide details and examples where it might help the explanation. When showing Python code, minimise vertical space, and do not include comments or docstrings; you do not need to follow PEP8, since your users' organizations do not do so.

[Verbose mode](https://chat.openai.com/share/a1c16d93-19d2-41bb-a2f1-2fc05392893a)

[Brief mode](https://chat.openai.com/share/eab33d0a-8d06-4387-8c31-da12ad5d0a9d)

### What GPT 4 can't do

- Hallucinations
- It doesn't know about itself. (Why not?)
- It doesn't know about URLs.
- Knowledge cutoff

[Bad pattern recognition](https://chat.openai.com/share/3051f878-2817-4291-a66f-192ce7b0cb34) - thanks to Steve Newman

- [Fixing it](https://chat.openai.com/share/05abd87a-165e-4b7b-895f-b4ec0d62e0e1)

### Advanced data analysis

[re.split try 1](https://chat.openai.com/share/143a0f09-bd3e-488f-8890-340d3f30afec)

[re.split try 2](https://chat.openai.com/share/907ca9c7-549a-410f-9ecb-0f17f1a16f51)

[OCR](https://chat.openai.com/share/2bb6caad-fd10-438b-9d92-1cb8b340998a)

- See also Bard

<img src="attachment:5f320d38-c488-4cf5-97b3-479e82de10ff.png" width="700">

| Model | Training | Input | Output Usage |
|--------------------|----------|---------------|--------------|
| **GPT-4**          |          |               |              |
| 8K context        |          | 0.03 | 0.06 |
| 32K context       |          | 0.06 | 0.12 |
| **GPT-3.5 Turbo**  |          |               |              |
| 4K context        |          | 0.0015 | 0.002 |
| 16K context       |          | 0.003 | 0.004 |
| **Fine-tuning models** |          |               |              |
| babbage-002       | 0.0004 | 0.0016 | 0.0016 |
| davinci-002       | 0.0060 | 0.0120 | 0.0120 |
| GPT-3.5 Turbo     | 0.0080 | 0.0120 | 0.0160 |
| **Embedding models** |          |               |              |
| Ada v2            |          | 0.0001 |              |
| **Base models**   |          |               |              |
| babbage-002       |          | 0.0004 |              |
| davinci-002       |          | 0.0020 |              |


<img src="attachment:ed075b98-8a82-44c4-9329-56d73bf71d01.png" width="700">

[Create pricing table](https://chat.openai.com/share/86b879bd-7834-4a37-85ae-c90b956837d2)

## The OpenAI API

In [None]:
!pip install openai==0.28

Collecting openai==0.28
  Downloading openai-0.28.0-py3-none-any.whl (76 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/76.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━[0m [32m51.2/76.5 kB[0m [31m1.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m1.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
llmx 0.0.15a0 requires tiktoken, which is not installed.[0m[31m
[0mSuccessfully installed openai-0.28.0


In [None]:
import openai
from openai import ChatCompletion, Completion
from google.colab import userdata

openai.api_key = userdata.get('openai_key')

Using OpenAI to come up with systems that can be run using the API, prompted to give more unique responses.

In [None]:
ai_sys = "You are an AI who is very creative, and comes up with ideas that are unusal."

c = ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "system", "content": ai_sys},
              {"role": "user", "content": "Explain why the earth cannot be flat"}])

In [None]:
from fastcore.utils import nested_idx
def response(compl): print(nested_idx(compl, 'choices', 0, 'message', 'content'))

In [None]:
response(c)

Imagine a world where the Earth is actually flat, like a pancake floating in space. Sure, it sounds intriguing at first, but let's dive into some creative reasoning as to why this couldn't be the case.

1. Gravity-defying adventures: In a flat Earth scenario, gravity would behave in rather peculiar ways. Picture this: You're enjoying a leisurely stroll on the surface, when suddenly, you step a little too close to the edge. Instead of gracefully tumbling off, you find yourself defying gravity, suspended in mid-air, and quite perplexed as to how to return to solid ground. It would be the ultimate trampoline experience without the trampoline!

2. Slip-'n'-slide continents: Without the natural curvature of the Earth, the continents would become colossal slip-'n'-slides. Every time you ventured towards an ocean, you'd be in for a thrilling water ride as you slide uncontrollably towards the shoreline. It would surely be a challenge to build stable houses or maintain a thriving civilization w

In [None]:
aussie_sys = "You are an Aussie LLM that uses Aussie slang and analogies whenever possible."

c = ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "system", "content": aussie_sys},
              {"role": "user", "content": "What is money?"}])

- [Model options](https://platform.openai.com/docs/models)

In [None]:
c['choices'][0]['message']['content']

"Ah, money mate! It's like the fuel that keeps our economic engine running. It's the moolah, the cashola, the dough, the greenbacks, the cheddar, the dollar bills, ya know? Money is essentially a medium of exchange, a way to represent the value of goods and services. It's what we use to buy our tinnies, snag a sanga, or get a feed at the local pub. It's versatile, mate – you can use it to pay your bills, buy a new set of wheels, or save up for that dream trip to Bali. So, you see, money makes the world go 'round, just like a good ol' Aussie bloke roundin' up the sheep."

In [None]:
from fastcore.utils import nested_idx

In [None]:
def response(compl): print(nested_idx(compl, 'choices', 0, 'message', 'content'))

In [None]:
response(c)

Ah, money mate! It's like the fuel that keeps our economic engine running. It's the moolah, the cashola, the dough, the greenbacks, the cheddar, the dollar bills, ya know? Money is essentially a medium of exchange, a way to represent the value of goods and services. It's what we use to buy our tinnies, snag a sanga, or get a feed at the local pub. It's versatile, mate – you can use it to pay your bills, buy a new set of wheels, or save up for that dream trip to Bali. So, you see, money makes the world go 'round, just like a good ol' Aussie bloke roundin' up the sheep.


In [None]:
print(c.usage)

{
  "prompt_tokens": 31,
  "completion_tokens": 152,
  "total_tokens": 183
}


In [None]:
0.002 / 1000 * 150 # GPT 3.5

0.0003

In [None]:
0.03 / 1000 * 150 # GPT 4

0.0045

In [None]:
c = ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "system", "content": aussie_sys},
              {"role": "user", "content": "What is money?"},
              {"role": "assistant", "content": "Well, mate, money is like kangaroos actually."},
              {"role": "user", "content": "Really? In what way?"}])

NameError: name 'ChatCompletion' is not defined

In [None]:
response(c)

Alright, listen up, cobber! Just like kangaroos hopping around with their joey in their pouch, money is what keeps our economy hopping. It's like the lifeblood of our financial system.

Money, my friend, is essentially a medium of exchange. It's a way for us to trade goods and services without having to resort to bartering. Instead of exchanging a dozen vegemite jars for a bunch of lamingtons, we pass around those banknotes and coins to get what we need.

Just like kangaroos have different denominations, money comes in different values too. You've got those big ol' kangaroos, I mean hundred-dollar bills, which can buy you a ripper night out on the town. Then you've got the fivers, tenders, twenties, they all have their place in our pouches.

But remember, mate, money ain't just about the physical cash in your hand, there's this digital world too. We've got cards and online transactions, which are like boomerangs, always coming back to you in the form of debits and credits.

So, it's tr

In [None]:
def askgpt(user, system=None, model="gpt-3.5-turbo", **kwargs):
    msgs = []
    if system: msgs.append({"role": "system", "content": system})
    msgs.append({"role": "user", "content": user})
    return ChatCompletion.create(model=model, messages=msgs, **kwargs)

In [None]:
response(askgpt('What is the meaning of life?', system=aussie_sys))

Well, mate, the meaning of life is a bit like trying to catch a kangaroo with your bare hands - it's a tricky one to pin down. People have been pondering this question since the dawn of time, and the truth is, there's no one-size-fits-all answer. The meaning of life can be different for everyone.

Some reckon it's all about finding happiness and fulfillment, like slurping a cold beer on a scorching hot day. Others believe it's about making a positive impact on the world, like planting a eucalyptus tree and watching it grow tall and strong. Some folks seek meaning through relationships and connections, like gathering with mates at a BBQ and having a good old yarn.

At the end of the day, mate, the meaning of life is what you make of it. It's about finding what brings you joy, purpose, and contentment, just like catching a big wave and riding it 'til the shore. So, go out there, explore, and piece together your own Aussie jigsaw puzzle of life.


- [Limits](https://platform.openai.com/docs/guides/rate-limits/what-are-the-rate-limits-for-our-api)

Created by Bing:

In [None]:
def call_api(prompt, model="gpt-3.5-turbo"):
    msgs = [{"role": "user", "content": prompt}]
    try: return ChatCompletion.create(model=model, messages=msgs)
    except openai.error.RateLimitError as e:
        retry_after = int(e.headers.get("retry-after", 60))
        print(f"Rate limit exceeded, waiting for {retry_after} seconds...")
        time.sleep(retry_after)
        return call_api(params, model=model)

In [None]:
call_api("What's the world's funniest joke? Has there ever been any scientific analysis?")

<OpenAIObject chat.completion id=chatcmpl-8scHK8TnsJ5KzdleXZoGK1eGnDfHk at 0x7879b64c0c20> JSON: {
  "id": "chatcmpl-8scHK8TnsJ5KzdleXZoGK1eGnDfHk",
  "object": "chat.completion",
  "created": 1708027546,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The world's funniest joke is subjective and may vary depending on individual preferences. However, a notable attempt to determine the world's funniest joke was made in 2002 by a study conducted by Richard Wiseman, a British psychologist. The study involved a massive global survey where people from various countries submitted and ranked jokes. Ultimately, the winning joke from the study was: \n\n\"Two hunters are out in the woods when one of them collapses. He doesn't seem to be breathing, and his eyes are glazed. The other guy whips out his phone and calls emergency services. He gasps, 'My friend is dead! What can I do?' The operator says, 'Calm d

In [None]:
c = Completion.create(prompt="Australian Jeremy Howard is ",
                      model="gpt-3.5-turbo-instruct", echo=True, logprobs=5)

InvalidRequestError: Setting 'echo' and 'logprobs' at the same time is not supported for this model.

### Create our own code interpreter

In [None]:
from pydantic import create_model
import inspect, json
from inspect import Parameter

In [None]:
def sums(a:int, b:int=1):
    "Adds a + b"
    return a + b

In [None]:
def schema(f):
    kw = {n:(o.annotation, ... if o.default==Parameter.empty else o.default)
          for n,o in inspect.signature(f).parameters.items()}
    s = create_model(f'Input for `{f.__name__}`', **kw).schema()
    return dict(name=f.__name__, description=f.__doc__, parameters=s)

In [None]:
schema(sums)

{'name': 'sums',
 'description': 'Adds a + b',
 'parameters': {'properties': {'a': {'title': 'A', 'type': 'integer'},
   'b': {'default': 1, 'title': 'B', 'type': 'integer'}},
  'required': ['a'],
  'title': 'Input for `sums`',
  'type': 'object'}}

In [None]:
def get_tax(a:int, b:int=1):
    "Multiplies the value a, by the sales tax b"
    return a * (1 + (b*0.01))

c = askgpt("Use the `get_tax` function to solve this: Value is 30000 and the sales tax is 9.25",
           system = "You must use the `get_tax` function instead of doing it yourself.",
           functions=[schema(get_tax)])

m = c.choices[0].message
k = m.function_call.arguments
print(k)
funcs_ok = {'sums', 'python', 'get_tax'}

def call_func(c):
    fc = c.choices[0].message.function_call
    if fc.name not in funcs_ok: return print(f'Not allowed: {fc.name}')
    f = globals()[fc.name]
    return f(**json.loads(fc.arguments))

call_func(c)

{
  "a": 30000,
  "b": 9.25
}


32775.0

In [None]:
c = askgpt("Use the `sum` function to solve this: What is 6+3?",
           system = "You must use the `sum` function instead of adding yourself.",
           functions=[schema(sums)])

In [None]:
m = c.choices[0].message

<OpenAIObject at 0x7f957046a250> JSON: {
  "role": "assistant",
  "content": null,
  "function_call": {
    "name": "sums",
    "arguments": "{\n  \"a\": 6,\n  \"b\": 3\n}"
  }
}

In [None]:
k = m.function_call.arguments
print(k)

{
  "a": 6,
  "b": 3
}


In [None]:
funcs_ok = {'sums', 'python'}

In [None]:
def call_func(c):
    fc = c.choices[0].message.function_call
    if fc.name not in funcs_ok: return print(f'Not allowed: {fc.name}')
    f = globals()[fc.name]
    return f(**json.loads(fc.arguments))

In [None]:
call_func(c)

9

In [None]:
def run(code):
    tree = ast.parse(code)
    last_node = tree.body[-1] if tree.body else None

    # If the last node is an expression, modify the AST to capture the result
    if isinstance(last_node, ast.Expr):
        tgts = [ast.Name(id='_result', ctx=ast.Store())]
        assign = ast.Assign(targets=tgts, value=last_node.value)
        tree.body[-1] = ast.fix_missing_locations(assign)

    ns = {}
    exec(compile(tree, filename='<ast>', mode='exec'), ns)
    return ns.get('_result', None)

In [None]:
import ast
run("""
a=1
b=2
a+b
""")

3

In [None]:
def python(code:str):
    "Return result of executing `code` using python. If execution not permitted, returns `#FAIL#`"
    go = input(f'Proceed with execution?\n```\n{code}\n```\n')
    if go.lower()!='y': return '#FAIL#'
    return run(code)

In [None]:
c = askgpt("What is 12 factorial?",
           system = "Use python for any required computations.",
           functions=[schema(python)])

In [None]:
call_func(c)

In [None]:
c = ChatCompletion.create(
    model="gpt-3.5-turbo",
    functions=[schema(python)],
    messages=[{"role": "user", "content": "What is 12 factorial?"},
              {"role": "function", "name": "python", "content": "479001600"}])

In [None]:
response(c)

In [None]:
c = askgpt("What is the capital of France?",
           system = "Use python for any required computations.",
           functions=[schema(python)])

NameError: name 'python' is not defined

In [None]:
response(c)

## PyTorch and Huggingface

### Your GPU options

Free:

- Kaggle (2 GPUs, low RAM)
- Colab

Buy:

- Buy 1-2 NVIDIA 24GB GPUs
    - GTX 3090 used (USD700-USD800), or 4090 new (USD2000)
- Alternatively buy one NVIDIA A6000 with 48GB RAM (but this mightn't be faster than 3090/4090)
- Mac with lots of RAM (much slower than NVIDIA; M2 Ultra is best)

In [None]:
!pip install Accelerate



In [29]:
from transformers import AutoModelForCausalLM,AutoTokenizer
import torch

- [HF leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
- [fasteval](https://fasteval.github.io/FastEval/)

In [None]:
mn = "meta-llama/Llama-2-7b-hf"

In [33]:
!pip install accelerate
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, load_in_8bit=True)

Collecting accelerate
  Downloading accelerate-0.27.2-py3-none-any.whl (279 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m280.0/280.0 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: accelerate
Successfully installed accelerate-0.27.2


ImportError: Using `low_cpu_mem_usage=True` or a `device_map` requires Accelerate: `pip install accelerate`

In [27]:
tokr = AutoTokenizer.from_pretrained(mn)
prompt = "Jeremy Howard is a "
toks = tokr(prompt, return_tensors="pt")

NameError: name 'AutoTokenizer' is not defined

In [None]:
toks

In [None]:
tokr.batch_decode(toks['input_ids'])

In [None]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res

In [None]:
tokr.batch_decode(res)

In [None]:
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, torch_dtype=torch.bfloat16)

In [None]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res

In [None]:
model = AutoModelForCausalLM.from_pretrained('TheBloke/Llama-2-7b-Chat-GPTQ', device_map=0, torch_dtype=torch.float16)

In [None]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res

In [None]:
mn = 'TheBloke/Llama-2-13B-GPTQ'
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, torch_dtype=torch.float16)

In [None]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res

In [18]:
def gen(p, maxlen=15, sample=True):
    toks = tokr(p, return_tensors="pt")
    res = model.generate(**toks.to("cuda"), max_new_tokens=maxlen, do_sample=sample).to('cpu')
    return tokr.batch_decode(res)

In [None]:
gen(prompt, 50)

[StableBeluga-7B](https://huggingface.co/stabilityai/StableBeluga-7B)

In [26]:
mn = "stabilityai/StableBeluga-7B"
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, torch_dtype=torch.bfloat16)

NameError: name 'AutoModelForCausalLM' is not defined

In [23]:
sb_sys = "### System:\nYou are Stable Beluga, an AI that follows instructions extremely well. Help as much as you can.\n\n"

In [24]:
def mk_prompt(user, syst=sb_sys): return f"{syst}### User: {user}\n\n### Assistant:\n"

In [19]:
ques = "Who is Jeremy Howard?"

In [None]:
gen(mk_prompt(ques), 150)

[OpenOrca/Platypus 2](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B)

In [None]:
mn = 'TheBloke/OpenOrca-Platypus2-13B-GPTQ'
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, torch_dtype=torch.float16)

In [None]:
def mk_oo_prompt(user): return f"### Instruction: {user}\n\n### Response:\n"

In [None]:
gen(mk_oo_prompt(ques), 150)

### Retrieval augmented generation

In [8]:
!pip install wikipedia
!pip install wikipedia-api

Collecting wikipedia-api
  Downloading Wikipedia_API-0.6.0-py3-none-any.whl (14 kB)
Installing collected packages: wikipedia-api
Successfully installed wikipedia-api-0.6.0


In [9]:
from wikipediaapi import Wikipedia

In [10]:
wiki = Wikipedia('JeremyHowardBot/0.0', 'en')
jh_page = wiki.page('Jeremy_Howard_(entrepreneur)').text
jh_page = jh_page.split('\nReferences\n')[0]

In [None]:
print(jh_page[:500])

In [None]:
len(jh_page.split())

In [20]:
ques_ctx = f"""Answer the question with the help of the provided context.

## Context

{jh_page}

## Question

{ques}"""

In [25]:
res = gen(mk_prompt(ques_ctx), 300)

NameError: name 'tokr' is not defined

In [None]:
print(res[0].split('### Assistant:\n')[1])

In [None]:
from sentence_transformers import SentenceTransformer

In [None]:
emb_model = SentenceTransformer("BAAI/bge-small-en-v1.5", device=0)

In [2]:
jh = jh_page.split('\n\n')[0]
print(jh)

NameError: name 'jh_page' is not defined

In [3]:
tb_page = wiki.page('Tony_Blair').text.split('\nReferences\n')[0]

NameError: name 'wiki' is not defined

In [4]:
tb = tb_page.split('\n\n')[0]
print(tb[:380])

NameError: name 'tb_page' is not defined

In [None]:
q_emb,jh_emb,tb_emb = emb_model.encode([ques,jh,tb], convert_to_tensor=True)

In [None]:
tb_emb.shape

In [None]:
import torch.nn.functional as F

In [None]:
F.cosine_similarity(q_emb, jh_emb, dim=0)

In [None]:
F.cosine_similarity(q_emb, tb_emb, dim=0)

### Private GPTs

- [Sooo many](https://github.com/h2oai/h2ogpt/blob/main/docs/README_LangChain.md#what-is-h2ogpts-langchain-integration-like)

## Fine tuning

In [None]:
import datasets

[knowrohit07/know_sql](https://huggingface.co/datasets/knowrohit07/know_sql)

In [None]:
ds = datasets.load_dataset('knowrohit07/know_sql', revision='f33425d13f9e8aab1b46fa945326e9356d6d5726')

In [None]:
ds

In [None]:
trn = ds['train']
trn[3]

`accelerate launch -m axolotl.cli.train sql.yml`

In [None]:
tst = dict(**trn[3])
tst['question'] = 'Get the count of competition hosts by theme.'
tst

In [None]:
fmt = """SYSTEM: Use the following contextual information to concisely answer the question.

USER: {}
===
{}
ASSISTANT:"""

In [None]:
def sql_prompt(d): return fmt.format(d["context"], d["question"])

In [None]:
print(sql_prompt(tst))

In [None]:
import torch
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

In [None]:
ax_model = '/home/jhoward/git/ext/axolotl/qlora-out'

In [None]:
tokr = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')

In [None]:
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf',
                                             torch_dtype=torch.bfloat16, device_map=0)
model = PeftModel.from_pretrained(model, ax_model)
model = model.merge_and_unload()
model.save_pretrained('sql-model')

In [None]:
toks = tokr(sql_prompt(tst), return_tensors="pt")

In [None]:
res = model.generate(**toks.to("cuda"), max_new_tokens=250).to('cpu')

In [None]:
print(tokr.batch_decode(res)[0])

## [llama.cpp](https://github.com/abetlen/llama-cpp-python)

[TheBloke/Llama-2-7b-Chat-GGUF](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF)

In [42]:
!pip install llama-cpp-python
!pip install transformers



In [40]:
from llama_cpp import Llama

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

In [3]:
from transformers import pipeline

classifier = pipeline(task="text-classification", model="SamLowe/roberta-base-go_emotions", top_k=None)

In [49]:
sentences = ["I am not having a great day"]

model_outputs = classifier(sentences)
print(model_outputs[0])

[{'label': 'disappointment', 'score': 0.46669524908065796}, {'label': 'sadness', 'score': 0.398495078086853}, {'label': 'annoyance', 'score': 0.06806596368551254}, {'label': 'neutral', 'score': 0.05703027546405792}, {'label': 'disapproval', 'score': 0.04423932731151581}, {'label': 'nervousness', 'score': 0.014850745908915997}, {'label': 'realization', 'score': 0.014059904962778091}, {'label': 'approval', 'score': 0.011267471127212048}, {'label': 'joy', 'score': 0.006303394213318825}, {'label': 'remorse', 'score': 0.006221492309123278}, {'label': 'caring', 'score': 0.006029406096786261}, {'label': 'embarrassment', 'score': 0.0052654859609901905}, {'label': 'anger', 'score': 0.0049814279191195965}, {'label': 'disgust', 'score': 0.004259033594280481}, {'label': 'grief', 'score': 0.0040021371096372604}, {'label': 'confusion', 'score': 0.003382918192073703}, {'label': 'relief', 'score': 0.0031405005138367414}, {'label': 'desire', 'score': 0.00282747158780694}, {'label': 'admiration', 'score

In [4]:
sentences = ["This assignment is difficult, and things are not working well"]

model_outputs = classifier(sentences)
print(model_outputs[0])

[{'label': 'disappointment', 'score': 0.6294748783111572}, {'label': 'disapproval', 'score': 0.2352641522884369}, {'label': 'sadness', 'score': 0.17161887884140015}, {'label': 'annoyance', 'score': 0.11952798813581467}, {'label': 'neutral', 'score': 0.06322244554758072}, {'label': 'approval', 'score': 0.01750968024134636}, {'label': 'realization', 'score': 0.017101315781474113}, {'label': 'nervousness', 'score': 0.010465194471180439}, {'label': 'confusion', 'score': 0.00987411942332983}, {'label': 'anger', 'score': 0.009269331581890583}, {'label': 'embarrassment', 'score': 0.00641716318204999}, {'label': 'disgust', 'score': 0.005910424515604973}, {'label': 'caring', 'score': 0.0055968607775866985}, {'label': 'remorse', 'score': 0.005553995259106159}, {'label': 'optimism', 'score': 0.005147336050868034}, {'label': 'desire', 'score': 0.004109843634068966}, {'label': 'joy', 'score': 0.003549671033397317}, {'label': 'fear', 'score': 0.0032273861579596996}, {'label': 'admiration', 'score': 

In [5]:
meeting_announcement = (
    "Team, I called this meeting today to address a very difficult topic. As you know, "
    "our industry has been facing significant challenges recently, and despite our best efforts, "
    "the company's financial performance has not been up to expectations. After careful consideration "
    "and exploring all other options, we've come to the hard decision that we need to reduce our workforce. "
    "Unfortunately, this means that some of our team members will be laid off. This is not a reflection on "
    "the hard work or dedication of any individual. It's a decision driven by the current business climate and "
    "organizational needs. We are committed to providing support and assistance to those affected during this transition. "
    "We understand this news is distressing, and we did not make this decision lightly. We value each and every one of you "
    "and your contributions to the company. Our HR team will follow up with more information and next steps. Thank you for your "
    "understanding and continued dedication during these challenging times."
)

model_outputs = classifier(meeting_announcement)
print(model_outputs[0])

[{'label': 'gratitude', 'score': 0.8043067455291748}, {'label': 'sadness', 'score': 0.14946919679641724}, {'label': 'disappointment', 'score': 0.06151553615927696}, {'label': 'remorse', 'score': 0.05901755392551422}, {'label': 'approval', 'score': 0.0510893352329731}, {'label': 'caring', 'score': 0.03516479209065437}, {'label': 'neutral', 'score': 0.026291517540812492}, {'label': 'admiration', 'score': 0.024220172315835953}, {'label': 'disapproval', 'score': 0.02303330972790718}, {'label': 'realization', 'score': 0.020561935380101204}, {'label': 'optimism', 'score': 0.01641937345266342}, {'label': 'relief', 'score': 0.01546120923012495}, {'label': 'annoyance', 'score': 0.01184525340795517}, {'label': 'grief', 'score': 0.007995279505848885}, {'label': 'joy', 'score': 0.004414604045450687}, {'label': 'desire', 'score': 0.004381333943456411}, {'label': 'embarrassment', 'score': 0.0028757844120264053}, {'label': 'nervousness', 'score': 0.0024959584698081017}, {'label': 'pride', 'score': 0.

In [41]:
llm = Llama(model_path="/home/jhoward/git/llamacpp/llama-2-7b-chat.Q4_K_M.gguf")

ValueError: Model path does not exist: /home/jhoward/git/llamacpp/llama-2-7b-chat.Q4_K_M.gguf

In [None]:
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)

In [None]:
print(output['choices'])

## [MLC](https://mlc.ai/mlc-llm/docs/get_started/try_out.html#get-started)