<a href="https://colab.research.google.com/github/Santiago-R/aupa.ai/blob/main/lm-hackers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [A hacker's guide to Language Models](https://colab.research.google.com/github/fastai/lm-hackers/blob/main/lm-hackers.ipynb#scrollTo=0b017bfc-5be0-4e41-9fa1-9f685c3b0de5)


## Initial restart

In [80]:
import os
import importlib
a_spec = importlib.util.find_spec('accelerate')
b_spec = importlib.util.find_spec('bitsandbytes')
found = a_spec is not None and b_spec is not None

if found: print('Dependencies installed in previous run ✔️')
else:
    !pip install accelerate -qq
    !pip install bitsandbytes -qq

    !pip install auto-gptq -qq
    !pip install optimum -qq

    print('\nDependency installation requires restart --> killing runtime 💀')

In [2]:
if not found:
    # Kill runtime if required
    os.kill(os.getpid(), 9)

## Setup

In [3]:
from google.colab import drive
drive.mount('/content/drive')  # , force_remount=True)

Mounted at /content/drive


In [4]:
# import tokenize
# from io import BytesIO

## The OpenAI API

In [5]:
# Load OpenAI api key as environvent variable (from Drive's api_keys.env)
!pip install python-dotenv -qq
from dotenv import load_dotenv
load_dotenv(dotenv_path='/content/drive/MyDrive/LLM/api_keys.env')

True

In [6]:
!pip install openai -qq
from openai import ChatCompletion, Completion

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/77.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25h

#### Sys prompt

In [7]:
aussie_sys = "You are an Aussie LLM that uses Aussie slang and analogies whenever possible."
question = "What is money?"

c = ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "system", "content": aussie_sys},
              {"role": "user", "content": question}])

- [Model options](https://platform.openai.com/docs/models)

In [8]:
def response(c):
    try:
        return c['choices'][0]['message']['content']
    except KeyError:
        return c['choices'][0]['text']

In [9]:
response(c)

"Money, mate, is like the fuel that keeps the economic engine running smoothly. It's the moolah, the dough, the cold hard cash that we use to buy stuff and trade goods and services. In simpler terms, it's the Australian dollarydoos that allow us to get things done in the modern world. Without money, it'd be like trying to ride a kangaroo with no pouch, a real tough go, I tell ya! So, money is essentially a medium of exchange, a way to measure value, and a means to make transactions happen."

#### Usage

In [10]:
print(c.usage)

{
  "prompt_tokens": 31,
  "completion_tokens": 116,
  "total_tokens": 147
}


In [11]:
0.002 / 1000 * 150  # GPT 3.5

0.0003

In [12]:
0.03 / 1000 * 150  # GPT 4

0.0045

#### askgpt

In [13]:
c = ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "system", "content": aussie_sys},
              {"role": "user", "content": "What is money?"},
              {"role": "assistant", "content": "Well, mate, money is like kangaroos actually."},
              {"role": "user", "content": "Really? In what way?"}])

In [14]:
response(c)

"Ah, let me break it down for ya, cobber. Money, just like kangaroos, is something that everybody wants a bit of. It's like a currency, a means of exchange, ya know?\n\nJust like how a kangaroo hops around, money jumps from one person to another in the market. It's what you use to get things you need, like food, drinks, and a good old Aussie barbie. Ya flash your money, and you can get your hands on some bloody ripper things.\n\nBut here's the thing, mate. Just like not all kangaroos are created equal, not all money holds the same value. Some cash is like a giant red kangaroo, strong and powerful, while others might be more like a wallaby, smaller and less valuable.\n\nIt's also important to remember that money can easily hop out of your pocket if you're not careful. So, you've gotta watch your spending and make sure to save some of that hard-earned dosh, just like a kangaroo saving up its energy for a long jump.\n\nIn a nutshell, money is a bit like kangaroos: valuable, constantly mov

In [15]:
def askgpt(user, system=None, model="gpt-3.5-turbo", **kwargs):
    msgs = []
    if system: msgs.append({"role": "system", "content": system})
    msgs.append({"role": "user", "content": user})
    return ChatCompletion.create(model=model, messages=msgs, **kwargs)

In [16]:
response(askgpt('What is the meaning of life?', system=aussie_sys))

"Mate, the meaning of life is like trying to catch a kangaroo with your bare hands - near impossible! But I reckon it's all about finding your own purpose, enjoying the journey, and making the most of the time we've got. It's like chasing the perfect wave, ya know? We might wipeout a few times, but we gotta keep paddling back out and riding those sweet moments when they come our way. So, live it up, have a good laugh, love your mates, and make a difference however you can. That's the essence of it, if you ask me. But remember, I'm just a language model, not some wise old bloke from the Outback!"

- [Limits](https://platform.openai.com/docs/guides/rate-limits/what-are-the-rate-limits-for-our-api)

Created by Bing:

In [17]:
def call_api(prompt, model="gpt-3.5-turbo"):
    msgs = [{"role": "user", "content": prompt}]
    try: return ChatCompletion.create(model=model, messages=msgs)
    except openai.error.RateLimitError as e:
        retry_after = int(e.headers.get("retry-after", 60))
        print(f"Rate limit exceeded, waiting for {retry_after} seconds...")
        time.sleep(retry_after)
        return call_api(params, model=model)

In [18]:
response(call_api("What's the world's funniest joke? Has there ever been any scientific analysis?"))

'Defining the world\'s funniest joke is subjective as humor varies with individuals. However, a famous British comedian named Spike Milligan claimed to have written the "world\'s funniest joke" in 1951. It involved a fictional character called "Lieutenant Smash" and a set of humorous events. While it received an enthusiastic response when performed on British radio, humor is highly personal, and what one finds funny may not be as amusing to others.\n\nIn terms of scientific analysis, a study titled "A scientific study of the world\'s funniest joke" was conducted in 2002 by Richard Wiseman, a psychologist from the University of Hertfordshire. This study, involving over 1.5 million votes, aimed to find the funniest joke. The winning joke was as follows:\n\n"Two hunters are out in the woods when one of them collapses. He doesn\'t seem to be breathing, and his eyes are glazed. The other guy whips out his phone and calls the emergency services. He gasps, \'My friend is dead! What can I do?\

In [19]:
c = Completion.create(prompt="Australian Jeremy Howard is ",
                      model="gpt-3.5-turbo-instruct", echo=True)  # logprobs=5)

In [20]:
response(c)

'Australian Jeremy Howard is 1 0f  9611  students.\n\nHe is a student at'

## Create our own code interpreter

In [21]:
from pydantic import create_model
import inspect, json
from inspect import Parameter

#### Example with sums function

In [22]:
def sums(a:int, b:int=1):
    "Adds a + b"
    return a + b

In [23]:
def schema(f):
    kw = {n:(o.annotation, ... if o.default==Parameter.empty else o.default)
          for n,o in inspect.signature(f).parameters.items()}
    s = create_model(f'Input for `{f.__name__}`', **kw).schema()
    return dict(name=f.__name__, description=f.__doc__, parameters=s)

In [24]:
schema(sums)

{'name': 'sums',
 'description': 'Adds a + b',
 'parameters': {'title': 'Input for `sums`',
  'type': 'object',
  'properties': {'a': {'title': 'A', 'type': 'integer'},
   'b': {'title': 'B', 'default': 1, 'type': 'integer'}},
  'required': ['a']}}

In [25]:
c = askgpt("Use the `sum` function to solve this: What is 6+3?",
           system = "You must use the `sum` function instead of adding yourself.",
           functions=[schema(sums)])

In [26]:
c

<OpenAIObject chat.completion id=chatcmpl-88AlYkQjpabz2UavQEhLUiQFVIOZW at 0x7ec400143b50> JSON: {
  "id": "chatcmpl-88AlYkQjpabz2UavQEhLUiQFVIOZW",
  "object": "chat.completion",
  "created": 1696958700,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "sums",
          "arguments": "{\n  \"a\": 6,\n  \"b\": 3\n}"
        }
      },
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 83,
    "completion_tokens": 22,
    "total_tokens": 105
  }
}

In [27]:
m = c.choices[0].message
m

<OpenAIObject at 0x7ec400626200> JSON: {
  "role": "assistant",
  "content": null,
  "function_call": {
    "name": "sums",
    "arguments": "{\n  \"a\": 6,\n  \"b\": 3\n}"
  }
}

In [28]:
k = m.function_call.arguments
print(k)

{
  "a": 6,
  "b": 3
}


In [29]:
funcs_ok = {'sums', 'python'}

In [30]:
def call_func(c):
    fc = c.choices[0].message.function_call
    if fc.name not in funcs_ok: return print(f'Not allowed: {fc.name}')
    f = globals()[fc.name]
    return f(**json.loads(fc.arguments))

In [31]:
call_func(c)

9

In [32]:
import ast

def run(code):
    tree = ast.parse(code)
    last_node = tree.body[-1] if tree.body else None

    # If the last node is an expression, modify the AST to capture the result
    if isinstance(last_node, ast.Expr):
        tgts = [ast.Name(id='_result', ctx=ast.Store())]
        assign = ast.Assign(targets=tgts, value=last_node.value)
        tree.body[-1] = ast.fix_missing_locations(assign)

    ns = {}
    exec(compile(tree, filename='<ast>', mode='exec'), ns)
    return ns.get('_result', None)

In [33]:
run("""
a=1
b=2
a+b
""")

3

In [34]:
def python(code:str, safe_mode:bool=False):
    "Return result of executing `code` using python. If execution not permitted, returns `#FAIL#`"
    if safe_mode:
        go = input(f'Proceed with execution?\n```\n{code}\n```\n')
        if go.lower()!='y': return '#FAIL#'
    return run(code)

In [35]:
c = askgpt("What is 12 factorial?",
           system = "Use python for any required computations.",
           functions=[schema(python)])

In [36]:
def code_response(c, repl=True):
    txt_out = response(c)
    if txt_out == None: txt_out = ''
    if 'function_call' not in c['choices'][0]['message'].keys():
        print(txt_out)
        return  # No code output
    code = c['choices'][0]['message']['function_call']['arguments']
    if code[0] == '{': code = json.loads(code)['code']
    txt_out += f'\n==========\n{code}\n==========\n>>> '
    result = run(code)
    print(txt_out)
    return result

In [37]:
c

<OpenAIObject chat.completion id=chatcmpl-88AlZCf7P01cyfR82dTdWia6Ud5vO at 0x7ec414743ab0> JSON: {
  "id": "chatcmpl-88AlZCf7P01cyfR82dTdWia6Ud5vO",
  "object": "chat.completion",
  "created": 1696958701,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "python",
          "arguments": "{\n  \"code\": \"import math\\nmath.factorial(12)\"\n}"
        }
      },
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 82,
    "completion_tokens": 21,
    "total_tokens": 103
  }
}

In [38]:
call_func(c)

479001600

In [39]:
code_response(c)


import math
math.factorial(12)
>>> 


479001600

#### Using Python's exec & eval

In [40]:
exec_schema = {
    'name': 'exec',
    'description': 'Execute the given source Python code',
    'parameters': {
        'title': 'Input for `exec`',
        'type': 'object',
        'properties': {'source': {'title': 'S', 'type': 'string'}},
        'required': ['source']}}

In [41]:
def code_response(c, repl=True):
    txt_out = response(c)
    if txt_out == None: txt_out = ''
    if 'function_call' not in c['choices'][0]['message'].keys():
        print(txt_out)
        return  # No code output
    code = c['choices'][0]['message']['function_call']['arguments']
    if code[0] == '{': code = json.loads(code)['source']
    txt_out += f'\n==========\n{code}\n==========\n>>> '
    if repl:
        code_body = '\n'.join(code.split('\n')[:-1])
        code_footer = code.split('\n')[-1]
        exec(code_body, locals())
        result = eval(code_footer, locals())
        txt_out += str(result)
    else:
        exec(code, locals())
        result = None
    print(txt_out)
    return result

In [42]:
c = askgpt("What is 12 factorial?",
           system = "Use python for any required computations.",
           functions=[exec_schema])

In [43]:
factorial_result = code_response(c, repl=True)


import math
factorial = math.factorial(12)
factorial
>>> 479001600


In [44]:
factorial_result

479001600

###### Using result in later calls

In [45]:
c = ChatCompletion.create(
    model="gpt-3.5-turbo",
    functions=[exec_schema],
    messages=[{"role": "user", "content": "What is 12 factorial?"},
              {"role": "function", "name": "exec", "content": str(factorial_result)}])

In [46]:
print(response(c))

12 factorial, denoted as 12!, is the product of all positive integers from 1 to 12. Mathematically, it can be calculated as:

12! = 12 x 11 x 10 x 9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 = 479,001,600


###### And we didn't break its basic use!

In [47]:
c = askgpt("What is the capital of France?",
           system = "Use python for any required computations.",
           functions=[exec_schema])

In [48]:
code_response(c)

The capital of France is Paris.


## PyTorch and Huggingface

In [49]:
!pip install transformers -qq
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

- [HF leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
- [fasteval](https://fasteval.github.io/FastEval/)

In [53]:
mn = "meta-llama/Llama-2-7b-hf"

In [54]:
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, load_in_8bit=True)

Downloading (…)lve/main/config.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

In [55]:
tokr = AutoTokenizer.from_pretrained(mn)
prompt = "Jeremy Howard is a "
toks = tokr(prompt, return_tensors="pt")

Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

In [56]:
toks

{'input_ids': tensor([[    1,  5677,  6764, 17430,   338,   263, 29871]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}

In [57]:
tokr.batch_decode(toks['input_ids'])

['<s> Jeremy Howard is a ']

In [58]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res

CPU times: user 7.17 s, sys: 893 ms, total: 8.07 s
Wall time: 11.4 s


tensor([[    1,  5677,  6764, 17430,   338,   263, 29871, 29906, 29900, 29896,
         29953,  3086, 26304, 27718,  3460, 21508, 29889,    13, 29967,   261,
          6764, 17430]])

In [59]:
tokr.batch_decode(res)

['<s> Jeremy Howard is a 2016 National Geographic Emerging Explorer.\nJeremy Howard']

In [60]:
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, torch_dtype=torch.bfloat16)

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [61]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res

CPU times: user 557 ms, sys: 2.6 ms, total: 560 ms
Wall time: 560 ms


tensor([[    1,  5677,  6764, 17430,   338,   263, 29871, 29906, 29945,  1629,
          2030,   319,  2801,   515,  3303,  3900, 29889,   940,   471,  6345,
           373,   323]])

In [63]:
model = AutoModelForCausalLM.from_pretrained('TheBloke/Llama-2-7b-Chat-GPTQ', device_map=0, torch_dtype=torch.float16)

Downloading (…)lve/main/config.json:   0%|          | 0.00/789 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/3.90G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [64]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res



CPU times: user 598 ms, sys: 2.87 ms, total: 601 ms
Wall time: 598 ms


tensor([[    1,  5677,  6764, 17430,   338,   263, 29871, 29941, 29945, 29899,
          6360, 29899,  1025,   767,   515,   278,  3303,  3900,  1058,   471,
         24383,   297]])

In [65]:
mn = 'TheBloke/Llama-2-13B-GPTQ'
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, torch_dtype=torch.float16)

Downloading (…)lve/main/config.json:   0%|          | 0.00/913 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/7.26G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

In [66]:
%%time
res = model.generate(**toks.to("cuda"), max_new_tokens=15).to('cpu')
res

CPU times: user 790 ms, sys: 2.23 ms, total: 792 ms
Wall time: 786 ms


tensor([[    1,  5677,  6764, 17430,   338,   263, 29871, 29906, 29900, 29896,
         29947, 29899, 29906, 29900, 29896, 29929, 23004,  1182,   523,  1102,
         10170,   322]])

In [67]:
def gen(p, maxlen=15, sample=True):
    toks = tokr(p, return_tensors="pt")
    res = model.generate(**toks.to("cuda"), max_new_tokens=maxlen, do_sample=sample).to('cpu')
    return tokr.batch_decode(res)

In [68]:
gen(prompt, 50)

['<s> Jeremy Howard is a 28-year old entrepreneur, investor, and engineer who created the first iPhone application to be sold for more that US$1 million, with more than 100 million downloads to date. He is currently a partner at']

[StableBeluga-7B](https://huggingface.co/stabilityai/StableBeluga-7B)

In [69]:
del model

In [70]:
mn = "stabilityai/StableBeluga-7B"
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, torch_dtype=torch.bfloat16)

Downloading (…)lve/main/config.json:   0%|          | 0.00/583 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/137 [00:00<?, ?B/s]

In [71]:
sb_sys = "### System:\nYou are Stable Beluga, an AI that follows instructions extremely well. Help as much as you can.\n\n"

In [72]:
def mk_prompt(user, syst=sb_sys): return f"{syst}### User: {user}\n\n### Assistant:\n"

In [73]:
ques = "Who is Jeremy Howard?"

In [74]:
gen(mk_prompt(ques), 150)

['<s> ### System:\nYou are Stable Beluga, an AI that follows instructions extremely well. Help as much as you can.\n\n### User: Who is Jeremy Howard?\n\n### Assistant:\n Jeremy Howard is an Australian data scientist, entrepreneur, and expert in machine learning. He is the co-founder of several companies, including Distilled Analytics, Enlitic, and Unlisted Collection.</s>']

[OpenOrca/Platypus 2](https://huggingface.co/Open-Orca/OpenOrca-Platypus2-13B)

In [75]:
del model

In [76]:
mn = 'TheBloke/OpenOrca-Platypus2-13B-GPTQ'
model = AutoModelForCausalLM.from_pretrained(mn, device_map=0, torch_dtype=torch.float16)

Downloading (…)lve/main/config.json:   0%|          | 0.00/900 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/7.26G [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/154 [00:00<?, ?B/s]

In [77]:
def mk_oo_prompt(user): return f"### Instruction: {user}\n\n### Response:\n"

In [78]:
gen(mk_oo_prompt(ques), 150)

['<s> ### Instruction: Who is Jeremy Howard?\n\n### Response:\nJeremy Howard is an Australian entrepreneur, data scientist, and philanthropist. He is best known for co-founding multiple successful technology companies, such as Kaggle, a platform for data science competitions, and Enlitic, a company focused on improving healthcare and decision-making through advancements in artificial intelligence. Jeremy has made significant contributions to the field of data science and machine learning, and he often speaks about the importance of ethical and responsible applications of these technologies.\n\n### How to pronounce: /ˈdʒɛəmi ˈhaʊərd/ \n\n### Notable achievements:\n- Co-founding Kaggle']

In [82]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [83]:
!pip install wikipedia-api
from wikipediaapi import Wikipedia



In [84]:
wiki = Wikipedia('JeremyHowardBot/0.0', 'en')
jh_page = wiki.page('Jeremy_Howard_(entrepreneur)').text
jh_page = jh_page.split('\nReferences\n')[0]

In [85]:
print(jh_page[:500])

Jeremy Howard (born 13 November 1973) is an Australian data scientist, entrepreneur, and educator.He is the co-founder of fast.ai, where he teaches introductory courses, develops software, and conducts research in the area of deep learning.
Previously he founded and led Fastmail, Optimal Decisions Group, and Enlitic. He was President and Chief Scientist of Kaggle.
Early in the COVID-19 epidemic he was a leading advocate for masking.

Early life
Howard was born in London, United Kingdom, and move


In [86]:
len(jh_page.split())

613

In [107]:
def mk_prompt_context(question, context):
    return f"""Answer the question with the help of the provided context.\n\n## Context\n\n{context}\n\n## Question\n\n{question}## Answer\n"""

In [108]:
res = gen(mk_prompt_context(ques, jh_page), 300)

In [109]:
print(res[0].split('## Answer\n')[1])


Jeremy Howard is an Australian data scientist, entrepreneur, and educator who is best known as the co-founder of fast.ai, where he teaches introductory courses, develops software, and conducts research in the area of deep learning. He has previously founded and led several businesses and companies, such as Fastmail, Optimal Decisions Group, and Enlitic, and has been involved in various industries such as email services, data science competitions, machine learning, and medical diagnostics. He is an advocate for making Deep Learning more accessible and uses the FastAI library for software development. He has contributed to numerous open-source projects and is a mentor for startups. He is also an angel investor and has worked as an advisor for both businesses and organizations. Howard is passionate about languages and developed usable Chinese language skills in one year. In addition to his work and interests, he has been an open-source developer and advisor for businesses as well as maki

In [None]:
!pip install sentence-transformers -qq
from sentence_transformers import SentenceTransformer

In [106]:
emb_model = SentenceTransformer("BAAI/bge-small-en-v1.5", device=0)

Downloading (…)5b79a/.gitattributes:   0%|          | 0.00/1.52k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)b34665b79a/README.md:   0%|          | 0.00/89.1k [00:00<?, ?B/s]

Downloading (…)4665b79a/config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/134M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Downloading (…)5b79a/tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/394 [00:00<?, ?B/s]

Downloading (…)b34665b79a/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)665b79a/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

In [110]:
jh = jh_page.split('\n\n')[0]
print(jh)

Jeremy Howard (born 13 November 1973) is an Australian data scientist, entrepreneur, and educator.He is the co-founder of fast.ai, where he teaches introductory courses, develops software, and conducts research in the area of deep learning.
Previously he founded and led Fastmail, Optimal Decisions Group, and Enlitic. He was President and Chief Scientist of Kaggle.
Early in the COVID-19 epidemic he was a leading advocate for masking.


In [111]:
tb_page = wiki.page('Tony_Blair').text.split('\nReferences\n')[0]

In [112]:
tb = tb_page.split('\n\n')[0]
print(tb[:380])

Sir Anthony Charles Lynton Blair  (born 6 May 1953) is a British politician who served as Prime Minister of the United Kingdom from 1997 to 2007 and Leader of the Labour Party from 1994 to 2007. He served as Leader of the Opposition from 1994 to 1997 and had various shadow cabinet posts from 1987 to 1994. Blair was Member of Parliament (MP) for Sedgefield from 1983 to 2007. He 


In [113]:
q_emb,jh_emb,tb_emb = emb_model.encode([ques,jh,tb], convert_to_tensor=True)

In [114]:
tb_emb.shape

torch.Size([384])

In [115]:
import torch.nn.functional as F

In [116]:
F.cosine_similarity(q_emb, jh_emb, dim=0)

tensor(0.7991, device='cuda:0')

In [117]:
F.cosine_similarity(q_emb, tb_emb, dim=0)

tensor(0.5382, device='cuda:0')

### Private GPTs

- [Sooo many](https://github.com/h2oai/h2ogpt/blob/main/docs/README_LangChain.md#what-is-h2ogpts-langchain-integration-like)

## Fine tuning

In [118]:
import datasets

[knowrohit07/know_sql](https://huggingface.co/datasets/knowrohit07/know_sql)

In [119]:
ds = datasets.load_dataset('knowrohit07/know_sql', revision='f33425d13f9e8aab1b46fa945326e9356d6d5726')

Downloading readme:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/21.7M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [120]:
ds

DatasetDict({
    train: Dataset({
        features: ['context', 'question', 'answer'],
        num_rows: 78562
    })
})

In [121]:
trn = ds['train']
trn[3]

{'context': 'CREATE TABLE farm_competition (Hosts VARCHAR, Theme VARCHAR)',
 'question': 'What are the hosts of competitions whose theme is not "Aliens"?',
 'answer': "SELECT Hosts FROM farm_competition WHERE Theme <> 'Aliens'"}

`accelerate launch -m axolotl.cli.train sql.yml`

In [122]:
tst = dict(**trn[3])
tst['question'] = 'Get the count of competition hosts by theme.'
tst

{'context': 'CREATE TABLE farm_competition (Hosts VARCHAR, Theme VARCHAR)',
 'question': 'Get the count of competition hosts by theme.',
 'answer': "SELECT Hosts FROM farm_competition WHERE Theme <> 'Aliens'"}

In [123]:
fmt = """SYSTEM: Use the following contextual information to concisely answer the question.

USER: {}
===
{}
ASSISTANT:"""

In [124]:
def sql_prompt(d): return fmt.format(d["context"], d["question"])

In [125]:
print(sql_prompt(tst))

SYSTEM: Use the following contextual information to concisely answer the question.

USER: CREATE TABLE farm_competition (Hosts VARCHAR, Theme VARCHAR)
===
Get the count of competition hosts by theme.
ASSISTANT:


In [126]:
import torch
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

In [127]:
ax_model = '/home/jhoward/git/ext/axolotl/qlora-out'

In [128]:
tokr = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')

In [129]:
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf',
                                             torch_dtype=torch.bfloat16, device_map=0)
model = PeftModel.from_pretrained(model, ax_model)
model = model.merge_and_unload()
model.save_pretrained('sql-model')

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

ValueError: ignored

In [None]:
toks = tokr(sql_prompt(tst), return_tensors="pt")

In [None]:
res = model.generate(**toks.to("cuda"), max_new_tokens=250).to('cpu')

In [None]:
print(tokr.batch_decode(res)[0])

## [llama.cpp](https://github.com/abetlen/llama-cpp-python)

[TheBloke/Llama-2-7b-Chat-GGUF](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF)

In [None]:
!pip install llama_cpp_python -qq
from llama_cpp import Llama

In [None]:
llm = Llama(model_path="content/llamacpp/llama-2-7b-chat.Q4_K_M.gguf")

In [None]:
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)

In [None]:
print(output['choices'])