<a href="https://colab.research.google.com/github/Santiago-R/aupa.ai/blob/main/lm-hackers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# [A hacker's guide to Language Models](https://colab.research.google.com/github/fastai/lm-hackers/blob/main/lm-hackers.ipynb#scrollTo=0b017bfc-5be0-4e41-9fa1-9f685c3b0de5)


## Setup

In [117]:
from google.colab import drive
drive.mount('/content/drive')  # , force_remount=True)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [118]:
# import tokenize
# from io import BytesIO

In [119]:
# from pathlib import Path
# path = Path("/content/drive/MyDrive/LLM")

## The OpenAI API

In [120]:
# Load OpenAI api key as environvent variable (from Drive's api_keys.env)
!pip install python-dotenv -qq
from dotenv import load_dotenv
load_dotenv(dotenv_path='/content/drive/MyDrive/LLM/api_keys.env')

True

In [121]:
!pip install openai -qq
from openai import ChatCompletion, Completion

In [122]:
aussie_sys = "You are an Aussie LLM that uses Aussie slang and analogies whenever possible."
question = "What is money?"

c = ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "system", "content": aussie_sys},
              {"role": "user", "content": question}])

- [Model options](https://platform.openai.com/docs/models)

In [123]:
def response(c):
    try:
        return c['choices'][0]['message']['content']
    except KeyError:
        return c['choices'][0]['text']

In [124]:
response(c)

"Righto mate, money is what makes the world go round, like a kangaroo hopping straight ahead. It's a medium of exchange, a tool used to buy and sell goods and services. Think of it as the fuel for your barbie, the cash you need to sling for snaggers and bevvies. Money can come in different forms, be it coins or notes, and nowadays, it can even be digital, like a dingo on a computer screen. Basically, money is the currency that keeps the economy dinkum, helping us to trade and get the things we need to survive and enjoy life."

In [125]:
print(c.usage)

{
  "prompt_tokens": 31,
  "completion_tokens": 125,
  "total_tokens": 156
}


In [126]:
0.002 / 1000 * 150  # GPT 3.5

0.0003

In [127]:
0.03 / 1000 * 150  # GPT 4

0.0045

In [128]:
c = ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "system", "content": aussie_sys},
              {"role": "user", "content": "What is money?"},
              {"role": "assistant", "content": "Well, mate, money is like kangaroos actually."},
              {"role": "user", "content": "Really? In what way?"}])

In [129]:
response(c)

"Ah, I'm glad you asked, cobber! Money is like kangaroos because they both have value and jump around from one hand to another. Just as kangaroos can hop from place to place, money can be used for various things and can change hands pretty darn quick. And just like how people chase after kangaroos to get a glimpse, folks often chase after money to meet their needs and wants. So you see, money and kangaroos may be different critters, but they sure have a lot in common!"

In [130]:
def askgpt(user, system=None, model="gpt-3.5-turbo", **kwargs):
    msgs = []
    if system: msgs.append({"role": "system", "content": system})
    msgs.append({"role": "user", "content": user})
    return ChatCompletion.create(model=model, messages=msgs, **kwargs)

In [131]:
response(askgpt('What is the meaning of life?', system=aussie_sys))

"Well, mate, that's a real ripper of a question you've got there! The meaning of life is like chasing the perfect wave—it's different for everyone. Some reckon it's all about finding true love and building a happy family, while others believe it's about seeking knowledge and making a difference in the world. The key is to find what gives your earthly existence some fair dinkum purpose and chase it like a kangaroo on steroids! Just remember, it's all about living the life you love, and loving the life you live. Cheers, mate!"

- [Limits](https://platform.openai.com/docs/guides/rate-limits/what-are-the-rate-limits-for-our-api)

Created by Bing:

In [132]:
def call_api(prompt, model="gpt-3.5-turbo"):
    msgs = [{"role": "user", "content": prompt}]
    try: return ChatCompletion.create(model=model, messages=msgs)
    except openai.error.RateLimitError as e:
        retry_after = int(e.headers.get("retry-after", 60))
        print(f"Rate limit exceeded, waiting for {retry_after} seconds...")
        time.sleep(retry_after)
        return call_api(params, model=model)

In [133]:
response(call_api("What's the world's funniest joke? Has there ever been any scientific analysis?"))

'The world\'s funniest joke is subjective and can vary from person to person, as humor is influenced by personal preferences and cultural differences. However, there have been scientific studies and attempts to find jokes that appeal to a wide range of people. \n\nOne notable study conducted in 2002 by psychologist Richard Wiseman involved over 40,000 participants from 70 different countries. He created an online experiment called "LaughLab" where people could rate and submit jokes. After analyzing the data, one of the jokes that ranked highly was:\n\n"Why don\'t some couples go to the gym? Because some relationships don\'t work out!"\n\nWhile this joke was popular in the study, it might not be universally considered the funniest to everyone. Humor is subjective and can vary greatly, so what may be hilarious to some can be less amusing to others. Ultimately, the perception of what is funny depends on individual tastes and preferences.'

In [134]:
c = Completion.create(prompt="Australian Jeremy Howard is ",
                      model="gpt-3.5-turbo-instruct", echo=True, logprobs=5)

In [135]:
response(c)

'Australian Jeremy Howard is the former head of Kaggle, and the youngest ever data scientist as a university'

## Create our own code interpreter

In [None]:
from pydantic import create_model
import inspect, json
from inspect import Parameter

In [None]:
def sums(a:int, b:int=1):
    "Adds a + b"
    return a + b

In [None]:
def schema(f):
    kw = {n:(o.annotation, ... if o.default==Parameter.empty else o.default)
          for n,o in inspect.signature(f).parameters.items()}
    s = create_model(f'Input for `{f.__name__}`', **kw).schema()
    return dict(name=f.__name__, description=f.__doc__, parameters=s)

In [142]:
schema(sums)

{'name': 'sums',
 'description': 'Adds a + b',
 'parameters': {'title': 'Input for `sums`',
  'type': 'object',
  'properties': {'a': {'title': 'A', 'type': 'integer'},
   'b': {'title': 'B', 'default': 1, 'type': 'integer'}},
  'required': ['a']}}

In [95]:
c = askgpt("Use the `sum` function to solve this: What is 6+3?",
           system = "You must use the `sum` function instead of adding yourself.",
           functions=[schema(sums)])

In [96]:
c

<OpenAIObject chat.completion id=chatcmpl-84Sg5UyGVQchIOmn3fjpOuKTWxhW8 at 0x7f4702aebf60> JSON: {
  "id": "chatcmpl-84Sg5UyGVQchIOmn3fjpOuKTWxhW8",
  "object": "chat.completion",
  "created": 1696074241,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "sums",
          "arguments": "{\n  \"a\": 6,\n  \"b\": 3\n}"
        }
      },
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 83,
    "completion_tokens": 22,
    "total_tokens": 105
  }
}

In [101]:
m = c.choices[0].message
m

<OpenAIObject at 0x7f47029b8270> JSON: {
  "role": "assistant",
  "content": null,
  "function_call": {
    "name": "sums",
    "arguments": "{\n  \"a\": 6,\n  \"b\": 3\n}"
  }
}

In [102]:
k = m.function_call.arguments
print(k)

{
  "a": 6,
  "b": 3
}


In [103]:
funcs_ok = {'sums', 'python'}

In [104]:
def call_func(c):
    fc = c.choices[0].message.function_call
    if fc.name not in funcs_ok: return print(f'Not allowed: {fc.name}')
    f = globals()[fc.name]
    return f(**json.loads(fc.arguments))

In [105]:
call_func(c)

9

In [106]:
import ast

def run(code):
    tree = ast.parse(code)
    last_node = tree.body[-1] if tree.body else None

    # If the last node is an expression, modify the AST to capture the result
    if isinstance(last_node, ast.Expr):
        tgts = [ast.Name(id='_result', ctx=ast.Store())]
        assign = ast.Assign(targets=tgts, value=last_node.value)
        tree.body[-1] = ast.fix_missing_locations(assign)

    ns = {}
    exec(compile(tree, filename='<ast>', mode='exec'), ns)
    return ns.get('_result', None)

In [107]:
run("""
a=1
b=2
a+b
""")

3

In [108]:
def python(code:str):
    "Return result of executing `code` using python. If execution not permitted, returns `#FAIL#`"
    go = input(f'Proceed with execution?\n```\n{code}\n```\n')
    if go.lower()!='y': return '#FAIL#'
    return run(code)

In [159]:
c = askgpt("What is 12 factorial?",
           system = "Use python for any required computations.",
           functions=[schema(python)])

In [110]:
call_func(c)

Proceed with execution?
```
import math
math.factorial(12)
```
y


479001600

In [111]:
c = ChatCompletion.create(
    model="gpt-3.5-turbo",
    functions=[schema(python)],
    messages=[{"role": "user", "content": "What is 12 factorial?"},
              {"role": "function", "name": "python", "content": "479001600"}])

In [112]:
response(c)

'12 factorial, denoted as 12!, is equal to 479,001,600.'

In [194]:
exec_schema = {'name': 'exec',
 'description': 'Execute the given source Python code',
 'parameters': {'title': 'Input for `exec`',
  'type': 'object',
  'properties': {'source': {'title': 'S', 'type': 'string'}},
  'required': ['source']}}

In [198]:
c = askgpt("What is 12 factorial?",
           system = "Use python for any required computations.",
           functions=[exec_schema])

In [199]:
def code_response(c, repl=True):
    code = json.loads(c['choices'][0]['message']['function_call']['arguments'])['source']
    if repl:
        code_body = '\n'.join(code.split('\n')[:-1])
        code_footer = code.split('\n')[-1]
        exec(code_body)
        print(eval(code_footer))
    else:
        exec(code)
        return

In [200]:
code_response(c)

479001600


In [None]:
fc = c.choices[0].message.function_call
f = globals()[fc.name]
return f(**json.loads(fc.arguments))

In [113]:
c = askgpt("What is the capital of France?",
           system = "Use python for any required computations.",
           functions=[schema(python)])

In [114]:
response(c)

'The capital of France is Paris.'

## PyTorch and Huggingface

In [None]:
from wikipediaapi import Wikipedia

In [None]:
wiki = Wikipedia('JeremyHowardBot/0.0', 'en')
jh_page = wiki.page('Jeremy_Howard_(entrepreneur)').text
jh_page = jh_page.split('\nReferences\n')[0]

In [None]:
print(jh_page[:500])

Jeremy Howard (born 13 November 1973) is an Australian data scientist, entrepreneur, and educator.He is the co-founder of fast.ai, where he teaches introductory courses, develops software, and conducts research in the area of deep learning.
Previously he founded and led Fastmail, Optimal Decisions Group, and Enlitic. He was President and Chief Scientist of Kaggle.
Early in the COVID-19 epidemic he was a leading advocate for masking.

Early life
Howard was born in London, United Kingdom, and move


In [None]:
len(jh_page.split())

613

In [None]:
ques_ctx = f"""Answer the question with the help of the provided context.

## Context

{jh_page}

## Question

{ques}"""

In [None]:
res = gen(mk_prompt(ques_ctx), 300)

In [None]:
print(res[0].split('### Assistant:\n')[1])

 Jeremy Howard is an Australian data scientist, entrepreneur, and educator known for his work in deep learning. He is the co-founder of fast.ai, where he teaches courses, develops software, and conducts research in the field. Before co-founding fast.ai, he was the President and Chief Scientist of Kaggle, the CEO of Fastmail and Optimal Decisions Group, and has a background in management consulting.</s>


In [None]:
from sentence_transformers import SentenceTransformer

In [None]:
emb_model = SentenceTransformer("BAAI/bge-small-en-v1.5", device=0)

In [None]:
jh = jh_page.split('\n\n')[0]
print(jh)

Jeremy Howard (born 13 November 1973) is an Australian data scientist, entrepreneur, and educator.He is the co-founder of fast.ai, where he teaches introductory courses, develops software, and conducts research in the area of deep learning.
Previously he founded and led Fastmail, Optimal Decisions Group, and Enlitic. He was President and Chief Scientist of Kaggle.
Early in the COVID-19 epidemic he was a leading advocate for masking.


In [None]:
tb_page = wiki.page('Tony_Blair').text.split('\nReferences\n')[0]

In [None]:
tb = tb_page.split('\n\n')[0]
print(tb[:380])

Sir Anthony Charles Lynton Blair  (born 6 May 1953) is a British politician who served as Prime Minister of the United Kingdom from 1997 to 2007 and Leader of the Labour Party from 1994 to 2007. He served as Leader of the Opposition from 1994 to 1997 and had various shadow cabinet posts from 1987 to 1994. Blair was Member of Parliament (MP) for Sedgefield from 1983 to 2007. He 


In [None]:
q_emb,jh_emb,tb_emb = emb_model.encode([ques,jh,tb], convert_to_tensor=True)

In [None]:
tb_emb.shape

torch.Size([384])

In [None]:
import torch.nn.functional as F

In [None]:
F.cosine_similarity(q_emb, jh_emb, dim=0)

tensor(0.7991, device='cuda:0')

In [None]:
F.cosine_similarity(q_emb, tb_emb, dim=0)

tensor(0.5315, device='cuda:0')

### Private GPTs

- [Sooo many](https://github.com/h2oai/h2ogpt/blob/main/docs/README_LangChain.md#what-is-h2ogpts-langchain-integration-like)

## Fine tuning

In [None]:
import datasets

[knowrohit07/know_sql](https://huggingface.co/datasets/knowrohit07/know_sql)

In [None]:
ds = datasets.load_dataset('knowrohit07/know_sql', revision='f33425d13f9e8aab1b46fa945326e9356d6d5726')

In [None]:
ds

DatasetDict({
    train: Dataset({
        features: ['context', 'answer', 'question'],
        num_rows: 78562
    })
})

In [None]:
trn = ds['train']
trn[3]

{'context': 'CREATE TABLE farm_competition (Hosts VARCHAR, Theme VARCHAR)',
 'answer': "SELECT Hosts FROM farm_competition WHERE Theme <> 'Aliens'",
 'question': 'What are the hosts of competitions whose theme is not "Aliens"?'}

`accelerate launch -m axolotl.cli.train sql.yml`

In [None]:
tst = dict(**trn[3])
tst['question'] = 'Get the count of competition hosts by theme.'
tst

{'context': 'CREATE TABLE farm_competition (Hosts VARCHAR, Theme VARCHAR)',
 'answer': "SELECT Hosts FROM farm_competition WHERE Theme <> 'Aliens'",
 'question': 'Get the count of competition hosts by theme.'}

In [None]:
fmt = """SYSTEM: Use the following contextual information to concisely answer the question.

USER: {}
===
{}
ASSISTANT:"""

In [None]:
def sql_prompt(d): return fmt.format(d["context"], d["question"])

In [None]:
print(sql_prompt(tst))

SYSTEM: Use the following contextual information to concisely answer the question.

USER: CREATE TABLE farm_competition (Hosts VARCHAR, Theme VARCHAR)
===
List all competition hosts sorted in ascending order.
ASSISTANT:


In [None]:
import torch
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

In [None]:
ax_model = '/home/jhoward/git/ext/axolotl/qlora-out'

In [None]:
tokr = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')

In [None]:
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf',
                                             torch_dtype=torch.bfloat16, device_map=0)
model = PeftModel.from_pretrained(model, ax_model)
model = model.merge_and_unload()
model.save_pretrained('sql-model')

In [None]:
toks = tokr(sql_prompt(tst), return_tensors="pt")

In [None]:
res = model.generate(**toks.to("cuda"), max_new_tokens=250).to('cpu')

In [None]:
print(tokr.batch_decode(res)[0])

<s> SYSTEM: Use the following contextual information to concisely answer the question.

USER: CREATE TABLE farm_competition (Hosts VARCHAR, Theme VARCHAR)
===
Get the count of competition hosts by theme.
ASSISTANT: SELECT COUNT(Hosts), Theme FROM farm_competition GROUP BY Theme</s>


## [llama.cpp](https://github.com/abetlen/llama-cpp-python)

[TheBloke/Llama-2-7b-Chat-GGUF](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF)

In [None]:
!pip install llama_cpp_python -qq
from llama_cpp import Llama

In [None]:
llm = Llama(model_path="content/llamacpp/llama-2-7b-chat.Q4_K_M.gguf")

ValueError: ignored

In [None]:
output = llm("Q: Name the planets in the solar system? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)


llama_print_timings:        load time =   192.25 ms
llama_print_timings:      sample time =    14.98 ms /    32 runs   (    0.47 ms per token,  2135.75 tokens per second)
llama_print_timings: prompt eval time =   192.16 ms /    15 tokens (   12.81 ms per token,    78.06 tokens per second)
llama_print_timings:        eval time =   767.74 ms /    31 runs   (   24.77 ms per token,    40.38 tokens per second)
llama_print_timings:       total time =  1032.79 ms


In [None]:
print(output['choices'])

[{'text': 'Q: Name the planets in the solar system? A: 1. Pluto (no longer considered a planet) 2. Mercury 3. Venus 4. Earth 5. Mars 6.', 'index': 0, 'logprobs': None, 'finish_reason': 'length'}]
