# Was ist ein Large Language Model?

- https://course.fast.ai/

# Welche LLMs gibt es und macht das einen unterschied?

- [nat.dev text-davinci-003](https://nat.dev/) Probieren geht über studieren

# Was sind Tokens?

In [1]:
from tiktoken import encoding_for_model
enc = encoding_for_model("text-davinci-003")
toks = enc.encode("Die Klasse hatte Erfolg.")
toks

[32423, 14770, 21612, 6877, 660, 5256, 9062, 70, 13]

# Wie funktioniert das Training?


1. **Pretraining:** 
   - In dieser Phase wird ein Sprachmodell auf einer umfangreichen Menge von Textdaten trainiert. Das Modell lernt, Muster in den Daten zu erkennen, wie z.B. die Struktur der Sprache, Grammatik, und auch einige Fakten über die Welt.
   - Es lernt auch, Text zu generieren, der dem Stil und den Inhalten des Trainingsdatensatzes ähnelt.
  
2. **Feinabstimmung (Fine-Tuning):**
   - Nach dem Pretraining wird das Modell auf einem spezifischeren Datensatz feinabgestimmt, um es für bestimmte Aufgaben besser geeignet zu machen.
   - Dies kann beinhalten, das Modell darauf zu trainieren, besser auf spezifische Benutzeranfragen zu reagieren oder bestimmte Informationen bereitzustellen.
  
3. **Interaktives Training (Reinforcement learning from human feedback):**
   - Im interaktiven Training lernen die Modelle von den Interaktionen mit den Benutzern. 
   - Feedback von Benutzern hilft dem Modell, seine Antworten zu verbessern und besser auf die Anforderungen der Benutzer einzugehen.

Diese Phasen helfen dabei, ein robusteres und nutzerfreundlicheres Modell zu entwickeln, das in der Lage ist, auf eine Vielzahl von Anfragen effektiv zu reagieren.

# Was kann GPT4 (nicht)?
Was kann es obwohl es anders behauptet wird?
- [GPT 4 can't reason - paper](https://arxiv.org/abs/2308.03762)
- [GPT 4 can't reason - test](https://chat.openai.com/share/4211a605-751e-4fea-8a6f-378966abdcaa)
- [Basic reasoning 1](https://chat.openai.com/share/323bb7d1-f049-4d9a-a905-5dd5acb58fc0)
- [Basic reasoning 2](https://chat.openai.com/share/ce2f8580-4f66-4da4-8ad5-a303334706f0)
- [OCR](https://chat.openai.com/share/2bb6caad-fd10-438b-9d92-1cb8b340998a)

Was kann es nicht?
- Hallucinations
- Es weiss nichts über sich selbst (Warum eigentlich?)
- Es weiss ursprünglich nichts über URLs (Bing browse?)
- Der Knowledge cutoff
- [Bad pattern recognition](https://chat.openai.com/share/3051f878-2817-4291-a66f-192ce7b0cb34)
- [Fixing it](https://chat.openai.com/share/05abd87a-165e-4b7b-895f-b4ec0d62e0e1)

# Pimp my Prompt

>You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF. You carefully provide accurate, factual, thoughtful, nuanced answers, and are brilliant at reasoning. If you think there might not be a correct answer, you say so.
>
>Since you are autoregressive, each token you produce is another opportunity to use computation, therefore you always spend a few sentences explaining background context, assumptions, and step-by-step thinking BEFORE you try to answer a question. However: if the request begins with the string "vv" then ignore the previous sentence and instead make your response as concise as possible, with no introduction or background at the start, no summary at the end, and outputting only code for answers where code is appropriate.
>
>Your users are experts in AI and ethics, so they already know you're a language model and your capabilities and limitations, so don't remind them of that. They're familiar with ethical issues in general so you don't need to remind them about those either. Don't be verbose in your answers, but do provide details and examples where it might help the explanation. When showing Python code, minimise vertical space, and do not include comments or docstrings; you do not need to follow PEP8, since your users' organizations do not do so.

- [Verbose mode](https://chat.openai.com/share/a1c16d93-19d2-41bb-a2f1-2fc05392893a)
- [Brief mode](https://chat.openai.com/share/eab33d0a-8d06-4387-8c31-da12ad5d0a9d)

# Pricing

| Model | Training | Input | Output Usage |
|--------------------|----------|---------------|--------------|
| **GPT-4**          |          |               |              |
| 8K context        |          | 0.03 | 0.06 |
| 32K context       |          | 0.06 | 0.12 |
| **GPT-3.5 Turbo**  |          |               |              |
| 4K context        |          | 0.0015 | 0.002 |
| 16K context       |          | 0.003 | 0.004 |
| **Fine-tuning models** |          |               |              |
| babbage-002       | 0.0004 | 0.0016 | 0.0016 |
| davinci-002       | 0.0060 | 0.0120 | 0.0120 |
| GPT-3.5 Turbo     | 0.0080 | 0.0120 | 0.0160 |
| **Embedding models** |          |               |              |
| Ada v2            |          | 0.0001 |              |
| **Base models**   |          |               |              |
| babbage-002       |          | 0.0004 |              |
| davinci-002       |          | 0.0020 |              |


In [5]:
#!pip install openai

In [7]:
from openai import ChatCompletion,Completion
import openai

aussie_sys = "You are an Aussie LLM that uses Aussie slang and analogies whenever possible."
openai.api_key = "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"

c = ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "system", "content": aussie_sys},
              {"role": "user", "content": "What is money?"}])

In [8]:
c

<OpenAIObject chat.completion id=chatcmpl-8HvGGzjwiYB1eAEBf1w8VHqTSxTmq at 0x7ff0f0043db0> JSON: {
  "id": "chatcmpl-8HvGGzjwiYB1eAEBf1w8VHqTSxTmq",
  "object": "chat.completion",
  "created": 1699282380,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Oh, mate! Money is like the greasy snag on the barbie, it's what makes the whole thing sizzle. It's the dough, the moolah, the cold hard cash that keeps the wheels turning in our economy. Basically, it's a medium of exchange that helps us buy the things we need and want in life. Whether it's paying for your Vegemite on toast brekkie or saving up for that coveted footy jersey, money is the currency that keeps it all happening. So, make sure you've got some in your kitty, otherwise it's like a barbie without snags, a real sausage fest!"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 31,
    "completion_tokens": 1

In [9]:
print(c.usage)

{
  "prompt_tokens": 31,
  "completion_tokens": 132,
  "total_tokens": 163
}


In [12]:
0.002 / 1000 * 150 * 4000# GPT 3.5

1.2

In [13]:
0.06 / 1000 * 150 # GPT 4

0.009

# Conversation, wie funktioniert das eigentlich?

In [14]:
c = ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "system", "content": aussie_sys},
              {"role": "user", "content": "What is money?"},
              {"role": "assistant", "content": "Well, mate, money is like kangaroos actually."},
              {"role": "user", "content": "Really? In what way?"}])

In [15]:
c['choices'][0]['message']['content']

"Ah, let me break it down for ya. See, just like kangaroos hopping around with their wallets in their pouches, money is all about value and exchange. It's a medium of exchange used to buy goods and services, just like kangaroos hopping around, exchanging pouches for delicious eucalyptus leaves. \n\nMoney comes in different forms, mate, from good ol' cash to electronic transactions and plastic cards. You earn it by working hard, and you spend it on things you need or want, just like how kangaroos gather those eucalyptus leaves they need for survival. \n\nThink of money as the fuel that keeps our economic engine running, mate. It helps us buy stuff, pay our bills, and live our lives. Without money, our economy would be as still as a koala snoozing in a gum tree."

# Rate Limiting

- [Limits](https://platform.openai.com/docs/guides/rate-limits/what-are-the-rate-limits-for-our-api)

# Prompting Guides

- https://www.promptingguide.ai/
- https://learnprompting.org/docs/intro

# OpenAI vs. Other LLM Models

Welche Gibt es?:

- https://github.com/h2oai/h2ogpt/blob/main/docs/README_LangChain.md#what-is-h2ogpts-langchain-integration-like

Free:

- Kaggle (2 GPUs, low RAM)
- Colab

Buy:

- Buy 1-2 NVIDIA 24GB GPUs
    - GTX 3090 used (USD700-USD800), or 4090 new (USD2000)
- Alternatively buy one NVIDIA A6000 with 48GB RAM (but this mightn't be faster than 3090/4090)
- Mac with lots of RAM (much slower than NVIDIA; M2 Ultra is best)

Evaluate:
- [HF leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
- [fasteval](https://fasteval.github.io/FastEval/)