## models list

```
import openai
model_list = openai.Model.list()

[model['id'] for model in openai.Model.list()['data']]
```

## encoding & `tiktoken`

- https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb
- prevent EOS: `'<|endoftext|>'`
    - "text-davinci-003"：`logit_bias={"50256": -100}` 
        - `Completion` API 
        - `tiktoken.get_encoding('gpt2')`
    - "gpt-4"：`logit_bias={"100257": -100}`
        - `ChatCompletion`
        - `tiktoken.get_encoding('cl100k_base')`

In [1]:
import tiktoken

In [17]:
tiktoken.list_encoding_names()

['gpt2', 'r50k_base', 'p50k_base', 'p50k_edit', 'cl100k_base']

In [24]:
for m in ['text-davinci-002', 'text-davinci-003', 'gpt-3.5-turbo']:
    print(tiktoken.encoding_for_model(m))

<Encoding 'p50k_base'>
<Encoding 'p50k_base'>
<Encoding 'cl100k_base'>


In [18]:
tokenizer = tiktoken.get_encoding('cl100k_base')

In [8]:
tokenizer.n_vocab

100277

In [9]:
tokenizer.special_tokens_set

{'<|endofprompt|>',
 '<|endoftext|>',
 '<|fim_middle|>',
 '<|fim_prefix|>',
 '<|fim_suffix|>'}

In [27]:
# EOT: end of text
tokenizer.eot_token

100257

In [13]:
tokenizer.encode('<|endoftext|>', allowed_special={'<|endoftext|>'})

[100257]

In [14]:
tokenizer.encode('<|endoftext|>', allowed_special=tokenizer.special_tokens_set)

[100257]

In [19]:
tiktoken.get_encoding('gpt2').encode('<|endoftext|>', allowed_special=tokenizer.special_tokens_set)

[50256]