## counting tokens with tiktoken lib
### encoding used by OpenAI models
- cl100k_base | gpt-4, gpt-3.5-turbo, text-embedding-ada-002
- p50k_base | Codex models, text-davinci-002 and 003
- r50k_base (or gpt2) | GPT-3 models like davinci

In [1]:
import tiktoken

In [20]:
#here you have to give encoding depending on which model you want to use
#enc = tiktoken.get_encoding("p50k_base") 
#alternativly use this fun and pass model name
enc = tiktoken.encoding_for_model("gpt-3.5-turbo") 

In [10]:
#example 1
#returns list of token id
enc.encode("Hello world!") 

[9906, 1917, 0]

In [11]:
#returns number of used tokens
len(enc.encode("Hello world!"))

3

In [7]:
#example 2
wiki_art = """The potato is a starchy food, a tuber of the plant Solanum tuberosum and is a root vegetable native to the Americas. The plant is a perennial in the nightshade family Solanaceae.[2]

Wild potato species can be found from the southern United States to southern Chile.[3] The potato was originally believed to have been domesticated by Native Americans independently in multiple locations,[4] but later genetic studies traced a single origin, in the area of present-day southern Peru and extreme northwestern Bolivia. Potatoes were domesticated there approximately 7,000–10,000 years ago, from a species in the Solanum brevicaule complex.[5][6][7] In the Andes region of South America, where the species is indigenous, some close relatives of the potato are cultivated.

Potatoes were introduced to Europe from the Americas by the Spanish in the second half of the 16th century. Today they are a staple food in many parts of the world and an integral part of much of the world's food supply. As of 2014, potatoes were the world's fourth-largest food crop after maize (corn), wheat, and rice.[8] Following millennia of selective breeding, there are now over 5,000 different types of potatoes.[6] Over 99% of potatoes presently cultivated worldwide descend from varieties that originated in the lowlands of south-central Chile.[9] The importance of the potato as a food source and culinary ingredient varies by region and is still changing. It remains an essential crop in Europe, especially Northern and Eastern Europe, where per capita production is still the highest in the world, while the most rapid expansion in production during the 21st century was in southern and eastern Asia, with China and India leading the world production of 376 million tonnes, as of 2021.

Like the tomato, the potato is a nightshade in the genus Solanum, and the vegetative and fruiting parts of the potato contain the toxin solanine which is dangerous for human consumption. Normal potato tubers that have been grown and stored properly produce glycoalkaloids in amounts small enough to be negligible for human health, but, if green sections of the plant (namely sprouts and skins) are exposed to light, the tuber can accumulate a high enough concentration of glycoalkaloids to affect human health."""

In [21]:
len(enc.encode(wiki_art))

485

In [22]:
#function making counting simpler

def num_tokens(string: str, model: str)->int:
    enc = tiktoken.encoding_for_model(model) 
    num_tokens = len(enc.encode(string))
    return num_tokens

In [23]:
num_tokens(wiki_art, "gpt-3.5-turbo")

485

#### price dictionary with models and price per token
two approaches, trying chat gpt and "standard"

In [24]:
import openai
from dotenv import dotenv_values

config = dotenv_values(".env")
openai.api_key = config["API"]

In [32]:
prices = openai.ChatCompletion.create(model = 'gpt-3.5-turbo'
                      , messages = [{"role": "system", "content": "you are python script generator assistant. Your job is to give python dictionary with data"}
                                    , {"role":"user", "content":"please give me python dictionary with model name and price per one token for every model you have avaliable in OpenAI. So in particular i want gpt-3.5-turbo and gpt-4 and text-davinci-003"}])


In [None]:
print(prices["choices"][0]["message"]["content"])

In [34]:
price_dict = {"gpt-3.5-turbo": 0.000002, "text-davinci-003": 0.00002, "gpt-4": 0.00006}

In [42]:
def price(string: str, model: str)->int:
    "price in dolars per prompt message"
    enc = tiktoken.encoding_for_model(model) 
    price = len(enc.encode(string))*price_dict[model]
    return price

In [43]:
price(wiki_art, "gpt-3.5-turbo")

0.0009699999999999999

In [44]:
price(wiki_art, "gpt-4")

0.0291