Create and activate a virtual environment (Optional)
- `python -m venv openai-env`
- `source openai-env/bin/activate` (Mac)
- `openai-env\Scripts\activate` (Windows)

Once the virtial environment is set up, install the OpenAI Python library:
- `pip install --upgrade openai`

In [15]:
from openai import OpenAI
import numpy as np
import pandas as pd
import tiktoken

This Python code is setting up three string constants:

- OPENAI_API_KEY: This is the API key for OpenAI. It's used to authenticate your application when making requests to the OpenAI API.

- EMBEDDING_MODEL: This is the name of the model used for text embedding. Text embedding is a way to convert text into a form that can be processed by machine learning algorithms. In this case, the model is "text-embedding-ada-002".

- LLM: This is the name of the language model used by OpenAI. We recommend using `gpt-3.5-turbo-0125` due to superior latency. `gpt-4-0125-preview`is, however, the most capable model with a 128k token context window and have been trained on data up to December 2023. 

In [14]:
OPENAI_API_KEY = "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"
EMBEDDING_MODEL = "text-embedding-ada-002"
LLM = "gpt-3.5-turbo-0125" #'gpt-4-0125-preview' better with large inputs and more complex tasks.

In [3]:
client = OpenAI(
    api_key=OPENAI_API_KEY
)

These are the available models you can choose to experiment with:

In [13]:
models = [model.id for model in client.models.list().data]
models

['gpt-3.5-turbo-16k',
 'gpt-3.5-turbo-1106',
 'dall-e-3',
 'gpt-3.5-turbo-16k-0613',
 'dall-e-2',
 'text-embedding-3-large',
 'whisper-1',
 'tts-1-hd-1106',
 'tts-1-hd',
 'gpt-3.5-turbo',
 'gpt-3.5-turbo-0125',
 'gpt-4-0613',
 'gpt-3.5-turbo-0301',
 'gpt-3.5-turbo-0613',
 'gpt-3.5-turbo-instruct-0914',
 'gpt-4',
 'tts-1',
 'davinci-002',
 'gpt-3.5-turbo-instruct',
 'babbage-002',
 'gpt-4-1106-preview',
 'gpt-4-vision-preview',
 'tts-1-1106',
 'gpt-4-0125-preview',
 'gpt-4-turbo-preview',
 'text-embedding-ada-002',
 'text-embedding-3-small',
 'davinci:ft-anderssl-2022-06-29-11-12-44',
 'davinci:ft-anderssl-2022-06-29-14-39-18']

This Python code uses the OpenAI API to create a chat completion. It's using the GPT-4 model to generate responses to a user's question about Telenor.

Here's a breakdown of the code:

- `client.chat.completions.create`: This is a method from the OpenAI API that creates a chat completion. A chat completion is a conversation with the model where you provide a series of messages and the model returns a generated message as a response.

- `model="gpt-4-0125-preview"`: This specifies the model to use for the chat completion. In this case, it's using the GPT-4 model.

- `messages`: This is a list of messages to send to the model. Each message is a dictionary with two keys: 'role' and 'content'. 'role' can be 'system', 'user', or 'assistant', and 'content' is the text of the message. The 'system' role is used to set the behavior of the 'assistant', and the 'user' role is used to ask the assistant a question.

In this example, the system message sets the assistant's role as a helpful assistant that can answer questions about Telenor. The user message then asks the question "What does Telenor do?".

In [12]:
chat_completion = client.chat.completions.create(
    model=LLM,
    messages=[
        {"role": "system", "content": "Du er en hjelpsom assistent som kan svare på spørsmål om Telenor."},
        {"role": "user", "content": "Hva driver Telenor med?"}
        ]
)
chat_completion.choices[0].message.content

'Telenor er et telekommunikasjonsselskap som tilbyr tjenester innen mobiltelefoni, bredbånd og TV. Selskapet driver også med nettverksinfrastruktur, satellittkommunikasjon og digitale tjenester. Telenor har virksomhet i flere land, hovedsakelig i Skandinavia og Asia.'

This Python code is using the OpenAI API to create an embedding for a given text input.

Here's a breakdown of the code:

- `client.embeddings.create()`: This is a method from the OpenAI API that creates an embedding. An embedding is a vector representation of the input text. It's a way of converting text into a form that can be processed by machine learning algorithms. It also enables you to perform vector search using e.g. cosine similarity. 

- `model="text-embedding-ada-002"`: This specifies the model to use for creating the embedding. In this case, it's using the "text-embedding-ada-002" model.

- `input="Verdifull informasjon om Telenor som du vil bruke inn i språkmodellen."`: This is the text input for which the embedding will be created.

- `encoding_format="float"`: This specifies the format of the encoding for the embedding. In this case, it's set to "float", which means the embedding will be a list of floating-point numbers.

In this example, the code is creating an embedding for the text "Verdifull informasjon om Telenor som du vil bruke inn i språkmodellen." using the "text-embedding-ada-002" model and a floating-point encoding format.

In [6]:
def get_embedding(text: str, model_name: str):
    embedding = client.embeddings.create(
        model=model_name,
        input=text,
        encoding_format="float"
    )
    return embedding.data[0].embedding

example_embedding = get_embedding("Verdifull informasjon om Telenor som du vil bruke inn i språkmodellen.", EMBEDDING_MODEL)


The function `cos_sim` is calculating the cosine similarity between two vectors `a` and `b`. If the vectors are identical, the cosine similarity is 1. If the vectors are orthogonal (i.e., not similar at all), the cosine similarity is 0.

In [8]:
def cos_sim(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

The API has a limit on the maximum number of input tokens for embeddings. The following function calculates number of tokens from a string:

In [16]:
def num_tokens_from_string(string: str, encoding_name: str = "cl100k_base") -> int:
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

For more advanced set-ups of applications powered by LLMs, check out these links:

- https://python.langchain.com/docs/get_started/introduction
- https://microsoft.github.io/autogen/
