# Local-LLM Examples

Simply choose your favorite model of choice from the models list and paste it into the `model` variable on the API calls. You can get a list of models below.

**Note, you do not need an OpenAI API Key, the API Key is your own API Key for the server if you defined one.**


In [5]:
import requests

models = requests.get("http://localhost:8091/v1/models")
print(models.json())

['bakllava-1-7b', 'llava-v1.5-7b', 'llava-v1.5-13b', 'alfred-40B-1023', 'Nanbeige-16B-Chat-32K', 'opus-v0.5-70B', 'Nous-Capybara-34B', 'dolphin-2_2-yi-34b', 'Euryale-1.4-L2-70B', 'alfred-40B-1023', 'firefly-llama2-7B-chat', 'neural-chat-7B-v3-1', 'firefly-llama2-13B-chat', 'dragon-yi-6B-v0', 'Marx-3B-v3', 'Akins-3B', 'platypus-yi-34b', 'OpenHermes-2.5-Mistral-7B-16k', 'TimeCrystal-L2-13B', 'merlyn-education-safety', 'merlyn-education-corpus-qa-v2', 'Tess-Medium-200K-v1.0', 'Deacon-34B', 'MoMo-70B-V1.1', 'Tai-70B', 'Thespis-Mistral-7B-v0.6', 'sqlcoder-7B', 'Tess-XL-v1.0', 'zephyr-7B-beta-pl', 'SynthIA-7B-v2.0-16k', 'ShiningValiantXS', 'DaringFortitude', 'Python-Code-33B', 'Python-Code-13B', 'opus-v0-70B', 'GodziLLa2-70B', 'Claire-7B-0.1', 'speechless-mistral-dolphin-orca-platypus-samantha-7B', 'Noromaid-13B-v0.1.1', 'Synatra-7B-v0.3-RP', 'Noromaid-13B-v0.1', 'Thespis-13B-v0.6', 'Augmental-ReMM-13B', 'Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v1', 'Yi-34B-200K-Llamafied', 'Yi-34B-200K', 

Install OpenAI


In [1]:
%pip install openai==0.28.1

Defaulting to user installation because normal site-packages is not writeable
[33mDEPRECATION: nb-black 1.0.7 has a non-standard dependency specifier black>='19.3'; python_version >= "3.6". pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of nb-black or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0mNote: you may need to restart the kernel to use updated packages.


## Chat Completion

[OpenAI API Reference](https://platform.openai.com/docs/api-reference/chat)


In [10]:
import openai

openai.api_base = "http://localhost:8091/v1"
openai.api_key = ""
prompt = "What is the capital of Ohio?"
messages = [{"role": "system", "content": prompt}]

response = openai.ChatCompletion.create(
    model="Mistral-7B-OpenOrca",
    messages=messages,
    temperature=1.31,
    max_tokens=8192,
    top_p=1.0,
    n=1,
    stream=False,
)
print(response)

{
  "id": "cmpl-dc858fdd-12c7-493a-bbf9-d65822e70ed1",
  "object": "text_completion",
  "created": 1700331361,
  "model": "Mistral-7B-OpenOrca",
  "usage": {
    "prompt_tokens": 62,
    "completion_tokens": 8,
    "total_tokens": 70
  },
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of Ohio?"
    },
    {
      "role": "assistant",
      "content": "The capital of Ohio is Columbus."
    }
  ]
}


## Embeddings

[OpenAI API Reference](https://platform.openai.com/docs/api-reference/embeddings)

The embeddings endpoint it currently uses is an ONNX embedder with 256 max tokens.


In [4]:
import openai

openai.api_base = "http://localhost:8091/v1"
openai.api_key = ""
prompt = "Columbus is the capital of Ohio."

response = openai.Embedding.create(
    input=prompt,
    engine="Mistral-7B-OpenOrca",
)

print(response)

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [
        4.666879653930664,
        -3.715498447418213,
        1.957158088684082,
        -5.0214409828186035,
        2.5578460693359375,
        4.012951374053955,
        5.267697334289551,
        0.2844562828540802,
        5.774829387664795,
        -2.6345221996307373,
        -3.7869722843170166,
        0.5685935020446777,
        -2.1250827312469482,
        1.0604925155639648,
        0.5907173156738281,
        -4.414628982543945,
        0.5746937394142151,
        0.4889771640300751,
        2.2201099395751953,
        -0.5606789588928223,
        -1.9370907545089722,
        -1.1837846040725708,
        -2.3475522994995117,
        6.9055376052856445,
        -7.300686359405518,
        3.5763654708862305,
        4.823240280151367,
        -0.27494508028030396,
        -9.100213050842285,
        0.5581408143043518,
        0.6792504191398621,
        -3.9802231788635254,
        -

## Python Examples

Matching the examples above, see the direct Python examples below.


In [None]:
%pip install local-llm --upgrade

In [None]:
from local_llm import LLM

LLM().models()

In [None]:
from local_llm import LLM

messages = [{"role": "system", "content": "What is the capital of Ohio?"}]

ai = LLM(
    models_dir="./models",
    model="Mistral-7B-OpenOrca",
)
ai.chat(messages=messages)

In [None]:
from local_llm import LLM

ai = LLM(
    models_dir="./models",
    model="Mistral-7B-OpenOrca",
)
ai.embedding("Columbus is the capital of Ohio.")