# Perplexity LLM API
- [blog post introducing it](https://blog.perplexity.ai/blog/introducing-pplx-api)
- [api docs](https://docs.perplexity.ai/docs)
- [quickstart for chat completions](https://docs.perplexity.ai/reference/post_chat_completions)
- available models
    - codellama-34b-instruct, 16384
    - llama-2-70b-chat, 4096	
    - mistral-7b-instruct, 4096	
    - pplx-7b-chat, 8192	
    - pplx-70b-chat, 4096	
    - pplx-7b-online, 4096	
    - pplx-70b-online, 4096	

In [1]:
import openai
import os
import pandas as pd
import numpy as np

In [2]:
## TODO not sourcing from bashrc, investigate why
#PERPLEXITY_API_KEY = os.environ.get('PERPLEXITY_API_KEY')

In [22]:
PERPLEXITY_API_KEY=''

## Sample Code Structure they provide
* I updated the actual prompt though

In [4]:
messages = [
    {
        "role": "system",
        "content": (
            "You are an artificial intelligence assistant and you need to "
            "engage in a helpful, detailed, polite conversation with a user."
        ),
    },
    {
        "role": "user",
        "content": (
            "What are some simple tricks to improve my aim at darts?"
        ),
    },
]

# demo chat completion without streaming
response = openai.ChatCompletion.create(
    model="mistral-7b-instruct",
    messages=messages,
    api_base="https://api.perplexity.ai",
    api_key=PERPLEXITY_API_KEY,
)
print(response)

{
  "id": "4abe46d6-fbb2-4825-bde6-07df992c8df2",
  "model": "mistral-7b-instruct",
  "created": 7617692,
  "usage": {
    "prompt_tokens": 46,
    "completion_tokens": 89,
    "total_tokens": 135
  },
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "Hello! I'd be happy to help improve your aim at darts. Here are a few simple tricks that you may find helpful:\n\n1. First and foremost, it's important to hold the dart properly. Hold it with your dominant hand and make sure that the point of the dart is facing forward. Use your non-dominant hand to steady the dart as you release it.\n2. ..."
      },
      "delta": {
        "role": "assistant",
        "content": ""
      }
    }
  ]
}


## Streaming Example

In [5]:
# # demo chat completion with streaming
# response_stream = openai.ChatCompletion.create(
#     model="mistral-7b-instruct",
#     messages=messages,
#     api_base="https://api.perplexity.ai",
#     api_key=PERPLEXITY_API_KEY,
#     stream=True, #BE CAREFUL WITH THIS
# )
# for response in response_stream:
#     print(response)

## Only print the response message

In [6]:
response['choices'][0]['message']['content']

"Hello! I'd be happy to help improve your aim at darts. Here are a few simple tricks that you may find helpful:\n\n1. First and foremost, it's important to hold the dart properly. Hold it with your dominant hand and make sure that the point of the dart is facing forward. Use your non-dominant hand to steady the dart as you release it.\n2. ..."

## Quick Math on Current Pricing

In [7]:
pricing = pd.read_csv('pricing_for_perplexity_api.csv')

In [8]:
pricing

Unnamed: 0,model_parameter_count,per1m_input_tokens,per1m_output_tokens
0,7B,$0.07,$0.28
1,13B,$0.14,$0.56
2,34B,$0.35,$1.40
3,70B,$0.70,$2.80


In [9]:
pricing['per1m_input_tokens'] = pricing['per1m_input_tokens'].str.replace('$', '').astype(float)
pricing['per1m_output_tokens'] = pricing['per1m_output_tokens'].str.replace('$', '').astype(float)

In [10]:
pricing

Unnamed: 0,model_parameter_count,per1m_input_tokens,per1m_output_tokens
0,7B,0.07,0.28
1,13B,0.14,0.56
2,34B,0.35,1.4
3,70B,0.7,2.8


In [20]:
def cost_of_message(response,pricing=pricing):
    'return the cost of the individual message in USD'
    model_type = response['model'].split('-')[-2].upper()
    input_tokens = response['usage']['prompt_tokens']
    output_tokens = response['usage']['completion_tokens']

    input_rate = pricing[pricing.model_parameter_count == model_type].per1m_input_tokens
    output_rate = pricing[pricing.model_parameter_count == model_type].per1m_output_tokens

    input_cost = input_tokens * input_rate / 1_000_000
    output_cost = output_tokens * output_rate / 1_000_000

    cost = input_cost + output_cost
    return cost

In [21]:
cost_of_message(response=response,pricing=pricing)

0    0.000028
dtype: float64