# 🚀 **Exploring Perplexity API Functionality**

This notebook aims to explore Perplexity API functionalities. [pplx-api](https://docs.perplexity.ai/). 

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/GiacomoMeloni/ExploringLLMs/blob/main/notebooks/exploring_perplexityai_api.ipynb)

## 🤔 **About Perplexity AI**

[Perplexity AI](https://www.perplexity.ai/) is an AI-powered search engine and chatbot that uses advanced natural language processing tequniques to provide users with accurate and up-to-date information on various topics. It is designed to search the web in real-time and offer information on a wide range of subjects. Some key features and capabilities of Perplexity AI include:

- 1️⃣ **Answer Engine**: Perplexity AI focuses on advancing how people [discover](https://www.perplexity.ai/discover) and share information by providing ready-made answers to users' questions and citing sources in real-time.

- 2️⃣ **Multifaceted Applications**: Perplexity AI is versatile and can assist various professions, such as researchers, writers, artists, musicians, and programmers, in tasks like answering questions, generating text, writing creative content, and summarizing text.

- 3️⃣ **Accuracy**: The AI uses natural language processing and machine learning techniques to analyze text data, providing an in-depth analysis and generating fresh concepts, resulting in accurate answers and information.

- 4️⃣ **User-friendly Interface**: Perplexity AI is available on the web and as an app for iPhone users, making it easily accessible to a wide range of users.

Perplexity AI was founded in 2022 by Andy Konwinski, Aravind Srinivas, Denis Yarats, and Johnny Ho, who have experience working with large language models at Google AI.


## 📰 **Recent releases**

Perplexity AI recently introduced its [pplx-api](https://docs.perplexity.ai/), which allows developers to access the company's AI models and other resources through a REST API. The API is designed to work with various large language models:

| Model Name              | Context Length | Model Type        |
|-------------------------|-----------------|-------------------|
| codellama-34b-instruct  | 16384           | Chat Completion  |
| llama-2-70b-chat        | 4096            | Chat Completion  |
| mistral-7b-instruct     | 4096 ⬆     | Chat Completion  |
| pplx-7b-chat            | 8192            | Chat Completion  |
| pplx-70b-chat           | 4096            | Chat Completion  |
| pplx-7b-online          | 4096            | Chat Completion  |
| pplx-70b-online         | 4096            | Chat Completion  |

⬆️ as reported on the 🗺️ [roadmap](https://docs.perplexity.ai/docs/feature-roadmap) the context length of Mistral 7b will be extended to 32K tokens.*

In [None]:
import os
import requests
from time import time
from enum import Enum
from datetime import datetime

## ℹ️ Get started

As reported in the [official documentation](https://docs.perplexity.ai/docs) is it possible to access to pplx-api generating an API Key through an active account. Pricing details are reported in the [pricing page](https://docs.perplexity.ai/docs/pricing).

Is it possible to follow the official [API Reference](https://docs.perplexity.ai/reference/post_chat_completions). 

In [None]:
# Useful placeholders

# PerplexityAI's API endpoint to access generative models
URL = "https://api.perplexity.ai/chat/completions"

# Is it possibile to specify some parameters for the output generation of each model
MAX_TOKENS = 1500
TEMPERATURE = 0
TOP_P = 1
FREQUENCY_PENALTY = 1.5

## 🤖 Model Details 
Currently, PerplexityAI APIs offer access to open-source models as listed in [Supported Models] page. However, as reported in their blog, the models with native web access managed by PerplexityAI APIs are `pplx-7b-online` and `pplx-70b-online`.

These models have been fine-tuned to effectively use snippets to inform their responses and are regularly updated to continually improve perfomance. Respectively, `pplx-7b-online` was finetuned using `mistral-7b`, and `pplx-70b-online` was fine-tuned using `llama2-70b`.

In [None]:
# Available models from PerplexityAI API

class AvailablePPLXModels(Enum):
  MISTRAL_7B: str = "mistral-7b-instruct"
  PPLX_7B: str = "pplx-7b-chat"
  PPLX_70B: str = "pplx-70b-chat"
  PPLX_7B_ON: str = "pplx-7b-online" #🛜 Native Online Retrieval
  PPLX_70B_ON: str = "pplx-70b-online" #🛜 Native Online Retrieval
  LLAMA2_70B: str = "llama-2-70b-chat"
  CODE_LLAMA_34B: str = "codellama-34b-isntruct"

## 🔎 Set up Authorization Header

As you can find in the **Get Started** section, it is necessary to use an API KEY to interact with PerplexityAI APIs. 

* If you are using a Jupyter Notebook insert `PPLXAI_API_TOKEN` as environment variable to retrieve the key;
* If you are using Google Colab insert `PPLXAI_API_TOKEN` as secret using the newly launched Secrets tab to retrieve the key.

In [None]:
if 'google.colab' in str(get_ipython()):
    try:
        from google.colab import userdata
        pplx_api_key = userdata.get('PPLXAI_API_TOKEN')
    except Exception: 
        pplx_api_key = None
else:
    pplx_api_key = os.environ.get('PPLXAI_API_TOKEN', None)

headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "authorization": f"Bearer {pplx_api_key}"
}

assert headers['authorization'] is not None, "You need to add an API Key to use Perplexity's API"

## 🎙️ Preparing Payload

To interact with the model we can use the classic message list as openAI's chat completion APIs. For more details is possible to check on [Perplexity API Reference](https://docs.perplexity.ai/reference/post_chat_completions). Furthermore, is it possible to specify parameters values to tune LLM's output behaviour. 

In [None]:
print('Insert your query here')
user_query = input('User: ')
print(f'Given query:\t{user_query}')

sys_prompt = f"""You have to report the most recent information requested by the user. Use only official documentation and be as concise as possible."""

payload = {
    "model": AvailablePPLXModels.PPLX_7B_ON.value,
    "messages": [
        {
            "role": "system",
            "content": sys_prompt
        },
        {
            "role": "user",
            "content": user_query
        }
    ],
    "max_tokens": MAX_TOKENS,
    "temperature": TEMPERATURE,
    "top_p": TOP_P,
    "frequency_penalty": FREQUENCY_PENALTY
}

assert payload['messages'] is not None, "You need to specify some message for the model to work!"

print(f'Payload ready!')

In [None]:
start = time()
response = requests.post(URL, json=payload, headers=headers).json()
print(f"Execution time: {round(time()-start, 2)}s")

## 📢 Response details
It is possible to retrieve the total amount of tokens the model dealt with for the current interation. Unfortunately, it is not possible to track the sources used by the LLM to generate the output. This makes difficult to evaluate the model retrieval system and understand how well the model is working.    

In [None]:
print(f"Input tokens:\t {response['usage']['prompt_tokens']}")
print(f"Output tokens:\t {response['usage']['completion_tokens']}")
print(f"Total tokens:\t {response['usage']['total_tokens']}\n\n")

print(f"{response['choices'][0]['message']['content']}")

⚠️ Some information reported here have been retrieved leveraging `pplx-7b-online`!