# BigQuery Integration

This notebook demonstrates how to set up and use the LLMstudio tracking functionalities integrated to BigQuery. 

You'll learn:
1. Authenticate BigQuery
2. Start a local Tracker server
3. See the saved logs

First things first:
* run `pip install llmstudio[tracker]`
* update your .env file with `GOOGLE_API_KEY` or `OPENAI_API_KEY` 


### BigQuery setup

To use BigQuery, follow these steps:

1. Select or create a Cloud Platform project.
2. [Create a BigQuery Dataset](https://cloud.google.com/bigquery/docs/datasets)
2. [Enable the BigQuery Storage API](https://console.cloud.google.com/apis/library/bigquery.googleapis.com).
3. [Setup Authentication:](https://googleapis.dev/python/google-api-core/latest/auth.html)
    - If you’re running in a Google Virtual Machine Environment (Compute Engine, App Engine, Cloud Run, Cloud Functions), authentication should “just work”.
    - If you’re developing locally, the easiest way to authenticate is using the Google Cloud SDK:
       ```$ gcloud auth application-default login```
    - If you’re running your application elsewhere, you should download a service account JSON keyfile and point to it using an environment variable: 
      ```$ export GOOGLE_APPLICATION_CREDENTIALS="/path/to/keyfile.json"```

   


## LLMstudio Tracker setup

For LLMstudio to store your logs in BigQuery you need to set the 'LLMSTUDIO_TRACKING_URI' environment variable with the corresponding uri, which will be in this format: `bigquery://<YOUR-GCP-PROJECT-ID>/<YOUR-BQ-DATASET-ID>`

In [None]:
import os

os.environ["LLMSTUDIO_TRACKING_URI"] = "bigquery://<YOUR-GCP-PROJECT-ID>/<YOUR-BQ-DATASET-ID>"

In [1]:
from llmstudio.providers import LLM
from pprint import pprint

In [2]:
from llmstudio.server import start_servers
start_servers(proxy=False, tracker=True)

Running LLMstudio Tracking on http://0.0.0.0:8002 


In [None]:
from llmstudio_tracker.tracker import TrackingConfig
# default port is 50002. set the environment varible to specify which host and port; LLMSTUDIO_TRACKING_HOST, LLMSTUDIO_TRACKING_PORT
tracker_config = TrackingConfig(host="0.0.0.0", port="50002")
# You can set OPENAI_API_KEY and ANTHROPIC_API_KEY on .env file
openai = LLM("openai", tracking_config = tracker_config)


## Analyse logs

In [8]:
from llmstudio_tracker.tracker import Tracker

tracker = Tracker(tracking_config=tracker_config)

In [9]:
logs = tracker.get_logs()
logs.json()[-1]

{'chat_input': "Write a paragraph explaining why you're not a cat",
 'chat_output': "I’m not a cat because I lack the physical form and instincts that characterize feline creatures. Unlike a cat, I don’t have a furry coat, retractable claws, or the ability to pounce playfully on a sunbeam. I don’t experience the world through senses like smell, sight, or sound, nor do I possess the whimsical personality traits that make cats so captivating, such as their curiosity and independence. Instead, I am a collection of algorithms and data, designed to process information and generate responses, which allows me to assist you in ways that a cat simply can't—like answering questions, providing explanations, or engaging in conversation.",
 'session_id': '20241024-110303-e8b361d9-d5f6-4b73-80f1-6d77a4be3793',
 'context': [{'role': 'user',
   'content': "Write a paragraph explaining why you're not a cat"}],
 'provider': 'openai',
 'model': 'gpt-4o-mini',
 'deployment': 'gpt-4o-mini-2024-07-18',
 'pa

## Add a session id to tracking logs

* this is especially benefitial if running an app, chatbot agent, etc in production and you need to correlate user feedback, costs etc with user sessions, agent runs, etc

In [None]:
# default port is 50002. set the environment varible to specify which host and port; LLMSTUDIO_TRACKING_HOST, LLMSTUDIO_TRACKING_PORT
# You can set OPENAI_API_KEY on .env file
openai = LLM("openai", tracking_config = tracker_config, session_id="openai-session-1")


In [11]:
response = openai.chat("Write a paragraph explaining why it's important to track AI agents usage metrics and costs and correlate with user feedback", model="gpt-4o", is_stream=True)
for i, chunk in enumerate(response):
    if i%20==0:
        print("\n")
    if not chunk.metrics:
        print(chunk.chat_output_stream, end="", flush=True)
    else:
        print("\n\n## Metrics:")
        pprint(chunk.metrics)




Tracking AI agents' usage metrics and costs, alongside correlating them with user feedback, is crucial

 for optimizing performance and user satisfaction. Usage metrics provide insights into how often and in what ways AI agents

 are being utilized, helping to identify patterns, peak usage times, and potential bottlenecks that could

 affect service quality. Monitoring costs is equally important to ensure that resources are allocated efficiently, preventing financial waste

 while maximizing return on investment. By correlating these metrics with user feedback, developers and stakeholders can gain

 a holistic understanding of how the AI agent is performing in real-world settings. This integrated approach enables the

 identification of areas for improvement, the fine-tuning of algorithms, and the enhancement of user experience,

 ultimately leading to more effective, scalable, and user-friendly AI solutions. Additionally, it allows for the

 alignment of AI functionalities with user

In [12]:
logs = tracker.get_session_logs(session_id="openai-session-1")
logs.json()[-1]

{'chat_input': "Write a paragraph explaining why it's important to track AI agents usage metrics and costs and correlate with user feedback",
 'chat_output': "Tracking AI agents' usage metrics and costs, alongside correlating them with user feedback, is crucial for optimizing performance and user satisfaction. Usage metrics provide insights into how often and in what ways AI agents are being utilized, helping to identify patterns, peak usage times, and potential bottlenecks that could affect service quality. Monitoring costs is equally important to ensure that resources are allocated efficiently, preventing financial waste while maximizing return on investment. By correlating these metrics with user feedback, developers and stakeholders can gain a holistic understanding of how the AI agent is performing in real-world settings. This integrated approach enables the identification of areas for improvement, the fine-tuning of algorithms, and the enhancement of user experience, ultimately l