# Working with datasets 📊

In this notebook, we'll cover how to create and work with datasets.

In [64]:
from dotenv import load_dotenv
# Load environment variables from a .env file, overriding existing ones.
# Disable override if your environment is defined outside the virtualenv.
load_dotenv(override=True)

from ddtrace.llmobs import Dataset

## 1. Creating a Dataset
In this example, we'll define a dataset programmatically by passing a list of dictionaries. Each dictionary represents a record containing an `input` and an optional `expected_output`. 

**Note**: Dataset names must be unique. If a dataset with the specified name already exists, the existing dataset will be returned instead of creating a new one.

Alternatively, you can create or upload datasets directly within our product. We encourage you to explore this workflow after completing the notebook.

In [65]:
dataset = Dataset(name="capitals-of-the-world-alextesting", 
                  data=[
                      {"input": "What is the capital of China?", "expected_output": "Beijing"},
                      {"input": "Which city serves as both the capital of South Africa's executive branch and its administrative capital?", "expected_output": "Pretoria"},
                      {"input": "What is the capital of Switzerland, many people incorrectly think it's Geneva or Zurich?", "expected_output": "Bern"},
                      {"input": "Name the capital city that sits on both sides of the Danube River, formed by uniting Buda and Pest in 1873?", "expected_output": "Budapest"},
                      {"input": "Which city became Kazakhstan's capital in 1997, was renamed to Nur-Sultan in 2019, and then back to its original name in 2022?", "expected_output": "Astana"},
                      {"input": "What is the capital of Bhutan, located in a valley at an elevation of 7,874 feet?", "expected_output": "Thimphu"},
                      {"input": "Which city became the capital of Myanmar in 2006, a planned city built from scratch in the jungle?", "expected_output": "Naypyidaw"},
                      {"input": "What is the capital of Eritrea, known for its Italian colonial architecture and Art Deco buildings?", "expected_output": "Asmara"},
                      {"input": "Name the capital of Turkmenistan, which holds the world record for the highest density of white marble buildings?", "expected_output": "Ashgabat"},
                      {"input": "Which city is Kyrgyzstan's capital, situated at the foot of the Tian Shan mountains and named after a wooden churn used to make kumis?", "expected_output": "Bishkek"},
                      {"input": "What is the capital of Brunei, officially known as Bandar Seri Begawan, which translates to 'City of the Noble Ruler'?", "expected_output": "Bandar Seri Begawan"},
                      {"input": "Name the capital of Tajikistan, which was formerly known as Stalinabad from 1929 to 1961?", "expected_output": "Dushanbe"},
                      {"input": "Which city serves as the capital of Eswatini (formerly Swaziland), whose name means 'place of burning' in siSwati?", "expected_output": "Mbabane"}
                  ])


You can access records using index notation:

In [None]:
print('Record at index 0 ->', dataset[0])
print('Records between index 1 and 5 ->', dataset[1:5])
# etc...

To push the dataset to Datadog, use the `push()` method.

In [None]:
dataset.push()

## 2. Displaying the Dataset as a DataFrame
You can display the dataset as a pandas dataframe for easier visualization.

In [None]:
# Display the dataset as a pandas dataframe
dataset.as_dataframe()

# 3. Pulling an Existing Dataset
To pull the dataset from Datadog, use the `pull()` method.

In [None]:
dataset = Dataset.pull(name="capitals-of-the-world")
dataset.as_dataframe()