# Working with datasets 📊

In this notebook, we'll cover how to create and work with datasets.

In [32]:
from dotenv import load_dotenv
load_dotenv(override=True)

from ddtrace.llmobs import Dataset

## 1. Creating a Dataset
In this example, we'll define a dataset programmatically by passing a list of dictionaries. Each dictionary represents a record containing an `input` and an optional `expected_output`. 

**Note**: Dataset names must be unique. If a dataset with the specified name already exists, the existing dataset will be returned instead of creating a new one.

Alternatively, you can create or upload datasets directly within our product. We encourage you to explore this workflow after completing the notebook.

In [50]:
# Create a dataset by passing a list of dictionaries, input is required, expected_output is optional.
dataset = Dataset(name="capitals-of-the-world", 
                  data=[
                      {"input": "What is the capital of China?", "expected_output": "Beijing"},
                      {"input": "Which city serves as both the capital of South Africa's executive branch and its administrative capital?", "expected_output": "Pretoria"},
                      {"input": "What is the capital of Switzerland, many people incorrectly think it's Geneva or Zurich?", "expected_output": "Bern"},
                      {"input": "Name the capital city that sits on both sides of the Danube River, formed by uniting Buda and Pest in 1873?", "expected_output": "Budapest"},
                      {"input": "Which city became Kazakhstan's capital in 1997, was renamed to Nur-Sultan in 2019, and then back to its original name in 2022?", "expected_output": "Astana"},
                      {"input": "What is the capital of Bhutan, located in a valley at an elevation of 7,874 feet?", "expected_output": "Thimphu"},
                      {"input": "Which city became the capital of Myanmar in 2006, a planned city built from scratch in the jungle?", "expected_output": "Naypyidaw"},
                      {"input": "What is the capital of Eritrea, known for its Italian colonial architecture and Art Deco buildings?", "expected_output": "Asmara"},
                      {"input": "Name the capital of Turkmenistan, which holds the world record for the highest density of white marble buildings?", "expected_output": "Ashgabat"},
                      {"input": "Which city is Kyrgyzstan's capital, situated at the foot of the Tian Shan mountains and named after a wooden churn used to make kumis?", "expected_output": "Bishkek"},
                      {"input": "What is the capital of Brunei, officially known as Bandar Seri Begawan, which translates to 'City of the Noble Ruler'?", "expected_output": "Bandar Seri Begawan"},
                      {"input": "Name the capital of Tajikistan, which was formerly known as Stalinabad from 1929 to 1961?", "expected_output": "Dushanbe"},
                      {"input": "Which city serves as the capital of Eswatini (formerly Swaziland), whose name means 'place of burning' in siSwati?", "expected_output": "Mbabane"}
                  ])


You can access records using index notation:

In [56]:
print('Record at index 0 ->', dataset[0])
print('Records between index 1 and 5 ->', dataset[1:5])
# etc...

Record at index 0 -> {'input': 'What is the capital of China?', 'expected_output': 'Beijing'}
Records between index 1 and 5 -> [{'input': "Which city serves as both the capital of South Africa's executive branch and its administrative capital?", 'expected_output': 'Pretoria'}, {'input': "What is the capital of Switzerland, many people incorrectly think it's Geneva or Zurich?", 'expected_output': 'Bern'}, {'input': 'Name the capital city that sits on both sides of the Danube River, formed by uniting Buda and Pest in 1873?', 'expected_output': 'Budapest'}, {'input': "Which city became Kazakhstan's capital in 1997, was renamed to Nur-Sultan in 2019, and then back to its original name in 2022?", 'expected_output': 'Astana'}]


To push the dataset to Datadog, use the `push()` method.

In [47]:
dataset.push()

ValueError: Dataset 'capitals-of-the-world-alex' already exists. Dataset versioning will be supported in a future release. Please use a different name for your dataset.

## 2. Displaying the Dataset as a DataFrame
You can display the dataset as a pandas dataframe for easier visualization.

In [57]:
# Display the dataset as a pandas dataframe
dataset.as_dataframe()

Unnamed: 0,expected_output,input
,,
0.0,Beijing,What is the capital of China?
1.0,Pretoria,Which city serves as both the capital of South...
2.0,Bern,"What is the capital of Switzerland, many peopl..."
3.0,Budapest,Name the capital city that sits on both sides ...
4.0,Astana,Which city became Kazakhstan's capital in 1997...
5.0,Thimphu,"What is the capital of Bhutan, located in a va..."
6.0,Naypyidaw,Which city became the capital of Myanmar in 20...
7.0,Asmara,"What is the capital of Eritrea, known for its ..."
8.0,Ashgabat,"Name the capital of Turkmenistan, which holds ..."


# 3. Pulling an Existing Dataset
To pull the dataset from Datadog, use the `pull()` method.

In [49]:
dataset = Dataset.pull(name="capitals-of-the-world")
dataset.as_dataframe()

Unnamed: 0_level_0,expected_output,input,metadata
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,record_id
0,Astana,Which city became Kazakhstan's capital in 1997...,cf44f999-2299-45e6-b6fb-53eb12cf2288
1,Budapest,Name the capital city that sits on both sides ...,6c07d3d2-8c7a-46c7-af47-bd2520d3cbac
2,Dushanbe,"Name the capital of Tajikistan, which was form...",84647598-1bc2-43f8-a67c-0cbfd959f9e7
3,Bern,"What is the capital of Switzerland, many peopl...",768dec53-3f94-4bda-bbc3-4353793b7440
4,Beijing,What is the capital of China?,9e93b8c0-dc06-4120-a6b5-ad874d769a58
5,Pretoria,Which city serves as both the capital of South...,a522f5c5-bc60-496f-87c0-096fabd640ec
6,Bandar Seri Begawan,"What is the capital of Brunei, officially know...",55673d1f-52aa-4370-ad47-7d1df8ac0ac5
7,Naypyidaw,Which city became the capital of Myanmar in 20...,08c3aafe-a416-4f78-8880-f2b91d6efc7f
8,Thimphu,"What is the capital of Bhutan, located in a va...",074febc5-ad24-46b2-8445-bf59e8aac714
9,Ashgabat,"Name the capital of Turkmenistan, which holds ...",7c94e126-c997-4084-9058-bc0b2cef316c
