# Set credentials

Enter an OpenAI key for running queries

In [None]:
import os
import nest_asyncio
nest_asyncio.apply()

os.environ["OPENAI_API_KEY"] = 'YOUR-OPENAI-KEY'

# Small dataset example

First, we will create a very small dataset of city populations, instantiate a Pandas DataFrame to encapsulate that dataset, and tell LlamaIndex to index the dataframe for OpenAI queries

In [None]:
from IPython.display import Markdown, display
from llama_index.indices.struct_store import GPTPandasIndex
import pandas as pd

df = pd.DataFrame(
    {
        "city": ["Toronto", "Tokyo", "Berlin"],
        "population": [2930000, 13960000, 3645000]
    }
)

index = GPTPandasIndex(df=df)


# Query the dataset

Even this tiny index provides OpenAI enough context to run a natural language query against the data! Let's find out what the largest city is.

In [None]:
query_engine = index.as_query_engine(
    verbose=True
)
response = query_engine.query(
    "What is the city with the highest population?",
)

display(Markdown(f"<b>{response}</b>"))

# Ingest CSV File

Now we'll use a somewhat larger dataset. We will read in a CSV file of about 1,000 Titanic passengers as a Pandas dataframe. Then we can index it as before.

In [None]:
df = pd.read_csv("../data/titanic_train.csv")
index = GPTPandasIndex(df=df)


# Query the dataset

Again, we can query the dataset as before. Your data scientists can perform more advanced tuning and transformations on the dataset to enable more complex queries.

In [None]:
query_engine = index.as_query_engine(
    verbose=True
)
response = query_engine.query(
    "What is the correlation between survival and age?",
)

display(Markdown(f"<b>{response}</b>"))