# Llama3.1 Dataset Maker for Nvidia NIM

This notebook is intended to run in a Llama3.1 NIM environment.
<br>
To set it up, please watch my video tutorial on the topic.

## Verify that Llama is Available
Specify the client URL to check if Llama3.1 is listening on port 8000

In [None]:
!curl localhost:8000/v1/health/ready

## Create Dataset

In the following cell, you will see how to chain prompt output to generate datasets automatically.

In [65]:
from openai import OpenAI
import pandas as pd

# create empty dataframe 
data = pd.DataFrame(columns=["country", "capital", "food"])

# specify model location
client = OpenAI(
  base_url = "http://localhost:8000/v1",
  api_key = "not_used"
)

def ask_question(user_input):
    
    # specify model settings
    chat_response = client.chat.completions.create(
    model="meta/llama-3.1-8b-instruct",
    messages=[{"role":"user","content": user_input}],
    temperature=0.5,
    top_p=1,
    max_tokens=1024,
    # return output as a single unit of text
    stream=False
    )

    return chat_response.choices[0].message.content

# fetch names of all world countries
all_countries = ask_question("""
names of all countries separated by commas in an alphabetical order.
names only, with no other output
""")
all_countries = all_countries.split(", ")

# iterate over all country names
for i, country in enumerate(all_countries):
    # fetch attributes for each country
    capital = ask_question("what is the capital city of " + country + ". just the name")
    food = ask_question("what is the national food of " + country + ". just the name")
    # store country and attributes in the pre-defined dataframe
    data.loc[i] = [country, capital, food]

# save CSV file in the current directory
data.to_csv("data.csv", header=None)

## Workflow: All Countries Output

The next cell was used to customize the all_countries output from the cell above. You don't have to run it as it is already included in the dataset maker code.

In [63]:
all_countries = ask_question("""
names of all countries separated by commas in an alphabetical order.
names only, with no other output
""")
all_countries = all_countries.split(", ")
print(all_countries)

['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin', 'Bhutan', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'Brunei', 'Bulgaria', 'Burkina Faso', 'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Comoros', 'Congo', 'Costa Rica', "CÃ´te d'Ivoire", 'Croatia', 'Cuba', 'Cyprus', 'Czech Republic', 'Denmark', 'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia', 'Eswatini', 'Ethiopia', 'Fiji', 'Finland', 'France', 'Gabon', 'Gambia', 'Georgia', 'Germany', 'Ghana', 'Greece', 'Grenada', 'Guatemala', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti', 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran', 'Iraq', 'Ireland', 'Israel', 'Italy', 'Jamaica', 'Japan', 'Jordan', 