# Llama3.1 Dataset Maker for Nvidia NIM

This notebook is intended to run in a Llama3.1 NIM environment via Brev.dev.
<br>
Please make sure to **paste your NGC API KEY**

## Setup Llama3.1 NIM

In [None]:
%%bash 

# Install Docker CLI
echo "Installing Docker CLI..."
sudo apt-get update
sudo apt-get install -y \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg \
    lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y docker-ce-cli

#################################################
# setup Llama3.1 NIM
export NGC_API_KEY= # paste your NGC API KEY here
#################################################

# Log in to NGC
echo "${NGC_API_KEY}" | docker login nvcr.io -u '$oauthtoken' --password-stdin

# Set path to your LoRA model store
export LOCAL_PEFT_DIRECTORY="$(pwd)/loras"
mkdir -p $LOCAL_PEFT_DIRECTORY
pushd $LOCAL_PEFT_DIRECTORY
popd

chmod -R 777 $LOCAL_PEFT_DIRECTORY

# Set up NIM cache directory
mkdir -p $HOME/.nim-cache

export NIM_PEFT_SOURCE=/workspace/loras # Path to LoRA models internal to the container
export CONTAINER_NAME=meta-llama3_1-8b-instruct
export NIM_CACHE_PATH=$HOME/.nim-cache
export NIM_PEFT_REFRESH_INTERVAL=60

docker run -d --name=$CONTAINER_NAME \
    --network=container:verb-workspace \
    --runtime=nvidia \
    --gpus all \
    --shm-size=16GB \
    -e NGC_API_KEY \
    -e NIM_PEFT_SOURCE \
    -e NIM_PEFT_REFRESH_INTERVAL \
    -v $HOME/.nim-cache:/home/user/.nim-cache \
    -v /home/ubuntu/workspace:/workspace \
    -w /workspace \
    nvcr.io/nim/meta/llama-3.1-8b-instruct:1.1.0

# Check if NIM is up
echo "Checking if NIM is up..."
while true; do
    if curl -s http://localhost:8000 > /dev/null; then
        echo "NIM has been started successfully!"
        break
    else
        echo "NIM is not up yet. Checking again in 10 seconds..."
        sleep 10
    fi
done
 

## Verify that Llama is Available
Specify the client URL to check if Llama3.1 is listening on port 8000

In [None]:
!curl localhost:8000/v1/health/ready

## Install OpenAI
To run the dataset maker, the openai modeule is required

In [None]:
!pip install openai

## Create Dataset

In the following cell, you will see how to chain prompt output to generate datasets automatically.

In [None]:
from openai import OpenAI
import pandas as pd

# create empty dataframe 
data = pd.DataFrame(columns=["country", "capital", "food"])

# specify model location
client = OpenAI(
  base_url = "http://localhost:8000/v1",
  api_key = "not_used"
)

def ask_question(user_input):
    
    # specify model settings
    chat_response = client.chat.completions.create(
    # please note: model name written without a dot symbol in this implementation
    model="meta/llama-3_1-8b-instruct",
    messages=[{"role":"user","content": user_input}],
    temperature=0.5,
    top_p=1,
    max_tokens=1024,
    # return output as a single unit of text
    stream=False
    )

    return chat_response.choices[0].message.content

# fetch names of all world countries
all_countries = ask_question("""
names of all countries separated by commas in an alphabetical order.
names only, with no other output
""")
all_countries = all_countries.split(", ")

# iterate over all country names
for i, country in enumerate(all_countries):
    # fetch attributes for each country
    capital = ask_question("what is the capital city of " + country + ". just the name")
    food = ask_question("what is the national food of " + country + ". just the name")
    # store country and attributes in the pre-defined dataframe
    data.loc[i] = [country, capital, food]

# save CSV file in the current directory
data.to_csv("data.csv", header=None)
print("CSV file was successully saved in the root directory of your Jupyter Lab! :)")