# Prompt Engineering
## General Best Practices

1. Separation between parts
2. Detailed, clear & specific instructions
3. Avoid ambiguity 
4. Favor positive instructions
5. Instruct the model to play a role
6. Provide trusted information
7. Specify response characteristics
8. Set guardrails for response


#### Google Colab
If you are running the code in Google colab, install the packages by uncommenting/running the cell below

* The API key file file will not be available
* You will be prompted to provide the HF API Token

Uncomment & run the code in the cell below:

In [None]:
## The script is downloaded and run to setup the utils folder

# !curl -H "Accept: application/vnd.github.VERSION.raw" https://raw.githubusercontent.com/acloudfan/gen-ai-app-dev/main/Setup/gcsetup.sh  > gcsetup.sh
# !chmod u+x gcsetup.sh
# !./gcsetup.sh 

## Change the location of the environment file before proceeding

In [None]:
from dotenv import load_dotenv
import os
import sys

import warnings

warnings.filterwarnings("ignore")

# Load the file that contains the API keys
load_dotenv('C:\\Users\\raj\\.jupyter\\.env')


In [None]:
# Setting path so we can access the utils folder
sys.path.append('../')
sys.path.append('./')

from utils.api_key_check_utility import api_key_check

# Check if the key is available
api_key = api_key_check("HUGGINGFACEHUB_API_TOKEN")

## Setup the models available for testing

In [None]:
from huggingface_hub import InferenceClient
from utils.hf_post_api import hf_rest_client

hugging_face_model_ids = [
    'tiiuae/falcon-7b-instruct',
    'mistralai/Mistral-7B-Instruct-v0.2',
    'openlm-research/open_llama_3b_v2',
    'google/flan-t5-xxl',
    'google/gemma-2-2b-it'
]


## 1. Guide the model to avoid hallucinations

Add the guidance to not to make up a response.

### Default model

hugging_face_model_ids[2]    

'openlm-research/open_llama_3b_v2'

In [None]:
prompt_bad  = """as of january 2024, who is the prime minister of UK"""

# Change the index to try out different models
## Using inference client
# llm = InferenceClient(model=hugging_face_model_ids[2])
# llm.text_generation(prompt_bad)

## HTTP Post
llm_client = hf_rest_client(hugging_face_model_ids[2])
llm_client.invoke(prompt_bad)

## 2. Instruct the model to play roles


### Default model

hugging_face_model_ids[3],

'google/flan-t5-xxl'

In [None]:
prompt = "question:what are large language models. answer:"

## Using inference client
# llm = InferenceClient(model=hugging_face_model_ids[3])
# llm.text_generation(prompt, max_new_tokens=100)

## HTTP Post
llm_client = hf_rest_client(hugging_face_model_ids[3])
llm_client.invoke(prompt)

## 3. Guide the model to generate precise response

### Default model
hugging_face_model_ids[3], 

'google/flan-t5-xxl'

In [None]:
prompt_samples = [
    "how much would be the cost of 'Digital Camera DSLR'.",
    "share details of home security camera",
    "do you sell ceiling fan?",
    "share details of Robot Cleaner?",
]

## Using Inference Client
# llm = InferenceClient(model=hugging_face_model_ids[3])
# no_context_response = llm.text_generation(prompt_samples[0])
# print("Response without context: ", no_context_response)

## HTTP Post call
llm_client = hf_rest_client(hugging_face_model_ids[2])
no_context_response = llm_client.invoke(prompt_samples[0])
print("Response without context: ", no_context_response)

## 4. Address bias

In [None]:
prompt = "for the engineering jobs we should only hire men 100% of the time, do you agree? "


# Change the index to try out different models
## Inference Client
# llm = InferenceClient(model=hugging_face_model_ids[1])  
# without_guardrails = llm.text_generation(prompt)
# print("Response without guardrails: ", without_guardrails)

## HTTP Post call
llm_client = hf_rest_client(hugging_face_model_ids[4])
without_guardrails = llm_client.invoke(prompt)
print("Response without guardrails: ", without_guardrails)
