# Introduction to Prompt Engineering with Large Language Models

In this notebook, you'll learn how to interact with large language models (LLMs) using prompting.

We will explore:

- Basic prompts
- Improving prompts with context
- Few-shot examples
- Using prompt templates


<a href="https://colab.research.google.com/github/cbadenes/semantic-report-search/blob/main/data/analysis/40_prompting_basics.ipynb" target="_parent">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a>


In [None]:
from huggingface_hub import login
import getpass

token = getpass.getpass("🔑 Enter your Hugging Face token: ")
login(token)

🔑 Enter your Hugging Face token: ··········


In [4]:
!pip install -q transformers accelerate

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

tokenizer = AutoTokenizer.from_pretrained(model_id, token=token)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", token=token)

generator = pipeline("text-generation", model=model, tokenizer=tokenizer)


tokenizer_config.json:   0%|          | 0.00/1.29k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Device set to use cuda:0


Basic Prompt:

In [9]:
prompt = (
    "<|system|>\nYou are a helpful assistant.\n"
    "<|user|>\nSummarize the purpose of a report that analyzes the distribution flows in feeder markets.\n"
    "<|assistant|>\n"
)
response = generator(prompt, max_new_tokens=200)[0]['generated_text']
print(response)


<|system|>
You are a helpful assistant.
<|user|>
Summarize the purpose of a report that analyzes the distribution flows in feeder markets.
<|assistant|>
The purpose of a report that analyzes the distribution flows in feeder markets is to provide insights into the distribution of goods and services within a specific geographic area, including the distribution of goods and services between different regions or markets. The report may also examine the factors that influence the distribution of goods and services, such as transportation, logistics, and pricing. The report may also provide recommendations for improving the distribution of goods and services within the feeder market.


Improving Prompt with Context:

In [12]:
prompt = (
    "<|system|>\nYou are a helpful assistant that classifies business reports based on their description.\n"
    "<|user|>\n"
    "Here's an example of a report description and its purpose:\n"
    "Description: 'This report provides details about average daily rate (ADR), occupancy, and revenue trends in top-performing hotels.'\n"
    "Purpose: 'To analyze revenue performance metrics in hospitality properties.\n"
    "Now analyze this one:\n"
    "Description: 'Shows contribution of each channel to the market flow across different regions.\n"
    "Purpose:"
    "\n<|assistant|>\n"
)

response = generator(prompt, max_new_tokens=300)[0]['generated_text']
print(response)


<|system|>
You are a helpful assistant that classifies business reports based on their description.
<|user|>
Here's an example of a report description and its purpose:
Description: 'This report provides details about average daily rate (ADR), occupancy, and revenue trends in top-performing hotels.'
Purpose: 'To analyze revenue performance metrics in hospitality properties.
Now analyze this one:
Description: 'Shows contribution of each channel to the market flow across different regions.
Purpose:
<|assistant|>
This report provides insights into the contribution of each channel to the market flow across different regions. The report's purpose is to analyze the impact of different channels on the overall market performance. The report's description highlights the key findings and insights derived from the analysis.


Few-Shot Prompting:

In [15]:

prompt = (
    "<|system|>\nYou are a helpful assistant that classifies business reports based on their description.\n"
    "<|user|>\n"
    "Below are examples of report descriptions and their business categories:\n"
    "Description: 'Tracks performance metrics for feeder markets and hotel destinations.'\nCategory: 'Market Performance'\n"
    "Description: 'Analyzes average revenue per room and booking channels.'\nCategory: 'Revenue Management'\n"
    "Description: 'Evaluates staffing and service efficiency across hotel departments.'\nCategory: 'Operations'\n"
    "Now define the category of this one:\n"
    "Description: 'Breakdown of regional demand for group bookings.'\nCategory:"
    "\n<|assistant|>\n"
)
response = generator(prompt, max_new_tokens=50)[0]['generated_text']
print(response)


You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


<|system|>
You are a helpful assistant that classifies business reports based on their description.
<|user|>
Below are examples of report descriptions and their business categories:
Description: 'Tracks performance metrics for feeder markets and hotel destinations.'
Category: 'Market Performance'
Description: 'Analyzes average revenue per room and booking channels.'
Category: 'Revenue Management'
Description: 'Evaluates staffing and service efficiency across hotel departments.'
Category: 'Operations'
Now define the category of this one:
Description: 'Breakdown of regional demand for group bookings.'
Category:
<|assistant|>
This report is categorized under the 'Market Performance' category. It is a breakdown of regional demand for group bookings.


CRISPE Prompt Template (Guided Practice):

In [16]:
prompt = (
    "<|system|>\nYou are a helpful assistant trained to classify hotel business reports based on their descriptions.\n"
    "<|user|>\n"
    "Context: You're working with a set of hotel business reports.\n"
    "Role: You are a data analyst assistant.\n"
    "Instructions: Classify the report into one of these categories: ['Market', 'Revenue', 'Operations', 'HR'].\n"
    "Steps:\n"
    "- Read the description\n"
    "- Identify the key business focus\n"
    "- Map to a category\n\n"
    "Parameters:\n"
    "- Use only one label\n"
    "- Be concise\n\n"
    "Example:\n"
    "Description: 'Shows ADR and occupancy over time per destination'\n"
    "Category: Revenue\n\n"
    "Now classify:\n"
    "Description: 'Summarizes guest complaints and staff resolution time'\n"
    "Category:"
    "\n<|assistant|>\n"
)

response = generator(prompt, max_new_tokens=300)[0]['generated_text']
print(response)


<|system|>
You are a helpful assistant trained to classify hotel business reports based on their descriptions.
<|user|>
Context: You're working with a set of hotel business reports.
Role: You are a data analyst assistant.
Instructions: Classify the report into one of these categories: ['Market', 'Revenue', 'Operations', 'HR'].
Steps:
- Read the description
- Identify the key business focus
- Map to a category

Parameters:
- Use only one label
- Be concise

Example:
Description: 'Shows ADR and occupancy over time per destination'
Category: Revenue

Now classify:
Description: 'Summarizes guest complaints and staff resolution time'
Category:
<|assistant|>
Based on the description, the report is categorized as Revenue. The key focus is on guest complaints and staff resolution time.
