# Introduction to Prompt Engineering with Large Language Models

In this notebook, you'll learn how to interact with large language models (LLMs) using prompting.

We will explore:

- Basic prompts
- Improving prompts with context
- Few-shot examples
- Using prompt templates

<a href="https://colab.research.google.com/github/cbadenes/semantic-report-search/blob/main/data/analysis/40_prompting_basics.ipynb" target="_parent">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a>


In [1]:
from huggingface_hub import login
import getpass

token = getpass.getpass("🔑 Enter your Hugging Face token: ")
login(token)


🔑 Enter your Hugging Face token: ··········


### Parameters for `pipeline("text-generation", ...)`

- **model**: The preloaded model (e.g., `AutoModelForCausalLM`).
- **tokenizer**: The tokenizer that matches the model (e.g., `AutoTokenizer`).

#### Generation Parameters:

- **max_length**:
  - Total length of input + generated output.
  - Use this for absolute control over sequence size.
  - Can be overridden by `max_new_tokens`.

- **max_new_tokens**:
  - Limits the number of *new* tokens the model should generate.
  - Use this instead of `max_length` when input varies in length.
  - Recommended for most prompting scenarios.

- **truncation**:
  - If `True`, cuts input to fit within model limits.
  - Useful when feeding long text as prompt.

- **do_sample**:
  - If `True`, the model samples from the probability distribution (randomness).
  - If `False`, it picks the most likely next token (greedy decoding).
  - Recommended: `True` + `temperature` tuning for creativity.

- **temperature**:
  - Controls the randomness of sampling.
  - Lower = more deterministic (e.g., 0.3), higher = more creative (e.g., 0.8).
  - Best used with `do_sample=True`.

- **return_full_text**:
  - If `True`, the result includes both the prompt and generated text.
  - If `False`, it returns only the model's output.


In [5]:
!pip install -q transformers accelerate

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

tokenizer = AutoTokenizer.from_pretrained(model_id, token=token)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", token=token)

Device set to use cuda:0


In [8]:
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_length=512,          # Total max length of input + output
    truncation=True,         # Truncate input if it exceeds model's max length
    do_sample=True,          # Enable sampling (randomness); set False for greedy decoding
    return_full_text=True,  # If True, returns input + output; if False, only generated part
    temperature=0.9,         # Controls randomness; lower = more deterministic
    max_new_tokens=10       # Number of tokens to generate (output only)
)


Device set to use cuda:0


Basic Prompt:

In [10]:
prompt = (
    "<|system|>\n"
    "You are a helpful assistant.\n"
    "<|user|>\n"
    "Summarize the purpose of a report that analyzes the distribution flows in feeder markets.\n"
    "<|assistant|>\n"
)
response = generator(prompt)[0]['generated_text']
print(response)


<|system|>
You are a helpful assistant.
<|user|>
Summarize the purpose of a report that analyzes the distribution flows in feeder markets.
<|assistant|>
A report analyzing the distribution flows in feeder


## Exercise: Generating a Quality Description from Structured Metadata

In this exercise, you will build a prompt to help a language model generate a high-quality description for a business report based on structured metadata.

You are given a metadata table for a report with fields such as report name, KPIs, dimensions, and business domain.

Your task is to:
1. **Create a well-structured prompt** that could be sent to a language model (e.g. ChatGPT) to generate a clear, informative, and context-aware description of the report.
2. **Compare the generated description** with the original one written by a human expert.

| Field | Value |
|-------|-------|
| ID Data Product | RPPBI0004 |
| Report Name | eCommerce Report 2024 |
| Product Owner | Bradley Garcia |
| PBIX_File | ProductionReport.pbix |
| Report View | B2B Digital Report |
| Category | Functional |
| Description (original) | Specific view to analyze B2B Digital performance. Focused on Total Revenue, segmented by Channel, Brand, and Sub BU. |
| Dimensions | Channel, Subchannel, Sub BU, Feeder Market, Brand, Source System, Target Market, Country, Customer Segment |
| KPIs | Total Revenue |
| Other Terms | CCG, CGW, CRO B2B Digital, NHPro Agencies, NHP, Digital Share |
| Tags | B2B Digital |
| Filters | No |
| Priority | Priority 1 |


Write a prompt that could be used to generate a one-paragraph description of the report. The model should:

- Explain the purpose of the report
- Mention the key metrics and dimensions
- Be concise, neutral, and professional
- Reflect business relevance

Once you generate your prompt and get a model output, evaluate it using the table above and reflect on possible improvements.


In [None]:
prompt = (
    "<|system|>\n"
    "You are a helpful assistant.\n"
    "<|user|>\n"
    "Summarize the purpose of a report that analyzes the distribution flows in feeder markets.\n"
    "<|assistant|>\n"
)
response = generator(prompt)[0]['generated_text']
print(response)

### Evaluation Criteria

You will compare your model-generated description to the original based on the following:

| Criterion | Description |
|----------|-------------|
| **Factual Accuracy** | Does it include the correct metrics and dimensions? |
| **Completeness** | Are key elements from the metadata represented? |
| **Clarity** | Is the generated description easy to understand? |
| **Brevity** | Is the output concise without omitting key info? |
| **Tone** | Is the tone appropriate for a business audience? |

---


In [None]:
from sentence_transformers import SentenceTransformer, util
from sklearn.metrics import precision_score, recall_score, f1_score
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import numpy as np

# Descargar recursos necesarios
nltk.download('punkt')
nltk.download('stopwords')

# Modelo semántico para embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')

# Ejemplos de descripción (editable)
reference = "Specific view to analyze B2B Digital performance. Focused on Total Revenue, segmented by Channel, Brand, and Sub BU."
generated = "This report provides an overview of B2B Digital performance, emphasizing total revenue and breaking it down by brand, channel, and business unit."

# 1. Similitud semántica
embedding_ref = model.encode(reference, convert_to_tensor=True)
embedding_gen = model.encode(generated, convert_to_tensor=True)
semantic_similarity = float(util.pytorch_cos_sim(embedding_ref, embedding_gen))

# 2. Métricas básicas: precisión, recall, F1 sobre palabras clave
def get_keywords(text):
    tokens = word_tokenize(text.lower())
    stop_words = set(stopwords.words('english'))
    return set([w for w in tokens if w.isalnum() and w not in stop_words])

ref_keywords = get_keywords(reference)
gen_keywords = get_keywords(generated)

true_positives = len(ref_keywords & gen_keywords)
precision = true_positives / len(gen_keywords) if gen_keywords else 0
recall = true_positives / len(ref_keywords) if ref_keywords else 0
f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0

# 3. Mostrar resultados
print(" Evaluation Summary:")
print(f"Semantic Similarity (cosine): {semantic_similarity:.3f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")
print(f"Reference Keywords: {sorted(ref_keywords)}")
print(f"Generated Keywords: {sorted(gen_keywords)}")
