Prompt Engineering Resource Guide

Exploration
1. General Guides
2. Prompt examples
Evaluation
1. Datasets
2. OpenAI evals
Beyond Prompt Engineering

"The hottest new programming language is English" - Andrej Karpathy, 24 Jan 2023

Prompt engineering is about skillfully creating input queries (prompts) to communicate with AI models like ChatGPT effectively. Think of it as writing instructions for a highly capable yet sometimes unpredictably dumb personal assistant.

This guide serves as a hands-on resource for developers and early adopters using large language models (LLMs). It goes beyond the usual one-off task prompts, focusing instead on processing large quantities of inputs via an API. When manual review of every output isn't feasible, it's critical to evaluate and manage the trade-offs between cost, speed, and output quality. Therefore we emphasize the 'engineering' part of prompt engineering here.

Our aim with this guide is to organize links to key external resources, and give concise commentary to help you find what's relevant for your task.

If you want to contribute to this guide, please open an issue, send a PR, or email me at prompts@matthiasberth.com.

Exploration

In the exploration phase of prompt engineering, the focus is on generating a range of candidate prompts that perform effectively on example inputs. This phase involves using a playground environment to experiment with various combinations of instructions, examples, and inputs, allowing for the identification and resolution of issues. Rapid iteration and drawing inspiration from existing prompts in the wild are key strategies during this phase.

General Guides

Prompt engineering guide from OpenAI

The Prompt engineering guide from OpenAI covers "Six strategies for getting better results":
Each comes with a set of tactics (like "Ask the model if it missed anything on previous passes"). The guide provides direct links to the OpenAI playground where you can try out examples.
Microsoft Introduction to prompt engineering,

The intro covers common techniques and best practices. The techniques article discusses Chain of Thought prompting, and the influence of the temperature parameter, among others.
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4

This research paper presents 26 guiding principles and evaluates their effectiveness across several models.
The CO-STAR framework

Suggests to structure the prompt as Context, Objective, Style, Tone, Audience, Response. Makes a lot of sense and helped the author win a competition. I'm still trying to track down original sources to the CO-STAR framework and that competition.

Prompt examples

OpenAI prompt examples

Many of these are geared to everyday use, but there are relevant prompts in the categories:
- Extract, e.g. Classify user reviews based on a set of tags.
- Transform, e.g. Convert ungrammatical statements into standard English.
Prompt collections / Libraries
- LangChain Hub collects prompts in a variety of areas, e.g. Tagging,
Summarization, Extraction.
- LangChain has prompts baked into its code. For example, here is a set of prompts for checking the correctness of summarizations: langchain/chains/llm_summarization_checker/prompts. So you can look up a use case in the LangChain docs (Summarization) and locate the relevant code.
e.g. langchain, Llamaindex
Finding examples by tasks / use case

Know the general category for your task, so you can search effectively for prompt examples, papers, and benchmark datasets.
1. Data Extraction
  
  Example: Extract product number, due date from unstructured orders received via email. (google this)
2. Sentiment Analysis
  
  Example: Analyzing customer feedback to determine sentiment towards a product or service. (google this)
3. Chatbot Conversations
  
  Example: Developing chatbots for handling customer service inquiries. (google this)
4. Text Classification
  
  Example: Categorizing support tickets into departments like technical, billing, general inquiries. (google this)
5. Named Entity Recognition (NER)
  
  Example: Identifying company names in financial reports. (google this)
6. Keyword Extraction
  
  Example: Extracting relevant keywords for SEO or document summarization. (google this)
7. Language Translation
  
  Example: Translating business documents or communications between languages. (google this)
8. Summarization
  
  Example: Generating concise summaries of long documents like business reports. (google this)
9. Topic Modeling
  
  Example: Identifying main topics in customer feedback or a collection of articles. (google this)
10. Spam Detection
  
  Example: Filtering out spam comments in a forum. (google this)
11. Intent Recognition
  
  Example: Understanding the intent behind customer messages in chatbot interactions. (google this)
12. Text Generation
  
  Example: Automatically generating text like product descriptions based on data inputs. (google this)
13. Question Answering Systems
  
  Example: Building systems for answering customer questions in natural language. (google this)
14. Emotion Detection
  
  Example: Identifying emotional states in text to understand customer sentiment. (google this)

Evaluation

Datasets

OpenAI evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Beyond Prompt Engineering

When all your prompt engineering efforts don't give good enough results, you can try some alternatives

Use another model. If you haven't done so already, try a different model with roughly the same or better capabilities. Keep in mind that performance is determined by the combination of model and prompt so you may want to iterate on your best prompt.
Fine-tune an existing model. You can select examples from your current dataset, or create them by hand.
Invest in better examples for a few-shot prompt. Think about providing more examples, more diverse examples, and positive vs. negative examples. If you're using RAG, try investing in the retrieval part of the pipeline.
Use ensembles / mixture of experts. Solve the same task by multiple different prompts / models, then consolidate results with a majority vote or some other mechanism.
Use automated methods to find a better prompt and / or better examples. For example, the DSPy paper reports performance improvements of 16-40% for their auto-optimized pipelines.
Roll your own NLP solution. For some tasks, you don't necessarily need the large language model, it's just much more convenient to use. There is a wide array of more classical NLP methods that you may want to use. You can still let LLMs help you with generating enough labeled data.
Pause. Seriously, sometimes it may be a viable approach to move on to the next promising application of LLMs. While you do that, something new may come up, like a price drop, a new more advanced model, or some research breakthrough that makes it worth revisiting the task.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Prompt Engineering Resource Guide

Exploration

General Guides

Prompt examples

Evaluation

Datasets

OpenAI evals

Beyond Prompt Engineering

About

License

mberth/ai

Folders and files

Latest commit

History

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Prompt Engineering Resource Guide

Exploration

General Guides

Prompt examples

Evaluation

Datasets

OpenAI evals

Beyond Prompt Engineering

About

Topics

Resources

License

Stars

Watchers

Forks