Skip to content

Latest commit

 

History

History
67 lines (46 loc) · 14.3 KB

responsibleai.md

File metadata and controls

67 lines (46 loc) · 14.3 KB

Responsible AI and Prompt Engineering

How to use Azure OpenAI and other tools to create safe and effective AI applications

Introduction

With great power comes great responsibility. As AI becomes more ubiquitous and influential, it is imperative to ensure that it is used in a responsible manner that adheres to ethical principles, respects human values, and minimizes potential harms. In this article, we will explore some of the challenges and opportunities of using large language models (LLMs), such as those developed by Azure OpenAI, to create AI applications. We will also introduce some of the tools and best practices that can help developers to use LLMs responsibly and effectively, such as mitigation strategies, prompt engineering, and monitoring. By the end of this post, you will have a better understanding of how to leverage the power of LLMs while ensuring the safety and quality of your AI applications.

What are Large Language Models and why are they important?

Large language models are deep neural networks that are trained on massive amounts of text data to learn the patterns and structures of natural language. They can generate text and images on various topics and tasks, such as summarization, translation, question answering, and more. LLMs can also perform zero-shot or few-shot learning, which means they can perform tasks that they were not explicitly trained on, by using natural language prompts as inputs. LLMs are important because they can unlock new capabilities and opportunities for AI applications. They can provide natural and engaging user experiences, enable novel forms of creativity and expression, and solve complex problems that require natural language understanding and generation. LLMs can also reduce the need for large amounts of labeled data and domain-specific expertise, making AI development more accessible and efficient.

What are the risks and challenges of using Large Language Models?

While LLMs offer many benefits, they also pose several risks and challenges that need to be addressed. Some of the potential harms that LLMs can cause or amplify are:

  • Discrimination, Hate Speech and Exclusion: LLMs can generate or propagate harmful content such as hate speech, offensive language, explicit sexual content, discrimination, violence, exclusionary norms, and issues pertaining to service quality. This can negatively affect the users and the society, as well as damage the reputation and trust of the developers and the organizations.
  • Unintentional Misinformation/ Hallucinations: LLMs can generate or propagate false or misleading information, either intentionally or unintentionally. This can result from the lack of grounding, verification, or alignment of the generated text with the user's intent or the factual sources. This can undermine the credibility and reliability of the AI applications, as well as cause confusion, misinformation, or harm to the users and the society.
  • Information Hazards: LLMs can generate or leak sensitive information, such as personal identification information (PII), confidential data, or intellectual property. This can compromise the privacy and security of the users and the organizations, as well as violate the legal and ethical obligations of the developers and the organizations.
  • Malicious Usage: LLMs can be used for malicious purposes, such as prompt injection attacks, also known as jailbreaks. This is when an adversary manipulates the input prompt to trick the LLM into generating harmful or undesirable outputs, such as commands, passwords, or secrets. This can compromise the integrity and functionality of the AI applications, as well as cause damage or harm to the users and the organizations.
  • Environmental Consequences: LLMs can have a significant environmental impact, due to the high computational cost and energy consumption of training and running large models. This can contribute to the global carbon footprint and climate change, as well as increase the operational costs and resource constraints of the developers and the organizations.

How to mitigate the risks and challenges of using Large Language Models?

To mitigate the risks and challenges of using LLMs, it is recommended to adopt a comprehensive, iterative, and layered approach that involves the following steps:

  • Identify: The first step is to identify the potential harms and risks that the LLMs can cause or amplify, based on the use case, the domain, the audience, the data, and the model. This can be done by conducting a risk assessment, a stakeholder analysis, and a data audit, as well as consulting relevant guidelines, frameworks, and standards for responsible AI.
  • Prevent: The second step is to prevent or reduce the occurrence of the potential harms and risks, by applying appropriate design choices, techniques, and tools. This can include data cleaning, data augmentation, data labeling, model selection, model fine-tuning, model testing, prompt engineering, content filtering, and content moderation. The Azure OpenAI service offers aditional content filters. In the reference implementation we include a set of recomended setting for these content filters.

Content Filters

  • Detect: The third step is to detect or monitor the occurrence of the potential harms and risks, by applying appropriate methods and tools. This can include model evaluation, model benchmarking, model auditing, model monitoring, content analysis, content safety, and content quality. We also recomend using aditional content safety models to run on top of your main filters (as seen in the reference implementation):

Additional Content Filters

  • Respond: The fourth step is to respond or remediate the occurrence of the potential harms and risks, by applying appropriate actions and tools. This can include model updating, model retraining, model deprecation, prompt revision, content correction, content deletion, content reporting, and content feedback. These steps should be performed in a cyclic and continuous manner, as the LLMs and the use cases evolve over time. The developers and the organizations should also document and communicate the process and the outcomes of the mitigation steps, to ensure transparency, accountability, and trustworthiness of the AI applications.

How to use Prompt Engineering to create effective and responsible AI applications?

Prompt engineering is the process of designing and refining natural language prompts that elicit the desired outputs from LLMs. Prompt engineering can help to improve the performance, quality, and safety of the AI applications, by providing additional context, data, and parameters to the LLMs. Prompt engineering can also help to reduce the need for model fine-tuning or retraining, by leveraging the zero-shot or few-shot learning capabilities of the LLMs. To use prompt engineering effectively and responsibly, it is recommended to follow these steps:

  • Define: The first step is to define the goal, the task, the domain, and the audience of the AI application, as well as the desired format, style, and tone of the output. This can help to narrow down the scope and the requirements of the prompt engineering process.
  • Design: The second step is to design the initial prompt, by providing the input text, the output text, and the optional parameters to the LLM. The input text should be clear, concise, and relevant to the task and the domain. The output text should be an example of the desired output, or a placeholder that indicates the expected output. The optional parameters can include meta-context, data sources, logic and reasoning, safety and quality, and other constraints or preferences.
  • Test: The third step is to test the initial prompt, by providing different input texts and observing the output texts generated by the LLM. The output texts should be evaluated based on the criteria of groundedness, coherence, fluency, relevance, and similarity, as well as the specific metrics and indicators of the task and the domain. The output texts should also be checked for any potential harms or risks, such as those mentioned in the previous section.
  • Iterate: The fourth step is to iterate the prompt, by modifying the input text, the output text, or the optional parameters, based on the feedback and the results of the testing step. The iteration process should aim to optimize the performance, quality, and safety of the output texts, as well as to generalize the prompt to handle different input texts and scenarios.
  • Deploy: The fifth step is to deploy the prompt, by integrating it into the AI application and exposing it to the end users. The prompt should be monitored and updated regularly, based on the user feedback and the changing needs and contexts of the AI application. To facilitate the prompt engineering process, there are several tools and platforms that can help the developers to create, test, iterate, and deploy prompts for LLMs. Some of these tools are:
  • Prompt Flow: Prompt flow is a development tool that streamlines the entire development cycle of AI applications powered by LLMs. It provides a comprehensive solution that simplifies the process of prototyping, experimenting, iterating, and deploying prompts for LLMs. Prompt flow allows developers to create executable flows that link LLMs, prompts, and Python tools through a visualized graph. It also enables developers to debug, share, and iterate their flows with ease through team collaboration and historical tracking. Moreover, prompt flow allows developers to create prompt variants and evaluate their performance through large-scale testing. It also integrates prompt flows into CI/CD workflows and pipelines, including running batch jobs. Furthermore, prompt flow allows developers to deploy a real-time endpoint that unlocks the full power of LLMs for their applications. Finally, prompt flow allows developers to bring the value of prompt flow into their native IDE with SDK, CLI, and REST support.
  • Visual Studio Code: Visual Studio Code is a code editor that supports a variety of programming languages and frameworks. It also supports prompt engineering for LLMs, by providing features such as syntax highlighting, code completion, code formatting, code snippets, code debugging, code testing, code refactoring, and code documentation. Visual Studio Code also allows developers to access and use LLMs from Azure OpenAI, as well as other platforms and providers.
  • GitHub Codespaces: GitHub Codespaces is a cloud-based development environment that allows developers to code, build, test, and deploy applications from anywhere. It also supports prompt engineering for LLMs, by providing features such as code editing, code collaboration, code versioning, code review, code hosting, and code deployment. GitHub Codespaces also allows developers to access and use LLMs from Azure OpenAI, as well as other platforms and providers.
  • Semantic Kernel: Semantic Kernel is a platform that allows developers to create, manage, and share semantic data for LLMs. It also supports prompt engineering for LLMs, by providing features such as data ingestion, data cleaning, data annotation, data augmentation, data validation, data querying, data visualization, and data export. Semantic Kernel also allows developers to access and use LLMs from Azure OpenAI, as well as other platforms and providers.
  • LangChain: LangChain is a platform that allows developers to create, manage, and share natural language data for LLMs. It also supports prompt engineering for LLMs, by providing features such as data collection, data annotation, data verification, data distribution, data monetization, and data governance. LangChain also allows developers to access and use LLMs from Azure OpenAI, as well as other platforms and providers.

How to monitor and evaluate the performance of Large Language Models in production?

Monitoring and evaluating the performance of LLMs in production is an essential step to ensure the quality, reliability, and safety of the AI applications. It can also help to identify and address any issues, gaps, or opportunities for improvement. To monitor and evaluate the performance of LLMs in production, it is recommended to use the following tools and methods:

  • Model Data Collector: Model Data Collector is a tool that allows developers to collect production data from LLMs, such as the input prompts, the output texts, and the optional parameters. It also allows developers to label and annotate the production data, as well as to store and manage the production data in a secure and scalable manner.
  • Responsible AI Evaluation Metrics: Responsible AI Evaluation Metrics are a set of metrics that allow developers to measure and compare the performance of LLMs in production, based on the criteria of groundedness, coherence, fluency, relevance, and similarity, as well as the specific metrics and indicators of the task and the domain. These metrics are interoperable with the prompt flow evaluation metrics, and can be used to analyze the production data collected by the model data collector.
  • Alerts and Notifications: Alerts and Notifications are a feature that allows developers to configure and receive alerts and notifications when the performance of LLMs in production deviates from the predefined targets or thresholds, based on the responsible AI evaluation metrics. This can help developers to quickly identify and respond to any issues or anomalies in the production environment.
  • Azure AI Studio: Azure AI Studio is a platform that allows developers to create, manage, and deploy AI and machine learning models and applications. It also supports the monitoring and evaluation of LLMs in production, by providing features such as model data collector, responsible AI evaluation metrics, alerts and notifications, and a rich dashboard that visualizes the metrics over time. Azure AI Studio also integrates with Azure OpenAI, Azure ML, and Copilot, as well as other platforms and providers.

Conclusion

In this article we have discussed some of the challenges and opportunities of using large language models to create AI applications. We have also introduced some of the tools and best practices that can help developers to use large language models responsibly and effectively, such as mitigation strategies, prompt engineering and monitoring. By following these steps, developers can leverage the power of large language models while ensuring the safety and quality of their AI applications.