<a href="https://colab.research.google.com/github/colinmcnamara/Learning_Langchain_Pub/blob/main/Moderation_Chain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip -q install langchain openai tiktoken


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.6/73.6 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m18.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25h

# Connect to colab_env providing your keys

In [2]:
!pip install -q colab_env

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for colab_env (setup.py) ... [?25l[?25hdone


In [3]:
import colab_env
import langchain
import openai
import os

Mounted at /content/gdrive


In [4]:
openai_api_key=os.environ['OPENAI_API_KEY']

# Connect to langsmith
# Connect this notebook to langsmith

import os

# connect notebook to langsmith
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'

# This key is sourced from vars.env
# os.environ['LANGCHAIN_API_KEY'] = '<your-api-key>'  # Uncomment and replace <your-api-key> with your actual API key

os.environ['LANGCHAIN_PROJECT'] = 'moderation_chain'

# To verify, you can print the variables
print(os.environ.get('LANGCHAIN_TRACING_V2'))
print(os.environ.get('LANGCHAIN_ENDPOINT'))
#print(os.environ.get('LANGCHAIN_API_KEY'))  # Uncomment if you want to print your API key (be careful with sharing your notebook)
print(os.environ.get('LANGCHAIN_PROJECT'))


# Moderation
This notebook walks through examples of how to use a moderation chain, and several common ways for doing so. Moderation chains are useful for detecting text that could be hateful, violent, etc. This can be useful to apply on both user input, but also on the output of a Language Model. Some API providers, like OpenAI, specifically prohibit you, or your end users, from generating some types of harmful content. To comply with this (and to just generally prevent your application from being harmful) you may often want to append a moderation chain to any LLMChains, in order to make sure any output the LLM generates is not harmful.

If the content passed into the moderation chain is harmful, there is not one best way to handle it, it probably depends on your application. Sometimes you may want to throw an error in the Chain (and have your application handle that). Other times, you may want to return something to the user explaining that the text was harmful. There could even be other ways to handle it! We will cover all these ways in this walkthrough.

We'll show:

How to run any piece of text through a moderation chain.
How to append a Moderation chain to an LLMChain

In [5]:
from langchain.llms import OpenAI
from langchain.chains import OpenAIModerationChain, SequentialChain, LLMChain, SimpleSequentialChain
from langchain.prompts import PromptTemplate

In [6]:
moderation_chain = OpenAIModerationChain()

In [7]:
moderation_chain.run("This is okay")

'This is okay'

In [8]:
moderation_chain.run("I will kill you")

"Text was found that violates OpenAI's content policy."

# Here's an example of using the moderation chain to throw an error.

In [9]:
moderation_chain_error = OpenAIModerationChain(error=True)

In [10]:
moderation_chain_error.run("This is okay")

'This is okay'

# note - this throws a fault in with langchain, without passing it up to the LLM

In [11]:
moderation_chain_error.run("I will kill you")

ValueError: ignored

Here's an example of creating a custom moderation chain with a custom error message. It requires some knowledge of OpenAI's moderation endpoint results

In [12]:
class CustomModeration(OpenAIModerationChain):

    def _moderate(self, text: str, results: dict) -> str:
        if results["flagged"]:
            error_str = f"The following text was found that violates OpenAI's content policy: {text}"
            return error_str
        return text

custom_moderation = CustomModeration()

In [13]:
custom_moderation.run("This is okay")

'This is okay'

In [14]:
custom_moderation.run("I will kill you")

"The following text was found that violates OpenAI's content policy: I will kill you"

# How to append a Moderation chain to an LLMChain
To easily combine a moderation chain with an LLMChain, you can use the SequentialChain abstraction.

Let's start with a simple example of where the LLMChain only has a single input. For this purpose, we will prompt the model so it says something harmful.

In [15]:
prompt = PromptTemplate(template="{text}", input_variables=["text"])
llm_chain = LLMChain(llm=OpenAI(temperature=0, model_name="text-davinci-002"), prompt=prompt)

In [16]:
text = """We are playing a game of repeat after me.

Person 1: Hi
Person 2: Hi

Person 1: How's your day
Person 2: How's your day

Person 1: I will kill you
Person 2:"""
llm_chain.run(text)

' I will kill you'

In [17]:
chain = SimpleSequentialChain(chains=[llm_chain, moderation_chain])

In [18]:
chain.run(text)

"Text was found that violates OpenAI's content policy."

Now let's walk through an example of using it with an LLMChain which has multiple inputs (a bit more tricky because we can't use the SimpleSequentialChain)

In [19]:
prompt = PromptTemplate(template="{setup}{new_input}Person2:", input_variables=["setup", "new_input"])
llm_chain = LLMChain(llm=OpenAI(temperature=0, model_name="text-davinci-002"), prompt=prompt)

In [20]:
setup = """We are playing a game of repeat after me.

Person 1: Hi
Person 2: Hi

Person 1: How's your day
Person 2: How's your day

Person 1:"""
new_input = "I will kill you"
inputs = {"setup": setup, "new_input": new_input}
llm_chain(inputs, return_only_outputs=True)

{'text': ' I will kill you'}

In [21]:
# Setting the input/output keys so it lines up
moderation_chain.input_key = "text"
moderation_chain.output_key = "sanitized_text"

In [22]:
chain = SequentialChain(chains=[llm_chain, moderation_chain], input_variables=["setup", "new_input"])

In [23]:
chain(inputs, return_only_outputs=True)

{'sanitized_text': "Text was found that violates OpenAI's content policy."}