# Transparency

In [27]:
!pip install torch transformers pydub pillow --quiet

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


## Watermarking

The below code demonstrates a method to establish transparency in content generated by large language models (LLMs) like GPT-2. 
- It involves two key functions: `generate_text_with_disclosure` and `add_watermark`. 

1. The `generate_text_with_disclosure` function generates text based on a given prompt and prepends a disclosure message stating that the content is AI-generated. This ensures that readers are immediately informed about the nature of the content. 

2. The `add_watermark` function appends a watermark tag, such as "AI_GENERATED," to the end of the text, serving as a digital signature to further indicate that the content was created by an AI system. By incorporating these elements, the code provides clear and visible markers that distinguish AI-generated content from human-written content. 

Additionally, setting a random seed with `set_seed` ensures reproducibility of the generated text, which is crucial for consistent results in transparency practices. Overall, this approach helps maintain trust and allows users to make informed decisions about the content they consume.


In [5]:
from transformers import pipeline, set_seed

# Load the text generation pipeline using the GPT-2 model
generator = pipeline("text-generation", model="gpt2")

def generate_text_with_disclosure(prompt):
    """
    Generate text based on a given prompt and prepend a disclosure message.

    Args:
        prompt (str): The input text prompt for the language model.

    Returns:
        str: The generated text with a disclosure indicating it is AI-generated.
    """
    disclosure = "Disclosure: This content is generated by an AI and not written by a human."
    generated_text = generator(prompt, max_length=200, num_return_sequences=1)[0]['generated_text']
    return f"{disclosure}\n\n{generated_text}"

def add_watermark(text, watermark="AI_GENERATED"):
    """
    Add a watermark to the text to indicate it is AI-generated.

    Args:
        text (str): The text to which the watermark will be added.
        watermark (str): The watermark tag to append to the text. Defaults to "AI_GENERATED".

    Returns:
        str: The text with the added watermark.
    """
    return f"{text}\n\n[{watermark}]"

# Set seed for reproducibility
set_seed(42)

# Example usage
prompt = "Write a persuasive essay on the benefits of renewable energy."
ai_generated_content = generate_text_with_disclosure(prompt)
ai_generated_content_with_watermark = add_watermark(ai_generated_content)

print(ai_generated_content_with_watermark)


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Disclosure: This content is generated by an AI and not written by a human.

Write a persuasive essay on the benefits of renewable energy. Learn what you ought to know about the different methods that make renewable energy available with solar power, and how the "carbon tax for the foreseeable future" should be applied to those who use it.

[AI_GENERATED]


### Audio Watermarking

The below code demonstrates a method to handle audio files from HTTP links and apply watermarking techniques for transparency and verification purposes. 

1. The `download_audio` function downloads audio from a given URL using the `requests` library and loads it into an `AudioSegment` object with `pydub`. 

2. The `embed_audio_watermark` function embeds a watermark into the audio by modifying the least significant bits of the audio samples, creating an imperceptible watermark that can be detected later. 

3. The `detect_audio_watermark` function extracts and detects the watermark by reading the least significant bits of the audio samples. 

In the example usage, an audio file is downloaded from a provided URL, the watermark is embedded into the downloaded audio, the watermarked audio is saved to a file, re-loaded, and the watermark is detected and printed to verify the process. This approach ensures that AI-generated content can be transparently marked and verified using digital watermarking techniques.

In [18]:
import requests
from pydub import AudioSegment
from io import BytesIO

def download_audio(url):
    """
    Download audio from a URL and return it as an AudioSegment.
    
    Args:
        url (str): The URL to download the audio from.
    
    Returns:
        AudioSegment: The downloaded audio.
    """
    response = requests.get(url)
    audio = AudioSegment.from_file(BytesIO(response.content))
    return audio

def embed_audio_watermark(audio, watermark="AI_GENERATED"):
    """
    Embed an invisible watermark in an audio file.
    
    Args:
        audio (AudioSegment): The audio to embed the watermark in.
        watermark (str): The watermark text to embed.
    
    Returns:
        AudioSegment: The watermarked audio.
    """
    samples = audio.get_array_of_samples()
    watermark_bits = ''.join(format(ord(char), '08b') for char in watermark)
    watermark_index = 0

    for i in range(len(samples)):
        if watermark_index < len(watermark_bits):
            samples[i] = (samples[i] & ~1) | int(watermark_bits[watermark_index])
            watermark_index += 1

    watermarked_audio = audio._spawn(samples)
    return watermarked_audio

def detect_audio_watermark(audio, watermark_length=12):
    """
    Detect an invisible watermark in an audio file.
    
    Args:
        audio (AudioSegment): The watermarked audio file.
        watermark_length (int): The length of the watermark.
    
    Returns:
        str: The detected watermark.
    """
    samples = audio.get_array_of_samples()
    watermark_bits = ""

    for i in range(watermark_length * 8):
        watermark_bits += str(samples[i] & 1)
    
    watermark_chars = [chr(int(watermark_bits[i:i+8], 2)) for i in range(0, len(watermark_bits), 8)]
    return ''.join(watermark_chars)



In [None]:
# Example usage
audio_url = "https://github.com/rafaelreis-hotmart/Audio-Sample-files/raw/master/sample.mp3"
original_audio = download_audio(audio_url)
watermarked_audio = embed_audio_watermark(original_audio)
watermarked_audio.export("watermarked_audio.wav", format="wav")


In [20]:

# Re-load the saved watermarked audio to test detection
watermarked_audio_loaded = AudioSegment.from_file("watermarked_audio.wav")
detected_watermark = detect_audio_watermark(watermarked_audio_loaded)
print(detected_watermark)  # Output: AI_GENERATED


AI_GENERATED


### Text Watermarking

For text, we can use a simple technique of embedding a unique identifier or hash in an imperceptible way. One approach is to use zero-width characters that do not affect the visible output but can be detected programmatically.

In [22]:
def embed_text_watermark(text, watermark="AI_GENERATED"):
    """
    Embed a watermark in the text using zero-width characters.
    
    Args:
        text (str): The original text.
        watermark (str): The watermark text to embed.
    
    Returns:
        str: The text with the embedded watermark.
    """
    zero_width_space = "\u200B"
    watermark_chars = zero_width_space.join(watermark)
    return f"{text}{zero_width_space}{watermark_chars}"

def detect_text_watermark(text, watermark="AI_GENERATED"):
    """
    Detect the watermark in the text.
    
    Args:
        text (str): The text with a potential embedded watermark.
        watermark (str): The watermark text to detect.
    
    Returns:
        bool: True if the watermark is detected, otherwise False.
    """
    zero_width_space = "\u200B"
    watermark_chars = zero_width_space.join(watermark)
    return watermark_chars in text

# Example usage
original_text = "This is a sample AI-generated text."
watermarked_text = embed_text_watermark(original_text)
print(watermarked_text)  # The watermark is imperceptible to human readers
print(detect_text_watermark(watermarked_text))  # Output: True


This is a sample AI-generated text.​A​I​_​G​E​N​E​R​A​T​E​D
True


### Image Watermarking

The below  code helps the embedding and detection of an invisible watermark in JPEG images by modifying the least significant bits of the RGB values. 

- The `embed_image_watermark` function reads the input image, converts it to RGB mode if necessary, and then flattens the pixel values for manipulation. The watermark text is converted to its binary representation, and each pixel's RGB values have their least significant bit altered to embed the watermark. After modification, the pixels are reshaped to the original image dimensions, and the watermarked image is saved. 

- `detect_image_watermark` function reads the watermarked image, converts it to RGB mode if needed, and flattens the pixel values. By examining the least significant bits of the RGB values, the watermark binary representation is reconstructed and converted back to its character form. This process allows for seamless embedding and detection of an invisible watermark in JPEG images, ensuring transparency and verification of AI-generated content.

In [28]:
from PIL import Image
import numpy as np

def embed_image_watermark(image_path, output_path, watermark="AI_GENERATED"):
    """
    Embed an invisible watermark in a JPEG image by modifying the least significant bits of the RGB values.
    
    Args:
        image_path (str): The path to the original image.
        output_path (str): The path to save the watermarked image.
        watermark (str): The watermark text to embed.
    
    Returns:
        None
    """
    image = Image.open(image_path)
    if image.mode != 'RGB':
        image = image.convert('RGB')
    pixels = np.array(image)

    # Flatten the pixels array for easier manipulation
    flat_pixels = pixels.flatten()
    watermark_bits = ''.join(format(ord(char), '08b') for char in watermark)
    watermark_index = 0

    for i in range(len(flat_pixels)):
        if watermark_index < len(watermark_bits):
            flat_pixels[i] = (flat_pixels[i] & ~1) | int(watermark_bits[watermark_index])
            watermark_index += 1

    # Reshape the flat array back to the original image shape
    watermarked_pixels = flat_pixels.reshape(pixels.shape)
    watermarked_image = Image.fromarray(watermarked_pixels)
    watermarked_image.save(output_path, format='JPEG')

def detect_image_watermark(image_path, watermark_length=12):
    """
    Detect an invisible watermark in a JPEG image by reading the least significant bits of the RGB values.
    
    Args:
        image_path (str): The path to the watermarked image.
        watermark_length (int): The length of the watermark.
    
    Returns:
        str: The detected watermark.
    """
    image = Image.open(image_path)
    if image.mode != 'RGB':
        image = image.convert('RGB')
    pixels = np.array(image)

    # Flatten the pixels array for easier manipulation
    flat_pixels = pixels.flatten()
    watermark_bits = ""

    for i in range(watermark_length * 8):
        watermark_bits += str(flat_pixels[i] & 1)
    
    watermark_chars = [chr(int(watermark_bits[i:i+8], 2)) for i in range(0, len(watermark_bits), 8)]
    return ''.join(watermark_chars)



In [30]:
# Example usage
original_image_path = "autumn-84714_1280.jpg"
watermarked_image_path = "path_to_watermarked_image.jpg"




In [31]:
embed_image_watermark(original_image_path, watermarked_image_path)
detected_watermark = detect_image_watermark(watermarked_image_path)
print(detected_watermark)  # Output: AI_GENERATED

©ïÞ¹N}²Bî¸
