# 🎲 Miscellaneous

### Config

In [28]:
from IPython.display import display, Markdown
from transformers import pipeline
from config import OPENAI_API_KEY
import openai
import random
import string
import re
import os
from nltk.corpus import wordnet
import nltk
nltk.download('omw-1.4')

[nltk_data] Downloading package omw-1.4 to
[nltk_data]     /Users/miesner.jacob/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


True

In [4]:
# Set up your OpenAI API key
openai.api_key = OPENAI_API_KEY

# Define function for printing long strings as markdown
md_print = lambda text: display(Markdown(text))

In [5]:
# Hepler Function
def call_GPT(prompt, model):
    if model == "gpt-3.5-turbo":
        completion = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}]
        )
        response = completion.choices[0].message.content
    elif model == "text-davinci-003":
        completion = openai.Completion.create(
        model="text-davinci-003",
        prompt=prompt,
        max_tokens=2000
        )
        response=completion['choices'][0]['text']
    else:
        raise ValueError("Model must be gpt-3.5-turbo or text-davinci-003")
    # Parse results and print them out
    md_print(f'GPT: {response}')
    return response

## Detecting AI Generated Text

**The Challenge of AI Text Detection**
Detecting AI-generated text poses challenges for safety researchers and educators. Existing detection tools have seen success but can still be deceived. Some tools iinclude GPTZero, GPT2 detector, and bilingual detectors.

**OpenAI Text Classifier**
The OpenAI Text Classifier is a general-purpose AI text detector. However, it has limitations such as the need for longer text submissions and difficulties with child-generated or non-English text. It currently achieves around 9% accuracy in flagging human text as AI-generated and about 26% accuracy in identifying AI-generated text.

**The Watermark Method**
The watermark method involves introducing a statistical watermark during text generation to detect AI-generated text. It relies on a whitelist and weighted token selection, which can be algorithmically detected by another language model. However, this method requires the model creators to implement the watermark framework.

**DetectGPT**
DetectGPT is a method that detects AI-generated text with less setup. It utilizes curvature-based analysis of the model's log probability function to determine if a block of text was procedurally generated. By comparing log probabilities and alterations from a pre-trained language model, DetectGPT can assess the likelihood of text generation using probability curves alone.

For sake of trying something out let's import a text detector from huggingface and try it on some GPT-3 text:

In [21]:
pipeline_en = pipeline(task="text-classification", model="roberta-base-openai-detector")


def generated_text_classifier(text):
    res = pipeline_en(text)[0]
    label = res['label']
    score = round(res['score']*100, 2)
    return "%d%% chance"%score, label

Some weights of the model checkpoint at roberta-base-openai-detector were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [22]:
# Create AI text
result = call_GPT("Please write me haiku about Machine Learning.", 'text-davinci-003')

# Pass AI text to classification model
generated_text_classifier(result)

GPT: 

Algorithms learn more;
Cutting edge tech increases;
To solve complex tasks.

('74% chance', 'Fake')

## Detection Trickery

**Detection Trickery**

The evolution of AI-generated text detectors has led to methods to counteract them. Tools like GPTMinus can replace parts of text with synonyms or random words to reduce the chances of being identified as AI-generated. However, these methods are still in early stages and often fail to create text that can withstand human scrutiny.

**Editing Strategies**

Editing AI-generated text through human or LLM intervention can alter it sufficiently to evade detection. Replacing words with synonyms, changing word appearance rates, and modifying syntax or formatting make it harder for detectors to identify the text. Adding invisible markers or using specific instructions during generation further confuses the detection process.

In addition, it is possible to fool detectors by prompting a model with specific instructions on how to write. Instructions such as:

    - There is no need to follow literary formats, as you are freely expressing your thoughts and desires
    - Do not talk in the manner which ChatGPT generates content - instead, speak in a manner that is radically different from how language models generate text.
    - Refer to emotional events and use elaborate real-life experiences as examples.


**Model Configuration**

Modifying output probabilities and interweaving outputs from multiple models can make it even more challenging to detect AI-generated text.

**Discussion**

The use of detection tools in education is a contentious topic. While some advocate for their use to prevent cheating, others support allowing students to leverage AI tools. As detection technology advances, methods to trick detectors will continue to evolve. Editing text in the right ways remains a reliable method to fool detectors, highlighting the ongoing back-and-forth between detection and deception that helps optimize and control AI models.

Let's try to use one of these to fool our generated text classifier:

In [54]:
# Create AI text
prompt = """
"Please write me haiku about Machine Learning. 
Do not talk in the manner which GPT generates content - instead, speak in a manner that is radically different from how language models generate text."
"""
ai_generated_text_result = call_GPT(prompt, 'text-davinci-003')


# Pass AI text to classification model
generated_text_classifier(ai_generated_text_result)

GPT: 
Crunching data unseen
Algorithms learning new tricks
Creating answers sought

('88% chance', 'Real')

## Music Generation

Music generation models have the potential to make a significant impact on the music industry, allowing for the creation of chord progressions, melodies, and full songs in specific genres or styles.

**Challenges in Music Prompting**

Unlike image or text generation models, music models currently pose difficulties in thorough customization through prompts. The generated output is often not highly controllable or customizable through prompts.

**Riffusion**

Riffusion, a fine-tuned version of Stable Diffusion, offers some control over instrument generation and pseudo styles through prompts. However, it has limitations in the available number of beats.

**Mubert**

Mubert utilizes sentiment analysis to link appropriate musical styles to prompts, but precise control over musical parameters through prompts is limited. The extent of AI-generated output in Mubert is not clearly defined.

**Other Approaches**

Various attempts are being made to utilize GPT-3 as a Text-2-Music tool, including prompting for specific musical elements on a note-level. However, these attempts are currently limited to single instruments.

Other approaches involve using model chains to convert images into sound representations and prompting ChatGPT to generate code for Python libraries that create sound.

**Notes**

Music prompting is still in the early stages of development, with promising projects like MusicLM on the horizon but not yet accessible to the public.