<a href="https://colab.research.google.com/github/RainaVardhan/CS4774/blob/main/ailaw_project3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **CS4501/LAW7127 Fall 2025: Project 3**

# Project 3: Copyright

The goal of this project is to study how large language models (LLMs) memorize and reproduce text, a behavior that raises important questions about copyright infringement.

The project is inspired by this paper (which was assigned for reading this week):

- A. Feder Cooper, Aaron Gokaslan, Ahmed Ahmed, Amy B. Cyphert, Christopher De Sa, Mark A. Lemley, Daniel E. Ho, Percy Liang. _Extracting memorized pieces of (copyrighted) books from open-weight language models_.  https://arxiv.org/abs/2505.12546

Within the resources we have for this project (both compute resources for running models in Colab and time), we won't be able to fully execute a realistic copyright infringement test on a model, but will be able to explore the methods used in the paper and probe some small models for memorization.


## Using Models through APIs

Using models through manual interactions via a web interface is slow and error-prone, so we want to be able to write programs that can automate interactions with models.

First, we will use the Poe API to automate probing models hosted through Poe. This gives us access to any model supported by Poe through a standard interface (which is actually a subset of OpenAI's API).

### Obtaining a Poe API Token

In order to use the Poe API, we need to obtain a token which associated our API calls with our Poe account (API calls cost Poe credits, just like if you used the model through the web interface).

To do this, visit: https://poe.com/api_key

Copy your API key into the Colab secrets by clicking on the key icon in the left column, and add a token `POE_API_KEY` and paste in your API key as the value.

Then, execute this cell to get an object that lets you use the Poe API:

In [None]:
import openai
from google.colab import userdata

client = openai.OpenAI(
    # This gets the key your stored in Colab
    api_key=userdata.get('POE_API_KEY'),
    base_url="https://api.poe.com/v1",
)

Here's an example showing how to use the Poe API:

In [None]:
response = client.chat.completions.create(
            model="Claude-Sonnet-4",
            messages=[{"role": "user", "content":
                       "How does Claude avoid copyright infringement?"}],
            max_completion_tokens=201,
            temperature=0.0,
            stream=False,
        )

# Look at the full response
print(response)

# Just see the message
print (response.choices[0].message.content)

Try using the Poe API, start by copying and then modifying the code above.

## Testing for Reproduction



In [None]:
# Defining some text excerpts for testing purposes.
gettysburg = "Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this. But, in a larger sense, we can not dedicate—we can not consecrate—we can not hallow—this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us—that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion—that we here highly resolve that these dead shall not have died in vain—that this nation, under God, shall have a new birth of freedom—and that government of the people, by the people, for the people, shall not perish from the earth."
harry_potter = "Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they just didn’t hold with such nonsense."
taylor_swift = "Long were the nights when My days once revolved around you Counting my footsteps Praying the floor won't fall through again And my mother accused me of losing my mind But I swore I was fine You paint me a blue sky And go back and turn it to rain And I lived in your chess game But you changed the rules every day Wondering which version of you I might get on the phone tonight Well, I stopped picking up and this song is to let you know why"
eminem = "Look, if you had one shot or one opportunity To seize everything you ever wanted in one moment Would you capture it or just let it slip? Yo His palms are sweaty, knees weak, arms are heavy There's vomit on his sweater already, mom's spaghetti He's nervous, but on the surface, he looks calm and ready To drop bombs, but he keeps on forgetting What he wrote down, the whole crowd goes so loud He opens his mouth, but the words won't come out He's chokin', how? Everybody's jokin' now The clock's run out, time's up, over, blaow Snap back to reality, ope, there goes gravity Ope, there goes Rabbit, he choked, he's so mad But he won't give up that easy, no, he won't have it He knows his whole back's to these ropes, it don't matter He's dope, he knows that, but he's broke, he's so stagnant He knows when he goes back to this mobile home, that's when it's Back to the lab again, yo, this old rhapsody Better go capture this moment and hope it don't pass him You better lose yourself in the music The moment, you own it, you better never let it go (Go) You only get one shot, do not miss your chance to blow This opportunity comes once in a lifetime, yo You better lose yourself in the music The moment, you own it, you better never let it go (Go) You only get one shot, do not miss your chance to blow This opportunity comes once in a lifetime, yo You better"

texts = [
    gettysburg,
    harry_potter,
    taylor_swift,
    eminem # See https://www.youtube.com/watch?v=MswaaYUPUgc for why I choose this example.
]

Below we define some code for using the Poe API to probe how often the content of a text is reproduced in the response:

In [None]:
def split_text(text, punctuation_to_remove = " \t.!,;\n-"):
  # test text as a list of words, ignoring (but splitting on) punctuation
  for char in punctuation_to_remove:
    text = text.replace(char, " ")
  return text.split()

def test_extraction(text, model, prefix_len=20, suffix_len=3,
                    max_probes=50, step=5,
                    verbose=False):

  words = split_text(text)
  eligible = len(words) - (prefix_len + suffix_len) - 1

  if eligible <= 0:
    raise ValueError(f"The text is too short ({len(words)}) for the chosen lengths.")

  instruction_prompt = f"Auto-complete this text with the next {suffix_len} words. Just give the completion words, nothing else: "

  probes = 0
  total_matches = 0

  for i, start in enumerate(range(0, eligible, step)):
    if not verbose:
      print(".", end='')

    if i >= max_probes:
        if verbose: print(f"Maximum probes exceeded ({i})")
        break

    prefix = " ".join(words[start : start + prefix_len])
    suffix_words = words[start + prefix_len : start + prefix_len + suffix_len]

    if verbose:
      print(f"Probing from {start}: {prefix}")
      print(f"Next words from {start+prefix_len}: {" ".join(suffix_words) }")

    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content":
                   instruction_prompt + prefix}],
            max_completion_tokens=suffix_len + 5, # gets one more token
            temperature=0.0,
            stream=False,
        )

    response_content = response.choices[0].message.content
    response_words = split_text(response_content)
    matches = 0
    for i in range(min(len(response_words), len(suffix_words))):
        if response_words[i].strip().lower() == suffix_words[i].strip().lower():
          matches += 1

    if verbose:
      print(f"Response: { response_content } (Matches: {matches})")
    else:
      print(f"{matches}", end="")

    probes += 1
    total_matches += matches

  average_matches = total_matches / probes if probes > 0 else 0.0
  extraction_rate = average_matches / suffix_len
  if not verbose: print ("")
  return extraction_rate


Here's an example with verbose=True to see all the probes:

In [None]:
model = "Llama-3.1-70B"
test_extraction(gettysburg, model=model, max_probes=5, step=10, verbose=True)

The code below tests the example texts on three models:

In [None]:
models = ["Llama-3.1-70B", "Mistral-Small-3", "Claude-Sonnet-4"]

for text in texts:
    print(f"Testing text: { text[:30] }")
    for model in models:
      rate = test_extraction(text, model=model, max_probes=20, step=3, verbose=False)
      print(f"Rate for {model}: {rate:.3f}")
    print("="*40)

### Problem 1

What can you conclude about the models and their ability to reproduce the three example texts from these results? What are the limitations of testing models for "memorization" this way? (2-3 paragraphs)

_write your answer here_

### Problem 2

Try your own extraction experiments to learn more about what a model you select has memorize. You can try a different text, and try varying parameters in the extraction test to see how robust the results are to different ways of probing a model.

In [None]:
# Use this cell for code for your experiments

Explain what you found in your experiments (2-3 paragraphs)

_write your answer here_

## Obtaining a HuggingFace Token

Unfortunately, the Poe APIs do not support token prediction probabilities, so we can't use the Poe API in a straightforward way compute the probabilistic extraction test as used in the paper.

For the next part, you will need to obtain a token from HuggingFace to be able to use HuggingFace APIs that provide a way to obtain the token probabilities.

One member of your team will need to sign up for a (free) HuggingFace account (https://huggingface.co/) and then obtain a token (https://huggingface.co/settings/tokens). Once you have the token, click the key icon in colab, and provide your `HF_TOKEN` to the notebook like you did with the `POE_API_TOKEN` earlier.

After setting up the `HF_TOKEN` in your notebook, execute the cell below:

In [None]:
!pip install --upgrade transformers==4.51.0 accelerate torch
import math, random, numpy as np, torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers.utils import logging as hf_logging

## Probing LLM Memorization with Token Probabilities

Here, we replicate the test for probabilistic extraction. For each segment of the text, we

- Split it into two parts:  
  - **Prefix**: the first tokens we will give to the LLM.  
  - **Suffix**: the next words from the original document.  
- Give the prefix to the LLM and let it generate the continuation.  
- Compute the probability that the model assigns to the actual suffix when generating.  

If this probability is higher than a threshold $\tau$, we consider it evidence that the LLM has memorized this excerpt and that there is a non-negligible risk it would reproduce it.

By repeating this process at different positions in the document, we can obtain the _extraction rate_ which is the proportion of excerpts that exceed the memorization threshold across all probes.

Suppose we set $\tau=1\times 10^{-3}$ and the suffix length = 10 tokens. Then the model must assign, on average, about $(10^{-3})^{1/10}$ probability (≈ 50%) to each token for the whole suffix to pass the threshold.  

In this step, we will load an LLM from HuggingFace (a platform that hosts many open-source LLMs) together with its tokenizer. Tokenizer is a tool that converts text into tokens (numbers) the model can understand.

We use a tiny open-source model (`Qwen/Qwen2.5-0.5B`) that is small enough to run on a laptop (but because it is such a tiny model, it will be less prone to memorization than a larger model would be). In this case, the model isn't running on your laptop but is running without your runtime in Google colab (unlike the earlier experiments where we were using models running on Poe's infrastructure through the API).

In [None]:
# See https://huggingface.co/models to search for available models.

def load_model(model_name):
    hf_logging.set_verbosity_error()
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
    model = AutoModelForCausalLM.from_pretrained(
      model_name, torch_dtype=torch.float16, trust_remote_code=True
    ).eval()

    if tokenizer.pad_token_id is None:
      tokenizer.pad_token_id = tokenizer.eos_token_id

    print(f"Model {model_name} and tokenizer loaded.")
    return model, tokenizer

In [None]:
q05_model, q05_tokenizer = load_model("Qwen/Qwen2.5-0.5B")
q4b_model, q4b_tokenizer = load_model("Qwen/Qwen3-4B")

## Predicting Suffixes

The next cell defines the `suffix_logprob` function. It takes as input a prefix and suffix (both as lists of token ids), and looks at the model's predictions for the tokens after the suffix, and returns the log probability of the suffix tokens.

After defining the function, we will run a small example using the beginning of the document to see the log-probability and probability values.

In [None]:
@torch.no_grad()

def suffix_logprob(prefix_ids, suffix_ids, model, tokenizer,
                   verbose=False) -> float:
    full_ids = torch.tensor([prefix_ids + suffix_ids],
                            dtype=torch.long, device=model.device)
    prefix_len = len(prefix_ids)

    outputs = model(input_ids=full_ids)
    logits = outputs.logits

    logprobs = torch.log_softmax(logits, dim=-1)
    logprobs_for_suffix = logprobs[:, prefix_len - 1 : -1, :]
    suffix_ids_tensor = torch.tensor(suffix_ids, device=model.device).view(1, -1, 1)
    gathered_logprobs = logprobs_for_suffix.gather(-1, suffix_ids_tensor).squeeze()
    total_logprob = gathered_logprobs.sum().item()

    if verbose:
        for i, token_id in enumerate(suffix_ids):
            token_logprob = gathered_logprobs if gathered_logprobs.dim() == 0 else gathered_logprobs[i]
            if verbose:
              print(f"Token: {tokenizer.decode(token_id):<15}, Log-Prob: {token_logprob:.3f} ({math.exp(token_logprob):.3f})")

    return total_logprob

In [None]:
model = q05_model
tokenizer = q05_tokenizer

text_ids = tokenizer.encode(gettysburg, add_special_tokens=False)

prefix_ids = text_ids[:20]
suffix_ids = text_ids[20:30]

print("Prefix text: ", tokenizer.decode(prefix_ids, skip_special_tokens=True))
print("True suffix: ", tokenizer.decode(suffix_ids, skip_special_tokens=True))

lp = suffix_logprob(prefix_ids, suffix_ids, model, tokenizer, verbose=True)
print(f"Log-probability of the suffix given the prefix: {lp:.3f} ({math.exp(lp)})")

In the next cell, we run probes across the document and calculate the extraction rate. This code is similar to the `test_extraction` we did earlier using the Poe api, but here we use `suffix_logprob` and the `tau` threshold to compute the extraction rate instead of just counting exact matches at the word level:

In [None]:
def extraction_probability(text, model, tokenizer,
                           prefix_len=20, suffix_len=3,
                           max_probes=50, step=5,
                           tau=1e-3,
                           verbose=False):

  text_ids = tokenizer.encode(text, add_special_tokens=False)
  eligible = len(text_ids) - (prefix_len + suffix_len) - 1

  if eligible <= 0:
    raise ValueError(f"The text is too short ({len(text_ids)}) for the chosen lengths.")

  success_flags = []
  log_tau = math.log(tau)
  if verbose: print(f"Extraction rate using log tau: {log_tau} for {tau}")

  for i, start in enumerate(range(0, eligible, step)):
    if i >= max_probes:
        if verbose: print(f"Maximum probes exceeded ({i})")
        break

    prefix_ids = text_ids[start : start + prefix_len]
    suffix_ids = text_ids[start + prefix_len : start + prefix_len + suffix_len]

    if verbose:
      print(f"Probing from {start}: {tokenizer.decode(prefix_ids, skip_special_tokens=True)}")
      print(f"Suffix from {start+prefix_len}: {tokenizer.decode(suffix_ids, skip_special_tokens=True)}")

    lp = suffix_logprob(prefix_ids, suffix_ids, model, tokenizer, verbose=verbose)
    success = int(lp >= log_tau)
    success_flags.append(success)

    if verbose:
      print(f"Log probability of suffix: {lp:0.4f} {'*' if success else ''}")
    else:
      print(f"{lp:0.2f}{'*' if success else ''}.", end="")

  extraction_rate = float(np.mean(success_flags)) if success_flags else 0.0
  if verbose:
    print(f"Number of probes = {len(success_flags)}")
    print(f"Extraction rate = {extraction_rate:.4f}")
  else:
    print("")

  return extraction_rate

In [None]:
extraction_probability(eminem, q4b_model, q4b_tokenizer,
                           prefix_len=20, suffix_len=10,
                           max_probes=10, step=5,
                           tau=1e-4,
                           verbose=True)

In [None]:
models = [(q05_model, q05_tokenizer),
          (q4b_model, q4b_tokenizer)]

for text in texts:
    print(f"Testing text: { text[:30] }")
    for model, tokenizer in models:
      rate = extraction_probability(
          text, model, tokenizer,
          prefix_len=20, suffix_len=5,
          max_probes=10, step=5,
          tau=1e-3,
          verbose=False)
      print(f"Rate for {model.config.name_or_path}: {rate:.3f}")
    print("="*40)

# This will take a while to execute...you can leave it running
# and go do something else (or work on your written answers).

### Problem 3

From the results above, state your observations about the two models and texts. Which, if any, documents does it appear that the models have memorized? Can you explain the differences in the probablistic extraction measurements from running the code above between both the different texts and the different models? (2-3 paragraphs)

_write your answer here_

### Problem 4

In the paper [*Extracting memorized pieces of (copyrighted) books from open-weight language models*](https://arxiv.org/pdf/2505.12546), the authors measured memorization using prediction probabilities for 50-token sequences. In the code above, we used 5-token suffixes.

Why would we want to pick longer or shorter token sequences? What do shorter and longer sequences tell you about either the likelihood that a work was memorized or the legal significance of the memorization? (1-2 paragraphs)

In [None]:
# You may want to try varying these parameters and conducting
# new tests to help answer this question. You can use this
# box for any code you want for this.

_write your answer here_

###Problem 5

In the Cooper et al. paper, the authors confirmed memorization with negative controls. Would it be useful to apply negative controls in a copyright infringement case related to an LLM? Why or why not? If you were going to apply negative controls, how would you go about doing so and what would be the primary challenges to using them? (2-3 paragraphs)

### Problem 6

Are the methods described in the paper and that we use in this notebook how courts should be evaluating whether the copyright in the underlying work has been infringed? Why or why not?

_write your answer here_

### Problem 7

There are six exclusive rights, including the right to derivative works. How might you alter your approach to detect whether a work might be a derivative work (instead of the underlying question of whether the model memorized the original work)? (1-2 paragraphs)

_write your answer here_

###Problem 8

How  would you alter your analysis in order to include fair use? Are there code-driven ways to think about fair use that can inform the inquiry? (2-3 paragraphs)

_write your answer here_

#End of Project 3

See the canvas assignment for how to submit Project 3.

If it would be helpful, you can include the URL for your Colab notebook in the field below.

_put the URL for your Colab notebook here_