# Assignment 2
## P+7 (Oulipian language modelling)

### Background

The Oulipo (*Ouvroir de Littérature Potentielle*, or “Workshop of Potential Literature”) is a French literary group founded in 1960 by writer Raymond Queneau and mathematician François Le Lionnais. The group focuses on using rules and constraints in writing as a way to spark creativity. Rather than seeing constraints as obstacles, Oulipians treat them as tools to inspire new forms of storytelling and poetry. Their work combines mathematics, language, and playfulness, making their approach both unique and influential in modern literature.

One of the most famous Oulipian writers is Georges Perec, who is known for his creative use of constraints. His novel *La Disparition* (“A Void”) is written entirely without the letter "e," which is especially challenging given how common "e" is in French. Perec’s writing often plays with the structure of language in surprising ways. One popular Oulipian technique is N+7, where each noun in a text is replaced by the noun seven entries later in a dictionary. This creates unusual, absurd, and often funny results, encouraging writers to think differently about language and meaning.

![George Perec](https://upload.wikimedia.org/wikipedia/commons/7/76/Myart_georges-perec_1978.jpg)

(George Perec, 1978. From Wikidata)

<!-- <img src="https://media.vigliensoni.com/clips/CART498/perec-01.jpg" width="800"> -->


## How it works

The N+7 technique process is straightforward

- **Start with a text**. Choose any text—this could be a poem, a sentence, or a passage.
- **Use a dictionary**. Have a dictionary (or word list) handy.
- **Replace each noun**. For every substantive noun in the original text, replace it with the noun appearing seven nouns away in the dictionary. If the end of the dictionary is reached, you can loop back to the beginning.
- **Maintain grammar**. Ensure the new text maintains grammatical correctness as much as possible, though the results often turn out  surreal and nonsensical.

For example, using the N+7 technique with a standard English dictionary for the original sentence:

*The cat sat on the mat*.

- ”Cat” → 7 nouns after “cat” is “catalog.”
- ”Mat” → 7 nouns after ”mat” is ”material.”

Results in:

*The catalog sat on the material.*

### Assignment and deliverables

For this assignment, you will create a variation of the N+7 technique we will name P+7. Using the GPT-2 language model, you will replace the last word of each line from *The Snow Man* with the word that has the seventh-highest probability according to the model’s predictions.

By the end of this assignment, submit a link to a GitHub repository named `CART498-GenAI` containing a folder labelled `A02` with the following items:

- A version of the text processed with your P+7 technique, saved as a `.txt` file.
- A second version of the text processed with `P+x`. Choose an `x` value that produces the funniest, wittiest, or most absurd version of the original text. Save this as a .txt file, and include the `x` value in the filename (e.g., `P+23.txt` or `P+12.txt`).
- A Python notebook with the script used to generate your P+7 and P+x transformations using the GPT-2 language model.
- A short reflection (250–350 words) explaining how altering the `x` value impacted the output of your P+x version. Additionally, discuss how you would implement a P+7 technique in which all nouns are replaced with their seventh-highest probability alternatives.

### *The Snow Man*
by Wallace Stevens (1879-1955)


> One must have a mind of winter  
> To regard the frost and the boughs  
> Of the pine-trees crusted with snow;  
> And have been cold a long time  
> To behold the junipers shagged with ice,  
> The spruces rough in the distant glitter  
> Of the January sun; and not to think  
> Of any misery in the sound of the wind,  
> In the sound of a few leaves,  
> Which is the sound of the land  
> Full of the same wind  
> That is blowing in the same bare place  
> For the listener, who listens in the snow,  
> And, nothing himself, beholds  
> Nothing that is not there and the nothing that is.  




In [None]:
# prompt: For this assignment, you will create a variation of the N+7 technique we will name P+7. Using the GPT-2 language model, you will replace the last word of each line from The Snow Man with the word that has the seventh-highest probability according to the model’s predictions.

!pip install transformers

from transformers import pipeline, GPT2TokenizerFast

# Load the GPT-2 model and tokenizer
generator = pipeline('text-generation', model='gpt2')
tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')

# The Snow Man poem
poem = """
One must have a mind of winter
To regard the frost and the boughs
Of the pine-trees crusted with snow;
And have been cold a long time
To behold the junipers shagged with ice,
The spruces rough in the distant glitter
Of the January sun; and not to think
Of any misery in the sound of the wind,
In the sound of a few leaves,
Which is the sound of the land
Full of the same wind
That is blowing in the same bare place
For the listener, who listens in the snow,
And, nothing himself, beholds
Nothing that is not there and the nothing that is.
"""

def p_plus_x(text, x):
    lines = text.strip().split('\n')
    new_poem = ""
    for line in lines:
        words = line.split()
        if words:
          # Generate text with GPT-2
          input_text = " ".join(words[:-1])  # Input everything except the last word
          result = generator(input_text, max_length=len(input_text) + 10, num_return_sequences=1)
          generated_text = result[0]['generated_text']
          generated_words = generated_text.split()

          # Find the xth most probable word
          try:
              new_word = generated_words[len(words) - 1]
          except IndexError:
            new_word = "[ERROR]" # Handle cases where GPT-2 doesn't provide enough predictions


          new_line = " ".join(words[:-1]) + " " + new_word
          new_poem += new_line + "\n"
        else:
          new_poem += "\n"

    return new_poem


# Create P+7 version
p7_poem = p_plus_x(poem, 7)
print("P+7 Version:\n", p7_poem)

with open("P+7.txt", "w") as f:
    f.write(p7_poem)


# Example with P+12 (you can change the value of x as desired)
p12_poem = p_plus_x(poem, 12)

with open("P+12.txt", "w") as f:
    f.write(p12_poem)



Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eo

P+7 Version:
 One must have a mind of their
To regard the frost and the rain
Of the pine-trees crusted with the
And have been cold a long time,
To behold the junipers shagged with the
The spruces rough in the distant distance.
Of the January sun; and not to think
Of any misery in the sound of the drum,
In the sound of a few loud
Which is the sound of the sound?
Full of the same in
That is blowing in the same bare and
For the listener, who listens in the same
And, nothing himself, except
Nothing that is not there and the nothing that is



Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene

In [None]:
# prompt: For this assignment, you will create a variation of the N+7 technique we will name P+x. Using the GPT-2 language model, you will replace the last word of each line from The Snow Man with the word that has the seventh-highest probability according to the model’s predictions. Choose an x value that produces the funniest, wittiest, or most absurd version of the original text.

# Example with P+23 (you can change the value of x as desired)
p23_poem = p_plus_x(poem, 23)

with open("P+23.txt", "w") as f:
    f.write(p23_poem)

print("P+23 Version:\n", p23_poem)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end gene

P+23 Version:
 One must have a mind of his
To regard the frost and the great
Of the pine-trees crusted with the
And have been cold a long while
To behold the junipers shagged with their
The spruces rough in the distant south,
Of the January sun; and not to be
Of any misery in the sound of the voices,
In the sound of a few small
Which is the sound of the fire
Full of the same And
That is blowing in the same bare wind.
For the listener, who listens in the current
And, nothing himself, everything
Nothing that is not there and the nothing that exists

