## Load Model and Tokenizer

### Subtask:
Load the GPT-2 tokenizer and the GPT-2 language model (`GPT2LMHeadModel`) from the `transformers` library.


**Reasoning**:
First, it's necessary to install the `transformers` library as it is not part of the standard Python distribution and is required for loading the GPT-2 tokenizer and model.



In [25]:
pip install transformers



**Reasoning**:
Now that the `transformers` library is installed, the next logical step is to import the required classes and then load the pre-trained GPT-2 tokenizer and model as specified in the subtask instructions.



In [26]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel

# Load the pre-trained GPT-2 tokenizer
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

# Load the pre-trained GPT-2 language model
model = GPT2LMHeadModel.from_pretrained('gpt2')

print("GPT-2 tokenizer and model loaded successfully.")

GPT-2 tokenizer and model loaded successfully.


**Reasoning**:
The `poem_text` needs to be processed line by line. The next step is to split the poem into individual lines based on the newline character, as indicated by the task description.



In [27]:
poem_lines = poem_text.strip().split('\n')
print(f"Number of lines in the poem: {len(poem_lines)}")
print("First 3 lines:")
for i, line in enumerate(poem_lines[:3]):
    print(f"Line {i+1}: {line}")

Number of lines in the poem: 15
First 3 lines:
Line 1: One must have a mind of winter
Line 2: To regard the frost and the boughs
Line 3: Of the pine-trees crusted with snow;


**Reasoning**:
Now that the poem is split into lines, the next step is to iterate through each line, identify its last word, and prepare the preceding text as input for the GPT-2 model to predict the next word. We'll also initialize an empty list to store the reconstructed lines.



In [28]:
reconstructed_lines = []

print("Processing poem lines...")

Processing poem lines...


In [18]:
poem_text = """One must have a mind of winter
To regard the frost and the boughs
Of the pine-trees crusted with snow;
And have been cold a long time
To behold the junipers shagged with ice,
The spruces rough in the distant glitter
Of the January sun; and not to think
Of any misery in the sound of the wind,
In the sound of a few leaves,
Which is the sound of the land
Full of the same wind
That is blowing in the same bare place
For the listener, who listens in the snow,
And, nothing himself, beholds
Nothing that is not there and the nothing that is."""

print("Poem text updated with the new content.")

Poem text updated with the new content.


**Reasoning**:
Now that the `poem_text` variable has been updated, the next step is to split the new poem into individual lines based on the newline character, as indicated by the task description.

In [19]:
poem_lines = poem_text.strip().split('\n')
print(f"Number of lines in the poem: {len(poem_lines)}")
print("First 3 lines:")
for i, line in enumerate(poem_lines[:3]):
    print(f"Line {i+1}: {line}")

Number of lines in the poem: 15
First 3 lines:
Line 1: One must have a mind of winter
Line 2: To regard the frost and the boughs
Line 3: Of the pine-trees crusted with snow;


**Reasoning**:
Before processing the lines of the new poem, the `reconstructed_lines` list needs to be re-initialized to ensure it's empty for the new processing run.

In [20]:
reconstructed_lines = []

print("Re-initialized reconstructed_lines list and proceeding with poem processing...")

Re-initialized reconstructed_lines list and proceeding with poem processing...


**Reasoning**:
Now, iterate through each `poem_lines` to process each line of the new poem. For each line, identify the last word and the prefix that will be used as input for the GPT-2 model. The code also includes the previously added logic for cleaning special characters.

In [29]:
import torch
import re

for i, line in enumerate(poem_lines):
    # Split line into words and clean each word from special characters
    cleaned_line_words = [re.sub(r'[^\w]', '', word) for word in line.split()]
    # Filter out empty strings that might result from cleaning (e.g., '...')
    words = [word for word in cleaned_line_words if word]

    if not words:
        reconstructed_lines.append('')
        continue

    last_word = words[-1]
    prefix = ' '.join(words[:-1])

    # Encode the prefix
    input_ids = tokenizer.encode(prefix, return_tensors='pt')

    # Get model predictions
    with torch.no_grad():
        outputs = model(input_ids)
        predictions = outputs.logits

    # Get the probabilities for the next token
    next_token_logits = predictions[0, -1, :]
    # Sort by probability in descending order and get token IDs
    sorted_logits, sorted_indices = torch.sort(next_token_logits, descending=True)

    # Get the 7th most likely token (index 6, since 0-indexed)
    if len(sorted_indices) > 6:
        predicted_token_id = sorted_indices[6].item()
        predicted_word = tokenizer.decode([predicted_token_id]).strip()
    else:
        # Fallback if fewer than 7 predictions are available
        predicted_word = "<UNKNOWN>"
        if len(sorted_indices) > 0:
            predicted_token_id = sorted_indices[0].item()
            predicted_word = tokenizer.decode([predicted_token_id]).strip()

    # Reconstruct the line with the new word
    reconstructed_line = f"{prefix} {predicted_word}".strip()
    reconstructed_lines.append(reconstructed_line)

print("Finished processing all lines with the new poem.")

Finished processing all lines with the new poem.


**Reasoning**:
The final step is to display the reconstructed poem with the replaced last words. The `reconstructed_lines` list now holds all the processed lines from the new poem, so joining them into a single string and printing the result will fulfill the task requirement.

In [30]:
print("Reconstructed Poem with 7th Most Likely Words (New Poem):")
print("--------------------------------------------------")
print('\n'.join(reconstructed_lines))

Reconstructed Poem with 7th Most Likely Words (New Poem):
--------------------------------------------------
One must have a mind of her
To regard the frost and the death
Of the pinetrees crusted with salt
And have been cold a long way
To behold the junipers shagged with white
The spruces rough in the distant horizon
Of the January sun and not to speak
Of any misery in the sound of the sound
In the sound of a few shots
Which is the sound of the voice
Full of the same day
That is blowing in the same bare air
For the listener who listens in the second
And nothing himself has
Nothing that is not there and the nothing that isn


**Reasoning**:
Now, iterate through each `poem_lines` to process each line. For each line, identify the last word and the prefix that will be used as input for the GPT-2 model.



In [31]:
import torch
import re # Import re module

for i, line in enumerate(poem_lines):
    # Split line into words and clean each word from special characters
    cleaned_line_words = [re.sub(r'[^\w]', '', word) for word in line.split()]
    # Filter out empty strings that might result from cleaning (e.g., '...')
    words = [word for word in cleaned_line_words if word]

    if not words:
        reconstructed_lines.append('')
        continue

    last_word = words[-1]
    prefix = ' '.join(words[:-1])

    # Encode the prefix
    input_ids = tokenizer.encode(prefix, return_tensors='pt')

    # Get model predictions
    with torch.no_grad():
        outputs = model(input_ids)
        predictions = outputs.logits

    # Get the probabilities for the next token
    next_token_logits = predictions[0, -1, :]
    # Sort by probability in descending order and get token IDs
    sorted_logits, sorted_indices = torch.sort(next_token_logits, descending=True)

    # Get the 7th most likely token (index 6, since 0-indexed)
    if len(sorted_indices) > 6:
        predicted_token_id = sorted_indices[6].item()
        predicted_word = tokenizer.decode([predicted_token_id]).strip()
    else:
        # Fallback if fewer than 7 predictions are available
        predicted_word = "<UNKNOWN>"
        if len(sorted_indices) > 0:
            predicted_token_id = sorted_indices[0].item()
            predicted_word = tokenizer.decode([predicted_token_id]).strip()

    # Reconstruct the line with the new word
    reconstructed_line = f"{prefix} {predicted_word}".strip()
    reconstructed_lines.append(reconstructed_line)

print("Finished processing all lines.")

Finished processing all lines.


**Reasoning**:
The subtask requires displaying the reconstructed poem with the replaced last words. The `reconstructed_lines` list now holds all the processed lines, so the next step is to join them into a single string, separated by newline characters, and then print the result.



In [32]:
print("Reconstructed Poem with 7th Most Likely Words:")
print("--------------------------------------------------")
print('\n'.join(reconstructed_lines))

Reconstructed Poem with 7th Most Likely Words:
--------------------------------------------------
One must have a mind of her
To regard the frost and the death
Of the pinetrees crusted with salt
And have been cold a long way
To behold the junipers shagged with white
The spruces rough in the distant horizon
Of the January sun and not to speak
Of any misery in the sound of the sound
In the sound of a few shots
Which is the sound of the voice
Full of the same day
That is blowing in the same bare air
For the listener who listens in the second
And nothing himself has
Nothing that is not there and the nothing that isn
One must have a mind of her
To regard the frost and the death
Of the pinetrees crusted with salt
And have been cold a long way
To behold the junipers shagged with white
The spruces rough in the distant horizon
Of the January sun and not to speak
Of any misery in the sound of the sound
In the sound of a few shots
Which is the sound of the voice
Full of the same day
That is blowi