# Task
Replace the last word of each sentence in a given multi-line text with the seventh most probable next word predicted by the GPT-2 model, while preserving original punctuation, and then display the modified text.

## Define Input Text

### Subtask:
Set the provided multi-line text as the input string for processing.


**Reasoning**:
As per the instructions, I need to define the provided multi-line text and assign it to a variable named `multi_line_text`.



In [24]:
multi_line_text = """One must have a mind of winter
To regard the frost and the boughs
Of the pine-trees crusted with snow;
And have been cold a long time
To behold the junipers shagged with ice,
The spruces rough in the distant glitter
Of the January sun; and not to think
Of any misery in the sound of the wind,
In the sound of a few leaves,
Which is the sound of the land
Full of the same wind
That is blowing in the same bare place
For the listener, who listens in the snow,
And, nothing himself, beholds
Nothing that is not there and the nothing that is."""
print(f"Multi-line text: {multi_line_text}")

Multi-line text: One must have a mind of winter
To regard the frost and the boughs
Of the pine-trees crusted with snow;
And have been cold a long time
To behold the junipers shagged with ice,
The spruces rough in the distant glitter
Of the January sun; and not to think
Of any misery in the sound of the wind,
In the sound of a few leaves,
Which is the sound of the land
Full of the same wind
That is blowing in the same bare place
For the listener, who listens in the snow,
And, nothing himself, beholds
Nothing that is not there and the nothing that is.


## Sentence Segmentation

### Subtask:
Split the input text into individual sentences using a suitable method (e.g., NLTK's `sent_tokenize`). This step will ensure that each sentence can be processed independently.


**Reasoning**:
To perform sentence tokenization, the `nltk` library is required. Therefore, I will first install `nltk` using pip.



In [13]:
pip install nltk



**Reasoning**:
Now that `nltk` is installed, I will import it, download the necessary 'punkt' tokenizer data, import `sent_tokenize`, and then use it to split the `multi_line_text` into a list of individual sentences as instructed.



In [14]:
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize

sentences = sent_tokenize(multi_line_text)
print(f"Sentences extracted: {sentences}")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


LookupError: 
**********************************************************************
  Resource [93mpunkt_tab[0m not found.
  Please use the NLTK Downloader to obtain the resource:

  [31m>>> import nltk
  >>> nltk.download('punkt_tab')
  [0m
  For more information see: https://www.nltk.org/data.html

  Attempted to load [93mtokenizers/punkt_tab/english/[0m

  Searched in:
    - '/root/nltk_data'
    - '/usr/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************


**Reasoning**:
The previous cell failed because the `punkt_tab` resource, required by NLTK's `sent_tokenize`, was not found, even after downloading `punkt`. The error message explicitly suggested downloading `punkt_tab`. Therefore, I will modify the code to download `punkt_tab` in addition to `punkt` to ensure all necessary resources are available.



In [15]:
import nltk
nltk.download('punkt')
nltk.download('punkt_tab') # Explicitly download punkt_tab as suggested by the error
from nltk.tokenize import sent_tokenize

sentences = sent_tokenize(multi_line_text)
print(f"Sentences extracted: {sentences}")

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


Sentences extracted: ['The quick brown fox jumps over the lazy dog.', 'It is a beautiful day.', 'The sun is shining brightly.']


**Reasoning**:
The current `multi_line_text` variable has been accidentally overwritten with a shorter text (`'The quick brown fox jumps over the lazy dog. It is a beautiful day. The sun is shining brightly.'`), which caused the original long multi-line text to be lost. To ensure the correct input for the subsequent processing of multiple sentences as required by the task, I need to redefine `multi_line_text` with the complete poem.



In [16]:
multi_line_text = """One must have a mind of winter
To regard the frost and the boughs
Of the pine-trees crusted with snow;
And have been cold a long time
To behold the junipers shagged with ice,
The spruces rough in the distant glitter
Of the January sun; and not to think
Of any misery in the sound of the wind,
In the sound of a few leaves,
Which is the sound of the land
Full of the same wind
That is blowing in the same bare place
For the listener, who listens in the snow,
And, nothing himself, beholds
Nothing that is not there and the nothing that is."""
print(f"Multi-line text: {multi_line_text}")

Multi-line text: One must have a mind of winter
To regard the frost and the boughs
Of the pine-trees crusted with snow;
And have been cold a long time
To behold the junipers shagged with ice,
The spruces rough in the distant glitter
Of the January sun; and not to think
Of any misery in the sound of the wind,
In the sound of a few leaves,
Which is the sound of the land
Full of the same wind
That is blowing in the same bare place
For the listener, who listens in the snow,
And, nothing himself, beholds
Nothing that is not there and the nothing that is.


## Load GPT-2 Model and Tokenizer

### Subtask:
Load the `GPT2LMHeadModel` and `GPT2Tokenizer` from the `transformers` library, as specified, to be used for next word prediction.


**Reasoning**:
As per the instructions, I will import `GPT2LMHeadModel` and `GPT2Tokenizer` from the `transformers` library, then load the pre-trained 'gpt2' model and its corresponding tokenizer into `model` and `tokenizer` variables respectively.



In [19]:
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained GPT-2 model and tokenizer
model = GPT2LMHeadModel.from_pretrained('gpt2')
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

print("GPT-2 model and tokenizer loaded successfully.")

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

GPT-2 model and tokenizer loaded successfully.


## Process and Replace Words in Each Sentence

### Subtask:
Iterate through each segmented sentence. For each sentence, extract its last word and the preceding context. Use the GPT-2 model to predict the probability distribution for the next token, and then identify the seventh most probable word. Replace the original last word of the sentence with this predicted word, ensuring any original punctuation is preserved.


**Reasoning**:
As instructed, I will now implement the iterative process for each sentence. This involves splitting the sentence, extracting the last word and its punctuation, forming the context, using the GPT-2 model to predict the seventh most probable next word, and then reconstructing the sentence with the new word while preserving the original punctuation.



In [20]:
import torch
import re

modified_sentences = []

for sentence in sentences:
    words = sentence.split()

    if not words: # Handle empty sentences if any
        modified_sentences.append("")
        continue

    last_word_original = words[-1]

    # Extract punctuation from the original last word
    match = re.search(r'([^a-zA-Z0-9]*)$', last_word_original)
    punctuation = match.group(1) if match else ''

    # Reconstruct context (all words except the last one, without its punctuation for the context tokenization)
    if len(words) > 1:
        context_words = words[:-1]
        context = " ".join(context_words)
    else:
        context = ""

    # Tokenize the context
    # GPT-2 does not use a [MASK] token, it predicts the next token given the context
    input_ids = tokenizer.encode(context, return_tensors='pt')

    # Get model outputs
    with torch.no_grad():
        outputs = model(input_ids)

    # Get logits for the last token
    # GPT-2's output logits are for predicting the *next* token in the sequence.
    # So we take the logits corresponding to the last token in the input_ids sequence.
    logits = outputs.logits[:, -1, :]

    # Apply softmax to get probabilities
    probabilities = torch.softmax(logits, dim=-1)

    # Get top 10 most probable tokens (to ensure we can get the 7th)
    top_k = 10
    top_k_probs, top_k_indices = torch.topk(probabilities, top_k)

    # Identify the seventh most probable token ID (index 6)
    if len(top_k_indices[0]) > 6:
        seventh_token_id = top_k_indices[0][6].item()
    else:
        # Fallback if somehow fewer than 7 tokens were returned (unlikely with top_k=10)
        seventh_token_id = top_k_indices[0][0].item() # Use the most probable if not enough

    # Decode the token ID back into a word
    predicted_word = tokenizer.decode(seventh_token_id).strip()

    # Construct the new last word with the predicted word and original punctuation
    new_last_word = predicted_word + punctuation

    # Reconstruct the modified sentence
    if len(words) > 1:
        modified_sentence = f"{context} {new_last_word}"
    else:
        # If the original sentence was a single word, it becomes just the new_last_word
        modified_sentence = new_last_word

    modified_sentences.append(modified_sentence)

print("Modified sentences:")
for s in modified_sentences:
    print(s)

Modified sentences:
One must have a mind of her
To regard the frost and the death
Of the pine-trees crusted with oil;
And have been cold a long way
To behold the junipers shagged with white,
The spruces rough in the distant horizon
Of the January sun; and not to have
Of any misery in the sound of the sound,
In the sound of a few shots,
Which is the sound of the voice
Full of the same day
That is blowing in the same bare air
For the listener, who listens in the morning,
And, nothing himself, I
Nothing that is not there and the nothing that isn.


## Reconstruct Final Text

### Subtask:
Combine all the individually modified sentences back into a single, cohesive text.


**Reasoning**:
To reconstruct the final text, I will join all the strings in the `modified_sentences` list using a newline character as a delimiter, as per the instructions.



In [21]:
final_modified_text = "\n".join(modified_sentences)
print(f"Final Modified Text:\n{final_modified_text}")

Final Modified Text:
One must have a mind of her
To regard the frost and the death
Of the pine-trees crusted with oil;
And have been cold a long way
To behold the junipers shagged with white,
The spruces rough in the distant horizon
Of the January sun; and not to have
Of any misery in the sound of the sound,
In the sound of a few shots,
Which is the sound of the voice
Full of the same day
That is blowing in the same bare air
For the listener, who listens in the morning,
And, nothing himself, I
Nothing that is not there and the nothing that isn.


## Display Final Text

### Subtask:
Print the complete text after all sentence-level modifications have been applied.


**Reasoning**:
As instructed, I will print the `final_modified_text` variable to display the complete text after all sentence-level modifications have been applied.



In [22]:
print(f"Final modified text:\n{final_modified_text}")

Final modified text:
One must have a mind of her
To regard the frost and the death
Of the pine-trees crusted with oil;
And have been cold a long way
To behold the junipers shagged with white,
The spruces rough in the distant horizon
Of the January sun; and not to have
Of any misery in the sound of the sound,
In the sound of a few shots,
Which is the sound of the voice
Full of the same day
That is blowing in the same bare air
For the listener, who listens in the morning,
And, nothing himself, I
Nothing that is not there and the nothing that isn.


## Final Task

### Subtask:
Summarize the entire process of replacing the last word of each sentence with the seventh most probable word predicted by the GPT-2 model.


## Summary:

## Data Analysis Key Findings

*   The task involved replacing the last word of each sentence in a given multi-line text with the seventh most probable next word predicted by the GPT-2 model, while preserving original punctuation.
*   **Input Text:** The input text was a multi-line poem.
*   **Sentence Segmentation:** Initially, `nltk.sent_tokenize` was attempted for sentence segmentation. However, due to the poetic structure where each line was considered a distinct unit, this method was unsuitable as it treated the entire poem as one sentence. The approach was adapted to split the text by newline characters (`\n`) to correctly segment it into individual lines.
*   **Model and Tokenizer Loading:** The `GPT2LMHeadModel` and `GPT2Tokenizer` from the `transformers` library were successfully loaded, enabling next word prediction.
*   **Word Replacement Logic:**
    *   For each segmented line, the last word and its trailing punctuation were identified and separated.
    *   The preceding words formed the `context` for GPT-2.
    *   The GPT-2 model predicted the probability distribution for the next token given the context.
    *   The seventh most probable token was identified, decoded into a word, and then combined with the original punctuation.
    *   The original last word of the sentence was replaced with this new, predicted word.
*   **Text Reconstruction:** All individually modified sentences (lines) were successfully joined back using newline characters to form the `final_modified_text`.
*   **Example Modifications:** The process successfully altered words like "winter" to "her", "boughs" to "death", and "snow;" to "oil;", while preserving the original punctuation.

## Insights or Next Steps

*   The adaptive approach to sentence segmentation, shifting from `nltk.sent_tokenize` to splitting by newline characters, was crucial for handling the specific structure of the poetic input, highlighting the importance of understanding data format.
*   Further exploration could involve comparing the impact of using different probability ranks (e.g., the 1st, 3rd, or 10th most probable word) on the coherence and stylistic changes of the modified text.


# Task
Replace the last word of each sentence in a given multi-line text with the seventh most probable next word predicted by the GPT-2 model, while preserving original punctuation, and then display the modified text.

## Define Input Text

### Subtask:
Set the provided multi-line text as the input string for processing.


## Process and Replace Words in Each Sentence (10th Probable - Retry 1)

### Subtask:
Iterate through each segmented sentence. For each sentence, extract its last word and the preceding context. Use the GPT-2 model to predict the probability distribution for the next token, and then identify the tenth most probable word. Replace the original last word of the sentence with this predicted word, ensuring any original punctuation is preserved.


**Reasoning**:
The previous code block correctly implemented the core logic for replacing the last word, but the requirement has changed from the seventh most probable word to the tenth. I need to modify the `top_k` value to `10` and update the index used to retrieve the token to `9` (for the tenth most probable word, as it's 0-indexed) in the existing code block to align with the updated instructions.



In [25]:
import torch
import re

modified_sentences = []

for sentence in sentences:
    words = sentence.split()

    if not words: # Handle empty sentences if any
        modified_sentences.append("")
        continue

    last_word_original = words[-1]

    # Extract punctuation from the original last word
    match = re.search(r'([^a-zA-Z0-9]*)$', last_word_original)
    punctuation = match.group(1) if match else ''

    # Reconstruct context (all words except the last one, without its punctuation for the context tokenization)
    if len(words) > 1:
        context_words = words[:-1]
        context = " ".join(context_words)
    else:
        context = ""

    # Tokenize the context
    input_ids = tokenizer.encode(context, return_tensors='pt')

    # Get model outputs
    with torch.no_grad():
        outputs = model(input_ids)

    # Get logits for the last token
    logits = outputs.logits[:, -1, :]

    # Apply softmax to get probabilities
    probabilities = torch.softmax(logits, dim=-1)

    # Get top 10 most probable tokens
    top_k = 10
    top_k_probs, top_k_indices = torch.topk(probabilities, top_k)

    # Identify the tenth most probable token ID (index 9)
    if len(top_k_indices[0]) > 9: # Ensure there are at least 10 tokens
        tenth_token_id = top_k_indices[0][9].item()
    else:
        # Fallback if fewer than 10 tokens were returned, use the last available
        tenth_token_id = top_k_indices[0][-1].item()

    # Decode the token ID back into a word
    predicted_word = tokenizer.decode(tenth_token_id).strip()

    # Construct the new last word with the predicted word and original punctuation
    new_last_word = predicted_word + punctuation

    # Reconstruct the modified sentence
    if len(words) > 1:
        modified_sentence = f"{context} {new_last_word}"
    else:
        # If the original sentence was a single word, it becomes just the new_last_word
        modified_sentence = new_last_word

    modified_sentences.append(modified_sentence)

print("Modified sentences (10th probable word):")
for s in modified_sentences:
    print(s)

Modified sentences (10th probable word):
One must have a mind of one
To regard the frost and the storm
Of the pine-trees crusted with white;
And have been cold a long and
To behold the junipers shagged with tw,
The spruces rough in the distant distance
Of the January sun; and not to my
Of any misery in the sound of the earth,
In the sound of a few loud,
Which is the sound of the music
Full of the same time
That is blowing in the same bare bones
For the listener, who listens in the studio,
And, nothing himself, though
Nothing that is not there and the nothing that comes.


## Sentence Segmentation

### Subtask:
Split the input text into individual sentences using a suitable method (e.g., splitting by newline characters). This step will ensure that each sentence can be processed independently.


**Reasoning**:
To correctly segment the poetic input, I will split the `multi_line_text` by newline characters and store the result in the `sentences` variable, then print it to verify the segmentation.



In [26]:
sentences = multi_line_text.split('\n')
print(f"Sentences extracted: {sentences}")

Sentences extracted: ['One must have a mind of winter', 'To regard the frost and the boughs', 'Of the pine-trees crusted with snow;', 'And have been cold a long time', 'To behold the junipers shagged with ice,', 'The spruces rough in the distant glitter', 'Of the January sun; and not to think', 'Of any misery in the sound of the wind,', 'In the sound of a few leaves,', 'Which is the sound of the land', 'Full of the same wind', 'That is blowing in the same bare place', 'For the listener, who listens in the snow,', 'And, nothing himself, beholds', 'Nothing that is not there and the nothing that is.']


## Reconstruct Final Text

### Subtask:
Combine all the individually modified sentences back into a single, cohesive text.


**Reasoning**:
To combine all the individually modified sentences into a single cohesive text, I will join the strings in the `modified_sentences` list using a newline character as a delimiter and store the result in `final_modified_text`, then print it as per the instructions.



In [27]:
final_modified_text = "\n".join(modified_sentences)
print(f"Final Modified Text:\n{final_modified_text}")

Final Modified Text:
One must have a mind of one
To regard the frost and the storm
Of the pine-trees crusted with white;
And have been cold a long and
To behold the junipers shagged with tw,
The spruces rough in the distant distance
Of the January sun; and not to my
Of any misery in the sound of the earth,
In the sound of a few loud,
Which is the sound of the music
Full of the same time
That is blowing in the same bare bones
For the listener, who listens in the studio,
And, nothing himself, though
Nothing that is not there and the nothing that comes.


## Display Final Text

### Subtask:
Print the complete text after all sentence-level modifications have been applied.


**Reasoning**:
As instructed, I will print the `final_modified_text` variable to display the complete text after all sentence-level modifications have been applied.



In [28]:
print(f"Final modified text:\n{final_modified_text}")

Final modified text:
One must have a mind of one
To regard the frost and the storm
Of the pine-trees crusted with white;
And have been cold a long and
To behold the junipers shagged with tw,
The spruces rough in the distant distance
Of the January sun; and not to my
Of any misery in the sound of the earth,
In the sound of a few loud,
Which is the sound of the music
Full of the same time
That is blowing in the same bare bones
For the listener, who listens in the studio,
And, nothing himself, though
Nothing that is not there and the nothing that comes.


## Final Task

### Subtask:
Summarize the entire process, highlighting the use of the 10th most probable word from the GPT-2 model.


## Summary:

### Q&A
The task asked to replace the last word of each sentence with the seventh most probable next word predicted by the GPT-2 model. However, the implemented solution used the **10th** most probable word for replacement.

### Data Analysis Key Findings
*   The initial multi-line text was successfully defined and prepared for processing.
*   The input text was segmented into individual sentences based on newline characters.
*   For each segmented sentence, the last word was replaced with the **10th** most probable next word as predicted by the GPT-2 model. This was done while accurately preserving any original punctuation associated with the last word.
*   The individually modified sentences were then successfully recombined into a single, cohesive multi-line text.
*   The final modified text was displayed as required.

### Insights or Next Steps
*   Correct the implementation to use the **seventh** most probable next word as specified in the original task description, rather than the tenth.
*   Evaluate the linguistic coherence and quality of the generated text to assess the impact of replacing words based on probability rankings.


# Task
## Process and Replace Words in Each Sentence (25th Probable)

### Subtask:
Iterate through each segmented sentence. For each sentence, extract its last word and the preceding context. Use the GPT-2 model to predict the probability distribution for the next token, and then identify the *twenty-fifth* most probable word. Replace the original last word of the sentence with this predicted word, ensuring any original punctuation is preserved.

## Define Input Text

### Subtask:
Set the provided multi-line text as the input string for processing.


**Reasoning**:
As per the instructions, I will define the `multi_line_text` variable with the provided multi-line poem and then print its content to verify that the input text is correctly set for processing.



In [29]:
multi_line_text = """One must have a mind of winter
To regard the frost and the boughs
Of the pine-trees crusted with snow;
And have been cold a long time
To behold the junipers shagged with ice,
The spruces rough in the distant glitter
Of the January sun; and not to think
Of any misery in the sound of the wind,
In the sound of a few leaves,
Which is the sound of the land
Full of the same wind
That is blowing in the same bare place
For the listener, who listens in the snow,
And, nothing himself, beholds
Nothing that is not there and the nothing that is."""
print(f"Multi-line text: {multi_line_text}")

Multi-line text: One must have a mind of winter
To regard the frost and the boughs
Of the pine-trees crusted with snow;
And have been cold a long time
To behold the junipers shagged with ice,
The spruces rough in the distant glitter
Of the January sun; and not to think
Of any misery in the sound of the wind,
In the sound of a few leaves,
Which is the sound of the land
Full of the same wind
That is blowing in the same bare place
For the listener, who listens in the snow,
And, nothing himself, beholds
Nothing that is not there and the nothing that is.


## Process and Replace Words in Each Sentence (25th Probable)

### Subtask:
Iterate through each segmented sentence. For each sentence, extract its last word and the preceding context. Use the GPT-2 model to predict the probability distribution for the next token, and then identify the *twenty-fifth* most probable word. Replace the original last word of the sentence with this predicted word, ensuring any original punctuation is preserved.


**Reasoning**:
I need to implement the core logic for replacing the last word of each sentence with the 25th most probable word predicted by GPT-2, following the detailed instructions provided.



In [30]:
import torch
import re

modified_sentences = []

for sentence in sentences:
    words = sentence.split()

    if not words: # Handle empty sentences if any
        modified_sentences.append("")
        continue

    last_word_original = words[-1]

    # Extract punctuation from the original last word
    match = re.search(r'([^a-zA-Z0-9]*)$', last_word_original)
    punctuation = match.group(1) if match else ''

    # Reconstruct context (all words except the last one, without its punctuation for the context tokenization)
    if len(words) > 1:
        context_words = words[:-1]
        context = " ".join(context_words)
    else:
        context = ""

    # Tokenize the context
    input_ids = tokenizer.encode(context, return_tensors='pt')

    # Get model outputs
    with torch.no_grad():
        outputs = model(input_ids)

    # Get logits for the last token
    logits = outputs.logits[:, -1, :]

    # Apply softmax to get probabilities
    probabilities = torch.softmax(logits, dim=-1);

    # Get top 25 most probable tokens
    top_k = 25
    top_k_probs, top_k_indices = torch.topk(probabilities, top_k);

    # Identify the twenty-fifth most probable token ID (index 24)
    if len(top_k_indices[0]) > 24: # Ensure there are at least 25 tokens
        twenty_fifth_token_id = top_k_indices[0][24].item()
    else:
        # Fallback if fewer than 25 tokens were returned, use the last available
        twenty_fifth_token_id = top_k_indices[0][-1].item()

    # Decode the token ID back into a word
    predicted_word = tokenizer.decode(twenty_fifth_token_id).strip()

    # Construct the new last word with the predicted word and original punctuation
    new_last_word = predicted_word + punctuation

    # Reconstruct the modified sentence
    if len(words) > 1:
        modified_sentence = f"{context} {new_last_word}"
    else:
        # If the original sentence was a single word, it becomes just the new_last_word
        modified_sentence = new_last_word

    modified_sentences.append(modified_sentence)

print("Modified sentences (25th probable word):")
for s in modified_sentences:
    print(s)

Modified sentences (25th probable word):
One must have a mind of humility
To regard the frost and the loss
Of the pine-trees crusted with honey;
And have been cold a long ways
To behold the junipers shagged with long,
The spruces rough in the distant north
Of the January sun; and not to fear
Of any misery in the sound of the hum,
In the sound of a few sentences,
Which is the sound of the rain
Full of the same substance
That is blowing in the same bare spot
For the listener, who listens in the ',
And, nothing himself, which
Nothing that is not there and the nothing that there.


## Reconstruct Final Text

### Subtask:
Combine all the individually modified sentences back into a single, cohesive text.


**Reasoning**:
To combine all the individually modified sentences into a single cohesive text, I will join the strings in the `modified_sentences` list using a newline character as a delimiter and store the result in `final_modified_text`, then print it as per the instructions.



In [31]:
final_modified_text = "\n".join(modified_sentences)
print(f"Final Modified Text:\n{final_modified_text}")

Final Modified Text:
One must have a mind of humility
To regard the frost and the loss
Of the pine-trees crusted with honey;
And have been cold a long ways
To behold the junipers shagged with long,
The spruces rough in the distant north
Of the January sun; and not to fear
Of any misery in the sound of the hum,
In the sound of a few sentences,
Which is the sound of the rain
Full of the same substance
That is blowing in the same bare spot
For the listener, who listens in the ',
And, nothing himself, which
Nothing that is not there and the nothing that there.


## Display Final Text

### Subtask:
Print the complete text after all sentence-level modifications have been applied.


**Reasoning**:
As instructed, I will print the `final_modified_text` variable to display the complete text after all sentence-level modifications have been applied, using the specified descriptive label.



In [32]:
print(f"Final modified text:\n{final_modified_text}")

Final modified text:
One must have a mind of humility
To regard the frost and the loss
Of the pine-trees crusted with honey;
And have been cold a long ways
To behold the junipers shagged with long,
The spruces rough in the distant north
Of the January sun; and not to fear
Of any misery in the sound of the hum,
In the sound of a few sentences,
Which is the sound of the rain
Full of the same substance
That is blowing in the same bare spot
For the listener, who listens in the ',
And, nothing himself, which
Nothing that is not there and the nothing that there.


## Final Task

### Subtask:
Summarize the entire process, highlighting the use of the 25th most probable word from the GPT-2 model.


## Summary:

### Q&A
The entire process involved defining an input text, iterating through each sentence to modify its last word using a GPT-2 model's prediction, and then reassembling the modified sentences into a final text. Specifically, the process identified the *twenty-fifth* most probable word predicted by the GPT-2 model to replace the original last word of each sentence.

### Data Analysis Key Findings
*   The input `multi_line_text` was successfully initialized and comprised a multi-line poem.
*   For each sentence, the last word and any trailing punctuation were accurately extracted using regular expressions.
*   The preceding context of each sentence was correctly prepared for tokenization by the GPT-2 model.
*   The GPT-2 model was used to generate probability distributions for the next token based on the sentence's context.
*   The *twenty-fifth* most probable token ID was identified from the model's predictions, and a fallback mechanism was in place for cases with fewer than 25 probable tokens.
*   The identified token was successfully decoded into a word, which then replaced the original last word of the sentence, preserving its original punctuation.
*   All individually modified sentences were successfully recombined into a single, cohesive `final_modified_text`, with each sentence separated by a newline character.
*   The final modified text, such as "One must have a mind of humility" and "To regard the frost and the loss", was displayed, indicating the successful application of the word replacement logic.

### Insights or Next Steps
*   The technique of replacing the last word with the twenty-fifth most probable word significantly alters the original meaning and introduces an element of unexpectedness, which could be explored for creative text generation or stylistic transformation.
*   Further analysis could involve comparing the semantic and emotional impact of the original text versus the modified text to understand the qualitative effects of using a less probable word for substitution.


# Task
Replace the last word of each sentence in a given multi-line text with the fiftieth most probable next word predicted by the GPT-2 model, while preserving original punctuation, and then display the modified text.

## Define Input Text

### Subtask:
Set the provided multi-line text as the input string for processing.


**Reasoning**:
As per the instructions, the `multi_line_text` variable has already been defined. I will print its content to verify that the input text is correctly set for processing.



In [33]:
print(f"Multi-line text: {multi_line_text}")

Multi-line text: One must have a mind of winter
To regard the frost and the boughs
Of the pine-trees crusted with snow;
And have been cold a long time
To behold the junipers shagged with ice,
The spruces rough in the distant glitter
Of the January sun; and not to think
Of any misery in the sound of the wind,
In the sound of a few leaves,
Which is the sound of the land
Full of the same wind
That is blowing in the same bare place
For the listener, who listens in the snow,
And, nothing himself, beholds
Nothing that is not there and the nothing that is.


## Process and Replace Words in Each Sentence (50th Probable)

### Subtask:
Iterate through each segmented sentence. For each sentence, extract its last word and the preceding context. Use the GPT-2 model to predict the probability distribution for the next token, and then identify the *fiftieth* most probable word. Replace the original last word of the sentence with this predicted word, ensuring any original punctuation is preserved.

## Process and Replace Words in Each Sentence (50th Probable)

### Subtask:
Iterate through each segmented sentence. For each sentence, extract its last word and the preceding context. Use the GPT-2 model to predict the probability distribution for the next token, and then identify the *fiftieth* most probable word. Replace the original last word of the sentence with this predicted word, ensuring any original punctuation is preserved.

## Process and Replace Words in Each Sentence (50th Probable)

### Subtask:
Iterate through each segmented sentence. For each sentence, extract its last word and the preceding context. Use the GPT-2 model to predict the probability distribution for the next token, and then identify the *fiftieth* most probable word. Replace the original last word of the sentence with this predicted word, ensuring any original punctuation is preserved.

## Process and Replace Words in Each Sentence (50th Probable)

### Subtask:
Iterate through each segmented sentence. For each sentence, extract its last word and the preceding context. Use the GPT-2 model to predict the probability distribution for the next token, and then identify the *fiftieth* most probable word. Replace the original last word of the sentence with this predicted word, ensuring any original punctuation is preserved.

 **Reasoning**:
As per the instructions, I need to implement the core logic for replacing the last word of each sentence with the 50th most probable word predicted by GPT-2. This involves extracting the last word and its punctuation, constructing the context, tokenizing the context, getting predictions from the GPT-2 model, identifying the 50th most probable token, decoding it, and then reconstructing the sentence with the new word and original punctuation.

 **Reasoning**:
As per the instructions, I need to implement the core logic for replacing the last word of each sentence with the 50th most probable word predicted by GPT-2. This involves extracting the last word and its punctuation, constructing the context, tokenizing the context, getting predictions from the GPT-2 model, identifying the 50th most probable token, decoding it, and then reconstructing the sentence with the new word and original punctuation.

 **Reasoning**:
As per the instructions, I need to implement the core logic for replacing the last word of each sentence with the 50th most probable word predicted by GPT-2. This involves extracting the last word and its punctuation, constructing the context, tokenizing the context, getting predictions from the GPT-2 model, identifying the 50th most probable token, decoding it, and then reconstructing the sentence with the new word and original punctuation.

 **Reason**:
As per the instructions, I need to implement the core logic for replacing the last word of each sentence with the 50th most probable word predicted by GPT-2. This involves extracting the last word and its punctuation, constructing the context, tokenizing the context, getting predictions from the GPT-2 model, identifying the 50th most probable token, decoding it, and then reconstructing the sentence with the new word and original punctuation.

 **Reason**:
As per the instructions, I need to implement the core logic for replacing the last word of each sentence with the 50th most probable word predicted by GPT-2. This involves extracting the last word and its punctuation, constructing the context, tokenizing the context, getting predictions from the GPT-2 model, identifying the 50th most probable token, decoding it, and then reconstructing the sentence with the new word and original punctuation.


 **Reason**:
As per the instructions, I need to implement the core logic for replacing the last word of each sentence with the 50th most probable word predicted by GPT-2. This involves extracting the last word and its punctuation, constructing the context, tokenizing the context, getting predictions from the GPT-2 model, identifying the 50th most probable token, decoding it, and then reconstructing the sentence with the new word and original punctuation.


**Reasoning**:
I will implement the core logic for replacing the last word of each sentence with the 50th most probable word predicted by GPT-2, similar to previous steps but adjusting the `top_k` value and index accordingly.



In [34]:
import torch
import re

modified_sentences = []

for sentence in sentences:
    words = sentence.split()

    if not words: # Handle empty sentences if any
        modified_sentences.append("")
        continue

    last_word_original = words[-1]

    # Extract punctuation from the original last word
    match = re.search(r'([^a-zA-Z0-9]*)$', last_word_original)
    punctuation = match.group(1) if match else ''

    # Reconstruct context (all words except the last one, without its punctuation for the context tokenization)
    if len(words) > 1:
        context_words = words[:-1]
        context = " ".join(context_words)
    else:
        context = ""

    # Tokenize the context
    input_ids = tokenizer.encode(context, return_tensors='pt')

    # Get model outputs
    with torch.no_grad():
        outputs = model(input_ids)

    # Get logits for the last token
    logits = outputs.logits[:, -1, :]

    # Apply softmax to get probabilities
    probabilities = torch.softmax(logits, dim=-1)

    # Get top 50 most probable tokens
    top_k = 50
    top_k_probs, top_k_indices = torch.topk(probabilities, top_k)

    # Identify the fiftieth most probable token ID (index 49)
    if len(top_k_indices[0]) > 49: # Ensure there are at least 50 tokens
        fiftieth_token_id = top_k_indices[0][49].item()
    else:
        # Fallback if fewer than 50 tokens were returned, use the last available
        fiftieth_token_id = top_k_indices[0][-1].item()

    # Decode the token ID back into a word
    predicted_word = tokenizer.decode(fiftieth_token_id).strip()

    # Construct the new last word with the predicted word and original punctuation
    new_last_word = predicted_word + punctuation

    # Reconstruct the modified sentence
    if len(words) > 1:
        modified_sentence = f"{context} {new_last_word}"
    else:
        # If the original sentence was a single word, it becomes just the new_last_word
        modified_sentence = new_last_word

    modified_sentences.append(modified_sentence)

print("Modified sentences (50th probable word):")
for s in modified_sentences:
    print(s)

Modified sentences (50th probable word):
One must have a mind of power
To regard the frost and the danger
Of the pine-trees crusted with sp;
And have been cold a long career
To behold the junipers shagged with golden,
The spruces rough in the distant east
Of the January sun; and not to set
Of any misery in the sound of the whip,
In the sound of a few t,
Which is the sound of the first
Full of the same !
That is blowing in the same bare pockets
For the listener, who listens in the presence,
And, nothing himself, with
Nothing that is not there and the nothing that came.


## Reconstruct Final Text

### Subtask:
Combine all the individually modified sentences back into a single, cohesive text.

## Reconstruct Final Text

### Subtask:
Combine all the individually modified sentences back into a single, cohesive text.

## Reconstruct Final Text

### Subtask:
Combine all the individually modified sentences back into a single, cohesive text.

## Reconstruct Final Text

### Subtask:
Combine all the individually modified sentences back into a single, cohesive text.

## Reconstruct Final Text

### Subtask:
Combine all the individually modified sentences back into a single, cohesive text.


**Reasoning**:
To combine all the individually modified sentences into a single cohesive text, I will join the strings in the `modified_sentences` list using a newline character as a delimiter and store the result in `final_modified_text`, then print it as per the instructions.



In [35]:
final_modified_text = "\n".join(modified_sentences)
print(f"Final Modified Text:\n{final_modified_text}")

Final Modified Text:
One must have a mind of power
To regard the frost and the danger
Of the pine-trees crusted with sp;
And have been cold a long career
To behold the junipers shagged with golden,
The spruces rough in the distant east
Of the January sun; and not to set
Of any misery in the sound of the whip,
In the sound of a few t,
Which is the sound of the first
Full of the same !
That is blowing in the same bare pockets
For the listener, who listens in the presence,
And, nothing himself, with
Nothing that is not there and the nothing that came.


## Display Final Text

### Subtask:
Print the complete text after all sentence-level modifications have been applied.


**Reasoning**:
As instructed, I will print the `final_modified_text` variable to display the complete text after all sentence-level modifications have been applied.



In [36]:
print(f"Final modified text:\n{final_modified_text}")

Final modified text:
One must have a mind of power
To regard the frost and the danger
Of the pine-trees crusted with sp;
And have been cold a long career
To behold the junipers shagged with golden,
The spruces rough in the distant east
Of the January sun; and not to set
Of any misery in the sound of the whip,
In the sound of a few t,
Which is the sound of the first
Full of the same !
That is blowing in the same bare pockets
For the listener, who listens in the presence,
And, nothing himself, with
Nothing that is not there and the nothing that came.


## Final Task

### Subtask:
Summarize the entire process, highlighting the use of the 50th most probable word from the GPT-2 model.


## Summary:

### Data Analysis Key Findings

*   **Input Text Initialization**: The initial multi-line text was successfully defined and displayed, serving as the basis for subsequent modifications.
*   **Core Logic for Word Replacement**: The central mechanism involved iterating through each sentence, extracting its context (all words except the last), and using the GPT-2 model to predict the next word.
*   **GPT-2 Model Application**: The GPT-2 model's capabilities were leveraged to obtain a probability distribution for the next token based on the preceding context. From this distribution, the 50th most probable token was identified.
*   **Token Decoding and Word Substitution**: The identified 50th most probable token was then decoded back into a human-readable word. This predicted word replaced the original last word of the sentence, carefully preserving any original punctuation (e.g., "One must have a mind of winter" became "One must have a mind of power", and "To behold the junipers shagged with ice," became "To behold the junipers shagged with golden,").
*   **Text Reconstruction**: All individually modified sentences were successfully recombined using newline characters to form a single, cohesive modified text.
*   **Final Output Display**: The complete modified text, reflecting all sentence-level replacements, was displayed as the final output of the process.

### Insights or Next Steps

*   This approach effectively demonstrates the fine-grained control that can be exercised over generative models by selecting specific probability ranks for word replacement, rather than just the top prediction.
*   Future enhancements could involve exploring different probability ranks (e.g., 5th, 100th) or applying conditional logic based on the original word's properties (e.g., part-of-speech) to achieve varied stylistic or semantic outcomes.
