In [1]:
!pip install transformers
!pip install datasets



In [2]:
file_name = "martineden.txt"
with open(file_name, 'r', encoding='utf-8') as file:
    martineden = file.read()

print(martineden)

﻿Martin Eden

by Jack London

“Let me live out my years in heat of blood!
    Let me lie drunken with the dreamer’s wine!
Let me not see this soul-house built of mud
    Go toppling to the dust a vacant shrine!”




CHAPTER I.


The one opened the door with a latch-key and went in, followed by a
young fellow who awkwardly removed his cap. He wore rough clothes that
smacked of the sea, and he was manifestly out of place in the spacious
hall in which he found himself. He did not know what to do with his
cap, and was stuffing it into his coat pocket when the other took it
from him. The act was done quietly and naturally, and the awkward young
fellow appreciated it. “He understands,” was his thought. “He’ll see me
through all right.”

He walked at the other’s heels with a swing to his shoulders, and his
legs spread unwittingly, as if the level floors were tilting up and
sinking down to the heave and lunge of the sea. The wide rooms seemed
too narrow for his rolling gait, and to himself he 

In [3]:
import torch
from transformers import GPT2LMHeadModel, GPT2Config, GPT2Tokenizer, GPT2LMHeadModel
from transformers import TextDataset, DataCollatorForLanguageModeling
from transformers import Trainer, TrainingArguments

# Load the GPT-2 model and tokenizer
model_name = "gpt2"
config = GPT2Config.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Define the dataset for training
train_file = "/content/martineden_test_data.txt"
test_file = "/content/martineden_train_data.txt"

def load_dataset(train_file, test_file, tokenizer):
    train_dataset = TextDataset(tokenizer=tokenizer, file_path=train_file, block_size=128)
    test_dataset = TextDataset(tokenizer=tokenizer, file_path=test_file, block_size=128)
    return train_dataset, test_dataset

train_dataset, test_dataset = load_dataset(train_file, test_file, tokenizer)

# Define the data collator
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

# Set up the training arguments
training_args = TrainingArguments(
    output_dir="output",
    overwrite_output_dir=True,
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    save_steps=10_000,
    save_total_limit=2,
    evaluation_strategy="epoch",
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)

# Train the model
trainer.train()

# Save the fine-tuned model
#trainer.save_model("fine_tuned_gpt2")



Epoch,Training Loss,Validation Loss
1,No log,3.643387
2,No log,3.584697
3,No log,3.570152


TrainOutput(global_step=42, training_loss=3.5574598766508556, metrics={'train_runtime': 60.5864, 'train_samples_per_second': 5.447, 'train_steps_per_second': 0.693, 'total_flos': 21556592640000.0, 'train_loss': 3.5574598766508556, 'epoch': 3.0})

In [7]:
trainer.save_model("fine_tuned_gpt2_martineden")

tokenizer.save_pretrained("fine_tuned_gpt2_martineden")

('fine_tuned_gpt2_martineden/tokenizer_config.json',
 'fine_tuned_gpt2_martineden/special_tokens_map.json',
 'fine_tuned_gpt2_martineden/vocab.json',
 'fine_tuned_gpt2_martineden/merges.txt',
 'fine_tuned_gpt2_martineden/added_tokens.json')

In [20]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load the fine-tuned GPT-2 model and tokenizer
model_name = "fine_tuned_gpt2_martineden"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Function to generate responses
def generate_response(prompt_text, model, tokenizer, max_length=200, num_return_sequences=1):
    input_ids = tokenizer.encode(prompt_text, return_tensors="pt")

    # Generate response
    output_sequences = model.generate(
        input_ids=input_ids,
        max_length=max_length,
        num_return_sequences=num_return_sequences,
        no_repeat_ngram_size=2,
        temperature=0.5,
        top_p=0.9,
    )

    # Decode the generated responses
    responses = []
    for response_id in output_sequences:
        response = tokenizer.decode(response_id, skip_special_tokens=True)
        responses.append(response)

    return responses

In [21]:
# Test the model with a prompt
prompt_text = "Martin Eden talk about bourgeois"
responses = generate_response(prompt_text, model, tokenizer)

for response in responses:
    print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Martin Eden talk about bourgeois life, and the bourgeois class, in the first place. Eden, who was a bourgeois, was not a proletarian. He was an anarchist.

He was, however, a revolutionary. In the words of the anarchist, "I am not an Anarchist, I am a Socialist."
The anarchist was the revolutionary, the one who had been the most active in organizing the working class. The anarchist had become the leader of a class that had taken over the world. It was no longer a working-class movement. But it was now a movement of people who were working for the sake of their own lives. They were the workers, not the capitalists. And they were not working to get a better life. Their lives were being taken away from them by the capitalist class and its agents. This was their life and they had no right to live it. That was why they did not want to work. There was nothing they could do. All they wanted was


In [22]:
# Test the model with a prompt
prompt_text = "Why didn't people help Martin Eden?"
responses = generate_response(prompt_text, model, tokenizer)

for response in responses:
    print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Why didn't people help Martin Eden?

He was a man of the people. He was the man who had been the first to come to the aid of his fellow-citizens.
And he was not a stranger to them. They had come for him. And he had not come because he wanted to help them, because they were his friends. It was because of them that he felt that they had taken him away. The people who were with him had done nothing to him, and he knew that. But he did not want to be seen with them again. Martin was in the midst of a great deal of trouble. His life was at stake. There was no way he could live. If he were to die, he would have to live with the rest of us. That was what he needed. In the end, Martin had to go. When he came back, the world was gone. All that was left was to see Martin. Then he went back to his own


In [23]:
# Test the model with a prompt
prompt_text = "Ruth talk about Martin Eden"
responses = generate_response(prompt_text, model, tokenizer)

for response in responses:
    print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Ruth talk about Martin Eden, and he was a bit of a jerk.

He was not a good enough man to be a man. He was too much of an idiot to have been a great man,
and he had been too good a fool to know that he could be anything but a
man. It was the same with him. The man was so much a coward, so stupid, that
he could not even stand up to the world. And yet he did not know how to stand
up to it. Martin was an old man who had lost his mind. His mind was
troubled by the thought of the idea of being a hero. But he knew that it was wrong. That he
was a bad man and that the way he acted was to make himself a better man
by doing what he thought was right. In the end, he would have to live
in a world of his own. If he were a poor man he might


In [25]:
# Test the model with a prompt
prompt_text = "Martin Eden talk about Ruth"
responses = generate_response(prompt_text, model, tokenizer)

for response in responses:
    print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Martin Eden talk about Ruth, and how she was a good girl.

She was the first woman to marry a man. She was not a woman of the world. But she had been a girl, a child, an old woman, who had lived in the shadow of her father. And she knew that she could not live without him. He was dead. Ruth was gone. The world was lost. It was forgotten. There was no one to look for. No one. Nothing. All that was left was to live. That was it. Life was all that mattered. Everything was for nothing. In the end, life was just a dream. A dream of a life that would last forever. When life is gone, it will be gone forever, forever and ever. If life were gone and life had not been, Ruth would have been dead, too. Now, she would be dead and gone again. Then, in a moment of despair, the dream would come true.


In [27]:
import torch
from transformers import GPT2LMHeadModel, GPT2Config, GPT2Tokenizer, GPT2LMHeadModel
from transformers import TextDataset, DataCollatorForLanguageModeling
from transformers import Trainer, TrainingArguments

# Load the GPT-2 model and tokenizer
model_name = "gpt2"
config = GPT2Config.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Define the dataset for training
train_file = "/content/martineden_test_data.txt"
test_file = "/content/martineden_train_data.txt"

def load_dataset(train_file, test_file, tokenizer):
    train_dataset = TextDataset(tokenizer=tokenizer, file_path=train_file, block_size=128)
    test_dataset = TextDataset(tokenizer=tokenizer, file_path=test_file, block_size=128)
    return train_dataset, test_dataset

train_dataset, test_dataset = load_dataset(train_file, test_file, tokenizer)

# Define the data collator
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

# Set up the training arguments
training_args = TrainingArguments(
    output_dir="output",
    overwrite_output_dir=True,
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    save_steps=10_000,
    save_total_limit=2,
    evaluation_strategy="epoch",
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)

# Train the model
trainer.train()

# Save the fine-tuned model
#trainer.save_model("fine_tuned_gpt2")

Epoch,Training Loss,Validation Loss
1,No log,3.711381
2,No log,3.649012
3,No log,3.622351


TrainOutput(global_step=21, training_loss=3.7029749552408853, metrics={'train_runtime': 61.9478, 'train_samples_per_second': 5.327, 'train_steps_per_second': 0.339, 'total_flos': 21556592640000.0, 'train_loss': 3.7029749552408853, 'epoch': 3.0})

In [28]:
trainer.save_model("fine_tuned_gpt2_martineden1")

tokenizer.save_pretrained("fine_tuned_gpt2_martineden1")

('fine_tuned_gpt2_martineden1/tokenizer_config.json',
 'fine_tuned_gpt2_martineden1/special_tokens_map.json',
 'fine_tuned_gpt2_martineden1/vocab.json',
 'fine_tuned_gpt2_martineden1/merges.txt',
 'fine_tuned_gpt2_martineden1/added_tokens.json')

In [29]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load the fine-tuned GPT-2 model and tokenizer
model_name = "fine_tuned_gpt2_martineden1"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Function to generate responses
def generate_response(prompt_text, model, tokenizer, max_length=200, num_return_sequences=1):
    input_ids = tokenizer.encode(prompt_text, return_tensors="pt")

    # Generate response
    output_sequences = model.generate(
        input_ids=input_ids,
        max_length=max_length,
        num_return_sequences=num_return_sequences,
        no_repeat_ngram_size=2,
        temperature=0.5,
        top_p=0.9,
    )

    # Decode the generated responses
    responses = []
    for response_id in output_sequences:
        response = tokenizer.decode(response_id, skip_special_tokens=True)
        responses.append(response)

    return responses

In [30]:
# Test the model with a prompt
prompt_text = "Martin Eden talk about literature"
responses = generate_response(prompt_text, model, tokenizer)

for response in responses:
    print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Martin Eden talk about literature and the world.

He was a writer, and he was not a poet. He was an artist. But he had no idea of the art of writing. His life was filled with the thought that he could write. It was the only way he knew how to write, he thought. And he did not know how. The only thing he understood was that writing was art. That was his life. When he wrote, it was for the sake of his own life, not for his friends. In the end, writing became a means of life for him. Writing was what he wanted to do. If he were to live, his writing would be a way of living. There was no way to get away from it. No way. All he needed was to be free. So he began writing, in the hope that it would help him to become a better writer. Then he started writing again. This time he went to the library. At first he


In [36]:
# Load the GPT-2 model and tokenizer
model_name = "gpt2"
config = GPT2Config.from_pretrained(model_name)
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Define the dataset for training
train_file = "/content/martineden_test_data.txt"
test_file = "/content/martineden_train_data.txt"

def load_dataset(train_file, test_file, tokenizer):
    train_dataset = TextDataset(tokenizer=tokenizer, file_path=train_file, block_size=128)
    test_dataset = TextDataset(tokenizer=tokenizer, file_path=test_file, block_size=128)
    return train_dataset, test_dataset

train_dataset, test_dataset = load_dataset(train_file, test_file, tokenizer)

# Define the data collator
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

# Set up the training arguments
training_args = TrainingArguments(
    output_dir="output",
    overwrite_output_dir=True,
    num_train_epochs=5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    save_steps=10_000,
    save_total_limit=2,
    evaluation_strategy="epoch",
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
)

# Train the model
trainer.train()

# Save the fine-tuned model
#trainer.save_model("fine_tuned_gpt2")

Epoch,Training Loss,Validation Loss
1,No log,3.634765
2,No log,3.563796
3,No log,3.553316
4,No log,3.554499
5,No log,3.552311


TrainOutput(global_step=70, training_loss=3.3316850934709823, metrics={'train_runtime': 104.2557, 'train_samples_per_second': 5.275, 'train_steps_per_second': 0.671, 'total_flos': 35927654400000.0, 'train_loss': 3.3316850934709823, 'epoch': 5.0})

In [37]:
trainer.save_model("fine_tuned_gpt2_martineden2")

tokenizer.save_pretrained("fine_tuned_gpt2_martineden2")

('fine_tuned_gpt2_martineden2/tokenizer_config.json',
 'fine_tuned_gpt2_martineden2/special_tokens_map.json',
 'fine_tuned_gpt2_martineden2/vocab.json',
 'fine_tuned_gpt2_martineden2/merges.txt',
 'fine_tuned_gpt2_martineden2/added_tokens.json')

In [39]:
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load the fine-tuned GPT-2 model and tokenizer
model_name = "fine_tuned_gpt2_martineden2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# Function to generate responses
def generate_response(prompt_text, model, tokenizer, max_length=200, num_return_sequences=1):
    input_ids = tokenizer.encode(prompt_text, return_tensors="pt")

    # Generate response
    output_sequences = model.generate(
        input_ids=input_ids,
        max_length=max_length,
        num_return_sequences=num_return_sequences,
        no_repeat_ngram_size=2,
        temperature=0.5,
        top_p=0.9,
    )

    # Decode the generated responses
    responses = []
    for response_id in output_sequences:
        response = tokenizer.decode(response_id, skip_special_tokens=True)
        responses.append(response)

    return responses

In [40]:
# Test the model with a prompt
prompt_text = "What does Martin think about bourgeois morality?"
responses = generate_response(prompt_text, model, tokenizer)

for response in responses:
    print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


What does Martin think about bourgeois morality?

He is not a man of the past. He is a bourgeois.
And he is afraid of bourgeois life. Martin
is afraid that bourgeois society will destroy him. And he
will not be able to live in it. But he will live with it, and he knows that
it is the bourgeois way of life that he has come to despise. It is
the bourgeois method of living that is destroying him, the way that the
bourgeois class has destroyed him and the class that has made him a
man. The bourgeois class is no longer the bourgeoisie, but the proletariat. In
this way, Martin is becoming a proletarian. This is why he was
convinced that Martin was a fool. That he had been deceived. For
Martin, he knew that his own life was not the life of a class
and that it was the work of an elite class. His life
was the working of his


In [41]:
# Test the model with a prompt
prompt_text = "Martin's working style"
responses = generate_response(prompt_text, model, tokenizer)

for response in responses:
    print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Martin's working style was to write and write. He was a writer, and he was not a poet.

He was an artist, too. His work was often too much for the eyes. It was too hard for them to see. They could not see him. But he could see them. And he knew that. The eyes were too busy looking at him to notice. When he saw them, he felt a great deal of relief. Then he remembered that he had been in the same room with the other men, that they had come to meet him, had seen him in a different way, in different ways, with different faces. That relief was the first thing he noticed. There was no relief in that relief, no satisfaction in it. In the moment of his relief he thought of the men who had met him and had made him feel good. Of the women who were with him at the dinner party, of those who knew him well, there was nothing in his life


In [42]:
# Test the model with a prompt
prompt_text = "Martin and Ruth's differences "
responses = generate_response(prompt_text, model, tokenizer)

for response in responses:
    print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Martin and Ruth's differences  were not so much that they were not in love, but that the love of the man was not the same as the desire of love.

He was a man of a certain class, and he was the class of men who were willing to accept the life of others. He was willing, in the words of one of his friends, to live in a society where he could be loved. But he did not live that way. In the midst of all that he had done, he felt that his life was in danger. It was time to get out of it. The life he loved was gone. His life had been destroyed. And yet he remained in it, with the hope that it would be the last. That hope was vain. There was no hope of life. Life was dead. All hope had gone to the hands of those who had loved him. They had not loved the way he liked them. Their love had died. When they
