<a href="https://colab.research.google.com/github/BreakoutMentors/Data-Science-and-Machine-Learning/blob/main/machine_learning/lesson%204%20-%20ML%20Apps/Streamlit/Spongebob_Generation/HuggingFace_Text_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

> Note: Always open in Colab for the best learning experience.

# NLP: Text-Generation
The dataset that was used is a collection of [SpongeBob scripts](https://www.kaggle.com/mikhailgaerlan/spongebob-squarepants-completed-transcripts), and here the ['distilgpt2'](https://huggingface.co/distilgpt2) model which is the smallest version of OpenAi's GPT2 model.

The model was finetuned on the dataset we have, which allowed it to generate text that relates to the original text.

We use a library called [HuggingFace](https://huggingface.co/) that contains a lot of pretrained transformers for NLP tasks like text-generation, language translation, summarization, and sentiment analysis. It is also great that this library is built on top of PyTorch so some of the parameters seen in their library will look familiar!

To complete this project, we will do these steps:

1. Upload `get_kaggle_data.py` from Github Repo to use it in this colab notebook
2. Install HuggingFace transformers and datasets along with [Git Large File Storage](https://git-lfs.github.com/) to upload model to HuggingFace
3. Upload kaggle.json file to download dataset
4. Download and Prepare the dataset, and Initializing the model
5. Train the Model
6. Downloading and Uploading Model

**Before doing any of this, it is important for you to make your own [HuggingFace account](https://huggingface.co/join)**

# Installing HuggingFace libraries and Git Large File Storage

To use git in this notebook, you need to declare the email and username that you used on HuggingFace so it can now who is uploading to it.

In [None]:
!pip install transformers
!pip install datasets
!pip install hf-lfs

In [None]:
# YOU NEED TO DO THIS
!git config --global user.email "your@email.com"
!git config --global user.name "username"

# Upload Kaggle.json

In [1]:
from google.colab import files
from IPython.utils import io
import os
files.upload()
os.system("mkdir -p ~/.kaggle")
os.system("cp kaggle.json ~/.kaggle/")
os.system("chmod 600 ~/.kaggle/kaggle.json")

Saving kaggle.json to kaggle.json


0

# Download and Prepare the dataset, and Initializing the model

These two functions, `prepare_data` and `tokenize_function` are both included in the `model.py` file in the repository. The `prepare_data` function is responsible for downloadin the data from Kaggle then creates a [HuggingFace Dataset](https://huggingface.co/docs/datasets/) from all the txt files. The `tokenizer_function` function is responsible for taking the dataset from the prior function and tokenizes the text along with making blocks of text for longer sequences.

In [4]:
from kaggle.api.kaggle_api_extended import KaggleApi
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer
from datasets import load_dataset

from get_kaggle_data import download_dataset
import os


def prepare_data(download_path, kaggle_link, kaggle_api):
    """
    This function downloads the dataset, and returns a HuggingFace Dataset with txt files.
    HuggingFace automatically groups the dataset by having each sample to a single line of text.
    The dataset contains the text with ['text'] key and ['attention_mask'] for the transformer.
    """

    # Getting current directory of where this file is running
    original_directory = os.getcwd()

    # Changing directory to the desired path, it will be created if does not exist
    try:
        os.chdir(download_path)
    except FileNotFoundError:
        os.mkdir(download_path)
        os.chdir(download_path)

    # Getting all the names of the files of the path 'download_path'
    directory_files = set(os.listdir())

    # Downloading the dataset
    download_dataset(kaggle_link, kaggle_api)
    # Getting name of folder that was unzipped
    folder_name = list(set(os.listdir()) - directory_files)[0]

    # Changing directory to current folder to read the txt files
    os.chdir(os.path.join(download_path, folder_name))

    # Getting all the names of the txt files
    file_names = os.listdir()

    # Loading the dataset with HuggingFace
    datasets = load_dataset('text', data_files={'train':file_names[0:350], 'valid':file_names[350:]})

    # Changing the current directory to the original directory where this file exists
    os.chdir(original_directory)
    return datasets

def tokenize_function(examples, tokenizer, block_size):
    """
    This function will take the text dataset and complete this steps below

    1. Tokenize the entire dataset
    2. Concatenate all examples from 2d list into a 1D
    3. Create blocks of the concatenated examples with a certain block size
    4. Create labels for the dataset
    """

    #1. Tokenize the entire dataset
    tokenized_examples = tokenizer(examples["text"])

    #2. Concatenate all examples from 2d list into a 1D
    # Going to flatten ['text'], ['input_ids'], ['attention_masks] from 2D lists to 1D lists or concatenate them
    concatenated_examples = {key:sum(tokenized_examples[key], []) for key in tokenized_examples.keys()}

    #3. Create blocks of the concatenated examples with a certain block size
    # Getting the total number of words
    num_tokens = len(concatenated_examples['input_ids'])
    # Getting the number of blocks; Cutting the that are left over that cannot make another block
    total_length = (num_tokens // block_size) * block_size

    results = {}
    for key, value in concatenated_examples.items():
        blocks = []
        for i in range(0, total_length, block_size):
            blocks.append(value[i: i+block_size])

        results[key] = blocks

    #4. Create labels for the dataset
    results['labels'] = results['input_ids'].copy()

    return results

In [5]:
# Getting kaggle link and access to your kaggle api
kaggle_link = "https://www.kaggle.com/mikhailgaerlan/spongebob-squarepants-completed-transcripts"
api = KaggleApi()
api.authenticate()

In [6]:
model_name = 'distilgpt2'
    
# Defining Tokenizer and Model
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token # Adding padding token to the tokenizer
model = AutoModelForCausalLM.from_pretrained(model_name)

# Getting datasets
path = os.getcwd()
raw_datasets = prepare_data(path, kaggle_link, api)

# Tokenize datasets
block_size = 128

tokenized_datasets = raw_datasets.map(tokenize_function,
                                      batched=True, 
                                      batch_size=1000, 
                                      remove_columns=['text'],
                                      fn_kwargs={'tokenizer':tokenizer, 'block_size':block_size})

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=762.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1042301.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1355256.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=352833716.0, style=ProgressStyle(descri…


Unzipping spongebob-squarepants-completed-transcripts.zip
Deleted spongebob-squarepants-completed-transcripts.zip


Using custom data configuration default-3fec0cb6dd1401d7


Downloading and preparing dataset text/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/text/default-3fec0cb6dd1401d7/0.0.0/e16f44aa1b321ece1f87b07977cc5d70be93d69b20486d6dacd62e12cf25c9a5...


HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))



HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

Dataset text downloaded and prepared to /root/.cache/huggingface/datasets/text/default-3fec0cb6dd1401d7/0.0.0/e16f44aa1b321ece1f87b07977cc5d70be93d69b20486d6dacd62e12cf25c9a5. Subsequent calls will reuse this data.


HBox(children=(FloatProgress(value=0.0, max=44.0), HTML(value='')))

Token indices sequence length is longer than the specified maximum sequence length for this model (1673 > 1024). Running this sequence through the model will result in indexing errors





HBox(children=(FloatProgress(value=0.0, max=6.0), HTML(value='')))




# Train the Model

Here I used a handful of training arguments available for us to use, but there are way more available [here](https://huggingface.co/transformers/main_classes/trainer.html#trainingarguments).

These are the ones used here:

1. output_dir: The output directory where the model predictions and checkpoints will be written.
2. num_train_epochs: The number of epochs to train.
3. evaluation_strategy: The evaluation strategy to adopt during training.
    * "no": No evaluation is done during training.
    * "steps": Evaluation is done (and logged) every eval_steps.
    * "epoch": Evaluation is done at the end of each epoch.
4. save_strategy: The checkpoint save strategy to adopt during training.
    * "no": No save is done during training.
    * "epoch": Save is done at the end of each epoch.
    * "steps": Save is done every save_steps.
5. learning_rate: learning rate for the optimizer, which is AdamW automatically
    * 5e^-5 = 5 x 10^(-5) = .00005
6. load_best_model_at_end: Whether or not to load the best model found during training at the end of training.

After defining the training arguments, we use the [Trainer](https://huggingface.co/transformers/main_classes/trainer.html#trainer) class that is reponsible for training the model.

In [7]:
# Training Model
training_args = TrainingArguments(output_dir='train-test',
                                   num_train_epochs=10,
                                   evaluation_strategy='epoch',
                                   save_strategy='epoch',
                                   learning_rate=5e-5,
                                   load_best_model_at_end=True
                                 )

# Declaring a trainer object to train the model
trainer = Trainer(model,
                  training_args,
                  train_dataset=tokenized_datasets['train'],
                  eval_dataset=tokenized_datasets['valid'])

# Training the model
trainer.train()

***** Running training *****
  Num examples = 9458
  Num Epochs = 10
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 11830


Epoch,Training Loss,Validation Loss
1,3.1934,3.126667
2,3.0102,3.078284
3,2.8832,3.062374
4,2.8184,3.058191
5,2.7569,3.061054
6,2.701,3.064501
7,2.6593,3.06966
8,2.6301,3.077829
9,2.6168,3.08283
10,2.5889,3.087097


***** Running Evaluation *****
  Num examples = 1290
  Batch size = 8
Saving model checkpoint to train-test/checkpoint-1183
Configuration saved in train-test/checkpoint-1183/config.json
Model weights saved in train-test/checkpoint-1183/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 1290
  Batch size = 8
Saving model checkpoint to train-test/checkpoint-2366
Configuration saved in train-test/checkpoint-2366/config.json
Model weights saved in train-test/checkpoint-2366/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 1290
  Batch size = 8
Saving model checkpoint to train-test/checkpoint-3549
Configuration saved in train-test/checkpoint-3549/config.json
Model weights saved in train-test/checkpoint-3549/pytorch_model.bin
***** Running Evaluation *****
  Num examples = 1290
  Batch size = 8
Saving model checkpoint to train-test/checkpoint-4732
Configuration saved in train-test/checkpoint-4732/config.json
Model weights saved in train-test/checkpoint-4732/py

TrainOutput(global_step=11830, training_loss=2.7916958600106323, metrics={'train_runtime': 1384.6619, 'train_samples_per_second': 68.305, 'train_steps_per_second': 8.544, 'total_flos': 5949919824445440.0, 'train_loss': 2.7916958600106323, 'epoch': 10.0})

# Downloading and Uploading Model

Here you are presented with three options:
1. Upload the Model to HuggingFace Only
    * This is recommended since storing the model is too large for your personal github repo, but you can still use the model later.

2. Both Save and Upload the Model
    * It takes a long time to download.
3. Save the Model Only
    * It takes a long time to download.

For you to upload the model you need to sign in to your HuggingFace account when running the code below:

In [8]:
!huggingface-cli login


        _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
        _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
        _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
        _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
        _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

        
Username: Coldestadam
Password: 
Login successful
Your token: iSQMKonzgVejhbFFYuWfsfaOmQJxjvcgXLGQcggRXQqIXvLdZSNcXfabRCXuSVCDxDUXiapjrKwXASdDqKbKKQioetNeAecCbTipobDQCiZVEjPdgHikmjGlldIBDuRD 

Your token has been saved to /root/.huggingface/token


## Upload the Model only (Suggestion)

I recommend using this method, since if you only save it you cannot upload it to github because of the large file sizes.

In [11]:
repo_name = "Breakout_Mentors_SpongeBob_Model"

# This only uploads the model
model.push_to_hub(repo_name)

# Upload the tokenizer to the repo as well
tokenizer.push_to_hub('Breakout_Mentors_SpongeBob_Model')

Configuration saved in Breakout_Mentors_SpongeBob_Model/config.json
Model weights saved in Breakout_Mentors_SpongeBob_Model/pytorch_model.bin
tokenizer config file saved in Breakout_Mentors_SpongeBob_Model/tokenizer_config.json
Special tokens file saved in Breakout_Mentors_SpongeBob_Model/special_tokens_map.json


'https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/commit/7d6ffb51cc99b247810054cf633e5baed74778a2'

## Save the model and Upload it

In [25]:
repo_name = "Breakout_Mentors_SpongeBob_Model"

# This saves the model and uploads it
model.save_pretrained('model', push_to_hub=True, repo_name)

# This saves the tokenizer in the same directory
tokenizer.save_pretrained('model')

# Upload the tokenizer to the repo as well
tokenizer.push_to_hub('Breakout_Mentors_SpongeBob_Model')

# Zipping model directory to model.zip file
!zip -r model.zip model

# Downloading model.zip
from google.colab import files
files.download('/content/model.zip')



## Save the model only

In [None]:
# Saving model and tokenizer to the 'model' directory
model.save_pretrained('model')
tokenizer.save_pretrained('model')

# Zipping model directory to model.zip file
!zip -r model.zip model

# Downloading model.zip
from google.colab import files
files.download('/content/model.zip')

# Loading model with Huggingface Pipeline

So after uploading it or saving the model, you can use your model now to generate text! You can use [HuggingFace Pipelines](https://huggingface.co/course/chapter1/3?fw=pt) to easily do this!

In [12]:
from transformers import pipeline

# If you uploaded it, the pipeline needs your username to reference your model
HF_username = ''

# Change this to the dir of your saved model if you did not upload it
repo_path = HF_username + "/Breakout_Mentors_SpongeBob_Model"

# Creating a Pipeline with my model to use it
generator = pipeline("text-generation", model=repo_path)

https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmppv6n7sme


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=936.0, style=ProgressStyle(description_…

storing https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/a3f827d65e8a2b6f2f8c3c76ca9dc7ad036e0ab1f003c632e65814e04602b8bd.d8865ef79c71785cad665f5230d056fbd5be9448c4b26657278c3f5605824d28
creating metadata file for /root/.cache/huggingface/transformers/a3f827d65e8a2b6f2f8c3c76ca9dc7ad036e0ab1f003c632e65814e04602b8bd.d8865ef79c71785cad665f5230d056fbd5be9448c4b26657278c3f5605824d28
loading configuration file https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/a3f827d65e8a2b6f2f8c3c76ca9dc7ad036e0ab1f003c632e65814e04602b8bd.d8865ef79c71785cad665f5230d056fbd5be9448c4b26657278c3f5605824d28
Model config GPT2Config {
  "_name_or_path": "distilgpt2",
  "_num_labels": 1,
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdro




loading configuration file https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/a3f827d65e8a2b6f2f8c3c76ca9dc7ad036e0ab1f003c632e65814e04602b8bd.d8865ef79c71785cad665f5230d056fbd5be9448c4b26657278c3f5605824d28
Model config GPT2Config {
  "_name_or_path": "distilgpt2",
  "_num_labels": 1,
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2LMHeadModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "gradient_checkpointing": false,
  "id2label": {
    "0": "LABEL_0"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0
  },
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 6,
  "n_positions": 1024,
  "resid_pdrop": 0.1,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labe

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=333972957.0, style=ProgressStyle(descri…

storing https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/cd92467e633cf0c7769021dd2879f8cb215fcfbabc7ace019d7ab71b5e628c14.22121e4212db51c67883350b31608a0458800b2f2fd858b85c34a0256f630def
creating metadata file for /root/.cache/huggingface/transformers/cd92467e633cf0c7769021dd2879f8cb215fcfbabc7ace019d7ab71b5e628c14.22121e4212db51c67883350b31608a0458800b2f2fd858b85c34a0256f630def
loading weights file https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/cd92467e633cf0c7769021dd2879f8cb215fcfbabc7ace019d7ab71b5e628c14.22121e4212db51c67883350b31608a0458800b2f2fd858b85c34a0256f630def





All model checkpoint weights were used when initializing GPT2LMHeadModel.

All the weights of GPT2LMHeadModel were initialized from the model checkpoint at Coldestadam/Breakout_Mentors_SpongeBob_Model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use GPT2LMHeadModel for predictions without further training.
https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpccbt2hgq


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=242.0, style=ProgressStyle(description_…

storing https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/ca86c7147ae8f28a6e2a458e5252e46d326204c78c7552cd6d74cccb60506632.78796cb538f90cb7a099bd6aebd7c443796b4bc5a6edf14d41930fa9fc965842
creating metadata file for /root/.cache/huggingface/transformers/ca86c7147ae8f28a6e2a458e5252e46d326204c78c7552cd6d74cccb60506632.78796cb538f90cb7a099bd6aebd7c443796b4bc5a6edf14d41930fa9fc965842





https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/vocab.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpt3lze0_x


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=798156.0, style=ProgressStyle(descripti…

storing https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/vocab.json in cache at /root/.cache/huggingface/transformers/a1a95dfed7c42c5ed3fbd2ef707b80121d1b6bf2e1c3331011611412fd5bc429.a1b97b074a5ac71fad0544c8abc1b3581803d73832476184bde6cff06a67b6bb
creating metadata file for /root/.cache/huggingface/transformers/a1a95dfed7c42c5ed3fbd2ef707b80121d1b6bf2e1c3331011611412fd5bc429.a1b97b074a5ac71fad0544c8abc1b3581803d73832476184bde6cff06a67b6bb





https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/merges.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpvekswhjc


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456356.0, style=ProgressStyle(descripti…

storing https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/merges.txt in cache at /root/.cache/huggingface/transformers/ab1d51ec6b4b6fb0841e71c24bc5549e8dc72a1cd1572876de2c3eb67a9ecb2c.f5b91da9e34259b8f4d88dbc97c740667a0e8430b96314460cdb04e86d4fc435
creating metadata file for /root/.cache/huggingface/transformers/ab1d51ec6b4b6fb0841e71c24bc5549e8dc72a1cd1572876de2c3eb67a9ecb2c.f5b91da9e34259b8f4d88dbc97c740667a0e8430b96314460cdb04e86d4fc435





https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpcrrfhq41


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1355270.0, style=ProgressStyle(descript…

storing https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/b6275ca3f7c26c20c8df954223d48fb4d5db33f7576da8a2a752522abc36a1d9.c83461319bb31d7584a5150318794d1f904cdcc960158c8c411bf05676b432c8
creating metadata file for /root/.cache/huggingface/transformers/b6275ca3f7c26c20c8df954223d48fb4d5db33f7576da8a2a752522abc36a1d9.c83461319bb31d7584a5150318794d1f904cdcc960158c8c411bf05676b432c8





https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/special_tokens_map.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmplt_qywi2


HBox(children=(FloatProgress(value=0.0, description='Downloading', max=120.0, style=ProgressStyle(description_…

storing https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/special_tokens_map.json in cache at /root/.cache/huggingface/transformers/f92d839bf9e080fdf9fa9142aa919cef3db6763ed652f6861375ba6280fca53f.fbf4061fb19cfc48adf3510a9b4a6037fcf9cdf64fbdb306b328bafb3092779b
creating metadata file for /root/.cache/huggingface/transformers/f92d839bf9e080fdf9fa9142aa919cef3db6763ed652f6861375ba6280fca53f.fbf4061fb19cfc48adf3510a9b4a6037fcf9cdf64fbdb306b328bafb3092779b





loading file https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/vocab.json from cache at /root/.cache/huggingface/transformers/a1a95dfed7c42c5ed3fbd2ef707b80121d1b6bf2e1c3331011611412fd5bc429.a1b97b074a5ac71fad0544c8abc1b3581803d73832476184bde6cff06a67b6bb
loading file https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/merges.txt from cache at /root/.cache/huggingface/transformers/ab1d51ec6b4b6fb0841e71c24bc5549e8dc72a1cd1572876de2c3eb67a9ecb2c.f5b91da9e34259b8f4d88dbc97c740667a0e8430b96314460cdb04e86d4fc435
loading file https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/b6275ca3f7c26c20c8df954223d48fb4d5db33f7576da8a2a752522abc36a1d9.c83461319bb31d7584a5150318794d1f904cdcc960158c8c411bf05676b432c8
loading file https://huggingface.co/Coldestadam/Breakout_Mentors_SpongeBob_Model/resolve/main/added_tokens.json from cache at None
load

In [15]:
# Testing Model
output = generator("SpongeBob", min_length=20)[0]['generated_text']
print(output)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


SpongeBob: [shouts] This is my turn, Patrick!Patrick: [shouts] This is my turn, Patrick, you saved my life! [holds up three of his shoes]SpongeBob: [shouts]


# Streamlit App
After saving or uploading your model, you can look at `launch.py` to launch the app.