Source: 

*   https://pypi.org/project/gpt-2-simple/#description
*   https://medium.com/@stasinopoulos.dimitrios/a-beginners-guide-to-training-and-generating-text-using-gpt2-c2f2e1fbd10a
*   https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce#scrollTo=VHdTL8NDbAh3
*  https://github.com/ak9250/gpt-2-colab
*  https://www.aiweirdness.com/d-and-d-character-bios-now-making-19-03-15/
*  https://minimaxir.com/2019/09/howto-gpt2/





[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zawemi/GS2DIT/blob/main/Class%203/gpt_2_shakespeare.ipynb#scrollTo=4tIUvFbLMUuE)

#Let's teach AI writing like a Shakespeare 🎓

##Installing the model

In [4]:
#install the library we'll use today
!pip install gpt-2-simple

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


##Generating text with basic model

###Importing and loading necessary components

In [5]:
#import what we need
import gpt_2_simple as gpt2 #for gpt-2 (our AI model)
import os #lets us doing things with files and folders
import requests #this one helps to dowload from the internet

In [6]:
#and let's download our AI model
gpt2.download_gpt2()   # model is saved into current directory under /models/124M/

Fetching checkpoint: 1.05Mit [00:00, 333Mit/s]                                                      
Fetching encoder.json: 1.05Mit [00:00, 5.02Mit/s]
Fetching hparams.json: 1.05Mit [00:00, 640Mit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 498Mit [00:07, 68.1Mit/s]                                  
Fetching model.ckpt.index: 1.05Mit [00:00, 508Mit/s]                                                
Fetching model.ckpt.meta: 1.05Mit [00:00, 7.65Mit/s]
Fetching vocab.bpe: 1.05Mit [00:00, 5.83Mit/s]


In [7]:
#strating the session so we can play with the gpt-2 model
sess = gpt2.start_tf_sess()

In [8]:
#we load the model from file to use it
gpt2.load_gpt2(sess, run_name='124M', checkpoint_dir='models')

Loading checkpoint models/124M/model.ckpt


###Text generation

In [None]:
#this is how we would start model statement
prefix = "Is there a second Earth?"

In [None]:
#the model is generating text
gpt2.generate(sess, run_name='124M', checkpoint_dir='models', prefix=prefix, length=50)

Is there a second Earth?

I don't know. I don't think I can understand that. I mean, I'm not saying it's a planet, but it's a planet with a planet. At the end of the day, we don't know what happened


##Generating text with improved (finetuned) model

**IMPORTANT**
</br>Restart the runtime (Runtime -> Restart runtime)

###Importing and loading necessary components

In [1]:
#import what we need
import gpt_2_simple as gpt2 #for gpt-2 (our AI model)
import os #lets us doing things with files and folders
import requests #this one helps to dowload from the internet

In [2]:
#get nietzsche texts
!wget "https://s3.amazonaws.com/text-datasets/nietzsche.txt"

--2023-05-24 12:12:48--  https://s3.amazonaws.com/text-datasets/nietzsche.txt
Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.217.225.16, 54.231.138.104, 54.231.164.152, ...
Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.217.225.16|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 600901 (587K) [text/plain]
Saving to: ‘nietzsche.txt.2’


2023-05-24 12:12:48 (4.00 MB/s) - ‘nietzsche.txt.2’ saved [600901/600901]



In [3]:
#game of thrones from https://www.kaggle.com/datasets/khulasasndh/game-of-thrones-books?select=001ssb.txt
!gdown "1CrL1wde_NGO68i5Prd_UNA_oW0cGQsxg&confirm=t"
!mv /content/001ssb.txt /content/got1.txt

Downloading...
From: https://drive.google.com/uc?id=1CrL1wde_NGO68i5Prd_UNA_oW0cGQsxg&confirm=t
To: /content/001ssb.txt
  0% 0.00/1.63M [00:00<?, ?B/s]100% 1.63M/1.63M [00:00<00:00, 106MB/s]


In [None]:
#let's dowload a file with all Shakespeare plays
!wget "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
!mv /content/input.txt /content/shakespeare.txt

--2023-03-21 15:19:13--  https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1115394 (1.1M) [text/plain]
Saving to: ‘input.txt’


2023-03-21 15:19:13 (19.9 MB/s) - ‘input.txt’ saved [1115394/1115394]



In [4]:
#strating the session so we can play with the gpt-2 model
sess = gpt2.start_tf_sess()

###Teaching our model

In [5]:
#finetuning with shakespeare.txt (which, to be honest, means that we are teaching the model how to write like a shakespeare)
#it takes a lot of time (~15min)...
gpt2.finetune(sess, 'got1.txt', steps=500)   # steps is max number of training steps

Loading checkpoint models/124M/model.ckpt
Loading dataset...


100%|██████████| 1/1 [00:01<00:00,  1.67s/it]


dataset has 433157 tokens
Training...
[1 | 7.12] loss=3.41 avg=3.41
[2 | 9.25] loss=3.42 avg=3.42
[3 | 11.39] loss=3.34 avg=3.39
[4 | 13.54] loss=3.20 avg=3.34
[5 | 15.71] loss=3.27 avg=3.33
[6 | 17.90] loss=3.27 avg=3.32
[7 | 20.05] loss=3.28 avg=3.31
[8 | 22.23] loss=3.14 avg=3.29
[9 | 24.39] loss=3.15 avg=3.27
[10 | 26.54] loss=3.16 avg=3.26
[11 | 28.71] loss=3.28 avg=3.26
[12 | 30.89] loss=3.17 avg=3.26
[13 | 33.08] loss=3.24 avg=3.25
[14 | 35.27] loss=3.29 avg=3.26
[15 | 37.44] loss=3.27 avg=3.26
[16 | 39.62] loss=3.11 avg=3.25
[17 | 41.82] loss=3.07 avg=3.24
[18 | 44.01] loss=3.04 avg=3.22
[19 | 46.20] loss=3.09 avg=3.22
[20 | 48.40] loss=3.12 avg=3.21
[21 | 50.60] loss=3.07 avg=3.20
[22 | 52.81] loss=3.15 avg=3.20
[23 | 55.03] loss=3.00 avg=3.19
[24 | 57.25] loss=3.05 avg=3.19
[25 | 59.48] loss=3.16 avg=3.18
[26 | 61.70] loss=3.15 avg=3.18
[27 | 63.93] loss=3.18 avg=3.18
[28 | 66.17] loss=2.93 avg=3.17
[29 | 68.42] loss=3.07 avg=3.17
[30 | 70.67] loss=3.07 avg=3.16
[31 | 72.92] 

###Text generation

In [6]:
prefix = "Is there a life after death?"

In [7]:
gpt2.generate(sess, prefix=prefix, length=150)

Is there a life after death? What can you say? 
Page 471

You said it would be so. You are the Lord of the Seven Kingdoms, after all. 
Alyn Tully was alone. She had the strength of a bear. She had the strength to bare the dark against the 
dark lords of the north. She could take them all. 
Catelyn gazed at her husband. The wolf of Asshai, the direwolf of the snows. She would marry 
him, and have a son together. Let them both live the rest of their days with the same soft, innocent 
beauty. Let them laugh together, and call each other brother and sister. Let them run together for ever and ever


###Saving model to Google Drive (optional)

In [8]:
from google.colab import drive
drive.mount('/content/drive')

MessageError: ignored

In [None]:
gpt2.copy_checkpoint_to_gdrive(run_name='run1')

You can find more texts e.g. on:
https://www.gutenberg.org/cache/epub/1597/pg1597.txt
</br></br>
You can download them to Colab using code similar to the ones below.

In [None]:
#!wget https://www.gutenberg.org/cache/epub/1597/pg1597.txt

--2023-03-21 14:49:16--  https://www.gutenberg.org/cache/epub/1597/pg1597.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 329071 (321K) [text/plain]
Saving to: ‘pg1597.txt’


2023-03-21 14:49:22 (800 KB/s) - ‘pg1597.txt’ saved [329071/329071]



In [None]:
#!wget https://www.gutenberg.org/files/98/98-0.txt

--2023-02-22 13:25:10--  https://www.gutenberg.org/files/98/98-0.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 807231 (788K) [text/plain]
Saving to: ‘98-0.txt’


2023-02-22 13:25:12 (718 KB/s) - ‘98-0.txt’ saved [807231/807231]



In [None]:
#https://github.com/matt-dray/tng-stardate/tree/master/data/scripts