#  Train a GPT-2 Text-Generating Model w/ GPU For Free 



Retrain an advanced text generating neural network on any text dataset **for free on a GPU using Collaboratory** using `gpt-2-simple`!

For more about `gpt-2-simple`, you can visit [this GitHub repository](https://github.com/minimaxir/gpt-2-simple).



In [0]:
!pip install -q gpt_2_simple
import gpt_2_simple as gpt2
import tensorflow as tf
from datetime import datetime
from google.colab import files

[?25l[K     |▌                               | 10kB 18.5MB/s eta 0:00:01[K     |█                               | 20kB 23.0MB/s eta 0:00:01[K     |█▌                              | 30kB 27.4MB/s eta 0:00:01[K     |██                              | 40kB 4.4MB/s eta 0:00:01[K     |██▌                             | 51kB 5.4MB/s eta 0:00:01[K     |███                             | 61kB 6.3MB/s eta 0:00:01[K     |███▌                            | 71kB 7.2MB/s eta 0:00:01[K     |████                            | 81kB 8.1MB/s eta 0:00:01[K     |████▌                           | 92kB 8.9MB/s eta 0:00:01[K     |█████                           | 102kB 9.7MB/s eta 0:00:01[K     |█████▌                          | 112kB 9.7MB/s eta 0:00:01[K     |██████                          | 122kB 9.7MB/s eta 0:00:01[K     |██████▌                         | 133kB 9.7MB/s eta 0:00:01[K     |███████                         | 143kB 9.7MB/s eta 0:00:01[K     |███████▌                

## Verify GPU

Colaboratory now uses an Nvidia T4 GPU, which is slightly faster than the old Nvidia K80 GPU for training GPT-2, and has more memory allowing you to train the larger GPT-2 models and generate more text. However sometimes the K80 will still be used.

You can verify which GPU is active by running the cell below.

In [0]:
!nvidia-smi

Tue Jul 23 15:38:13 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 410.79       CUDA Version: 10.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P8    32W / 149W |      0MiB / 11441MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

## Downloading GPT-2

If you're retraining a model on new text, you need to download the GPT-2 model first. 

There are two sizes of GPT-2:

* `117M` (default): the "small" model, 500MB on disk.
* `345M`: the "medium" model, 1.5GB on disk.

Larger models have more knowledge, but take longer to finetune and longer to generate text. You can specify which base model to use by changing `model_name` in the cells below.

The next cell downloads it from Google Cloud Storage and saves it in the Colaboratory VM at `/models/<model_name>`.

This model isn't permanently saved in the Colaboratory VM; you'll have to redownload it if you want to retrain it at a later time.

In [0]:
model_name = "345M"
gpt2.download_gpt2(model_name=model_name)

## Mounting Google Drive

The best way to get input text to-be-trained into the Colaboratory VM, and to get the trained model *out* of Colaboratory, is to route it through Google Drive *first*.

Running this cell (which will only work in Colaboratory) will mount your personal Google Drive in the VM, which later cells can use to get data in/out. (it will ask for an auth code; that auth is not saved anywhere)

In [0]:
gpt2.mount_gdrive()

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
from google.colab import drive
drive.mount("/content/drive", force_remount=True)

Mounted at /content/drive


## Uploading a Text File to be Trained to Colaboratory

In the Colaboratory Notebook sidebar on the left of the screen, select *Files*. From there you can upload files:

![alt text](https://i.imgur.com/TGcZT4h.png)

Upload **any smaller text file**  (<10 MB) and update the file name in the cell below, then run the cell.

In [0]:
file_name = 'english_filtered_comments.txt'

If your text file is larger than 10MB, it is recommended to upload that file to Google Drive first, then copy that file from Google Drive to the Colaboratory VM.

In [0]:
gpt2.copy_file_from_gdrive(file_name)


## Finetune GPT-2

The next cell will start the actual finetuning of GPT-2. It creates a persistent TensorFlow session which stores the training config, then runs the training for the specified number of `steps`. (to have the finetuning run indefinitely, set `steps = -1`)

The model checkpoints will be saved in `/checkpoint/run1` by default. The checkpoints are saved every 500 steps (can be changed) and when the cell is stopped.

The training might time out after 4ish hours; make sure you end training and save the results so you don't lose them!

**IMPORTANT NOTE:** If you want to rerun this cell, **restart the VM first** (Runtime -> Restart Runtime). You will need to rerun imports but not recopy files.

Other optional-but-helpful parameters for `gpt2.finetune`:


*  **`restore_from`**: Set to `fresh` to start training from the base GPT-2, or set to `latest` to restart training from an existing checkpoint.
* **`sample_every`**: Number of steps to print example output
* **`print_every`**: Number of steps to print training progress.
* **`learning_rate`**:  Learning rate for the training. (default `1e-4`, can lower to `1e-5` if you have <1MB input data)
*  **`run_name`**: subfolder within `checkpoint` to save the model. This is useful if you want to work with multiple models (will also need to specify  `run_name` when loading the model)

In [0]:
op_file_name = model_name + "english_filtered_comments" 


In [0]:

sess = gpt2.start_tf_sess()
print(op_file_name)
gpt2.finetune(sess,
              dataset=file_name,
              model_name=model_name,
              steps=1000,
              restore_from='fresh',
              run_name=op_file_name,
              print_every=10,
              sample_every=200,
#               batch_size=2,
              save_every=500
              )

W0630 17:42:55.323034 139803404752768 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/gpt_2.py:164: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0630 17:42:55.332010 139803404752768 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/model.py:148: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.



345Menglish_filtered_comments


W0630 17:43:06.638713 139803404752768 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:71: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0630 17:43:06.656453 139803404752768 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:17: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0630 17:43:06.659375 139803404752768 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:77: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
W0630 17:43:06.672140 139803404752768 deprecation_wr

Loading checkpoint models/345M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:43<00:00, 43.18s/it]


dataset has 7799034 tokens
Training...
[10 | 25.57] loss=3.69 avg=3.69
[20 | 41.81] loss=3.38 avg=3.54
[30 | 57.78] loss=3.36 avg=3.48
[40 | 73.55] loss=3.20 avg=3.41
[50 | 89.28] loss=3.44 avg=3.41
[60 | 105.06] loss=3.54 avg=3.43
[70 | 120.95] loss=3.19 avg=3.40
[80 | 136.84] loss=3.44 avg=3.40
[90 | 152.72] loss=3.02 avg=3.36
[100 | 168.56] loss=3.27 avg=3.35
[110 | 184.39] loss=3.79 avg=3.39
[120 | 200.22] loss=3.08 avg=3.36
[130 | 216.05] loss=3.96 avg=3.41
[140 | 231.92] loss=2.74 avg=3.36
[150 | 247.77] loss=3.68 avg=3.38
[160 | 263.64] loss=3.80 avg=3.41
[170 | 279.54] loss=3.31 avg=3.41
[180 | 295.42] loss=3.53 avg=3.41
[190 | 311.34] loss=3.39 avg=3.41
[200 | 327.27] loss=3.27 avg=3.40
 did eat it and I think it was very good for the planet and I am happy to go and grow it again. It does not have the toxins in it as did the green apple and it is healthier. But I do not believe that will ever do any good.
Yes, I do not believe it does harm anything. I believe that only those w

W0630 18:12:50.145867 139803404752768 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.


After the model is trained, you can copy the checkpoint folder to your own Google Drive.

If you want to download it to your personal computer, it's strongly recommended you copy it there first, then download from Google Drive. (NB: if you are downloading the model to your personal computer, download the large model checkpoint file *seperately*, download the other files, and reconstruct the `/checkpoint/run1` folder hierarchy locally).

In [0]:
import os
gpt2.copy_checkpoint_to_gdrive(run_name = op_file_name)

You're done! Feel free to go to the **Generate Text From The Trained Model** section to generate text based on your retrained model.

## Load a Trained Model Checkpoint

Running the next cell will copy the `checkpoint` folder from your Google Drive into the Colaboratory VM.

In [0]:
import os
gpt2.copy_checkpoint_from_gdrive(run_name = op_file_name)    


The next cell will allow you to load the retrained model checkpoint + metadata necessary to generate text.

**IMPORTANT NOTE:** If you want to rerun this cell, **restart the VM first** (Runtime -> Restart Runtime). You will need to rerun imports but not recopy files.

In [0]:
sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, run_name = op_file_name)

Loading checkpoint checkpoint/345Menglish_filtered_comments/model-1000
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from checkpoint/345Menglish_filtered_comments/model-1000


## Generate Text From The Trained Model

After you've trained the model or loaded a retrained model from checkpoint, you can now generate text. `generate` generates a single text from the loaded model.

In [0]:
finetune_file = 'fine_tune_organic_full_positive'
finetune_file_name = finetune_file + '.txt'
gpt2.copy_file_from_gdrive(finetune_file_name)

model_name = "345M"
op_file_name = model_name + "english_filtered_comments" 

import os
gpt2.copy_checkpoint_from_gdrive(run_name = op_file_name)  

sess = gpt2.start_tf_sess()
# gpt2.load_gpt2(sess, run_name = op_file_name)

W0724 06:51:26.422323 140386897622912 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/gpt_2.py:90: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

W0724 06:51:26.423799 140386897622912 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/gpt_2.py:100: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.



In [0]:
finetune_op_file_name = model_name + 'english_' + finetune_file 
print(finetune_op_file_name)

# tf.reset_default_graph()

gpt2.finetune(sess,
              model_name=model_name,
              dataset = finetune_file_name,
              steps=2000,
              restore_from='latest',
              run_name=op_file_name,
              print_every=50,
              sample_every=500,
#               batch_size=2,
              save_every=1000,
              overwrite = True
              )

W0724 06:52:06.595651 140386897622912 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/gpt_2.py:164: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0724 06:52:06.602977 140386897622912 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/model.py:148: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.



345Menglish_fine_tune_organic_full_positive


W0724 06:52:20.893075 140386897622912 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:71: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0724 06:52:20.914369 140386897622912 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:17: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0724 06:52:20.917629 140386897622912 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:77: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
W0724 06:52:20.933140 140386897622912 deprecation_wr

Loading checkpoint checkpoint/345Menglish_filtered_comments/model-1000


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  2.73it/s]


dataset has 30477 tokens
Training...
Saving checkpoint/345Menglish_filtered_comments/model-1000
Saving checkpoint/345Menglish_filtered_comments/model-1000
 of blood.
This has been the mantra with organic labeling ever since the organic industry opened up to the public. The fact that they are actually having to explain to consumers that their own genetically altered ingredients are in any more than one of the five types of foods or beverages we buy from them is ridiculous!
Organic is still genetically modified. Why would we need it? Organic is still an ingredient of pesticides. It is not necessary to make it safer by spraying the air with pesticides.
why not just change the ingredients?
I am sure that organic farmers will tell you and say it is safe for the environment and so on.
I do not know the entire organic label. Some parts will have chemicals in food. So they only buy organic. But most of the food I eat is organic.
This is what I find interesting - not only are the organic foods 

W0724 07:42:59.886262 140386897622912 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.


.
It's an economic question because the short answer is an economic one: Yes, we have enough resources and land to feed everyone with organic food, but it would require a larger shift of resources toward organic food, most notably in the form of labor.
Water Quality, Quantity and Corn Flakes The key to successful organic farming is in working with nature and the soils you are given.
Commodity Costs and Returns Organic farms produce same yields as conventional farms Cornell University Agricultural Experiment Station Soybean Yield Under Conventional and Organic Cropping Systems with Recommended and High Inputs During the Transition Year to Organic Wheat Emergence, Early Plant Populations, and Weed Densities Following Soybeans in Conventional and Organic Cropping Systems However, the crux of the matter is eliminating the amount of waste we currently generate with all food across the world.
Organic Teas provide several health benefits to people of all age groups, thanks to their naturally 

In [0]:
# gpt2.copy_checkpoint_to_gdrive(run_name = finetune_op_file_name)
gpt2.copy_checkpoint_to_gdrive(run_name = op_file_name)

In [0]:
finetune_file = 'fine_tune_organic_full_positive'
model_name = "345M"

finetune_op_file_name = model_name + 'english_' + finetune_file 
op_file_name = '345Menglish_filtered_comments'

import os
gpt2.copy_checkpoint_from_gdrive(run_name = finetune_op_file_name)  

sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, run_name = op_file_name)



W0724 08:42:47.842337 140192218957696 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/gpt_2.py:90: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

W0724 08:42:47.844617 140192218957696 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/gpt_2.py:100: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

W0724 08:42:50.263040 140192218957696 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/gpt_2.py:340: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0724 08:42:50.271141 140192218957696 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/model.py:148: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W0724 08:42:56.578408 140192218957696 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/g

Loading checkpoint checkpoint/345Menglish_filtered_comments/model-3000


In [0]:
import pandas as pd
belief_pd = pd.read_excel('drive/My Drive/Belief_statements_output.xlsx')
col_op = 'Output_'+finetune_file
belief_pd[col_op] = ''

In [0]:
for index, row in belief_pd.iterrows():
    print(index)
    a = gpt2.generate(sess, run_name=op_file_name,  length=150, return_as_list = True, temperature = 0.9, top_k = 40,  truncate="", include_prefix = True, prefix = row['Beliefs'])
    belief_pd.at[index,col_op] = '.'.join(a[0].split(".")[:-1]) + '.'

belief_pd.to_excel('drive/My Drive/Belief_statements_output.xlsx', index = None, header=True, encoding='utf-8-sig')

0


W0724 08:43:54.955510 140192218957696 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:71: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0724 08:43:54.976052 140192218957696 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:17: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0724 08:43:54.979749 140192218957696 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:77: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65


In [0]:
finetune_file = 'fine_tune_organic_products_positive'
finetune_file_name = finetune_file + '.txt'
gpt2.copy_file_from_gdrive(finetune_file_name)

model_name = "345M"
op_file_name = model_name + "english_filtered_comments" 
finetune_op_file_name = model_name + 'english_' + finetune_file 

import os
# gpt2.copy_checkpoint_from_gdrive(run_name = finetune_op_file_name) 
gpt2.copy_checkpoint_from_gdrive(run_name = op_file_name)  

sess = gpt2.start_tf_sess()
# gpt2.load_gpt2(sess, run_name = op_file_name)

print(finetune_op_file_name)

# tf.reset_default_graph()

gpt2.finetune(sess,
              model_name=model_name,
              dataset = finetune_file_name,
              steps=2000,
              restore_from='latest',
              run_name=op_file_name,
              print_every=50,
              sample_every=500,
#               batch_size=2,
              save_every=1000,
              overwrite = True
              )

# gpt2.copy_checkpoint_to_gdrive(run_name = finetune_op_file_name)
gpt2.copy_checkpoint_to_gdrive(run_name = op_file_name)

W0724 21:15:11.366263 140521546360704 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/gpt_2.py:90: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

W0724 21:15:11.368367 140521546360704 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/gpt_2.py:100: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

W0724 21:15:14.444474 140521546360704 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/gpt_2.py:164: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0724 21:15:14.452155 140521546360704 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/model.py:148: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.



345Menglish_fine_tune_organic_products_positive


W0724 21:15:26.423186 140521546360704 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:71: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0724 21:15:26.442198 140521546360704 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:17: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0724 21:15:26.445417 140521546360704 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:77: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.
W0724 21:15:26.458451 140521546360704 deprecation_wr

Loading checkpoint checkpoint/345Menglish_filtered_comments/model-1000


100%|██████████| 1/1 [00:00<00:00,  5.67it/s]

Loading dataset...
dataset has 15233 tokens
Training...





Saving checkpoint/345Menglish_filtered_comments/model-1000
Saving checkpoint/345Menglish_filtered_comments/model-1000
 human term of 'love' which is a lie . I am talking about a lie that is been carried on by big business all along, and will not be stopped until we destroy this entire world.
There seems to be this tendency to go for the obvious reasons and be for it. It is not worth a lot of money for small businesses, who have to be financially dependent on large ones to do business. You never really know what you are buying till you get inside of a chain of profit, whether that chain has health, safety, or other conditions for their employees that would not meet for smaller businesses. That is a risk one must take. The reason for their need to be a lot less expensive is not for any reason of profit. All profits are for profit. So the same reason they have to be so expensive, is because they are not going to make enough money, while they are spending money to do it, because there are 

W0724 21:44:42.225912 140521546360704 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.


 be a good choice as it’s non-processed, and free from allergens.
Plus, gourmet food is rarely sold locally, which can make growing your own food extremely difficult, so you'll end up buy most of your veggies organic.
Not only will your veggies be cheaper, they will also be more delicious.
I have decided that I will live with organic food as the lesser evil.
By avoiding it as much as possible I hope that I live a healthier life.
Organic food may not be the best, but it is all we have left under the USDA.
But I would pay extra for an organic chicken.
Product taste, concerns for the environment and the desire to avoid foods from genetically engineered organisms are among the many other reasons some consumers prefer to buy organic food products.
Many French wines from Burgundy are made according to biodynamique methods, which goes way beyond organic standards.
However, many new brands are offering quality organic products.
To the some extent you can trust the organic certification.
Emphas

In [0]:
finetune_file = 'fine_tune_organic_products_positive'
model_name = "345M"

finetune_op_file_name = model_name + 'english_' + finetune_file 
op_file_name = '345Menglish_filtered_comments'

import os
gpt2.copy_checkpoint_from_gdrive(run_name = finetune_op_file_name)  

sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, run_name = op_file_name)

import pandas as pd
belief_pd = pd.read_excel('drive/My Drive/Belief_statements_output.xlsx')
col_op = 'Output_'+finetune_file
belief_pd[col_op] = ''

for index, row in belief_pd.iterrows():
    print(index)
    a = gpt2.generate(sess, run_name=op_file_name,  length=150, return_as_list = True, temperature = 0.9, top_k = 40,  truncate="", include_prefix = True, prefix = row['Beliefs'])
    belief_pd.at[index,col_op] = '.'.join(a[0].split(".")[:-1]) + '.'

belief_pd.to_excel('drive/My Drive/Belief_statements_output.xlsx', index = None, header=True, encoding='utf-8-sig')



W0725 04:23:46.227291 140506670434176 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/gpt_2.py:90: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

W0725 04:23:46.229021 140506670434176 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/gpt_2.py:100: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

W0725 04:23:49.004920 140506670434176 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/gpt_2.py:340: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0725 04:23:49.011964 140506670434176 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/model.py:148: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W0725 04:23:54.137252 140506670434176 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/g

Loading checkpoint checkpoint/345Menglish_filtered_comments/model-3000
0


W0725 04:24:05.644512 140506670434176 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:71: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0725 04:24:05.660219 140506670434176 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:17: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0725 04:24:05.662907 140506670434176 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/sample.py:77: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.random.categorical` instead.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65


In [0]:
finetune_file = 'fine_tune_organic_products_positive'
model_name = "345M"

finetune_op_file_name = model_name + 'english_' + finetune_file 
op_file_name = '345Menglish_filtered_comments'

import os
gpt2.copy_checkpoint_from_gdrive(run_name = finetune_op_file_name)  

sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, run_name = op_file_name)

import pandas as pd
belief_pd = pd.read_excel('drive/My Drive/Belief_statements.xlsx')
Output_df = pd.DataFrame(columns=['Output','Label'])

i = Output_df.shape[0]
for index, row in belief_pd.iterrows():
    print(index, i)
    a = gpt2.generate(sess, run_name=op_file_name,  length=150, return_as_list = True, nsamples = 10, temperature = 0.9, top_k = 40,  truncate="", include_prefix = True, prefix = row['Beliefs'])
    for j in range(len(a)): 
      Output_df.at[i,'Output'] = '.'.join(a[j].split(".")[:-1]) + '.'
      Output_df.at[i,'Label'] = 'p'
      i = i + 1
      

Output_df.to_excel('drive/My Drive/GPT2_statements_output.xlsx', index = None, header=True, encoding='utf-8-sig')



Loading checkpoint checkpoint/345Menglish_filtered_comments/model-3000
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from checkpoint/345Menglish_filtered_comments/model-3000
0 0
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
1 10
2 20
3 30
4 40
5 50
6 60
7 70
8 80
9 90
10 100
11 110
12 120
13 130
14 140
15 150
16 160
17 170
18 180
19 190
20 200
21 210
22 220
23 230
24 240
25 250
26 260
27 270
28 280
29 290
30 300
31 310
32 320
33 330
34 340
35 350
36 360
37 370
38 380
39 390
40 400
41 410
42 420
43 430
44 440
45 450
46 460
47 470
48 480
49 490
50 500
51 510
52 520
53 530
54 540
55 550
56 560
57 570
58 580
59 590
60 600
61 610
62 620
63 630
64 640
65 650


In [0]:
finetune_file = 'fine_tune_organic_products_negative'
model_name = "345M"

finetune_op_file_name = model_name + 'english_' + finetune_file 
op_file_name = '345Menglish_filtered_comments'

import os
gpt2.copy_checkpoint_from_gdrive(run_name = finetune_op_file_name)  

sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, run_name = op_file_name)

import pandas as pd
belief_pd = pd.read_excel('drive/My Drive/Belief_statements.xlsx')
Output_df = pd.read_excel('drive/My Drive/GPT2_statements_output.xlsx')
# Output_df = pd.DataFrame(columns=['Output','Label'])

i = Output_df.shape[0]
for index, row in belief_pd.iterrows():
    print(index, i)
    a = gpt2.generate(sess, run_name=op_file_name,  length=150, return_as_list = True, nsamples = 10, temperature = 0.9, top_k = 40,  truncate="", include_prefix = True, prefix = row['Beliefs'])
    for j in range(len(a)): 
      Output_df.at[i,'Output'] = '.'.join(a[j].split(".")[:-1]) + '.'
      Output_df.at[i,'Label'] = 'n'
      i = i + 1
      

Output_df.to_excel('drive/My Drive/GPT2_statements_output.xlsx', index = None, header=True, encoding='utf-8-sig')

Loading checkpoint checkpoint/345Menglish_filtered_comments/model-3000
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from checkpoint/345Menglish_filtered_comments/model-3000
0 660
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
1 670
2 680
3 690
4 700
5 710
6 720
7 730
8 740
9 750
10 760
11 770
12 780
13 790
14 800
15 810
16 820
17 830
18 840
19 850
20 860
21 870
22 880
23 890
24 900
25 910
26 920
27 930
28 940
29 950
30 960
31 970
32 980
33 990
34 1000
35 1010
36 1020
37 1030
38 1040
39 1050
40 1060
41 1070
42 1080
43 1090
44 1100
45 1110
46 1120
47 1130
48 1140
49 1150
50 1160
51 1170
52 1180
53 1190
54 1200
55 1210
56 1220
57 1230
58 1240
59 1250
60 1260
61 1270
62 1280
63 1290
64 1300
65 1310


In [0]:
finetune_file = 'fine_tune_organic_products_neutral'
model_name = "345M"

finetune_op_file_name = model_name + 'english_' + finetune_file 
op_file_name = '345Menglish_filtered_comments'

import os
gpt2.copy_checkpoint_from_gdrive(run_name = finetune_op_file_name)  

sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, run_name = op_file_name)

import pandas as pd
belief_pd = pd.read_excel('drive/My Drive/Belief_statements.xlsx')
Output_df = pd.read_excel('drive/My Drive/GPT2_statements_output.xlsx')
# Output_df = pd.DataFrame(columns=['Output','Label'])

i = Output_df.shape[0]
for index, row in belief_pd.iterrows():
    print(index, i)
    a = gpt2.generate(sess, run_name=op_file_name,  length=150, return_as_list = True, nsamples = 10, temperature = 0.9, top_k = 40,  truncate="", include_prefix = True, prefix = row['Beliefs'])
    for j in range(len(a)): 
      Output_df.at[i,'Output'] = '.'.join(a[j].split(".")[:-1]) + '.'
      Output_df.at[i,'Label'] = '0'
      i = i + 1
      

Output_df.to_excel('drive/My Drive/GPT2_statements_output.xlsx', index = None, header=True, encoding='utf-8-sig')

Loading checkpoint checkpoint/345Menglish_filtered_comments/model-3000
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from checkpoint/345Menglish_filtered_comments/model-3000
0 1320
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
1 1330
2 1340
3 1350
4 1360
5 1370
6 1380
7 1390
8 1400
9 1410
10 1420
11 1430
12 1440
13 1450
14 1460
15 1470
16 1480
17 1490
18 1500
19 1510
20 1520
21 1530
22 1540
23 1550
24 1560
25 1570
26 1580
27 1590
28 1600
29 1610
30 1620
31 1630
32 1640
33 1650
34 1660
35 1670
36 1680
37 1690
38 1700
39 1710
40 1720
41 1730
42 1740
43 1750
44 1760
45 1770
46 1780
47 1790
48 1800
49 1810
50 1820
51 1830
52 1840
53 1850
54 1860
55 1870
56 1880
57 1890
58 1900
59 1910
60 1920
61 1930
62 1940
63 1950
64 1960
65 1970


In [0]:
from sklearn.model_selection import train_test_split
trainingSet, testSet = train_test_split(Output_df, test_size=0.2)


In [0]:
print(trainingSet.shape[0], testSet.shape[0])

1584 396
