#  Train a GPT-2 Text-Generating Model w/ GPU For Free 

by [Max Woolf](http://minimaxir.com)

*Last updated: November 10th, 2019*

Retrain an advanced text generating neural network on any text dataset **for free on a GPU using Collaboratory** using `gpt-2-simple`!

For more about `gpt-2-simple`, you can visit [this GitHub repository](https://github.com/minimaxir/gpt-2-simple). You can also read my [blog post](https://minimaxir.com/2019/09/howto-gpt2/) for more information how to use this notebook!


To get started:

1. Copy this notebook to your Google Drive to keep it and save your changes. (File -> Save a Copy in Drive)
2. Make sure you're running the notebook in Google Chrome.
3. Run the cells below:


In [None]:
%tensorflow_version 1.x
!pip install -q gpt-2-simple
import gpt_2_simple as gpt2
from datetime import datetime
from google.colab import files

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



## GPU

Colaboratory uses either a Nvidia T4 GPU or an Nvidia K80 GPU. The T4 is slightly faster than the old K80 for training GPT-2, and has more memory allowing you to train the larger GPT-2 models and generate more text.

You can verify which GPU is active by running the cell below.

In [None]:
!nvidia-smi

Sun Mar  1 22:50:20 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.48.02    Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   35C    P8     9W /  70W |      0MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|  No ru

## Downloading GPT-2

If you're retraining a model on new text, you need to download the GPT-2 model first. 

There are three released sizes of GPT-2:

* `124M` (default): the "small" model, 500MB on disk.
* `355M`: the "medium" model, 1.5GB on disk.
* `774M`: the "large" model, cannot currently be finetuned with Colaboratory but can be used to generate text from the pretrained model (see later in Notebook)
* `1558M`: the "extra large", true model. Will not work if a K80 GPU is attached to the notebook. (like `774M`, it cannot be finetuned).

Larger models have more knowledge, but take longer to finetune and longer to generate text. You can specify which base model to use by changing `model_name` in the cells below.

The next cell downloads it from Google Cloud Storage and saves it in the Colaboratory VM at `/models/<model_name>`.

This model isn't permanently saved in the Colaboratory VM; you'll have to redownload it if you want to retrain it at a later time.

In [None]:
gpt2.download_gpt2(model_name="124M")

Fetching checkpoint: 1.05Mit [00:00, 425Mit/s]                                                      
Fetching encoder.json: 1.05Mit [00:00, 118Mit/s]                                                    
Fetching hparams.json: 1.05Mit [00:00, 698Mit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 498Mit [00:02, 208Mit/s]                                   
Fetching model.ckpt.index: 1.05Mit [00:00, 263Mit/s]                                                
Fetching model.ckpt.meta: 1.05Mit [00:00, 153Mit/s]                                                 
Fetching vocab.bpe: 1.05Mit [00:00, 159Mit/s]                                                       


## Mounting Google Drive

The best way to get input text to-be-trained into the Colaboratory VM, and to get the trained model *out* of Colaboratory, is to route it through Google Drive *first*.

Running this cell (which will only work in Colaboratory) will mount your personal Google Drive in the VM, which later cells can use to get data in/out. (it will ask for an auth code; that auth is not saved anywhere)

In [None]:
gpt2.mount_gdrive()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Uploading a Text File to be Trained to Colaboratory

In the Colaboratory Notebook sidebar on the left of the screen, select *Files*. From there you can upload files:

![alt text](https://i.imgur.com/TGcZT4h.png)

Upload **any smaller text file**  (<10 MB) and update the file name in the cell below, then run the cell.

In [None]:
file_name = "abstracts_2020.txt"

#/content/drive/My Drive/Colab/arxivData_summary.txt
#/content/drive/My Drive/Colab/abstracts_2000.txt

In [None]:
cd /content/drive/My Drive/Colab

/content/drive/My Drive/Colab


In [None]:
import os
os.chdir("/content/drive/My Drive/Colab")

In [None]:
ls

abstracts_2000.txt  arxivData_summary.txt


If your text file is larger than 10MB, it is recommended to upload that file to Google Drive first, then copy that file from Google Drive to the Colaboratory VM.

In [None]:
gpt2.copy_file_from_gdrive(file_name)

## Finetune GPT-2

The next cell will start the actual finetuning of GPT-2. It creates a persistent TensorFlow session which stores the training config, then runs the training for the specified number of `steps`. (to have the finetuning run indefinitely, set `steps = -1`)

The model checkpoints will be saved in `/checkpoint/run1` by default. The checkpoints are saved every 500 steps (can be changed) and when the cell is stopped.

The training might time out after 4ish hours; make sure you end training and save the results so you don't lose them!

**IMPORTANT NOTE:** If you want to rerun this cell, **restart the VM first** (Runtime -> Restart Runtime). You will need to rerun imports but not recopy files.

Other optional-but-helpful parameters for `gpt2.finetune`:


*  **`restore_from`**: Set to `fresh` to start training from the base GPT-2, or set to `latest` to restart training from an existing checkpoint.
* **`sample_every`**: Number of steps to print example output
* **`print_every`**: Number of steps to print training progress.
* **`learning_rate`**:  Learning rate for the training. (default `1e-4`, can lower to `1e-5` if you have <1MB input data)
*  **`run_name`**: subfolder within `checkpoint` to save the model. This is useful if you want to work with multiple models (will also need to specify  `run_name` when loading the model)
* **`overwrite`**: Set to `True` if you want to continue finetuning an existing model (w/ `restore_from='latest'`) without creating duplicate copies. 

In [None]:
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              dataset=file_name,
              model_name='124M',
              steps=1000,
              restore_from='fresh',
              run_name='run2020',
              print_every=10,
              sample_every=200,
              save_every=500
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [03:29<00:00, 209.10s/it]


dataset has 39385529 tokens
Training...
[10 | 29.06] loss=2.77 avg=2.77
[20 | 50.53] loss=2.93 avg=2.85
[30 | 72.15] loss=2.96 avg=2.89
[40 | 93.98] loss=2.91 avg=2.89
[50 | 116.10] loss=2.85 avg=2.88
[60 | 138.16] loss=2.89 avg=2.88
[70 | 160.24] loss=2.70 avg=2.86
[80 | 182.41] loss=2.77 avg=2.85
[90 | 204.73] loss=2.72 avg=2.83
[100 | 227.07] loss=2.66 avg=2.81
[110 | 249.43] loss=2.79 avg=2.81
[120 | 271.83] loss=2.98 avg=2.83
[130 | 294.30] loss=2.81 avg=2.82
[140 | 316.79] loss=2.69 avg=2.81
[150 | 339.30] loss=2.83 avg=2.82
[160 | 361.80] loss=2.64 avg=2.80
[170 | 384.35] loss=2.84 avg=2.81
[180 | 407.04] loss=2.83 avg=2.81
[190 | 430.21] loss=2.95 avg=2.82
[200 | 453.02] loss=2.67 avg=2.81
 UA2(p/C1) and  MDA-SAR(p/C2). The primary findings are that both N-methylnitrosodiamin (N NMDAR) and MDA-SAR(p/C1) bind to N-methylnitrosodiamin at the P21-N-methylnitrosodiamin binding site. However, no evidence of reduced N-methylnitrosodiamin binding to MDA-SAR(p/C1) has been published. O

After the model is trained, you can copy the checkpoint folder to your own Google Drive.

If you want to download it to your personal computer, it's strongly recommended you copy it there first, then download from Google Drive. The checkpoint folder is copied as a `.rar` compressed file; you can download it and uncompress it locally.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
cd /content/drive/My Drive


/content/drive/My Drive


In [None]:
cd Colab

/content/drive/My Drive/Colab


In [None]:
gpt2.copy_checkpoint_to_gdrive(run_name='run2020')

You're done! Feel free to go to the **Generate Text From The Trained Model** section to generate text based on your retrained model.

## Load a Trained Model Checkpoint

Running the next cell will copy the `.rar` checkpoint file from your Google Drive into the Colaboratory VM.

In [None]:
gpt2.copy_checkpoint_from_gdrive(run_name='run2000')

The next cell will allow you to load the retrained model checkpoint + metadata necessary to generate text.

**IMPORTANT NOTE:** If you want to rerun this cell, **restart the VM first** (Runtime -> Restart Runtime). You will need to rerun imports but not recopy files.

In [None]:
sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, run_name='run2020')

Loading checkpoint checkpoint/run2020/model-1000
INFO:tensorflow:Restoring parameters from checkpoint/run2020/model-1000


## Generate Text From The Trained Model

After you've trained the model or loaded a retrained model from checkpoint, you can now generate text. `generate` generates a single text from the loaded model.

In [None]:
gpt2.generate(sess, run_name='run2000')

We have previously reported that in all of the human primary human neuropathology, normal aging was seen in a "normal" age group. Normal aging was found in a very elderly population, as is found in Alzheimer's disease and in postmortem neuropathology. The above results are not specific to Alzheimer's disease or senile dementia and will have to be tested in a variety of other neurological disorders in a variety of different cultures.
The present study examined the relation of serum levels of apolipoprotein E, apolipoprotein B, apolipoprotein E polymorphism and apolipoprotein E genotype in the non-depressed elderly. The HDL-cholesterol level, apolipoprotein E, apolipoprotein E genotype and apolipoprotein E genotype were significantly reduced in the elderly compared to those in the elderly without Alzheimer's disease. However, there was no significant correlation between  apolipoprotein E genotype and apolipoprotein E genotype as assessed by the apolipoprotein E genotype method. Allelic a

If you're creating an API based on your model and need to pass the generated text elsewhere, you can do `text = gpt2.generate(sess, return_as_list=True)[0]`

You can also pass in a `prefix` to the generate function to force the text to start with a given character sequence and generate text from there (good if you add an indicator when the text starts).

You can also generate multiple texts at a time by specifing `nsamples`. Unique to GPT-2, you can pass a `batch_size` to generate multiple samples in parallel, giving a massive speedup (in Colaboratory, set a maximum of 20 for `batch_size`).

Other optional-but-helpful parameters for `gpt2.generate` and friends:

*  **`length`**: Number of tokens to generate (default 1023, the maximum)
* **`temperature`**: The higher the temperature, the crazier the text (default 0.7, recommended to keep between 0.7 and 1.0)
* **`top_k`**: Limits the generated guesses to the top *k* guesses (default 0 which disables the behavior; if the generated output is super crazy, you may want to set `top_k=40`)
* **`top_p`**: Nucleus sampling: limits the generated guesses to a cumulative probability. (gets good results on a dataset with `top_p=0.9`)
* **`truncate`**: Truncates the input text until a given sequence, excluding that sequence (e.g. if `truncate='<|endoftext|>'`, the returned text will include everything before the first `<|endoftext|>`). It may be useful to combine this with a smaller `length` if the input texts are short.
*  **`include_prefix`**: If using `truncate` and `include_prefix=False`, the specified `prefix` will not be included in the returned text.

In [None]:
gpt2.generate(sess, run_name='run2000',
              length=250,
              temperature=0.7,
              prefix="Alzheimer's disease is caused by",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease is caused by a highly selective neurodegenerative process. Therefore, the mechanism of amyloid deposition and progression  is a multifaceted, multi-system biology.
The effect of the apolipoprotein E4 receptor antagonist, apolipoprotein E4, on the activity of an immunohistochemical assay for the carboxyl-terminal domain of the apolipoprotein E4 receptor was examined in postmortem human brain. The presence of a high concentration of the apolipoprotein E4 receptor was also observed in the cerebral cortex. There was a striking effect on the level of the apolipoprotein E4 receptor binding to the apolipoprotein E4 receptor. In contrast, the binding of the apolipoprotein E4 receptor was not altered in the cerebral cortex in the frontal cortex or in the temporal cortex. These findings suggest that apolipoprotein E4 receptor activation is an important part of the apolipoprotein E4 receptor system.
The amiridin-responsive protein family was identified from the parietal and te

In [None]:
gpt2.generate(sess, run_name='run2000',
              length=50,
              temperature=0.7,
              prefix="Alzheimer's disease is caused by",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease is caused by a genetic mutation that results in a mutation that causes the presenilin-1 (PS1) gene to encode an APP fragment which is expressed in the brains of Alzheimer's disease patients. In this study, we demonstrate that the PS1 gene
Alzheimer's disease is caused by mutations of the genes encode for the beta-amyloid precursor protein (APP) gene located on chromosome 21. The mutations are also responsible for the  common form of the disease.
The neuropathologic features of the Alzheimer's disease (AD
Alzheimer's disease is caused by the accumulation of beta-amyloid peptide in the brains of individuals with Alzheimer's disease. A cytokine response to protect against this beta-amyloid peptide accumulation may be required for the onset of the disease.
The hypothalamic
Alzheimer's disease is caused by a genetic defect in the amyloid precursor protein (APP). APP is thought to be defective in humans because of its early activity. To date, the APP gene has been reporte

In [None]:
gpt2.generate(sess, run_name='run2010',
              length=250,
              temperature=0.7,
              prefix="Alzheimer's disease is caused by",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease is caused by mutations in the presenilin-1 gene. We have shown previously that the presenilin 1 gene is present in the brain of a transgenic mouse expressing AD-associated amyloid beta (Abeta) plaques. We have now shown that the presenilin 1 gene is associated with neurotrophic  and neuroprotective effects in  a transgenic mouse model of AD. We have also shown that presenilin 1 gene expression is associated with a reduction in the levels of cytokines and proinflammatory cytokines in AD brains. Our results provide a novel mechanism by which presenilin 1 gene expression might mediate neurotrophic and neuroprotective effects of Abeta peptide.
Glycogen synthase kinase-3beta (GSK-3beta) is a member of the putative primary cell-associated protein kinase that cleaves and aggregates the paired helical filaments (PHF) related to Alzheimer's disease. While GSK-3beta has been shown to be a member of the so-called filamentous protein kinase family, the central role of GSK-3beta

In [None]:
gpt2.generate(sess, run_name='run2010',
              length=50,
              temperature=0.7,
              prefix="Alzheimer's disease is caused by",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease is caused by a combination of genetic, environmental, and medical factors. Here, we review the current state of the knowledge regarding the etiology and pathogenesis of AD.
Alzheimer's disease (AD) is a progressive disorder that is characterized by progressive and
Alzheimer's disease is caused by vitamin B12 deficiency and vitamin B-12 deficiency exacerbated many  neurodegenerative disorders including Alzheimer's disease. The vitamin B-12 deficiency is an  integral part of the production of oxidative stress and the maintenance of the brain. Vitamin B-
Alzheimer's disease is caused by abnormal processing of amyloid precursor protein (APP), a key component of the amyloid plaques found in the brain, in which the beta-amyloid (Abeta) peptide  has recently been detected to be the primary pathological
Alzheimer's disease is caused by a complex neurodegenerative process that includes  the conversion of beta-amyloid (Abeta) to soluble Abeta (Abeta(1-40)]. The two peptide fr

In [None]:
gpt2.generate(sess, run_name='run2020',
              length=250,
              temperature=0.7,
              prefix="Alzheimer's disease is caused by",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease is caused by an imbalance between  synaptic and cholinergic precursors. We have previously shown that the density of the cholinergic precursors is reduced in the hippocampus of mice suffering from AD (p = 0.04). Here, we investigated the effects of apolipoprotein E (apoE) on the hippocampal cholinergic precursors by binding them with the PG-2A2-R yamikit, an apolipoprotein E (apoE)-deficient mouse model. The interaction of apoE with the PG-2A2-R yamikit reduced the volume of the hippocampal cholinergic precursors (p = 0.04) and the  dentate gyrus (p = 0.04) in the mouse model, and induced a decrease in the density of the hippocampal cholinergic precursors (p < 0.05). These results revealed that apoE-mediated cholinergic dysfunction in the hippocampus is caused by an imbalance between the cholinergic precursors and the PG-2A2-R yamikit. In addition, apoE-mediated cholinergic dysfunction in the hippocampal cholinergic precurs
Alzheimer's disease is caused by the accum

In [None]:
gpt2.generate(sess, run_name='run2020',
              length=50,
              temperature=0.7,
              prefix="Alzheimer's disease is caused by",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease is caused by an accumulation of amyloid-beta protein. A chaperone identified by the first gene (SC18B) is required for the clearance of amyloid-beta, and is now the preferred chaperone for amyloid-beta
Alzheimer's disease is caused by  mutations in the APP gene. We present the results of a  systematic review and an analysis of the literature on the pathomechanism of Alzheimer's disease, focusing on the role of the APP gene in the pathogenesis of the disease. We
Alzheimer's disease is caused by the growing burden of neurodegenerative and inflammatory diseases. This article  discusses the current state of knowledge in the field of neuroinflammation and its potential therapeutic implications.
BACKGROUND: The mechanisms that regulate the polymerization of amyloid beta
Alzheimer's disease is caused by mutations in the amyloid precursor protein (APP) gene. The disease can be classified into two main types: sporadic, and rarer. The sporadic cases are those with a high risk

In [None]:
gpt2.generate(sess, run_name='run2000',
              length=250,
              temperature=0.7,
              prefix="Alzheimer's disease should be treated with",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease should be treated with cholinesterase inhibitors.
1. The ability of a cholinergic factor, choline acetyltransferase (ChAT), to treat Alzheimer's disease (AD), is unclear. 2. The cholinergic hypothesis has been based on the hypothesis that cholinergic mechanisms of action are intact in AD and,  in addition to their function in the normal aging process, the hypothesis that cholinergic mechanisms of action in AD are intact. 3. A number of studies have shown that a cholinergic neurotransmitter, choline acetyltransferase (ChAT), can reverse the destruction of acetylcholine (ACh), a cholinergic neurotransmitter. 4. Recently, choline acetyltransferase (ChAT) inhibitors, eicosapentaenoic acid (EPA), can have an effect on acetylcholine release from plasma or from cholinergic neurons of the central nervous system (CNS). 5. The role of ChAT in the prevention, treatment, and prevention of AD is unclear. 6. To clarify whether ChAT is an effective treatment  for AD, we have analy

In [None]:
gpt2.generate(sess, run_name='run2000',
              length=50,
              temperature=0.7,
              prefix="Alzheimer's disease should be treated with",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease should be treated with low doses of tacrine, but at higher doses.
We report a case of A beta-pleated sheet-like structures in the anterior part of the hippocampus of a patient with Alzheimer's disease, who had a severe memory disturbance and memory loss
Alzheimer's disease should be treated with antipiracy, as it has been successfully shown without it to worsen the symptoms of Alzheimer's disease. The treatment of Alzheimer's disease is more complicated when the disease is not treated with antipiracy and when the disease is treated with antipiracy
Alzheimer's disease should be treated with the drug ABT, which has been shown to affect cognition, memory and intellectual function.
Abnormal gene expression in Alzheimer's disease has been found to be marked in the cerebellum, which is characterized by a high level of the apol
Alzheimer's disease should be treated with a combination of the multisystemic approach and the  multidisciplinary approach.
The beta-amyloid precur

In [None]:
gpt2.generate(sess, run_name='run2010',
              length=250,
              temperature=0.7,
              prefix="Alzheimer's disease should be treated with",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease should be treated with anti-inflammatory drug, ibuprofen, which improves the effects of ibuprofen on the inflammatory response. The compound ibuprofen is also found to reduce the rate of  neuronal apoptosis in Alzheimer's disease and to increase the survival of microglia cells.
The authors report the development of a novel, novel, active antihealing agent that may be useful as a means of blocking the induction of apoptosis in the brain. The drug combination (ICR) A2805 and A2806 are able to inhibit the induction of neuronal apoptosis in patients with the Alzheimer's disease (AD). This Anti-apoptotic Antibody has been demonstrated to be effective as a treatment for AD and other neurodegenerative disorders.
An investigation of cytochrome P450 (CYP450) related molecules was performed in the present study. The results show that CYP450 is positively related to Alzheimer's disease (AD) and that the C-terminal fragment C695 is related to the disorder. These results suggest

In [None]:
gpt2.generate(sess, run_name='run2010',
              length=50,
              temperature=0.7,
              prefix="Alzheimer's disease should be treated with",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease should be treated with cerebrovascular administration of purer acetylcholine. The clinical and biochemical characteristics of Alzheimer's disease should be improved, for various indications in Alzheimer's disease (AD) the action of acetylcholine is mediated by activation of cholin
Alzheimer's disease should be treated with caution when the drug is being used for Alzheimer's disease.
We examined the association between the Apolipoprotein E genotype and cerebrovascular disease in a large prospective study of elderly Chinese. A total of 400 participants participated in the study
Alzheimer's disease should be treated with caution, because it is believed to be a progressive disease that is likely to progress as slowly as Alzheimer's disease. This review discusses the main problems first reported by the authors and summarizes the potential use of specific therapeutic agents for Alzheimer's disease.
The
Alzheimer's disease should be treated with a combination of antipsycho

In [None]:
gpt2.generate(sess, run_name='run2020',
              length=250,
              temperature=0.7,
              prefix="Alzheimer's disease should be treated with",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease should be treated with the same therapeutic strategies as for other dementias.
The amyloid-beta (Abeta) peptide, which is used to synthesize the amyloid precursor protein, is one of the major drivers of neurodegenerative diseases, including Alzheimer's disease. Here, we report that the amyloid-beta peptide exerts its beneficial effects by activating the transcription of the gene, and by enhancing its expression with a transcriptional response that becomes highly expressed. This transcriptional response affects the expression of several genes, including the protein phosphatase A, the protein phosphatase C, the protein phosphatase CII, the protein phosphatase B, the protein phosphatase CII, the protein phosphatase A and the protein phosphatase C. In  addition, the transcription of the gene in neuronal and glial cells, as well as the expression of the gene in the neuropathological stably transfected rat model of Alzheimer's disease, is also inhibited by the peptide. In

In [None]:
gpt2.generate(sess, run_name='run2020',
              length=50,
              temperature=0.7,
              prefix="Alzheimer's disease should be treated with",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease should be treated with a VPAGA monoclonal antibody. The two next steps are: (1) to determine the clinical efficacy of a monoclonal antibody to a monoclonal antibody of 4,5-dihydroxyvitamin D
Alzheimer's disease should be treated with a combination of symptomatic anti-Alzheimer's disease drugs and a non-Alzheimer's disease drug combined with a single anti-Alzheimer's drug.
BACKGROUND: In this retrospective study, we aimed to determine the prevalence
Alzheimer's disease should be treated with an agent of the "silent killer," and the first step of the treatment should be the treatment of the disease.
Over the past few years,  research has explored a variety of neurodegenerative diseases. In the 1960s, the
Alzheimer's disease should be treated with haloperidol with no side effects.
The loss of white matter (WM) in the hippocampus is a prominent but not necessarily well-understood aspect of Alzheimer's disease (AD). We have investigated the relationship between the volum

In [None]:
gpt2.generate(sess, run_name='run2000',
              length=250,
              temperature=0.7,
              prefix="Alzheimer's disease is characterized by",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease is characterized by the accumulation of beta-amyloid peptide (beta-AP) throughout the brain. In the transgenic model, the A beta peptide amyloid precursor protein is localized in the Golgi apparatus in the brainstem, and the Golgi apparatus appears to be  a central control system for the generation of APP. In GBS, the density of soluble APP was determined in the hippocampus, but did not differ by the APP genotype or the APP genotype. In the transgenic mouse model, the density of APP was found in the cerebral cortex of the transgenic mice, and the density of APP was found in the hippocampus. In the transgenic mouse model, the density of APP was found in the cerebral cortex of the transgenic mice, and the density of APP was found in the hippocampus. However, the density of APP was not different between transgenic mice and transgenic mice in the hippocampus, and no difference was found between transgenic mice and transgenic mice in the temporal cortex. In the transgeni

In [None]:
gpt2.generate(sess, run_name='run2000',
              length=50,
              temperature=0.7,
              prefix="Alzheimer's disease is characterized by",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease is characterized by progressive dementia of the Alzheimer type, and the occurrence of a cortical subcortical Lewy body disease. The ventricular size of the cerebral cortex is reduced in Alzheimer's disease, and the cerebellar density is reduced by an inordinate magnitude
Alzheimer's disease is characterized by the presence of multiple neurodegenerative processes, the presence of which are accentuated in the AD brain. In this study, we have compared the levels of cAMP and Ginkgo biloba extract, the enzyme that cleaves  phosphat
Alzheimer's disease is characterized by a progressive neurodegenerative process, including progressive as well as severe neurodegenerative changes. A progressive memory deficit is also present in many demented patients. Using the NINCDS-ADRDA criteria, the present case report provides the
Alzheimer's disease is characterized by neuronal degeneration and synaptic loss. In addition, the gene encoding APP is expressed by up to 40% of all neurons 

In [None]:
gpt2.generate(sess, run_name='run2010',
              length=250,
              temperature=0.7,
              prefix="Alzheimer's disease is characterized by",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease is characterized by a progressive impairment of cholinergic functions. Here we show that the amyloid-beta peptide (Abeta) in cerebrospinal fluid (CSF) from patients with Alzheimer's disease (AD) has a significant effect on the activity of the cholinergic system and the cholinergic receptor subtype. Abeta induced a novel effect on the cholinergic system and the receptor subtype. These results indicate that a cholinergic cell type and an APOE4 lineage are required for the cholinergic modulation of Abeta production and neurotransmitter release.
Alzheimer's disease (AD) is the most common form of dementia and is characterized by the progressive loss of cognitive functions. The etiology and treatment of AD is still unclear. We now report the discovery of a novel protein, a 45 kDa member of the neurofilamentous protein tau, that is expressed in the brains of patients with AD and controls. The expression of tau isoforms in the brain of patients with AD and controls was inv

In [None]:
gpt2.generate(sess, run_name='run2010',
              length=50,
              temperature=0.7,
              prefix="Alzheimer's disease is characterized by",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease is characterized by a progressive loss of cognitive functions. The molecular pathophysome of Alzheimer's disease is based in the formation of amyloid-beta (Abeta) and promotes an inflammatory response. The growing evidence for an etiopathogenesis of Alzheimer's disease
Alzheimer's disease is characterized by the presence of amyloid plaques that are the hallmark of Alzheimer's disease (AD). Amyloid fibrils (A beta, beta-amyloid, and tau) can exhibit concomitant pathological changes, such as ext
Alzheimer's disease is characterized by severe pathological changes in the brain. The aim of this study was to mimic the effects of the amyloid-beta-peptides and the tau-immunoreactive fragment of the presenilin 1 (PS-1)
Alzheimer's disease is characterized by a progressive loss of synapses and the loss of neurons. This loss provides a new target for the development of new drugs that target synapse loss in Alzheimer's disease, which has been the target of intense interest. In 

In [None]:
gpt2.generate(sess, run_name='run2020',
              length=250,
              temperature=0.7,
              prefix="Alzheimer's disease is characterized by",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease is characterized by a decrease in brain-derived neurotrophic factor (BDNF) and decreased brain-derived neurotrophic factor (BDNF) concentrations. In  this study, we investigated the effect of two doses of alpha-synuclein (M1 and M2), a flavonoid with anti-BDNF, on the androgen and estrogen receptor subunits (ERs) in the hippocampus and postcentral gyrus (cGluR), a nucleolus-containing subcellular structure located in the hippocampus in Alzheimer's disease. M1 and M2 were administrated in the hippocampus for 4 h and then  the assay was repeated in the cGluR subcellular region (CGR). The M1 and M2 dose of M1 and M2 completely suppressed the decrease of BDNF levels in the cGluR subcellular regions, whereas the M1 dose suppressed the increase of BDNF levels in the cGluR subcellular region. Furthermore, the M1 dose of M2 significantly decreased the BDNF levels in the cGluR subcellular region and CGR. Furthermore, M1 also significantly increased the BDNF levels in the CGR

In [None]:
gpt2.generate(sess, run_name='run2020',
              length=50,
              temperature=0.7,
              prefix="Alzheimer's disease is characterized by",
              nsamples=5,
              batch_size=5
              )

Alzheimer's disease is characterized by an increased risk of  dementia. We propose that apoE genotype mediate the genotype-phenotype interaction, and that we are able to identify the allele-phenotype interaction in AD. The results indicate that the APOE4/
Alzheimer's disease is characterized by progressive hyperphosphorylation of tau, and the accumulation of phosphorylated tau is associated with neurodegeneration. In the present study, we measured the expression of the tau and phosphorylated tau by immunohist
Alzheimer's disease is characterized by  the aggregation of the amyloid precursor protein (APP) and the metabolic alterations leading to the formation of the amyloid beta-protein (A beta). Both soluble and insoluble APP and A beta are required for the initiation of A beta
Alzheimer's disease is characterized by severe cognitive impairment, a progressive neurodegeneration, and the development of senile plaques and neurofibrillary tangles. A genetic study was undertaken to character

For bulk generation, you can generate a large amount of text to a file and sort out the samples locally on your computer. The next cell will generate a generated text file with a unique timestamp.

You can rerun the cells as many times as you want for even more generated texts!

In [None]:
gen_file = 'gpt2_gentext_{:%Y%m%d_%H%M%S}.txt'.format(datetime.utcnow())

gpt2.generate_to_file(sess,
                      destination_path=gen_file,
                      length=500,
                      temperature=0.7,
                      nsamples=100,
                      batch_size=20
                      )

In [None]:
# may have to run twice to get file to download
files.download(gen_file)

## Generate Text From The Pretrained Model

If you want to generate text from the pretrained model, not a finetuned model, pass `model_name` to `gpt2.load_gpt2()` and `gpt2.generate()`.

This is currently the only way to generate text from the 774M or 1558M models with this notebook.

In [None]:
model_name = "774M"

gpt2.download_gpt2(model_name=model_name)

Fetching checkpoint: 1.05Mit [00:00, 354Mit/s]                                                      
Fetching encoder.json: 1.05Mit [00:00, 131Mit/s]                                                    
Fetching hparams.json: 1.05Mit [00:00, 279Mit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 3.10Git [00:23, 131Mit/s]                                  
Fetching model.ckpt.index: 1.05Mit [00:00, 380Mit/s]                                                
Fetching model.ckpt.meta: 2.10Mit [00:00, 226Mit/s]                                                 
Fetching vocab.bpe: 1.05Mit [00:00, 199Mit/s]                                                       


In [None]:
sess = gpt2.start_tf_sess()

gpt2.load_gpt2(sess, model_name=model_name)

W0828 18:37:58.571830 139905369159552 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.


Loading pretrained model models/774M/model.ckpt


In [None]:
gpt2.generate(sess,
              model_name=model_name,
              prefix="The secret of life is",
              length=100,
              temperature=0.7,
              top_p=0.9,
              nsamples=5,
              batch_size=5
              )

The secret of life is that it's really easy to make it complicated," said Bill Nye, the host of the popular science show "Bill Nye the Science Guy." "And this is one of the reasons why we all need to be smarter about science, because we can't keep up with the amazing things that are going on all the time."

While Nye is correct that "everything that's going on all the time" is making the world a better place, he misses the point. This is not
The secret of life is in the rhythm of the universe. It's not a mystery. It's not a mystery to me. It's the nature of the universe. It's the beauty of the universe. It's the way the universe works. It's the way the universe is. It's the way the universe is going to work. It's the way the universe is. It's the way the universe is. It's the way the universe is. It's the way the universe is. It's the way
The secret of life is in the universe.


-

The Red Devil

It's the end of the world as we know it, and the only thing that can save us is a band of 

# Etcetera

If the notebook has errors (e.g. GPU Sync Fail), force-kill the Colaboratory virtual machine and restart it with the command below:

In [None]:
!kill -9 -1

# LICENSE

MIT License

Copyright (c) 2019 Max Woolf

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.