# PlotCreator
This notebook plans to make custom movie plots using the details from Netflix's Movie and TV Show dataset. </br>
The notebook uses [GPT-2](https://en.wikipedia.org/wiki/GPT-2), a transfomer model that works using a deep neural network to predict proper texts and sentences. </br>
One of the easier methods of usage is by using [GPT-2-Simple](https://github.com/minimaxir/gpt-2-simple) made by Max Woolf. Example of usage is present on the repo. </br>


In [1]:
%tensorflow_version 1.x
!pip install -q gpt-2-simple
import gpt_2_simple as gpt2
from datetime import datetime
from google.colab import files
import pandas as pd

TensorFlow 1.x selected.
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



Mount your drive so that you can save the checkpoint reached in the model into your drive later after training.

In [2]:
gpt2.mount_gdrive()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
gpt2.download_gpt2(model_name="124M")

Fetching checkpoint: 1.05Mit [00:00, 321Mit/s]                                                      
Fetching encoder.json: 1.05Mit [00:00, 6.00Mit/s]
Fetching hparams.json: 1.05Mit [00:00, 268Mit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 498Mit [00:17, 27.7Mit/s]                                  
Fetching model.ckpt.index: 1.05Mit [00:00, 279Mit/s]                                                
Fetching model.ckpt.meta: 1.05Mit [00:00, 8.05Mit/s]
Fetching vocab.bpe: 1.05Mit [00:00, 8.95Mit/s]


You can download and learn about the data [here](https://www.kaggle.com/shivamb/netflix-shows).

In [4]:
data = pd.read_csv('netflix_titles.csv')

Since the parameter for the dataset is in form of a txt file, we will iterate through the dataset and put it in a txt file. We shall also add marker for when the plot of movie starts and ends.

In [5]:
texts = ""
for i,row in data.iterrows():
  # print(str(data['description'])
  texts += '<|startoftext|> ' + str(row['description']) + ' <|endoftext|>\n'
txtFile = open("data.txt","w+")
txtFile.write(texts)
print(txtFile.read())
txtFile.close()




We then run it and title it as run1 so that we can save it in drive and use it later again.

In [6]:
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              dataset='data.txt',
              model_name='124M',
              steps=600,
              restore_from='fresh',
              run_name='run1',
              print_every=10,
              sample_every=100
              )

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Loading checkpoint models/124M/model.ckpt
INFO:tensorflow:Restoring parameters from models/124M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:02<00:00,  2.26s/it]


dataset has 347815 tokens
Training...
[10 | 31.51] loss=2.48 avg=2.48
[20 | 55.29] loss=2.45 avg=2.47
[30 | 78.32] loss=2.35 avg=2.43
[40 | 101.57] loss=2.29 avg=2.39
[50 | 125.25] loss=2.24 avg=2.36
[60 | 148.78] loss=2.17 avg=2.33
[70 | 172.06] loss=2.31 avg=2.32
[80 | 195.45] loss=2.22 avg=2.31
[90 | 218.98] loss=2.11 avg=2.29
[100 | 242.42] loss=2.12 avg=2.27
|html|> From an intimate look at the history, culture and cultural impact of Filipino independence and his early rise to become the Philippine army general who laid claim to it. <|endoftext|>
<|startoftext|> In a gritty neighborhood of Los Angeles, Mexican police officers confront a culture that has grown increasingly uncomfortable, unruly and dangerous. <|endoftext|>
<|startoftext|> A woman of Irish descent leaves an impoverished family to become a fashion designer at a fashion show. But a sudden crisis leads her to run afoul of a powerful force on the inside. <|endoftext|>
<|startoftext|> A middle-aged woman struggles to con

Uploading to drive

In [7]:
gpt2.copy_checkpoint_to_gdrive(run_name='run1')

The two comments can be uncommented and used directly when you want to rerun the program with the same training set.

In [8]:
# gpt2.copy_checkpoint_from_gdrive(run_name='run1')

In [9]:
# sess = gpt2.start_tf_sess()
# gpt2.load_gpt2(sess, run_name='run1')

In [10]:
gpt2.generate(sess, run_name='run1',
             length=100,
             prefix="<|startoftext|>",
             truncate="<|endoftext|>\n",
             include_prefix=False)

 Two thrill-seeking mythical creatures make their way into a twisted crime-infested town, where they learn the key to unlocking their deepest fears is a quest for freedom. 


We generate the output as a list and then properly format for printing it out.

In [22]:
plotList = gpt2.generate(sess, run_name = 'run1',
                length = 100,
                temperature = 1,
                nsamples=20,
                batch_size=10,
                include_prefix=False,
                return_as_list = True
                )

In [23]:
finalList = []
for plots in plotList:
  plot = plots.split()
  started = False
  plt = ''
  # print(plot)
  for word in plot:
    if not started and word == '<|startoftext|>':
      started = True
    elif started and word == '<|endoftext|>':
      started = False
      finalList.append(plt)
    elif started:
      plt += word + ' '


Print any 5 random movie plots

In [24]:
from random import choice
for _ in range(5) : print(choice(finalList))

A father's suicide sends his daughter and grandson next door into the pain of losing their first child, only to incite a violent clash. 
Inspired by "MythBusters," five astronauts discuss far-fetched stories that touch on everything from religious origins to alien training camps at Macdonald-Tinsley Airport. Two Boston police officers are profilers for a Boston Globe crime beat, while working undercover as a pair of cops in Hong Kong. 
Two cooly devoted exes rekindle their bond by taking on deadly mysteries together in this supernatural mystery. As a judge faces a lawsuit over his involvement in a gay rights march in 1980, the billionaire owner of the Toronto Sun tries to redeem himself. 
When three emerging small-town filmmakers revisit a classic blues tune, the tune is replaced by a hauntingly nostalgic tune for the guys they meet. 
In an Antibodiesaga neighborhood, a defiant radio talk-show host finds himself the target of a ruthless bully with a stake in his heart. 
