# Essay Writing using AI

*   Part 0: Setup workspace
*   Part 1: Loading Machine learning model on any text dataset for free on a GPU using Collaboratory
*   Part 2: Training machine learning model on specific topics 
*   Part 3: Start writing essay

In [33]:
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

In [0]:
%matplotlib inline

import os, sys 
import logging, io, json, warnings
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)
warnings.filterwarnings('ignore')

In [27]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Set up workspace (Mounting Google Drive)

1. Mount your google drive
2. Add path to the system

In [3]:
## This is a goodle 
from google.colab import drive
drive.mount('/content/gdrive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/gdrive


In [0]:
mkdir gdrive/'My Drive'/dsscamp

In [9]:
cd /content/gdrive/'My Drive'/dscamp2

/content/gdrive/My Drive/dscamp2


In [12]:
ls dscamp_public/'NLP 2'/Essay_Writing/

AI_EssayWriting.ipynb  [0m[01;34mcodes[0m/  [01;34mdatasets[0m/  readme.md


In [0]:
#codepath = os.path.join(nb_path, 'codes')
codepath = os.path.join(os.getcwd(), 'dscamp_public/NLP 2/Essay_Writing/codes')
sys.path.append(codepath)

### Install libraries

In [14]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/12/b5/ac41e3e95205ebf53439e4dd087c58e9fd371fd8e3724f2b9b4cdb8282e5/transformers-2.10.0-py3-none-any.whl (660kB)
[K     |▌                               | 10kB 29.4MB/s eta 0:00:01[K     |█                               | 20kB 2.9MB/s eta 0:00:01[K     |█▌                              | 30kB 3.8MB/s eta 0:00:01[K     |██                              | 40kB 4.3MB/s eta 0:00:01[K     |██▌                             | 51kB 3.5MB/s eta 0:00:01[K     |███                             | 61kB 3.9MB/s eta 0:00:01[K     |███▌                            | 71kB 4.2MB/s eta 0:00:01[K     |████                            | 81kB 4.5MB/s eta 0:00:01[K     |████▌                           | 92kB 4.9MB/s eta 0:00:01[K     |█████                           | 102kB 4.5MB/s eta 0:00:01[K     |█████▌                          | 112kB 4.5MB/s eta 0:00:01[K     |██████                          | 122kB 4.5

### GPU
Colaboratory uses either a Nvidia T4 GPU or an Nvidia K80 GPU. The T4 is slightly faster than the old K80 for training GPT-2, and has more memory allowing you to train the larger GPT-2 models and generate more text.

You can verify which GPU is active by running the cell below.

In [0]:
import numpy as np
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel

In [0]:
MAX_LENGTH = int(10000)

In [0]:
def set_seed(seed):
    np.random.seed(seed)
    torch.manual_seed(seed)
    if args.n_gpu > 0:
        torch.cuda.manual_seed_all(seed)

In [0]:
from main import GenerateSentence 

## Loading machine learning model
This machine learning model is trained on a very large corpus of ~40 GB of text data. The largest model size is huge with 1.5 billion parameters, trained on a dataset of **8 million** web pages.

In [23]:
generate_sentence = GenerateSentence(dataset_path='dscamp_public/NLP 2/Essay_Writing/datasets')

'Language Generator loaded successfully....'


In [25]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

## Train model on specific topics for free on Google Colab GPUs
Option to provide your own datasets is also available.
The defaults topics to choose from:

1. Artificial intelligence
2. Machine learning
3. History of United States

More topics will be added....

In [28]:
generate_sentence.train_on_topics('ai')

HBox(children=(FloatProgress(value=0.0, max=3.0), HTML(value='')))




## Start Writing Essays

At each step five options will be provided

* A -> AI option
* B -> AI option
* C -> AI option
* D -> User can choose to add sentences
* E -> STOP the writing process

In [30]:
generate_sentence.start_writing()

Write the first sentence >>> Artificial intelligence plays an important role in modern world.


In [34]:
generate_sentence.generate_sentences()

::::YOURS OPTIONS ARE :::
A. --> 

B. --> 

C. -->  It's a great way to learn and understand what is going on around the globe, how people are doing things that they don't normally do or can never be done before," says Dr.

D. --> Write your own sentences
E. --> STOP ESSAY WRITING
{'A': 'Artificial intelligence plays an important role in modern world.', 'B': 'Artificial intelligence plays an important role in modern world.', 'C': 'Artificial intelligence plays an important role in modern world. It\'s a great way to learn and understand what is going on around the globe, how people are doing things that they don\'t normally do or can never be done before," says Dr.'}
Choose your option >>> B
 
 
 
 ****** ESSAY TILL THIS POINT *******
Artificial intelligence plays an important role in modern world. 



::::YOURS OPTIONS ARE :::
A. --> 

B. --> 

C. --> 

D. --> Write your own sentences
E. --> STOP ESSAY WRITING
{'A': 'Artificial intelligence plays an important role in modern world.', 'B

In [37]:
generate_sentence = GenerateSentence(dataset_path='dscamp_public/NLP 2/Essay_Writing/datasets')

'Language Generator loaded successfully....'


In [38]:
generate_sentence.train_on_topics('history')

HBox(children=(FloatProgress(value=0.0, max=7.0), HTML(value='')))




In [40]:
generate_sentence.start_writing()

Write the first sentence >>> George Washington is an important figure in American history.


In [41]:
generate_sentence.generate_sentences()

::::YOURS OPTIONS ARE :::
A. -->  He was a member of theocratic and anti-Catholic Church, he served as president for many times before being elected to Congress but never once ran against it again.

B. --> 
The first president of the United States was born on March 6, 1776 and died at age 85 (18 years old).

C. -->  He was born on April 1, 1876 at the home of his mother and father who died when he had a heart attack during World War II
The first lady's name came from her husband George W.

D. --> Write your own sentences
E. --> STOP ESSAY WRITING
{'A': 'George Washington is an important figure in American history. He was a member of theocratic and anti-Catholic Church, he served as president for many times before being elected to Congress but never once ran against it again.', 'B': 'George Washington is an important figure in American history.\nThe first president of the United States was born on March 6, 1776 and died at age 85 (18 years old).', 'C': "George Washington is an important