#Abstractive Summarization of Privacy Policies with Seq2Seq
**FYP1**
BY:
SHAHZEB AND MUNEEB


The objective of this project is to build a model that can create relevant abstractive summaries for Privacy Policies preserving the original context. 
To build our model we will use a two-layered bidirectional RNN with LSTMs on the input data and two layers, each with an LSTM using attention on the target data.
The sections of this project are:

- Preparing the Data
- Building the Model
- Training the Model
- Making Our Own Abstractive Summaries

In [0]:
import pandas as pd
import numpy as np
import tensorflow as tf
import re
from nltk.corpus import stopwords
import time
from tensorflow.python.layers.core import Dense
from tensorflow.python.ops.rnn_cell_impl import _zero_state_tensors

TensorFlow Version: 1.15.0


In [0]:
print('TensorFlow Version: {}'.format(tf.__version__))


In [0]:
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

Found GPU at: /device:GPU:0


### MOunt Drive

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [0]:
import os
os.getcwd()
!ls

drive  sample_data


## Exploring The Data set


In [0]:
headings = pd.read_csv('drive/My Drive/Colab Notebooks/newdatasetcopy.csv')

In [0]:
print(headings.shape)
print(headings.head())

(4904, 2)

Unnamed: 0,Summary,Policy
0,If the browser is loading a page and the serve...,Cookies are text files sent by Web servers to ...
1,The Weather Channel and its AffiliatesWe use i...,One of the most valuable assets of our busines...
2,When you enter sensitive financial information...,EA understands the importance of keeping your ...
3,We may also obtain information about you from ...,Activision only collects personal information ...
4,Session data is cleared when a browser is closed.,Pinkbike does not collect information from not...


In [0]:
# Check for any null values in the dataset
headings.isnull().sum()

Summary    0
Policy     0
dtype: int64

In [0]:
# Remove null values and unneeded features
headings = headings.dropna()

headings = headings.reset_index(drop=True)

In [0]:
headings.head()

Unnamed: 0,Summary,Policy
0,If the browser is loading a page and the serve...,Cookies are text files sent by Web servers to ...
1,The Weather Channel and its AffiliatesWe use i...,One of the most valuable assets of our busines...
2,When you enter sensitive financial information...,EA understands the importance of keeping your ...
3,We may also obtain information about you from ...,Activision only collects personal information ...
4,Session data is cleared when a browser is closed.,Pinkbike does not collect information from not...


In [0]:
# Inspecting some of the items from the dataset
for i in range(5):
    print("Extracted Summary#",i+1)
    print(headings.Summary[i])
    print("Original Paragraph#",i+1)
    print(headings.Policy[i])
    print()

Extracted Summary# 1
If the browser is loading a page and the server requests the information stored in the cookie the cookie is sent back to the server.
Original Paragraph# 1
Cookies are text files sent by Web servers to Web browsers and are stored on the user's computer or mobile device. If the browser is loading a page and the server requests the information stored in the cookie the cookie is sent back to the server.Cookies contain data about the user's activities on the website and can be used by Web servers to identify and track users as they navigate different pages on a website and to identify users returning to a website.Cookies may be either "persistent" or "session" cookies. A persistent cookie will remain valid until its set expiration date (unless deleted by the user). A session cookie will expire at the end of the user session when the Web browser is closed.

Extracted Summary# 2
The Weather Channel and its AffiliatesWe use information collected on the Services and disclos

In [0]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


True

## Preparing the Data

In [0]:
# A list of contractions from http://stackoverflow.com/questions/19790188/expanding-english-language-contractions-in-python
contractions = { 
"ain't": "am not",
"aren't": "are not",
"can't": "cannot",
"can't've": "cannot have",
"'cause": "because",
"could've": "could have",
"couldn't": "could not",
"couldn't've": "could not have",
"didn't": "did not",
"doesn't": "does not",
"don't": "do not",
"hadn't": "had not",
"hadn't've": "had not have",
"hasn't": "has not",
"haven't": "have not",
"he'd": "he would",
"he'd've": "he would have",
"he'll": "he will",
"he's": "he is",
"how'd": "how did",
"how'll": "how will",
"how's": "how is",
"i'd": "i would",
"i'll": "i will",
"i'm": "i am",
"i've": "i have",
"isn't": "is not",
"it'd": "it would",
"it'll": "it will",
"it's": "it is",
"let's": "let us",
"ma'am": "madam",
"mayn't": "may not",
"might've": "might have",
"mightn't": "might not",
"must've": "must have",
"mustn't": "must not",
"needn't": "need not",
"oughtn't": "ought not",
"shan't": "shall not",
"sha'n't": "shall not",
"she'd": "she would",
"she'll": "she will",
"she's": "she is",
"should've": "should have",
"shouldn't": "should not",
"that'd": "that would",
"that's": "that is",
"there'd": "there had",
"there's": "there is",
"they'd": "they would",
"they'll": "they will",
"they're": "they are",
"they've": "they have",
"wasn't": "was not",
"we'd": "we would",
"we'll": "we will",
"we're": "we are",
"we've": "we have",
"weren't": "were not",
"what'll": "what will",
"what're": "what are",
"what's": "what is",
"what've": "what have",
"where'd": "where did",
"where's": "where is",
"who'll": "who will",
"who's": "who is",
"won't": "will not",
"wouldn't": "would not",
"you'd": "you would",
"you'll": "you will",
"you're": "you are"
}

In [0]:
def clean_text(text, remove_stopwords = True):
    '''Remove unwanted characters, stopwords, and format the text to create fewer nulls word embeddings'''
    
    # Convert words to lower case
    text = text.lower()
    
    # Replace shorter contractions with their longer forms 
    if True:
        text = text.split()
        new_text = []
        for word in text:
            if word in contractions:
                new_text.append(contractions[word])
            else:
                new_text.append(word)
        text = " ".join(new_text)
    
    # Format words and remove unwanted characters if left although we have removed them already while scraping
    text = re.sub(r'https?:\/\/.*[\r\n]*', '', text, flags=re.MULTILINE)
    text = re.sub(r'\<a href', ' ', text)
    text = re.sub(r'&amp;', '', text) 
    text = re.sub(r'[_"\-;%()|+&=*%.,!?:#$@\[\]/]', ' ', text)
    text = re.sub(r'<br />', ' ', text)
    text = re.sub(r'\'', ' ', text)
    
    # Optionally, remove stop words
    if remove_stopwords:
        text = text.split()
        stops = set(stopwords.words("english"))
        text = [w for w in text if not w in stops]
        text = " ".join(text)

    return text

We will remove the stopwords from the texts because they do not provide much use for training our model.
However, we will keep them for our summaries so that they sound more like natural phrases. 

In [0]:

import nltk
nltk.download('stopwords')
  
# Clean the summaries and texts
clean_summaries = []
for summary in headings.Summary:
    clean_summaries.append(clean_text(summary, remove_stopwords=False))
print("Summaries are complete.")

clean_texts = []
for text in headings.Policy:
    clean_texts.append(clean_text(text))
print("Texts are complete.")

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
Summaries are complete.
Texts are complete.


In [0]:
# Inspect the cleaned summaries and texts to ensure they have been cleaned well
for i in range(5):
    print("Clean #",i+1)
    print("Extracted Summary",i+1)
    print(clean_summaries[i])
    print("Orignal Paragraph",i+1)
    print(clean_texts[i])
    print()

Clean # 1
Extracted Summary 1
if the browser is loading a page and the server requests the information stored in the cookie the cookie is sent back to the server 
Orignal Paragraph 1
cookies text files sent web servers web browsers stored user computer mobile device browser loading page server requests information stored cookie cookie sent back server cookies contain data user activities website used web servers identify track users navigate different pages website identify users returning website cookies may either persistent session cookies persistent cookie remain valid set expiration date unless deleted user session cookie expire end user session web browser closed

Clean # 2
Extracted Summary 2
the weather channel and its affiliateswe use information collected on the services and disclose information to third parties including your physical geographic location to help fulfill your requests or in connection with the operation of the services for example to service your account cond

In [0]:
def count_words(count_dict, text):
    '''Count the number of occurrences of each word in a set of text'''
    for sentence in text:
        for word in sentence.split():
            if word not in count_dict:
                count_dict[word] = 1
            else:
                count_dict[word] += 1

In [0]:
# Find the number of times each word was used and the size of the vocabulary
word_counts = {}

count_words(word_counts, clean_summaries)
count_words(word_counts, clean_texts)
            
print("Size of Vocabulary:", len(word_counts))

Size of Vocabulary: 20638


#Word Embeddings



In [0]:

# Load Conceptnet Numberbatch's (CN) embeddings, probably better than Glove 
#(https://github.com/commonsense/conceptnet-numberbatch)

embeddings_index = {}
with open('drive/My Drive/Colab Notebooks/numberbatch-en-17.02.txt', encoding='utf-8') as f:
    for line in f:
        #print (line)
        values = line.split(' ')
        #print (values[0])
        word = values[0]
        #print (values[1:])
        embedding = np.asarray(values[1:], dtype='float32')
        embeddings_index[word] = embedding

        
print('Word embeddings:', len(embeddings_index))

Word embeddings: 484557


In [0]:
# Find the number of words that are missing from CN, and are used more than our threshold.
missing_words = 0
threshold = 20    #20threshold #Thrshold is here if any word appear more than 20 therefore
#I use a threshold of 20, so that words not in CN can be added to our word_embedding_matrix, but they need to be common enough in the paragraphs so that the model can understand their meaning.
rey=[]
for word, count in word_counts.items():
    if count > threshold:
        if word not in embeddings_index:
            missing_words += 1
            rey.append(word)
            
missing_ratio = round(missing_words/len(word_counts),4)*100
            
print("Number of words missing from CN:", missing_words)
print("Percent of words that are missing from vocabulary: {}%".format(missing_ratio))
print(rey)

Number of words missing from CN: 122
Percent of words that are missing from vocabulary: 0.59%
['cookieswe', 'auap', '10', '30', '800', '13', '“how', 'realgm', '“contact', 'us”', 'cricbuzz', '—', '22', 'mixedmartialarts', '–', '12', 'borderfree', 'malayala', '•', 'informationwhen', '”', '100', '9gag', 'rewards®', '“web', '18', '888', 'vegasinsider', 'sulekha', 'bcrs', 'doccheck', 'soundcloud®', 'mansueto', 'informationwe', 'techmedia', 'information”', '44', 'supermedia', 'pegym', 'out”', 'cookies”', 'mouthshut', '1798', 'myfitnesspal', 'footytube', '90', 'interpals', '·', 'b92', 'implix', 'bitly', 'mapsofindia', 'bitpay', 'italki', 'lumosity', '“we”', '“us”', 'mobile9', 'uclick', '24', 'intermarkets', '1998', 'researchgate', '2011', 'roblox', '15', '2000', 'andto', 'domaintools', 'linguee', 'viumbe', '83', 'netdoctor', '17', 'zedo', '192', 'informationin', '128', 'surfline', '“cookies”', 'lyricsfreak', '866', 'pinterest', 'teenshealth', 'sourceswe', 'yardbarker', 'mediaplex', '2012', 'l

In [0]:
# Limit the vocab that we will use to words that appear more than  threshold 

#dictionary to convert words to integers
vocab_to_int = {} 

value = 0
for word, count in word_counts.items():
    if count >= threshold or word in embeddings_index:
        vocab_to_int[word] = value
        value += 1

# Special tokens that will be added to our vocab
codes = ["<UNK>","<PAD>","<EOS>","<GO>"]   

# Add codes to vocab
for code in codes:
    vocab_to_int[code] = len(vocab_to_int)

# Dictionary to convert integers to words
int_to_vocab = {}
for word, value in vocab_to_int.items():
    int_to_vocab[value] = word

usage_ratio = round(len(vocab_to_int) / len(word_counts),4)*100

print("Total number of unique words:", len(word_counts))
print("Number of words we will use:", len(vocab_to_int))
print("Percent of words we will use: {}%".format(usage_ratio))

Total number of unique words: 20638
Number of words we will use: 11684
Percent of words we will use: 56.61000000000001%


In [0]:
# Need to use 300 for embedding dimensions to match CN's vectors.
embedding_dim = 300
nb_words = len(vocab_to_int)

# Create matrix with default values of zero
word_embedding_matrix = np.zeros((nb_words, embedding_dim), dtype=np.float32)
for word, i in vocab_to_int.items():
    if word in embeddings_index:
        word_embedding_matrix[i] = embeddings_index[word]
    else:
        # If word not in CN, create a random embedding for it
        new_embedding = np.array(np.random.uniform(-1.0, 1.0, embedding_dim))
        print (new_embedding)
        embeddings_index[word] = new_embedding
        word_embedding_matrix[i] = new_embedding 

# Check if value matches len(vocab_to_int)
print(len(word_embedding_matrix))

[ 9.38552393e-01 -1.26789851e-01  2.18896366e-02  6.17978394e-01
 -1.85553445e-01  6.45900714e-01 -4.25754861e-01 -7.38292947e-01
  1.73234476e-01  7.09489351e-01  7.51146155e-01 -5.23828666e-01
 -7.71412400e-02  5.28138758e-01 -5.41541849e-02  9.11013350e-01
 -4.51455847e-01 -5.34323708e-01 -8.33437394e-01 -6.89330756e-01
 -3.31888971e-01  1.55914086e-01 -3.89707104e-01 -4.80716887e-01
  9.28899503e-01 -6.47938230e-01  6.19603407e-01  3.18771968e-01
 -3.29277506e-02  5.41170976e-01  6.82016772e-01  9.63586382e-01
 -3.63044489e-01 -1.10587878e-01  6.36955077e-01 -8.86233305e-01
 -9.58324372e-01 -9.64608836e-01 -1.48594308e-01 -7.30174177e-01
  8.89473995e-01  4.52163120e-01 -3.57320996e-01  1.44031604e-01
  6.78991074e-01  1.13691946e-01  3.56625491e-01 -9.08695716e-01
 -4.02359622e-01  3.39131441e-01  9.64308130e-02 -1.80134714e-01
 -5.31435587e-01  4.35433941e-01 -5.91769204e-01  1.63048725e-01
 -9.10561986e-01 -4.79355657e-01 -7.63951163e-01 -6.46521353e-01
 -3.35027447e-01  7.56180

In [0]:
def convert_to_ints(text, word_count, unk_count, eos=False):
    #'''Convert words in text to an integer.If word is not in vocab_to_int, use UNK's integer.Total the number of words and UNKs.Add EOS token to the end of texts'''
    ints = []
    for sentence in text:
        sentence_ints = []
        for word in sentence.split():
            word_count += 1
            if word in vocab_to_int:
                sentence_ints.append(vocab_to_int[word])
            else:
                sentence_ints.append(vocab_to_int["<UNK>"])
                unk_count += 1
        if eos:
            sentence_ints.append(vocab_to_int["<EOS>"])
        ints.append(sentence_ints)
    return ints, word_count, unk_count

In [0]:
# Apply convert_to_ints to clean_summaries and clean_texts
word_count = 0
unk_count = 0

int_summaries, word_count, unk_count = convert_to_ints(clean_summaries, word_count, unk_count)
int_texts, word_count, unk_count = convert_to_ints(clean_texts, word_count, unk_count, eos=True)

unk_percent = round(unk_count/word_count,4)*100



Total number of words in Vocab: 1076211
Total number of UNKs in Vocab: 16320
Percent of words that are UNK: 1.52%


In [0]:
print("Total number of words in Vocab:", word_count)
print("Total number of UNKs in Vocab:", unk_count)
print("Percent of words that are UNK: {}%".format(unk_percent))

In [0]:
#print (int_to_vocab[value] for value in int_summaries[:5])
for value in int_summaries[2:]:
  for value2 in value:
    print (int_to_vocab[value2],end=" ")
  break
print ()
print (len(int_summaries))
print (int_texts[:5])
print (len(int_texts))


when you enter sensitive financial information such as a credit card number on our order forms we encrypt the transmission of that information using commercially reasonable methods 
4904
[[277, 1385, 532, 14, 101, 241, 101, 1442, 11, 293, 284, 869, 533, 2, 4, 6, 8, 9, 10, 11, 13, 13, 14, 15, 8, 277, 1609, 113, 293, 647, 154, 314, 101, 241, 290, 891, 71, 1314, 579, 424, 154, 290, 71, 3505, 154, 277, 89, 672, 283, 112, 277, 283, 13, 420, 1276, 441, 3674, 130, 326, 957, 293, 112, 13, 4210, 1105, 293, 112, 101, 2, 115, 11682], [844, 809, 2191, 161, 769, 238, 17, 18, 779, 123, 99, 2538, 851, 11680, 258, 17, 18, 11680, 20, 10, 21, 23, 24, 10, 25, 26, 27, 29, 30, 31, 32, 33, 9, 35, 37, 23, 40, 41, 42, 43, 44, 45, 41, 49, 50, 10, 51, 53, 54, 55, 56, 57, 58, 23, 45, 59, 60, 63, 64, 65, 66, 23, 67, 68, 71, 20, 23, 25, 26, 89, 3825, 100, 10, 10, 1160, 26, 1015, 126, 23, 89, 260, 749, 3424, 89, 45, 1571, 207, 194, 838, 492, 17, 18, 887, 245, 207, 17, 18, 887, 20, 2538, 896, 126, 23, 687, 17, 18, 8

In [0]:
def create_lengths(text):
    #Create a data frame of the sentence lengths from a text
    lengths = []
    for sentence in text:
        lengths.append(len(sentence))
    return pd.DataFrame(lengths, columns=['counts'])

In [0]:
lengths_summaries = create_lengths(int_summaries)
lengths_texts = create_lengths(int_texts)

print("Extracted Summaries:")
print(lengths_summaries.describe())
print()
print("Paragraphs:")
print(lengths_texts.describe())

Extracted Summaries:
            counts
count  4904.000000
mean     44.424551
std      34.802341
min       0.000000
25%      21.000000
50%      35.000000
75%      58.000000
max     406.000000

Paragraphs:
            counts
count  4904.000000
mean    176.031199
std     179.663526
min       1.000000
25%      75.000000
50%     117.000000
75%     209.000000
max    2370.000000


In [0]:
print (lengths_summaries[:5])

   counts
0      25
1     101
2      27
3      60
4       9


In [0]:
# Inspect the length of texts
#print(lengths_texts,count)

print(np.percentile(lengths_texts.counts, 90))
print(np.percentile(lengths_texts.counts, 95))
print(np.percentile(lengths_texts.counts, 99))

361.0
494.84999999999945
941.9100000000008


In [0]:
# Inspect the length of summaries
print(np.percentile(lengths_summaries.counts, 90))
print(np.percentile(lengths_summaries.counts, 95))
print(np.percentile(lengths_summaries.counts, 99))

85.0
108.0
172.0


In [0]:
def unk_counter(sentence):
    #Counts the number of time UNK appears in a sentence.
    unk_count = 0
    for word in sentence:
        if word == vocab_to_int["<UNK>"]:
            unk_count += 1
    return unk_count

In [0]:


# Sort the summaries and texts by the length of the texts, shortest to longest
# Limit the length of summaries and texts based on the min and max ranges.
# Remove summaries that include too many UNKs

sorted_summaries = []
sorted_texts = []
max_text_length = 2000
max_summary_length = 500
min_length = 2
unk_text_limit = 1
unk_summary_limit = 0

for length in range(min(lengths_texts.counts), max_text_length): 
    for count, words in enumerate(int_summaries):
        if (len(int_summaries[count]) >= min_length and
            len(int_summaries[count]) <= max_summary_length and
            len(int_texts[count]) >= min_length and
            unk_counter(int_summaries[count]) <= unk_summary_limit and
            unk_counter(int_texts[count]) <= unk_text_limit and
            length == len(int_texts[count])
           ):
            sorted_summaries.append(int_summaries[count])
            sorted_texts.append(int_texts[count])
        
# Compare lengths to ensure they match
print(len(sorted_summaries))
print(len(sorted_texts))

2841
2841


## Model Build

In [0]:
def model_inputs():
    #Create palceholders for inputs to the model
    
    input_data = tf.placeholder(tf.int32, [None, None], name='input')
    targets = tf.placeholder(tf.int32, [None, None], name='targets')
    lr = tf.placeholder(tf.float32, name='learning_rate')
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    summary_length = tf.placeholder(tf.int32, (None,), name='summary_length')
    max_summary_length = tf.reduce_max(summary_length, name='max_dec_len')
    text_length = tf.placeholder(tf.int32, (None,), name='text_length')

    return input_data, targets, lr, keep_prob, summary_length, max_summary_length, text_length

In [0]:
def process_encoding_input(target_data, vocab_to_int, batch_size):
    #Remove the last word id from each batch and concat the <GO> to the begining of each batch
    
    ending = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1])
    dec_input = tf.concat([tf.fill([batch_size, 1], vocab_to_int['<GO>']), ending], 1)

    return dec_input

In [0]:
def encoding_layer(rnn_size, sequence_length, num_layers, rnn_inputs, keep_prob):
    #Create the encoding layer
    
    for layer in range(num_layers):
        with tf.variable_scope('encoder_{}'.format(layer)):
            cell_fw = tf.contrib.rnn.LSTMCell(rnn_size,
                                              initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
            cell_fw = tf.contrib.rnn.DropoutWrapper(cell_fw, 
                                                    input_keep_prob = keep_prob)

            cell_bw = tf.contrib.rnn.LSTMCell(rnn_size,
                                              initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
            cell_bw = tf.contrib.rnn.DropoutWrapper(cell_bw, 
                                                    input_keep_prob = keep_prob)

            enc_output, enc_state = tf.nn.bidirectional_dynamic_rnn(cell_fw, 
                                                                    cell_bw, 
                                                                    rnn_inputs,
                                                                    sequence_length,
                                                                    dtype=tf.float32)
    # Join outputs since we are using a bidirectional RNN
    enc_output = tf.concat(enc_output,2)
    
    return enc_output, enc_state

In [0]:

def training_decoding_layer(dec_embed_input, summary_length, dec_cell, initial_state, output_layer, 
                            vocab_size, max_summary_length):
    #Create the training logits
    
    training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input,
                                                        sequence_length=summary_length,
                                                        time_major=False)

    training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,
                                                       training_helper,
                                                       initial_state,
                                                       output_layer) 

    training_logits, _ , _ = tf.contrib.seq2seq.dynamic_decode(training_decoder,
                                                           output_time_major=False,
                                                           impute_finished=True,
                                                           maximum_iterations=max_summary_length)
    return training_decoder

In [0]:
def inference_decoding_layer(embeddings, start_token, end_token, dec_cell, initial_state, output_layer,
                             max_summary_length, batch_size):
    #Create the inference logits
    
    start_tokens = tf.tile(tf.constant([start_token], dtype=tf.int32), [batch_size], name='start_tokens')
    
    inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(embeddings,
                                                                start_tokens,
                                                                end_token)
                
    inference_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,
                                                        inference_helper,
                                                        initial_state,
                                                        output_layer)
                
    inference_logits, _ , _ = tf.contrib.seq2seq.dynamic_decode(inference_decoder,
                                                            output_time_major=False,
                                                            impute_finished=True,
                                                            maximum_iterations=max_summary_length)
    
    return inference_decoder

In [0]:

def decoding_layer(dec_embed_input, embeddings, enc_output, enc_state, vocab_size, text_length, summary_length, 
                   max_summary_length, rnn_size, vocab_to_int, keep_prob, batch_size, num_layers):
    #Create the decoding cell and attention for the training and inference decoding layers
    
    for layer in range(num_layers):
        with tf.variable_scope('decoder_{}'.format(layer)):
            lstm = tf.contrib.rnn.LSTMCell(rnn_size,
                                           initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
            dec_cell = tf.contrib.rnn.DropoutWrapper(lstm, 
                                                     input_keep_prob = keep_prob)
    
    output_layer = Dense(vocab_size,
                         kernel_initializer = tf.truncated_normal_initializer(mean = 0.0, stddev=0.1))
    
    attn_mech = tf.contrib.seq2seq.BahdanauAttention(rnn_size,
                                                  enc_output,
                                                  text_length,
                                                  normalize=False,
                                                  name='BahdanauAttention')

    dec_cell = tf.contrib.seq2seq.AttentionWrapper(dec_cell,
                                                          attn_mech,
                                                          rnn_size)
            
    #initial_state = tf.contrib.seq2seq.AttentionWrapperState(enc_state[0],
    #                                                                _zero_state_tensors(rnn_size, 
    #                                                                                    batch_size, 
    #                                                                                    tf.float32)) 
    initial_state = dec_cell.zero_state(batch_size=batch_size,dtype=tf.float32).clone(cell_state=enc_state[0])

    with tf.variable_scope("decode"):
        training_decoder = training_decoding_layer(dec_embed_input, 
                                                  summary_length, 
                                                  dec_cell, 
                                                  initial_state,
                                                  output_layer,
                                                  vocab_size, 
                                                  max_summary_length)
        
        training_logits,_ ,_ = tf.contrib.seq2seq.dynamic_decode(training_decoder,
                                  output_time_major=False,
                                  impute_finished=True,
                                  maximum_iterations=max_summary_length)
    with tf.variable_scope("decode", reuse=True):
        inference_decoder = inference_decoding_layer(embeddings,  
                                                    vocab_to_int['<GO>'], 
                                                    vocab_to_int['<EOS>'],
                                                    dec_cell, 
                                                    initial_state, 
                                                    output_layer,
                                                    max_summary_length,
                                                    batch_size)
        
        inference_logits,_ ,_ = tf.contrib.seq2seq.dynamic_decode(inference_decoder,
                                  output_time_major=False,
                                  impute_finished=True,
                                  maximum_iterations=max_summary_length)

    return training_logits, inference_logits

In [0]:
def seq2seq_model(input_data, target_data, keep_prob, text_length, summary_length, max_summary_length, 
                  vocab_size, rnn_size, num_layers, vocab_to_int, batch_size):
    #Use the previous functions to create the training and inference logits
    
    # Use Numberbatch's embeddings and the newly created ones as our embeddings
    embeddings = word_embedding_matrix
    
    enc_embed_input = tf.nn.embedding_lookup(embeddings, input_data)
    enc_output, enc_state = encoding_layer(rnn_size, text_length, num_layers, enc_embed_input, keep_prob)
    
    dec_input = process_encoding_input(target_data, vocab_to_int, batch_size)
    dec_embed_input = tf.nn.embedding_lookup(embeddings, dec_input)
    
    training_logits, inference_logits  = decoding_layer(dec_embed_input, 
                                                        embeddings,
                                                        enc_output,
                                                        enc_state, 
                                                        vocab_size, 
                                                        text_length, 
                                                        summary_length, 
                                                        max_summary_length,
                                                        rnn_size, 
                                                        vocab_to_int, 
                                                        keep_prob, 
                                                        batch_size,
                                                        num_layers)
    
    return training_logits, inference_logits

In [0]:

def pad_sentence_batch(sentence_batch):
    #Pad sentences with <PAD> so that each sentence of a batch has the same length
    max_sentence = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [vocab_to_int['<PAD>']] * (max_sentence - len(sentence)) for sentence in sentence_batch]

In [0]:
def get_batches(summaries, texts, batch_size):
    #Batch summaries, texts, and the lengths of their sentences together
    for batch_i in range(0, len(texts)//batch_size):
        start_i = batch_i * batch_size
        summaries_batch = summaries[start_i:start_i + batch_size]
        texts_batch = texts[start_i:start_i + batch_size]
        pad_summaries_batch = np.array(pad_sentence_batch(summaries_batch))
        pad_texts_batch = np.array(pad_sentence_batch(texts_batch))
        
        # Need the lengths for the _lengths parameters
        pad_summaries_lengths = []
        for summary in pad_summaries_batch:
            pad_summaries_lengths.append(len(summary))
        
        pad_texts_lengths = []
        for text in pad_texts_batch:
            pad_texts_lengths.append(len(text))
        
        yield pad_summaries_batch, pad_texts_batch, pad_summaries_lengths, pad_texts_lengths

In [0]:
# Hyperparameters
epochs = 100
batch_size = 32
rnn_size = 256
num_layers = 2
learning_rate = 0.005
keep_probability = 0.75

In [0]:
# Build the graph
train_graph = tf.Graph()
# Set the graph to default to ensure that it is ready for training
with train_graph.as_default():
    
    # Load the model inputs    
    input_data, targets, lr, keep_prob, summary_length, max_summary_length, text_length = model_inputs()

    # Create the training and inference logits
    training_logits, inference_logits = seq2seq_model(tf.reverse(input_data, [-1]),
                                                      targets, 
                                                      keep_prob,   
                                                      text_length,
                                                      summary_length,
                                                      max_summary_length,
                                                      len(vocab_to_int)+1,
                                                      rnn_size, 
                                                      num_layers, 
                                                      vocab_to_int,
                                                      batch_size)
    
    # Create tensors for the training logits and inference logits
    training_logits = tf.identity(training_logits.rnn_output, 'logits')
    inference_logits = tf.identity(inference_logits.sample_id, name='predictions')
    
    # Create the weights for sequence_loss
    masks = tf.sequence_mask(summary_length, max_summary_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        # Loss function
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizer
        optimizer = tf.train.AdamOptimizer(learning_rate)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)
print("Graph is built.")

Graph is built.


## Training Model

In [0]:
# Subset the data for training
start = 00000
end = start + 9600
sorted_summaries_short = sorted_summaries
sorted_texts_short = sorted_texts
#print(sorted_summaries)
print("The shortest text length:", len(sorted_texts_short[0]))
print("The longest text length:",len(sorted_texts_short[-1]))

The shortest text length: 3
The longest text length: 920


In [0]:
# Train the Model
learning_rate_decay = 0.4
min_learning_rate = 0.0005
display_step = 20 # Check training loss after every 20 batches
stop_early = 0 
stop = 6000000000000000 #3 # If the update loss does not decrease in 3 consecutive update checks, stop training
per_epoch = 3 # Make 3 update checks per epoch
update_check = (len(sorted_texts_short)//batch_size//per_epoch)-1

update_loss = 0 
batch_loss = 0
summary_update_loss = [] # Record the update losses for saving improvements in the model

  
tf.reset_default_graph()
checkpoint = "drive/My Drive/Colab Notebooks/best_model.ckpt"  #300k sentence
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
    
    # If we want to continue training a previous session
    # loader = tf.train.import_meta_graph(checkpoint + '.meta')
    # loader.restore(sess, checkpoint)
    #sess.run(tf.local_variables_initializer())

    for epoch_i in range(1, epochs+1):
        update_loss = 0
        batch_loss = 0
        for batch_i, (summaries_batch, texts_batch, summaries_lengths, texts_lengths) in enumerate(
                get_batches(sorted_summaries_short, sorted_texts_short, batch_size)):
            start_time = time.time()
            _, loss = sess.run(
                [train_op, cost],
                {input_data: texts_batch,
                 targets: summaries_batch,
                 lr: learning_rate,
                 summary_length: summaries_lengths,
                 text_length: texts_lengths,
                 keep_prob: keep_probability})

            batch_loss += loss
            update_loss += loss
            end_time = time.time()
            batch_time = end_time - start_time

            if batch_i % display_step == 0 and batch_i > 0:
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}, Seconds: {:>4.2f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(sorted_texts_short) // batch_size, 
                              batch_loss / display_step, 
                              batch_time*display_step))
                batch_loss = 0
                
                #saver = tf.train.Saver() 
                #saver.save(sess, checkpoint)
                
            if batch_i % update_check == 0 and batch_i > 0:
                print("Average loss for this update:", round(update_loss/update_check,3))
                summary_update_loss.append(update_loss)
                
              
                  
                # If the update loss is at a new minimum, save the model
                if update_loss <= min(summary_update_loss):
                    print('New Record!') 
                    stop_early = 0
                    saver = tf.train.Saver() 
                    saver.save(sess, checkpoint)

                else:
                    print("No Improvement.")
                    #stop_early += 1
                    #if stop_early == stop:
                    #    break
                update_loss = 0
            
                    
        # Reduce learning rate, but not below its minimum value
        learning_rate *= learning_rate_decay
        if learning_rate < min_learning_rate:
            learning_rate = min_learning_rate
        
        #if stop_early == stop:
        #    print("Stopping Training.")
            #break

Epoch   1/100 Batch   20/88 - Loss:  3.484, Seconds: 6.22
Average loss for this update: 3.289
New Record!
Epoch   1/100 Batch   40/88 - Loss:  2.639, Seconds: 10.43
Average loss for this update: 2.585
New Record!
Epoch   1/100 Batch   60/88 - Loss:  2.683, Seconds: 16.57
Epoch   1/100 Batch   80/88 - Loss:  2.319, Seconds: 25.75
Average loss for this update: 2.34
New Record!
Epoch   2/100 Batch   20/88 - Loss:  2.502, Seconds: 5.73
Average loss for this update: 2.536
No Improvement.
Epoch   2/100 Batch   40/88 - Loss:  2.495, Seconds: 9.49
Average loss for this update: 2.454
No Improvement.
Epoch   2/100 Batch   60/88 - Loss:  2.539, Seconds: 16.81
Epoch   2/100 Batch   80/88 - Loss:  2.182, Seconds: 25.60
Average loss for this update: 2.201
New Record!
Epoch   3/100 Batch   20/88 - Loss:  2.344, Seconds: 5.71
Average loss for this update: 2.373
No Improvement.
Epoch   3/100 Batch   40/88 - Loss:  2.318, Seconds: 9.10
Average loss for this update: 2.288
No Improvement.
Epoch   3/100 Ba

In [0]:
checkpoint = "drive/My Drive/Colab Notebooks/best_model.ckpt" 

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)
    names = []
    [names.append(n.name) for n in loaded_graph.as_graph_def().node]
names

INFO:tensorflow:Restoring parameters from drive/My Drive/Colab Notebooks/best_model.ckpt


['input',
 'targets',
 'learning_rate',
 'keep_prob',
 'summary_length',
 'Const',
 'max_dec_len',
 'text_length',
 'ReverseV2/axis',
 'ReverseV2',
 'embedding_lookup/params_0',
 'embedding_lookup/axis',
 'embedding_lookup',
 'embedding_lookup/Identity',
 'encoder_0/DropoutWrapperInit/Const',
 'encoder_0/DropoutWrapperInit/Const_1',
 'encoder_0/DropoutWrapperInit_1/Const',
 'encoder_0/DropoutWrapperInit_1/Const_1',
 'encoder_0/bidirectional_rnn/fw/fw/Rank',
 'encoder_0/bidirectional_rnn/fw/fw/range/start',
 'encoder_0/bidirectional_rnn/fw/fw/range/delta',
 'encoder_0/bidirectional_rnn/fw/fw/range',
 'encoder_0/bidirectional_rnn/fw/fw/concat/values_0',
 'encoder_0/bidirectional_rnn/fw/fw/concat/axis',
 'encoder_0/bidirectional_rnn/fw/fw/concat',
 'encoder_0/bidirectional_rnn/fw/fw/transpose',
 'encoder_0/bidirectional_rnn/fw/fw/sequence_length',
 'encoder_0/bidirectional_rnn/fw/fw/Shape',
 'encoder_0/bidirectional_rnn/fw/fw/strided_slice/stack',
 'encoder_0/bidirectional_rnn/fw/fw/strid

## Making Our Own Summaries

In [0]:
def text_to_seq(text):
    '''Prepare the text for the model'''
    
    text = clean_text(text)
    return [vocab_to_int.get(word, vocab_to_int['<UNK>']) for word in text.split()]

In [0]:
text=''' 9GAG Limited is a media company.  Our mission is to make the world happier.

The 9GAG services that link to this Privacy Policy provide fun ways for you to create, share, and discover the best content in the world.

When you use these services, you’ll share some information with us. We want to explain clearly what information we collect, why we collect it, how we use it, whom we share it with, and the controls we give you to access, update, and delete your information.

That’s why we’ve written this Privacy Policy. We tried to write it in simple language with minimal legalese so it’s easy for you to understand. If you have any questions about anything in our Privacy Policy, please contact us. 

Information We Collect
There are three basic categories of information we collect:

Information you choose to give us

Information we get when you use our services

Information we get from third parties

Here’s how we collect your information.

How We Collect Information
Information You Choose To Give Us
When you use and interact with our services, we collect the information you choose to share with us. For example, most of our services require you to set up a basic 9GAG account, so we need to collect a few important details about you, such as: a unique username you choose, a password, an e-mail address, and your date of birth. To make it easier for others to find you, we may also ask you to provide us additional information that will be publicly visible on our services, for example a profile picture, a name, or other identifiable information. Our other services, such as commerce products, may also require you to provide us with a debit card or credit card number and its associated account information.

You’ll also provide any information you send through our services, such as post and comment uploads. Keep in mind that the posts and comments you upload can be seen publicly, and can always be saved and copied outside of our services. Please don’t share content you wouldn’t want someone to save or share.

Lastly, when you contact us or communicate with us in any way, we’ll collect any information you give us.

Information We Get When You Use Our Services
When you use our services, we collect information about which services you use and how you use them. For example, we may know what section or post you are reading, when you’re reading it, and how you’re reading it (desktop, mobile browser, or mobile app). Here’s a more detailed explanation of the information we collect when you use our services:

Usage Information. We collect information about your activity through our services. For example:

How you interact with our services, such as what sections you look at, what you search for, what posts and comments you upload, or what posts and comments you upvote or downvote.

Content Information. We collect content you create on our services. For example:

The images, videos, and text you upload

We also collect information about the content you create. For example:

When you created the content

Where you created the content

How you created the content such as device information

We also collect information about how other users interact with your content. For example:

Post upvotes and downvotes

Comments and comment upvotes and downvotes

Device Information. We collect information from and about the devices you use. For example:

Information about your hardware and software, such as hardware model, operating system version, device memory, advertising identifiers, unique application identifiers, unique device identifiers, and language.

Information about your wireless and mobile network connections, such as your type of network connection.

Camera and Photos. Some of our services require us to collect images and other information from your devices’s camera and photos. For example, you won’t be able to upload images or videos from your camera roll unless we can access your camera or media library.

Location Information.  When you use our services, we may collect information about your location. With your permission, we may also collect information about your precise location using methods that include GPS, wireless networks, and Wi-Fi access points.

Information Collected by Cookies and Other Technologies. We may use cookies and other technologies such as web beacons, web storage, and unique advertising identifiers to collect information about your activity, browser, and device. We may use these technologies to collect information when you interact with services we offer through one of our partners, such as advertising and commerce features. Most web browsers are set to accept cookies by default. It is up to you to move or reject browser cookies through the settings on your browser or device. Removing or rejecting cookies may affect our service function and availability.

Log Information.  We also collect log information when you use our services. For example:

Details about how you’ve used our services.

Device information, such as web browser type and language.

Access times.

Pages and content viewed.

IP address.

Identifiers associated with cookies or other technologies that may uniquely identify your device or browser.

Pages you visited before or after visiting our website or using our service.

Information We Collect From Third Parties
We may collect information from our affiliates and third parties. For example:

If you use your Google or Facebook account to sign up or sign in to our services, we may receive information so you can create or access your 9GAG account.

If you interact with one of our advertisers, they may share information with us to help target or measure ad performance.

How We Use Information
We use your information to provide you with the best products and services we can build and improve. Here’s what we use this information for:

To develop, maintain, improve, deliver, and protect our products and services.

Monitor and analyze trends and usages of our products and services.

Send you communications, including by e-mail. Examples include using e-mail to respond to support inquiries, or to share information about our products, services, and promotional offers that we think may interest you.

Personalize our services by showing you better and more relevant content.

Provide ad targeting and measurement, including the use of your precise location information to show relevant ads.

Enhance and enforce the safety and security of our products and services.

Verify your identity and prevent fraud, or other authorized and illegal activity.

Use information we’ve collected cookies and other data to enhance our services and products.

Enforce our Terms of Service, Community Rules, and other usage policies.

How We Share Information
We may share information about you in the following ways:

With other 9GAG Users. We may share information with other 9GAG users. Examples include:

Information about you, such as your name, username, and profile pictures.

Information you upload, such as image or video posts.

Information about how you interacted with our services, such as the content you saved, the comments you post, and the content you upvote and downvote.

With all 9GAG Users, our business partners, and the general public. We may share information with all 9GAG users, our business partners, and the general public. Examples include:

Public information like your name, username, and profile pictures.

Content submissions and saved content set to be viewable by everyone.

Content from our services that you share on other services, such as Facebook or Pinterest.

With our affiliates. We may share information with the entities within the 9GAG Limited family of companies.

With third parties. We may share your information with third parties. Examples include:

With service providers. We may share information about you with service providers who perform services on our behalf. 

With business partners. We may share information about you with business partners that provide services and functionality. 

With third parties for legal reasons.  We may share information about you if we reasonably believe that disclosing the information is necessary. Examples include:

To comply with any valid legal process, governmental request, or applicable law, rule or regulation.

To investigate, enforce, or remedy potential Terms of Service violations

To protect the safety, rights, and property of 9GAG, our users, or others.

To detect and resolve any fraud or security matters or concerns.

With third parties as part of a merger or acquisition. If 9GAG Limited gets involved with a merger, asset sale, financing, liquidation, bankruptcy, or acquisition of all or some portion of our business to another company, we may share your information with the company before, during, and after the transaction.

We may also share aggregated, non-personally identifiable, or de-identified information with third parties, such as advertisers.

Third-Party Content and Integrations
Our services may contain third-party links and search results, include third-party integrations, or offer a co-branded or third-party-branded-service. Through these links, third-party integrations, and co-branded or third-party-branded services, you may be providing information, including personal information, directly to the third party, to us, or to both. You acknowledge and agree that we are not responsible for how those third parties collect or use your information. We encourage you to review the privacy policies of every third-party service you visit or use, including those third parties you interact with through our services.

Analytics and Advertising Services
Provided by Others
We may let other companies use cookies, web beacons, and similar technologies on our services. These companies may collect information about how you use our services over time, and combine it with similar information from other services and companies. This information may be used to analyze and track data, determine the popularity of certain content, and under your online activity, among other things.

Some companies, including our affiliates, may use information collected on our services to measure the performance of ads and deliver more relevant ads, including on third-party websites and apps.

Provided by Us
We may collect information about your activity on third-party services that use cookies and other technologies provided by us. We use this information to improve your user experience with our services and to improve your advertising services, including measuring the performance of ads and showing you relevant ads. Visit our Advertising Preferences page to learn more about 9GAG advertising and how you can control the information used to select the ads you see.

How Long We Keep Your Information
9GAG stores your information, the posts you upload, and the comments you upload for a period of time. For example:

We store your basic account information, including your name, username, and email address until you ask us to delete them

We store location information for different lengths of time based on how precise it is and which services you use.

We are constantly collecting and updating information about the things you like or dislike, so we can provide you with more relevant data, more relevant ads, and a better user experience.

If you decide to stop using 9GAG, you can ask us to delete your account. 

There may be legal requirements to store your data and we may need to suspend those deletion practices if we receive valid legal process or request asking us to preserve that content, or if we receive reports of abuse or other Terms of Service Violations. We may also need to retain certain information in backup for a limited period of time, or as required by law.

Control over Your Information
At 9GAG, your privacy is important to us. We want you to have control over the information you give us, the information we use, and the information we keep. These are the tools we provide you to control your information:

Download My Data. You can request your data to be sent to you.

Removing Permissions. If you agree to let us use your information, you can change your mind at any time by changing the settings on your device. If you remove permission, certain services may be impacted and may lose functionality.

Deletion. We hope you remain a 9GAG user for life, but if you want to delete your account, please contact us. 

Users in the European Union and European Economic Area
If you’re a user in the European Union or European Economic Area, please note that 9GAG Limited is the data controller of your information. Here’s additional information we would like you to know:

Bases For Using Your Information
Your country only allows us to use your personal information when certain conditions apply or are met. These conditions are called legal bases. At 9GAG, we rely on one of the following bases:

Contract. We may use your information because you’ve entered into an agreement with us. For example, when you buy our merchandise and have accepted our Terms and Services, we may use some of your information to collect payment and to make sure you receive the right product, at the right time, to the right address.

Legitimate Interest. We may use your information because we - or a third party - has a legitimate interest. For example, we need to use your information to provide, maintain and improve our services. This includes protecting your account, uploading your posts and comments, providing customer support, and showing the best content you’ll like. Most of these services are free. To keep the lights on at 9GAG HQ, we may use this information to show you relevant ads. We believe legitimate interest does not outweigh your right to privacy, so we only rely on legitimate interest when we think the way we are using your data does not significantly impact your privacy.

Consent. We may ask for your consent to use your information for specific purposes. If we do, we’ll make sure you can revoke your consent in our services or through your device permissions at any time. Even if we’re not relying on consent to use your information, we may ask you for permission to access data such as your location.

Legal obligation. We may use your personal information when we are required to comply with the law, such as when we respond to valid legal process or need to take action to protect our users or employees.

Your Right To Object
You have the right to object to our use of your information. For some data, we’ve provided you with the ability to delete your data if you don’t want us to store or process it anymore. For other data, you can stop our use of your information by disabling the feature altogether. You can do these things on our site or in our apps. If you find other types of information you don’t agree with us using, please contact us.

International Transfers
We may collect your personal information from, transfer it to, and process it in the United States and other countries outside of where you live. When we share information of EU and EEA users outside the EU or EEA, we make sure an adequate transfer mechanism is in place. For internal transfers, we rely on the EU-U.S. and Swiss-U.S. Privacy Shield.

Your Right To Complain
If you’re based in the EU or EEA and have a complaint, you can file your complaint with the supervisory authority in your Member States. For example, if you’re based in the UK, you can file a complaint with the Information Commissioner’s Office. '''

In [0]:
paragraphs = text.split('\n')
rm_paras = list()
for para in paragraphs:
  if len(para)<200:
    continue
  rm_paras.append(para)

In [0]:
checkpoint = "drive/My Drive/Colab Notebooks/best_model.ckpt"
print ("\n######################################## ORIGINAL POLICY ##########################################\n")
for para in rm_paras:
  print (para)
print ("\n###################################################################################################\n")
print ("\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ SUMMARIZED POLICY @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n")
for para in rm_paras:
  input_sentence = para
  text = text_to_seq(input_sentence)

  loaded_graph = tf.Graph()
  with tf.Session(graph=loaded_graph) as sess:
      # Load saved model
      loader = tf.train.import_meta_graph(checkpoint + '.meta')
      loader.restore(sess, checkpoint)

      input_data = loaded_graph.get_tensor_by_name('input:0')
      logits = loaded_graph.get_tensor_by_name('predictions:0')
      text_length = loaded_graph.get_tensor_by_name('text_length:0')
      summary_length = loaded_graph.get_tensor_by_name('summary_length:0')
      keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')
      
      #Multiply by batch_size to match the model's input parameters
      answer_logits = sess.run(logits, {input_data: [text]*batch_size, 
                                        summary_length: [np.random.randint(5,8)], 
                                        text_length: [len(text)]*batch_size,
                                        keep_prob: 1.0})[0] 

  # Remove the padding from the tweet
  pad = vocab_to_int["<PAD>"] 
  print('\nSummary')
  #print('  Word Ids:       {}'.format([i for i in answer_logits if i != pad]))
  print('  Response Words: {}'.format(" ".join([int_to_vocab[i] for i in answer_logits if i != pad])))
  print ()
print ("\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n")


######################################## ORIGINAL POLICY ##########################################

When you use these services, you’ll share some information with us. We want to explain clearly what information we collect, why we collect it, how we use it, whom we share it with, and the controls we give you to access, update, and delete your information.
That’s why we’ve written this Privacy Policy. We tried to write it in simple language with minimal legalese so it’s easy for you to understand. If you have any questions about anything in our Privacy Policy, please contact us. 
When you use and interact with our services, we collect the information you choose to share with us. For example, most of our services require you to set up a basic 9GAG account, so we need to collect a few important details about you, such as: a unique username you choose, a password, an e-mail address, and your date of birth. To make it easier for others to find you, we may also ask you to provide us additi

In [0]:
# Create summaries

input_sentence = "When you use and interact with our services, we collect the information you choose to share with us. For example, most of our services require you to set up a basic 9GAG account, so we need to collect a few important details about you, such as: a unique username you choose, a password, an e-mail address, and your date of birth. To make it easier for others to find you, we may also ask you to provide us additional information that will be publicly visible on our services, for example a profile picture, a name, or other identifiable information. Our other services, such as commerce products, may also require you to provide us with a debit card or credit card number and its associated account information.You’ll also provide any information you send through our services, such as post and comment uploads. Keep in mind that the posts and comments you upload can be seen publicly, and can always be saved and copied outside of our services. Please don’t share content you wouldn’t want someone to save or share.Lastly, when you contact us or communicate with us in any way, we’ll collect any information you give us."

text = text_to_seq(input_sentence)

checkpoint = "drive/My Drive/Colab Notebooks/best_model.ckpt"

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)

    input_data = loaded_graph.get_tensor_by_name('input:0')
    logits = loaded_graph.get_tensor_by_name('predictions:0')
    text_length = loaded_graph.get_tensor_by_name('text_length:0')
    summary_length = loaded_graph.get_tensor_by_name('summary_length:0')
    keep_prob = loaded_graph.get_tensor_by_name('keep_prob:0')
    
    #Multiply by batch_size to match the model's input parameters
    answer_logits = sess.run(logits, {input_data: [text]*batch_size, 
                                      summary_length: [np.random.randint(5,8)], 
                                      text_length: [len(text)]*batch_size,
                                      keep_prob: 1.0})[0] 

# Remove the padding from the tweet
pad = vocab_to_int["<PAD>"] 

print('Original Text:', input_sentence)

print('\nText')
print('  Word Ids:    {}'.format([i for i in text]))
print('  Input Words: {}'.format(" ".join([int_to_vocab[i] for i in text])))

print('\nSummary')
print('  Word Ids:       {}'.format([i for i in answer_logits if i != pad]))
print('  Response Words: {}'.format(" ".join([int_to_vocab[i] for i in answer_logits if i != pad])))

INFO:tensorflow:Restoring parameters from drive/My Drive/Colab Notebooks/best_model.ckpt
Original Text: When you use and interact with our services, we collect the information you choose to share with us. For example, most of our services require you to set up a basic 9GAG account, so we need to collect a few important details about you, such as: a unique username you choose, a password, an e-mail address, and your date of birth. To make it easier for others to find you, we may also ask you to provide us additional information that will be publicly visible on our services, for example a profile picture, a name, or other identifiable information. Our other services, such as commerce products, may also require you to provide us with a debit card or credit card number and its associated account information.You’ll also provide any information you send through our services, such as post and comment uploads. Keep in mind that the posts and comments you upload can be seen publicly, and can al

PARAGRAPH: "Swift Communications collects the personal information you provide when you use this Site to register for an account, subscribe to one of our newspapers, or submit news items or use our social networking tools such as when you: comment on news stories, create blogs, submit photographs and videos and send public or private messages to your friends on the Site. This includes your contact information, such as your name, e-mail address, postal address and telephone number or your billing information, such as your credit or debit card information and the content of the communications. We process your payment card payments using a third party and only see the last 4 digits of the number and the maker of the card. If you choose to participate in one of our surveys or sweepstakes  we will collect the information you provide in your answers. Information Sent to Us by Your Web Browser. We collect information that is sent to us automatically by your Web browser. This information typically includes your IP address, the identity of your Internet service provider, the name and version of your operating system, the name and version of your browser, the date and time of your visit, and the pages you visit. Please check your browser if you want to learn what information your browser sends or how to change your settings. Generally, we do not link the information provided by your browser to information that identifies you by name. More About IP Addresses. An IP address is a unique number that is automatically assigned to your device when you connect to the Internet. It is used to identify your devices location in cyberspace, so that the information you request can be delivered to you. If you use a dial-up connection or a connection that assigns dynamic IP addresses, your device will be assigned a new IP address each time you connect to the Internet. If, however, your device is permanently connected to the Internet using a static IP address, the IP address assigned to your device will generally be the same each time you use your device. IP addresses do not include your name, email address or other information that identifies you personally, but in some cases, they can be used to identify you. For example, if you have registered with this Site, we may link your IP address to account information that identifies you personally. We may also provide your IP address to our third-party service provider who can use it to identify your state and zip code. This gives us valuable demographic information about the individuals who use this Site. Our service providers are bound by contract to use the IP addresses we provide only for this purpose. In addition, if, for example, we suspect fraud or a threat to the security of this Site, we may share our server logs-which contain users IP addresses-with the appropriate investigative authorities, who could use that information to trace and identify individuals."


Heading: how do we use the information

Summary:  service provider, the name and version of your operating system, the name and version of your browser, the date and time of your visit, and the pages you visit. Please check your browser if you want to learn what information your browser sends or how to change your settings. Generally, we do not link the information provided by your browser to information that identifies you by name. 
