# Introduction: Writing Patents Using a Recurrent Neural Network

The purpose of this notebook is to develop a recurrent neural network which can be used to write patent abstracts. Although this is mostly meant as a simple example, the idea of recurrent neural networks is powerful and can be usde for real purposes such as generating text similar to a corpus, machine translation, and supervised learning tasks.

In [1]:
import pandas as pd 
import numpy as np

import os

BATCH_SIZE = 512
CHUNK_SIZE = 180

In [2]:
import json
from itertools import chain

data = []

for file in os.listdir('../data/patents_parsed/'):
    with open(f'../data/patents_parsed/{file}', 'rt') as fin:
        data.append([json.loads(l) for l in fin])
        
        
data = list(chain(*data))
data = [r for r in data if r[0] is not None]
data = [r for r in data if len(r[0]) >= 200]
len(data)

6382

In [3]:
lens = [len(x[0]) for x in data]
min(lens)

201

In [4]:
data[0][1], data[0][0]

('Artificial intelligence system for item analysis for rework shop orders ',
 'A computer implemented method facilitates the capability for shop re-work orders to be effectively scheduled, knowing the time and location of item availability that is needed to correct the problem found in the re-work shop orders. The system automatically identifies alternate components or items that can be used in the shop orders and provides realistic shipping dates so that the re-work shop orders can be scheduled. If components or items are not available, the system provides feedback to the material planning system to re-plan items using traditional material planning systems such as the MRP (material requirement planning) systems and provide projected shipping dates so that re-work orders can be scheduled.')

In [5]:
abstracts = [d[0] for d in data]
titles = [d[1] for d in data]

chars = []
for abstract in abstracts:
    for ch in abstract:
        chars.append(ch)
        
chars = set(chars)
len(chars)

147

In [6]:
char_to_idx = {char: idx for idx, char in enumerate(chars)}
idx_to_char = {idx: char for char, idx in char_to_idx.items()}
char_to_idx['a'], idx_to_char[47]

(136, 'V')

In [7]:
from keras.models import Input, Model
from keras.layers import Dense, Dropout
from keras.layers import LSTM
from keras.layers.wrappers import TimeDistributed
from keras.optimizers import RMSprop
from keras.callbacks import EarlyStopping, ModelCheckpoint

Using TensorFlow backend.


In [8]:
def char_rnn_model(num_chars, num_layers, num_nodes = 512, dropout = 0.1):
    # Take in a sequence of one-hot encoded characters
    input_layer = Input(shape = (None, num_chars), name = 'input')
    prev = input_layer
    
    # Add an LSTM cell for each layer
    for i in range(num_layers):
        lstm = LSTM(num_nodes, return_sequences = True, name = f'lstm_layer_{i}')(prev)
        if dropout:
            prev = Dropout(dropout)(lstm)
        else:
            prev = lstm
            
    # For each time step find the most likely character - one time step considers up to current character
    # Time Distributed applies same layer to all time steps (first dimension)
    dense = TimeDistributed(Dense(num_chars, name = 'dense',
                             activation = 'softmax'))(prev)
    model = Model(inputs = [input_layer], outputs = [dense])
    
    # Compile with categorical loss
    model.compile(loss = 'categorical_crossentropy', 
                  optimizer = RMSprop(lr=0.01), 
                  metrics = ['accuracy'])
    
    return model

In [9]:
model = char_rnn_model(len(chars), 2)
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           (None, None, 147)         0         
_________________________________________________________________
lstm_layer_0 (LSTM)          (None, None, 512)         1351680   
_________________________________________________________________
dropout_1 (Dropout)          (None, None, 512)         0         
_________________________________________________________________
lstm_layer_1 (LSTM)          (None, None, 512)         2099200   
_________________________________________________________________
dropout_2 (Dropout)          (None, None, 512)         0         
_________________________________________________________________
time_distributed_1 (TimeDist (None, None, 147)         75411     
Total params: 3,526,291
Trainable params: 3,526,291
Non-trainable params: 0
_________________________________________________________________


In [10]:
import random
random.sample(abstracts, 1)

['Avatars, methods, apparatuses, computer program products, devices and systems are described that carry out identifying at least one instance of media content as a prospective cohort-linked attribute; presenting to at least one member of a population the at least one instance of media content; measuring at least one physiologic activity of the at least one member of the population, the at least one physiologic activity proximate to the at least one instance of media content; associating the at least one physiologic activity with at least one mental state; and specifying at least one population cohort based on the at least one mental state.']

In [11]:
import random
def data_generator(text, char_to_idx, batch_size, chunk_size):
    X = np.zeros((batch_size, chunk_size, len(char_to_idx)))
    y = np.zeros((batch_size, chunk_size, len(char_to_idx)))
    
    chunk_size_original = chunk_size
    
    # Generator yields samples
    while True:
        # Batch size is number of samples to use
        for row in range(batch_size):
            
            # Choose a random abstract
            sample = random.sample(text, 1)[0]
            
            # Choose a random starting index
            idx = random.randrange(len(sample) - chunk_size - 1)

            # Empty array to hold a chunk, chunk size is number of characters to extract
            chunk = np.zeros((chunk_size + 1, len(char_to_idx)))
            
            # Need to find one more than chunk size to make labels
            for i in range(chunk_size + 1):
                chunk[i, char_to_idx[sample[idx + i]]] = 1
                
            # Features are all characters except for last
            X[row, :, :] = chunk[:chunk_size]
            # Labels are all characters except for first
            y[row, :, :] = chunk[1:]
            
        yield X, y

In [12]:
Xs, ys = next(data_generator(abstracts, char_to_idx, 512, chunk_size = 80))

In [13]:
Xs.shape

(512, 80, 147)

In [14]:
sample = Xs[1]
sample.shape

(80, 147)

In [15]:
x = []

for row in sample:
    x.append(idx_to_char[np.argmax(row)])
''.join(x)

'private system or a residential system. A second step is executed upon a termina'

In [16]:
y = []

for row in sample:
    y.append(idx_to_char[np.argmax(row)])
''.join(y)

'private system or a residential system. A second step is executed upon a termina'

The label is a shifted forward version of the features. At each feature, we are teaching the network to predict the next character.

In [17]:
callbacks = [EarlyStopping(monitor = 'loss', min_delta = 0.03, patience = 5),
             ModelCheckpoint(filepath = '../models/first_rnn.h5', save_best_only=True)]

In [18]:
from itertools import chain
all_text = list(chain(*abstracts))
len(all_text)

5332623

In [19]:
train_gen = data_generator(abstracts, char_to_idx, 256, chunk_size=CHUNK_SIZE)

h = model.fit_generator(generator=train_gen, epochs = 40, callbacks = callbacks,
                        steps_per_epoch = 2 * len(all_text) / (BATCH_SIZE * CHUNK_SIZE),
                        verbose = 2)

Epoch 1/40
 - 154s - loss: 3.3441 - acc: 0.1105
Epoch 2/40




 - 151s - loss: 3.0639 - acc: 0.1302
Epoch 3/40
 - 151s - loss: 3.0454 - acc: 0.1379
Epoch 4/40
 - 151s - loss: 2.8410 - acc: 0.2041
Epoch 5/40
 - 151s - loss: 1.7560 - acc: 0.5275
Epoch 6/40
 - 151s - loss: 1.5383 - acc: 0.6020
Epoch 7/40
 - 151s - loss: 1.4581 - acc: 0.6260
Epoch 8/40
 - 151s - loss: 1.4636 - acc: 0.6303
Epoch 9/40
 - 151s - loss: 1.3711 - acc: 0.6517
Epoch 10/40
 - 151s - loss: 1.3461 - acc: 0.6582
Epoch 11/40
 - 151s - loss: 1.4086 - acc: 0.6494
Epoch 12/40
 - 151s - loss: 1.3879 - acc: 0.6543
Epoch 13/40
 - 151s - loss: 1.4522 - acc: 0.6433
Epoch 14/40
 - 151s - loss: 1.4008 - acc: 0.6540


In [25]:
model.save('../models/first_rnn.h5')

In [20]:
random.randint(0, 150)

126

In [21]:
random.randrange(150)

52

In [22]:
import sys
def generate_output(model, text, start_index = 2, diversity = None, amount = 400):
    
    if start_index is None:
        start_index = random.randint(0, CHUNK_SIZE)
        
    sample = random.sample(text, 1)[0]
    generated = sample[start_index: start_index + CHUNK_SIZE]
    yield generated + '#'
    
    for i in range(amount):
        x = np.zeros((1, len(generated), len(chars)))
        for t, char in enumerate(generated):
            x[0, t, char_to_idx[char]] = 1
            
        preds = model.predict(x, verbose = 0)[0]
    
        if diversity is None:
            next_index = np.argmax(preds[len(generated) - 1])
            
        else:
            preds = np.array(preds[len(generated) - 1]).astype(np.float64)
            preds = np.log(preds) / diversity
            exp_preds = np.exp(preds)
            preds = exp_preds / np.sum(exp_preds)
            probas = np.random.multinomial(1, preds, 1)
            next_index = np.argmax(preds)
            
        next_char = idx_to_char[next_index]
        yield next_char
        
        generated += next_char
    return generated



In [23]:
for ch in generate_output(model, abstracts, diversity = 100, amount = 200):
    sys.stdout.write(ch)
    

method and apparatus for speeding and enhancing the "learning" function of a computer configured as a multilayered, feed format artificial neural network using logistic functions a#nd a set of patterns of the set of training data and the set of training data is determined based on the training data sets and the set of training data and the second set of network parameters are de

In [24]:
for ch in generate_output(model, abstracts):
    sys.stdout.write(ch)

development platform for developing a skill for a persistent companion device (PCD) includes an asset development library having an application programming interface (API) configur#ed to provide a communication component and a computer system and a processing system for controlling a processing device for controlling a processing device for controlling a processing device and a computer system in a computer system in a computer system in a computer system in a computer system and a processing system for computing a processing system including a plurality of sensors and a sec

# Conclusions

In this notebook, we got a small glimpse at the abilities of recurrent neural networks. Using just a few thousands patents and a basic neural network, we were able to teach a machine to produce reasonable outputs of patent abstracts.