# Introduction: Writing Patents Using a Recurrent Neural Network

The purpose of this notebook is to develop a recurrent neural network which can be used to write patent abstracts. Although this is mostly meant as a simple example, the idea of recurrent neural networks is powerful and can be usde for real purposes such as generating text similar to a corpus, machine translation, and supervised learning tasks.

In [176]:
import pandas as pd 
import numpy as np

import os

BATCH_SIZE = 512
CHUNK_SIZE = 180

In [177]:
import json
from itertools import chain

data = []

for file in os.listdir('../data/patents_parsed/'):
    with open(f'../data/patents_parsed/{file}', 'rt') as fin:
        data.append([json.loads(l) for l in fin])
        
        
data = list(chain(*data))
data = [r for r in data if r[0] is not None]
data = [r for r in data if len(r[0]) >= 200]
len(data)

6382

In [179]:
lens = [len(x[0]) for x in data]
min(lens)

201

In [180]:
data[0][1], data[0][0]

('Artificial intelligence system for item analysis for rework shop orders ',
 'A computer implemented method facilitates the capability for shop re-work orders to be effectively scheduled, knowing the time and location of item availability that is needed to correct the problem found in the re-work shop orders. The system automatically identifies alternate components or items that can be used in the shop orders and provides realistic shipping dates so that the re-work shop orders can be scheduled. If components or items are not available, the system provides feedback to the material planning system to re-plan items using traditional material planning systems such as the MRP (material requirement planning) systems and provide projected shipping dates so that re-work orders can be scheduled.')

In [181]:
abstracts = [d[0] for d in data]
titles = [d[1] for d in data]

chars = []
for abstract in abstracts:
    for ch in abstract:
        chars.append(ch)
        
chars = set(chars)
len(chars)

147

In [182]:
char_to_idx = {char: idx for idx, char in enumerate(chars)}
idx_to_char = {idx: char for char, idx in char_to_idx.items()}
char_to_idx['a'], idx_to_char[47]

(72, 'A')

In [183]:
from keras.models import Input, Model
from keras.layers import Dense, Dropout
from keras.layers import LSTM
from keras.layers.wrappers import TimeDistributed
from keras.optimizers import RMSprop
from keras.callbacks import EarlyStopping, ModelCheckpoint

In [184]:
def char_rnn_model(num_chars, num_layers, num_nodes = 512, dropout = 0.1):
    # Take in a sequence of one-hot encoded characters
    input_layer = Input(shape = (None, num_chars), name = 'input')
    prev = input_layer
    
    # Add an LSTM cell for each layer
    for i in range(num_layers):
        lstm = LSTM(num_nodes, return_sequences = True, name = f'lstm_layer_{i}')(prev)
        if dropout:
            prev = Dropout(dropout)(lstm)
        else:
            prev = lstm
            
    # For each time step find the most likely character - one time step considers up to current character
    # Time Distributed applies same layer to all time steps (first dimension)
    dense = TimeDistributed(Dense(num_chars, name = 'dense',
                             activation = 'softmax'))(prev)
    model = Model(inputs = [input_layer], outputs = [dense])
    
    # Compile with categorical loss
    model.compile(loss = 'categorical_crossentropy', 
                  optimizer = RMSprop(lr=0.01), 
                  metrics = ['accuracy'])
    
    return model

In [185]:
model = char_rnn_model(len(chars), 1)
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input (InputLayer)           (None, None, 147)         0         
_________________________________________________________________
lstm_layer_0 (LSTM)          (None, None, 512)         1351680   
_________________________________________________________________
dropout_6 (Dropout)          (None, None, 512)         0         
_________________________________________________________________
time_distributed_6 (TimeDist (None, None, 147)         75411     
Total params: 1,427,091
Trainable params: 1,427,091
Non-trainable params: 0
_________________________________________________________________


In [186]:
import random
random.sample(abstracts, 1)

['The present invention describes the use of autonomous devices, which can be arranged in networks, such as neural networks, to better identify, track, and acquire sources of signals present in an environment. The environment may be a physical environment, such as a battlefield, or a more abstract environment, such as a communication network. The devices may be mobile, in the form of vehicles with sensors, or may be information agents, and may also interact with one another, thus allowing for a great deal of flexibility in carrying out a task. In some cases, the devices may be in the form of autonomous vehicles which can collaboratively sense, identify, or classify a number of sources or targets concurrently. The autonomous devices may function as mobile agents or attractors in a network, such as a neural network. The devices may also be aggregated to form a network of networks and provide scalability to a system in which the autonomous devices are operating.']

In [187]:
import random
def data_generator(text, char_to_idx, batch_size, chunk_size):
    X = np.zeros((batch_size, chunk_size, len(char_to_idx)))
    y = np.zeros((batch_size, chunk_size, len(char_to_idx)))
    
    chunk_size_original = chunk_size
    
    # Generator yields samples
    while True:
        # Batch size is number of samples to use
        for row in range(batch_size):
            
            # Choose a random abstract
            sample = random.sample(text, 1)[0]
            
            # Choose a random starting index
            idx = random.randrange(len(sample) - chunk_size - 1)

            # Empty array to hold a chunk, chunk size is number of characters to extract
            chunk = np.zeros((chunk_size + 1, len(char_to_idx)))
            
            # Need to find one more than chunk size to make labels
            for i in range(chunk_size + 1):
                chunk[i, char_to_idx[sample[idx + i]]] = 1
                
            # Features are all characters except for last
            X[row, :, :] = chunk[:chunk_size]
            # Labels are all characters except for first
            y[row, :, :] = chunk[1:]
            
        yield X, y

In [188]:
Xs, ys = next(data_generator(abstracts, char_to_idx, 512, chunk_size = 80))

In [189]:
Xs.shape

(512, 80, 147)

In [190]:
sample = Xs[1]
sample.shape

(80, 147)

In [191]:
x = []

for row in sample:
    x.append(idx_to_char[np.argmax(row)])
''.join(x)

'volutional layers of a trained convolution neural network for determining a conv'

In [192]:
y = []

for row in sample:
    y.append(idx_to_char[np.argmax(row)])
''.join(y)

'volutional layers of a trained convolution neural network for determining a conv'

The label is a shifted forward version of the features. At each feature, we are teaching the network to predict the next character.

In [193]:
callbacks = [EarlyStopping(monitor = 'loss', min_delta = 0.03, patience = 5),
             ModelCheckpoint(filepath = '../models/first_rnn.h5', save_best_only=True)]

In [194]:
from itertools import chain
all_text = list(chain(*abstracts))
len(all_text)

5332623

In [None]:
train_gen = data_generator(abstracts, char_to_idx, 256, chunk_size=CHUNK_SIZE)

h = model.fit_generator(generator=train_gen, epochs = 40, callbacks = callbacks,
                        steps_per_epoch = 2 * len(all_text) / (BATCH_SIZE * CHUNK_SIZE),
                        verbose = 2)

Epoch 1/40
 - 68s - loss: 2.8362 - acc: 0.2256
Epoch 2/40




 - 67s - loss: 1.9862 - acc: 0.4509
Epoch 3/40
 - 67s - loss: 1.6189 - acc: 0.5733
Epoch 4/40
 - 67s - loss: 1.4580 - acc: 0.6143
Epoch 5/40
 - 67s - loss: 1.4041 - acc: 0.6314
Epoch 6/40
 - 67s - loss: 1.4180 - acc: 0.6343
Epoch 7/40
 - 67s - loss: 1.3666 - acc: 0.6463
Epoch 8/40
 - 67s - loss: 1.4145 - acc: 0.6402
Epoch 9/40
 - 67s - loss: 1.3861 - acc: 0.6458
Epoch 10/40
 - 67s - loss: 1.3029 - acc: 0.6620
Epoch 11/40


In [156]:
random.randint(0, 150)

37

In [155]:
random.randrange(150)

39

In [None]:
import sys
def generate_output(model, text, start_index = 2, diversity = None, amount = 400):
    
    if start_index is None:
        start_index = random.randint(0, CHUNK_SIZE)
        
    sample = random.sample(text, 1)[0]
    generated = sample[start_index: start_index + CHUNK_SIZE]
    yield generated + '#'
    
    for i in range(amount):
        x = np.zeros((1, len(generated), len(chars)))
        for t, char in enumerate(generated):
            x[0, t, char_to_idx[char]] = 1
            
        preds = model.predict(x, verbose = 0)[0]
    
        if diversity is None:
            next_index = np.argmax(preds[len(generated) - 1])
            
        else:
            preds = np.array(preds[len(generated) - 1]).astype(np.float64)
            preds = np.log(preds) / diversity
            exp_preds = np.exp(preds)
            preds = exp_preds / np.sum(exp_preds)
            probas = np.random.multinomial(1, preds, 1)
            next_index = np.argmax(preds)
            
        next_char = idx_to_char[next_index]
        yield next_char
        
        generated += next_char
    return generated



In [None]:
for ch in generate_output(model, abstracts):
    sys.stdout.write(ch)
    

In [None]:
for ch in generate_output(model, abstracts):
    sys.stdout.write(ch)

# Conclusions

In this notebook, we got a small glimpse at the abilities of recurrent neural networks. Using just a few thousands patents and a basic neural network, we were able to teach a machine to produce reasonable outputs of patent abstracts.