# Creating a Spell Checker

The objective of this project is to build a model that can take a sentence with spelling mistakes as input, and output the same sentence, but with the mistakes corrected. Data can be found on as books from [Project Gutenberg](http://www.gutenberg.org/ebooks/search/?sort_order=downloads) or as cleaned wikipedia dumps.

To save time for multiple runs, file are saved in folder "./data" and weights of the neural network are saved and reused. Note that reusing only works for the same symbol set and neural net configuration. If you have varying symbols in different data sets (books), run all of them at the same time (put them all in the books folder), and use the offset variable to chew through all of the data piece by piece.

This is a one layer LSTM network, that only works forward. Improvements could be to add additional layers, and to use a bidirectional LSTM. More like the human way.

The sections of the project are:
- Loading the Data
- Preparing the Data
- Building the Model
- Training the Model
- Fixing Misspelled sentences

Load libraries

In [1]:
from __future__ import print_function

import numpy as np
import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)
from tensorflow import keras
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense
import pandas as pd
from collections import namedtuple
import time
import re
from sklearn.model_selection import train_test_split
import os
from os import listdir
from os.path import isfile, join
import bz2

Check hardware. Please note that a GPU is needed if you want to run anything but the smallest datasets.

In [2]:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 9287504794949069077
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 4967563264
locality {
  bus_id: 1
  links {
  }
}
incarnation: 14461268740794714468
physical_device_desc: "device: 0, name: GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5"
]


In [3]:
# For loading of books, encoding needs to be defined 
# for other os than linux.

def load_book(path):
    """Load a book from its file"""
    input_file = os.path.join(path)
    with open(input_file, encoding='utf-8') as f:
        book = f.read()
    return book

# For loading of precomputed data
def load_data(file, path):
    """Load data from its file"""
    loaded_data = []
    input_file = os.path.join(path, file)
    print (input_file)
    # Load and decompress data from file
    with bz2.open("./data/sentences.bz2", "rt") as f:
        loaded_data = f.read()
    return loaded_data

def load_vocab_input(file, path):
    """Load data from its file"""
    input_characters = set()
    input_file = os.path.join(path, file)
    # Load and decompress data from file
    with bz2.open("./data/vocab_input.bz2", "rt") as f:
        loaded_data = f.read()
    for char in loaded_data:
        input_characters.add(char)
    return input_characters

def load_vocab_target(file, path):
    """Load data from its file"""
    target_characters = set()
    input_file = os.path.join(path, file)
    # Load and decompress data from file
    with bz2.open("./data/vocab_target.bz2", "rt") as f:
        loaded_data = f.read()
    for char in loaded_data:
        target_characters.add(char)
    return target_characters

In [4]:
# Paths to subfolders
path = './books/'
path_data = './data/'

In [5]:
# Global variables
# Check if data files exists
# Using try is a cheat. Properly done the code would check if flie exists.
try:
    loaded_data = load_data('sentences.bz2', path_data)
    sentences_exist = True
    print('Loaded sentences.')
    sentences = []
    for line in loaded_data.splitlines():
        sentences.append(line)
except:
    sentences_exist = False
    print('No sentences found to load. Going to load books.')
    
try:
    vocab_input = load_vocab_input('vocab_input.bz2', path_data)
    vocab_input_exists = True
    print('Loaded vocab_input.')
    vocab_target = load_vocab_target('vocab_target.bz2', path_data)
    print('Loaded vocab_target.')
    vocab_target_exists = True
except:
    vocab_target_exists = vocab_input_exists = False

./data/sentences.bz2
No sentences found to load. Going to load books.
Loaded vocab_input.
Loaded vocab_target.


Define file loading functions

In [6]:
# Collect all of the book file names
if sentences_exist == False:
    book_files = [f for f in listdir(path) if isfile(join(path, f))]
    book_files = book_files[0:]

In [7]:
# Load the books using the file names
if sentences_exist == False:
    books = []
    for book in book_files:
        books.append(load_book(path+book))

In [8]:
# Compare the number of words in each book 
if sentences_exist == False:
    for i in range(len(books)):
        print("There are {} words in {}.".format(len(books[i].split()), book_files[i]))


There are 177535 words in allasou2015nonum1040.sv.


In [9]:
# Check to ensure the text looks alright
if sentences_exist == False:
    print(books[0][:500],"/.../")

Uppdraget är härmed slutfört.
Härigenom föreskrivs följande.
Till detta hör ett antal bilagor.
Ett begrepp som inte helt är klarlagt.
Den enskilde har många roller.
Till MB hör ett antal förordningar.
Därför är denna lösning inte aktuell.
JAMA Facial Plast Surg.
Piercing ingår inte i standarden.
Flera svar var möjliga att ge.
En mottagning lämnade inget svar.
Begreppet patient definieras inte.
Som en liten del ingår hälsoskyddet.
Centrala kapitel för denna utredning är.
Resultatet ska dokumenter /.../


## Preparing the Data

In [10]:
def clean_text(text):
    '''Remove unwanted characters and extra spaces from the text'''
    #text = re.sub(r'\n', ' ', text) 
    text = re.sub(r'[{}@_*>()\\#%+=\[\]]','', text)
    text = re.sub('a0','', text)
    text = re.sub('\'92t','\'t', text)
    text = re.sub('\'92s','\'s', text)
    text = re.sub('\'92m','\'m', text)
    text = re.sub('\'92ll','\'ll', text)
    text = re.sub('\'91','', text)
    text = re.sub('\'92','', text)
    text = re.sub('\'93','', text)
    text = re.sub('\'94','', text)
    #text = re.sub('\.','. ', text)
    #text = re.sub('\!','! ', text)
    #text = re.sub('\?','? ', text)
    text = re.sub(' +',' ', text)
    #text = [text.islower()]
    return text

In [11]:
# Clean the text of the books
if sentences_exist == False:
    clean_books = []
    for book in books:
        clean_books.append(clean_text(book))

In [12]:
# Check to ensure the text has been cleaned properly
if sentences_exist == False:
    print(clean_books[0][:500],"/.../")

Uppdraget är härmed slutfört.
Härigenom föreskrivs följande.
Till detta hör ett antal bilagor.
Ett begrepp som inte helt är klarlagt.
Den enskilde har många roller.
Till MB hör ett antal förordningar.
Därför är denna lösning inte aktuell.
JAMA Facial Plast Surg.
Piercing ingår inte i standarden.
Flera svar var möjliga att ge.
En mottagning lämnade inget svar.
Begreppet patient definieras inte.
Som en liten del ingår hälsoskyddet.
Centrala kapitel för denna utredning är.
Resultatet ska dokumenter /.../


In [13]:
# Split the text from the books into sentences.
# Choose whether only lower case or not below.
if sentences_exist == False:
    sentences = []
    save_sentences = ""
    for book in clean_books:
        for sentence in book.splitlines():
            sentence = sentence.lower() # lower case to halve the nr of symbols
            sentences.append(sentence)
            save_sentences += (sentence + '\n')
    print("There are {} sentences.".format(len(sentences)))
    # Write compressed data to file
    with bz2.open("./data/sentences.bz2", "wt") as bzip_file:
        unused = bzip_file.write(save_sentences)
        #unused = bzip_file.write(save_sentences.encode())     
        # encoding=’utf8′, errors=’strict’
    save_sentences = ""

There are 44910 sentences.


In [14]:
# Check to ensure the text has been split correctly.
print(sentences[:5])

['uppdraget är härmed slutfört.', 'härigenom föreskrivs följande.', 'till detta hör ett antal bilagor.', 'ett begrepp som inte helt är klarlagt.', 'den enskilde har många roller.']


In [15]:
# Find the length of each sentence
lengths = []
for sentence in sentences:
    lengths.append(len(sentence))
lengths = pd.DataFrame(lengths, columns=["counts"])

In [16]:
lengths.describe()

Unnamed: 0,counts
count,44910.0
mean,28.283812
std,8.472319
min,10.0
25%,21.0
50%,30.0
75%,36.0
max,40.0


In [17]:
# Limit the data we will use to train our model
max_length = 60 # was 92
min_length = 10

good_sentences = []

#for sentence in int_sentences:
for sentence in sentences:
    if len(sentence) <= max_length and len(sentence) >= min_length:
        good_sentences.append(sentence)

print("We will use {} sentences to train and test our model.".format(len(good_sentences)))

We will use 44910 sentences to train and test our model.


In [18]:
# Split the data into training and testing sentences (testing is for
# inferencing). There will be three data sets; training which is split
# into training and validation during fit, and testing for inferencing.
# Note that inferencing does not use the test set, as of yet.
training, testing = train_test_split(good_sentences, test_size = 0.1, random_state = 2)

print("Number of training sentences:", len(training))
print("Number of testing sentences:", len(testing))

Number of training sentences: 40419
Number of testing sentences: 4491


In [19]:
letters = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o',
           'p','q','r','s','t','u','v','w','x','y','z','å','ä','ö',' ','.']

# Note that caps and numbers are not introduced as errors, because the data does not contain them.
#           'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P',
#           'Q','R','S','T','U','V','X','Y','Z','Å','Ä','Ö',

def noise_maker(sentence, threshold):
    '''Relocate, remove, or add characters to create spelling mistakes'''
    noisy_sentence = ""
    i = 0
    while i < len(sentence):
        random = np.random.uniform(0,1,1)
        # Most characters will be correct since the threshold value is high
        if random < threshold:
            noisy_sentence+=sentence[i]
        else:
            new_random = np.random.uniform(0,1,1)
            # ~20% chance characters will swap locations
            if new_random > 0.8:
                if i == (len(sentence) - 1):
                    # If last character in sentence, it will not be typed
                    continue
                else:
                    # if any other character, swap order with following character
                    noisy_sentence+=sentence[i+1]
                    noisy_sentence+=sentence[i]
                    #noisy_sentence.append(sentence[i+1])
                    #noisy_sentence.append(sentence[i])
                    i += 1
            # ~20% chance an extra letter will be added to the sentence
            elif new_random > 0.60:
                noisy_sentence+=sentence[i]
                noisy_sentence+=sentence[i]
                #noisy_sentence.append(sentence[i])
                #noisy_sentence.append(sentence[i])
                #random_letter = np.random.choice(letters, 1)[0]
                #noisy_sentence.append(vocab_to_int[random_letter])
            # ~40% chance a letter will be substituted for another
            elif new_random > 0.20:
                noisy_sentence+=sentence[i]
                #noisy_sentence.append(sentence[i])
                #random_letter = np.random.choice(letters, 1)[0]
                #noisy_sentence.append(vocab_to_int[random_letter])
            # 20% chance a character will not be typed
            else:
                pass     
        i += 1
    return noisy_sentence

*Note: The noise_maker function is used to create spelling mistakes that are similar to those we would make. Sometimes we forget to type a letter, type a letter in the wrong location, or add an extra letter.*

In [20]:
# Check to ensure noise_maker is making mistakes correctly.
threshold = 0.95
# for sentence in training_sorted[:5]:
training = sentences
for sentence in training[:5]:
    print(sentence)
    print(noise_maker(sentence, threshold))
    print()

uppdraget är härmed slutfört.
uppdraget är härmed slutfört.

härigenom föreskrivs följande.
härigenom föreskrivs följande.

till detta hör ett antal bilagor.
till detta hör tet antal bilagor.

ett begrepp som inte helt är klarlagt.
ett begrepp som inte helt är klalragt.

den enskilde har många roller.
den enskilde  har många orlle.



# the Model data

In [21]:
batch_size = 32 # Limited by memory. 64 is much faster, but uses more memory.

In [22]:
# time steps, ie number of letters
input_dim = 40

In [23]:
epochs = 30 # 100 # Number of epochs to train for. Smaller data sets
                  # needs more runs than larger data sets. 1 is only
                  # useful for testing. Less is better, an overtrained
                  # model is useless.

units = 64

output_size = 10

latent_dim = 256      # Latent dimensionality of the encoding space.
num_samples = 100000  # Number of samples to train on. Used with
                      # the offset variable you can chew through
                      # large data sets, without using much memory.
threshold = 0.95      # introduce 5 % errors, i.e. about once for every 20 chars.

Initialize and save space

In [24]:
input_texts = []
target_texts = []

if vocab_target_exists:
    target_characters = vocab_target
    target_characters = sorted(list(target_characters))
else: 
    target_characters = set()

if vocab_input_exists:
    input_characters = vocab_input
    input_characters = sorted(list(input_characters))
else:
    input_characters = set()

lines = training

training = []
#testing = []
good_sentences = []
sentences = []

In [25]:
added_characters = False
# offset for large datasets, if you want to run it piece by piece
line_offset = 0

for line in lines[line_offset: min((num_samples+line_offset), len(lines) - 1)]:
    target_text = str (line)
    input_text = noise_maker(line, threshold)
    # We use "tab" as the "start sequence" character
    # for the targets, and "\n" as "end sequence" character.
    target_text = "\t" + target_text + "\n"
    input_texts.append(input_text)
    target_texts.append(target_text)
    
    for char in input_text:
        if char not in input_characters:
            input_characters.add(char)
            added_characters = True
    for char in target_text:
        if char not in target_characters:
            target_characters.add(char)
            added_characters = True
            

In [26]:
if not vocab_input_exists:
    print('saving vocab')
    input_characters = sorted(list(input_characters))
    vocab_input=""
    for char in input_characters:
        vocab_input += char
    with bz2.open("./data/vocab_input.bz2", "wt") as bzip_file:
        unused = bzip_file.write(vocab_input)
    added_characters = False

if not vocab_target_exists:
    print('saving vocab')
    target_characters = sorted(list(target_characters))
    vocab_target=""
    for char in target_characters:
        vocab_target += char
    with bz2.open("./data/vocab_target.bz2", "wt") as bzip_file:
        unused = bzip_file.write(vocab_target)
    added_characters = False

In [27]:
# Note that added characters (new characters in new datasets) are
# save correctly, but read incorrectly and then sorted. This is a bug.
# It is better to use a fixed vocabulary (list of symbols), that are 
# always guaranteed to remain in the same order, even when data 
# sets (texts/books), contain different characters/symbols.
if added_characters:
    vocab_input=""
    for char in input_characters:
        vocab_input += char
    with bz2.open("./data/vocab_input.bz2", "wt") as bzip_file:
        unused = bzip_file.write(vocab_input)
    vocab_target=""
    for char in target_characters:
        vocab_target += char
    with bz2.open("./data/vocab_target.bz2", "wt") as bzip_file:
        unused = bzip_file.write(target_characters)
        
    

In [28]:
num_encoder_tokens = len(input_characters)
num_decoder_tokens = len(target_characters)
max_encoder_seq_length = max_length + 11 # max([len(txt) for txt in clean_books])
max_decoder_seq_length = max_encoder_seq_length + 2 # max([len(txt) for txt in target_texts])

print("Number of samples:", len(lines))
print("Number of unique input tokens:", num_encoder_tokens)
print("Number of unique output tokens:", num_decoder_tokens)
print("Max sequence length for inputs:", max_encoder_seq_length)
print("Max sequence length for outputs:", max_decoder_seq_length)

Number of samples: 44910
Number of unique input tokens: 33
Number of unique output tokens: 35
Max sequence length for inputs: 71
Max sequence length for outputs: 73


In [29]:
input_token_index = dict([(char, i) for i, char in enumerate(input_characters)])
target_token_index = dict([(char, i) for i, char in enumerate(target_characters)])

print(input_token_index)
print(target_token_index)

encoder_input_data = np.zeros(
    (len(input_texts), max_encoder_seq_length, num_encoder_tokens), dtype="float32"
)
decoder_input_data = np.zeros(
    (len(input_texts), max_decoder_seq_length, num_decoder_tokens), dtype="float32"
)
decoder_target_data = np.zeros(
    (len(input_texts), max_decoder_seq_length, num_decoder_tokens), dtype="float32"
)

{' ': 0, '!': 1, '.': 2, '?': 3, 'a': 4, 'b': 5, 'c': 6, 'd': 7, 'e': 8, 'f': 9, 'g': 10, 'h': 11, 'i': 12, 'j': 13, 'k': 14, 'l': 15, 'm': 16, 'n': 17, 'o': 18, 'p': 19, 'q': 20, 'r': 21, 's': 22, 't': 23, 'u': 24, 'v': 25, 'w': 26, 'x': 27, 'y': 28, 'z': 29, 'ä': 30, 'å': 31, 'ö': 32}
{'\t': 0, '\n': 1, ' ': 2, '!': 3, '.': 4, '?': 5, 'a': 6, 'b': 7, 'c': 8, 'd': 9, 'e': 10, 'f': 11, 'g': 12, 'h': 13, 'i': 14, 'j': 15, 'k': 16, 'l': 17, 'm': 18, 'n': 19, 'o': 20, 'p': 21, 'q': 22, 'r': 23, 's': 24, 't': 25, 'u': 26, 'v': 27, 'w': 28, 'x': 29, 'y': 30, 'z': 31, 'ä': 32, 'å': 33, 'ö': 34}


In [30]:
for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):
    for t, char in enumerate(input_text):
        encoder_input_data[i, t, input_token_index[char]] = 1.0
    encoder_input_data[i, t + 1 :, input_token_index[" "]] = 1.0
    for t, char in enumerate(target_text):
        # decoder_target_data is ahead of decoder_input_data by one timestep
        decoder_input_data[i, t, target_token_index[char]] = 1.0
        if t > 0:
            # decoder_target_data will be ahead by one timestep
            # and will not include the start character.
            decoder_target_data[i, t - 1, target_token_index[char]] = 1.0
    decoder_input_data[i, t + 1 :, target_token_index[" "]] = 1.0
    decoder_target_data[i, t:, target_token_index[" "]] = 1.0

# Build the RNN model

In [31]:
# Define an input sequence and process it.
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)

# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]

# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None, num_decoder_tokens))

# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation="softmax")
decoder_outputs = decoder_dense(decoder_outputs)

In [32]:
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

Training the Model

In [33]:
model.compile(
    optimizer="rmsprop", loss="categorical_crossentropy")

In [34]:
# Do not run this line if no weights are saved!
status = model.load_weights('weights') 

NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for weights

In [35]:
model.fit(
    [encoder_input_data, decoder_input_data],
    decoder_target_data,
    batch_size=batch_size,
    epochs=epochs,                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
    validation_split=0.2)

Train on 35927 samples, validate on 8982 samples
Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


<tensorflow.python.keras.callbacks.History at 0x22ae144a048>

In [36]:
# Save weights
model.save_weights('weights')

## Fixing Custom Sentences

In [37]:
from tensorflow.keras.models import Model
from tensorflow.python.keras.layers import Input, LSTM, Dense
import tensorflow as tf

In [38]:
# Next: inference mode (sampling).
# Here's the drill:
# 1) encode input and retrieve initial decoder state
# 2) run one step of decoder with this initial state
# and a "start of sequence" token as target.
# Output will be the next target token
# 3) Repeat with the current target token and current states

# Define sampling models
encoder_model = Model(encoder_inputs, encoder_states)

decoder_state_input_h = Input(shape=(latent_dim,))
decoder_state_input_c = Input(shape=(latent_dim,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(
    decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model(
    [decoder_inputs] + decoder_states_inputs,
    [decoder_outputs] + decoder_states)

# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_input_char_index = dict(
    (i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
    (i, char) for char, i in target_token_index.items())


def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0, target_token_index['\t']] = 1.

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        # Update states
        states_value = [h, c]

    return decoded_sentence


for seq_index in range(100):
    # Take one sequence (part of the training set)
    # for trying out decoding.
    input_seq = encoder_input_data[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)
    

    

-
Input sentence: uppdraget är härmed slufört.
Decoded sentence: uppdraget är märst avgrågat.

-
Input sentence: härigenom föreskrivs följande.
Decoded sentence: härigenom föreskrivs följande.

-
Input sentence: till detta hör ett antal  bilagor.
Decoded sentence: till det behöver de utföras av slu.

-
Input sentence: ett begrepp som inte helt är klarlagt.
Decoded sentence: ett klimatpolitiskt ramverk för sverige.

-
Input sentence: dne enskilde har många roller.
Decoded sentence: den modell kommer att vara någ.

-
Input sentence: till mb hör ett antal förordnignar.
Decoded sentence: till det kommer att förslaget vält.

-
Input sentence: därför är denna lösning inte aktuell.
Decoded sentence: därför är en styrelse är stor uppgift.

-
Input sentence: jamafacial plast surg
Decoded sentence: jaa stare inte servide.

-
Input sentence: piercing ingår inte i standarden.
Decoded sentence: priviktion and research service.

-
Input sentence: flera svar var möjliga att ge.
Decoded sentence: fler

-
Input sentence: eu och kommunernas bosatdspolitik.
Decoded sentence: eu och kommunernas bostadspolitik.

-
Input sentence: delrapport rån sverigeförhandlingen.
Decoded sentence: delrapport från sverigeförhandlingen.

-
Input sentence: en förvaltning som håller ihop.
Decoded sentence: en förvaltning som håller ihop.

-
Input sentence: bostder att ob kvarr i.
Decoded sentence: bostadsbrist råder i del.

-
Input sentence: mer gemensamma tobaksreglerr.
Decoded sentence: mer gemensamma tobaksregler.

-
Input sentence: mer trygghet och bättref örsäkring.
Decoded sentence: mer trygghet och bättre försäkring.

-
Input sentence: systematiska jämförelser.
Decoded sentence: systematiska jämförelser.

-
Input sentence: för  lärande i staten.
Decoded sentence: för lärande i staten.

-
Input sentence: arbetslöhet och ekonomisk iståånd..
Decoded sentence: arbetslöhet och ekonomiskt bistånd.

-
Input sentence: skapa tilltro.
Decoded sentence: skapa tilltro.

-
Input sentence: bans och ungasr ätt vid