# LSTM in practice -- NLP

## Language modeling

A language model is a probability distribution over the sequence of words, modeling language (production), thus if the set of words is $w$, then for arbitrary $\mathbf w = \langle w_1,\dots, w_n\rangle$ ($w_i\in W$) sequence it defines a $P(\mathbf w)$ probability. 

Probability with chain rule:

$$P(\mathbf w)= P(w_1)\cdot P(w_2 \vert w_1 )\cdot P(w_3\vert w_1, w_2)\cdot\dots\cdot P(w_n\vert w_1,\dots, w_{n-1})$$

so this means, that for the modeling we need only to give the conditional probability of the "continuation", the next word, thus for $w$ word and $\langle w_1,\dots,w_n\rangle$ sequence the probability that the next word will be $w$

$$P(w ~\vert ~ w_1,\dots,w_n)$$

There are character based models also, which take the individual characters as units, not the words, and model language as a distribution over sequences of characters (think T9...)

### Measurement of performance: Perplexity

A language model $\mathcal M$'s perplexity over the word series $\mathbf w = \langle w_1,\dots, w_n\rangle$ is:

$$\mathbf{PP}_{\mathcal M}(\mathbf w) = \sqrt[n]{\frac{1}{P_{\mathcal M}(\mathbf w)}}$$

With the chain rule can be rewritten as:

$$\mathbf{PP}_{\mathcal M}(\mathbf w) = {\sqrt[n]{\frac{1}{P_{\mathcal M}(w_1)}\cdot \frac{1}{P_{\mathcal M}(w_2 \vert w_1 )}\cdot \frac{1}{P_{\mathcal M}(w_3\vert w_1, w_2)}\cdot\dots\cdot \frac{1}{P_{\mathcal M}(w_n\vert w_1,\dots, w_{n-1})}}}$$

which is exactly the geometric mean of the reciprocals of the conditional probabilities of all words in the corpus.

In case of a bigram model this is further simplified to:
$$\mathbf{PP}_{\mathcal M}(\mathbf w) = \sqrt[n]{\frac{1}{P_{\mathcal M}(w_1)}\cdot \frac{1}{P_{\mathcal M}(w_2 \vert w_1 )}\cdot \frac{1}{P_{\mathcal M}(w_3\vert w_2)}\cdot\dots\cdot \frac{1}{P_{\mathcal M}(w_n\vert w_{n-1})}}$$


### But what is it good for?
For example:
- Predictive text input ("autocomplete")
- Generating text
- Spell checking
- Language understanding
- And most importantly representation learning - this we will be studiying in detail in a next lecture

### Generating text with a language model

The language model produces a tree with probable continuations of the text:

<img src="https://4.bp.blogspot.com/-Jjpb7iyB37A/WBZI4ImGQII/AAAAAAAAA9s/ululnUWt2vw9NMKuEr-F9H8tR2LEv36lACLcB/s1600/prefix_probability_tree.png" width=400 heigth=400>

Using this tree we can try different algorithms to search for the best "continuations". A full breadth-first search oi usually impossible, due to the high branching factor of the tree.

Alternatives:
- "Greedy": we choose the continuation which has the highest direct probability, This will most probably be suboptimal, since the probability of the full sequence is tha product of the continuations, and if we would have chosen a different path, we might ahve been able to choose later words with hihg probabilities.
- Beam-search: we always store a fixed $k$ number of partial sequences, and we always try to expand these, always keeping the most probable $k$ from the possible continuations. 

Example ($k$=5):

<img src="http://opennmt.net/OpenNMT/img/beam_search.png" width=600 heigth=600>
 

### The "old way": N-gram based solutions

With _gross_ simplification we assume, that the distribution is only dependent on the prior $n-1$ words (where $n$ is typically $<=4$), thus we assume a Markov chain of the order $n$:

 $$P(w ~\vert ~ w_1,\dots,w_k) = P(w ~\vert ~ w_{k- n + 2},\dots,w_k)$$

We simply compute these probabilities in a frequentist style by calculating the $n$-gram statistics of the corpus at hand:

$$P(w_2 ~\vert ~w_1) = \frac{c(\langle w_1, w_2 \rangle)}{c(w_1)}$$

$$P(w_{k+1} \vert~ w_1,\dots,w_k)_\mathrm = \frac{c(\langle w_1,...,w_k, w_{k+1} \rangle)}{c(\langle w_1, \dots w_k\rangle)}$$

Please note, that in this case we are using "memorization", a form of database learning, with minimal compression - "counting".

But what do we do the given $n$-grams rarely or never occur? We have to employ some __smoothing__ solutions, like: 

##### Additive smoothing
We pretend that we have seen the $n$-grams more times than we have actually did with a fixed $\delta$ number, in the simplest case with $n=2$:

$$P(w_2 ~\vert ~w_1) = \frac{c(\langle w_1, w_2 \rangle) + \delta}{\sum_{w\in V} [c(\langle w_1, w\rangle) + \delta]}$$

Widespread solution for $\delta$ is $1$.

The main problem with this kind of smoothing is that it does not take into account by "supplementing" the data the frequency of components of shorter $n$-grams, eg. if neither $\langle w_1, w_2 \rangle$  nor $\langle w_1, w_3 \rangle$ occurs in the corpus, it assumes the frequency of both bigrams to be $\delta$, irrespective of the ratio of frequencies of $w_2$ and $w_3$.
Most smoothing techniques are trying to accomodate this, eg: simple interpolation:

##### Interpolatcion

In case of bigrams, we add - with a certain weight - the probabilities coming from the individual frequencies:

$$P(w_2 ~\vert ~w_1)_{\mathrm{interp}} = \lambda_1\frac{c(\langle w_1, w_2 \rangle)}{c(w_1)} + (1 - \lambda_1)\frac{c(w_1)}{\sum_{w\in V}c(w)}$$

Recursive solution for arbitrary $k$:

$$P(w_{k+1} \vert~ w_1,\dots,w_k)_\mathrm{interp} = \lambda_k\frac{c(\langle w_1,...,w_k, w_{k+1} \rangle)}{c(\langle w_1, \dots w_k\rangle)} + (1-\lambda_k)P_\mathrm{interp}(\langle w_2,\dots,w_{k+1}\rangle)$$

$\lambda_k$ is empirically set by examining the corpus, typically by [Expectation Maximization algorithm](https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm), which - as we have mentioned - iteratively tunes the parameters to maximize the likelihood.


Good overview about the smoothing methods: [MacCartney, NLP Lunch Tutorial: Smoothing](https://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tutorial.pdf)

 
#### General problems

- Even the core assumption is not too realistic, since the probabilities are for sure influenced in a way by words further than $n$, but for practical reasons, it has to be limited (sparsity, computation capacity).
- On a large enough corpus, the memory footprint of the $n$-gram models is _huge_, eg. for the 1T n-gram corpus of Google ([see here](https://catalog.ldc.upenn.edu/LDC2006T13)) containing 1,024,908,267,229 tokens the $n$-gram counts are as follows:
    - unigram: 13,588,391, 
    - bigram: 314,843,401, 
    - trigram: 977,069,902, 
    - fourgrams: 1,313,818,354 
    - fivegram: 1,176,470,663.

## Language modeling with LSTMs

One way to circumvent the Markov assumption is to use RNN-s, which are capable of modeling the long-ter dependencies inside the sequence of words. The text is thus considered to be a time-series, and thus an appropriate architecture can be used (as we have already seen):

<img src="http://drive.google.com/uc?export=view&id=1y8QYr9ftTvXAxgzS-ldnGlijVpmK2l21" width=600 heigth=600>



Notable features:

- Input is a "one-hot" encoded vector, wchic we on the spot transform into an "embedding vector"
- For each output step, we get a probability distribution over the whole vocabulary with softmax
- This above is a simple RNN, but LSTMs can be used without any problems

### Teaching

_In theory_ an RNN could be trained with full GD on the corpus in one go:

<img src="http://drive.google.com/uc?export=view&id=1XsBoRp7cNay3svFLRDv2JEDyC7m7CUdC" width=600 heigth=600>


- The loss is generally the well-kown crossentropy, which is in this case (since the input is a one-hot vector):
  $$J^{(i)}(\Theta) = -\log (\hat y[x^{(i+1)}])$$
  the negative logarithm of the probability assigned by the network to the right word / next word.

- For the sake of more frequent updates, and since BPTT for long sequences is very expensive, teaching is done in smaller units with not necessarily the same length.
- The unit is typically one or more sentece, or if the length allows, and we have enough material, a paragraph can be a good candidate.
- Initial state in case of the time-series units: if the boundaries are inside a unit of text, it is important to _transfer the hidden state_ from the previous unit, in other cases initialization can be done by some fixed value.
- (Somewhat misleading) terminology: the length of the "time" unit is _time step_, but sometimes certain implementations call it _minibatch_, though that would generally mean the number of units processed in one go for the sake of computaitonal efficiency.


### LSTM as layers

+ An LSTM - how ever strange that may sound - can be considered to be a complete layer. The most important parameter of it is the "number of (memory) units", which is the length of the hidden state vector, thus, the memory capacity. **Warning: this does not have any relationship to input size, thus can be considered a freely chosen parameter.**
+ It is quite widespread to use multiple LSTM layers ("stacked LSTMs") -- as in the case of ConvNets the hope is, that the layers learn a hierarchy of abstract representations:

<img src="http://wenchenli.github.io/assets/img/GNMT_residual.png" width=60%>

(on the right side a network is shown with skip/residual connections!)

In this case it makes sense, that we do not only get on top of the LSTM a final prediction $h$ (or even prediction + inner state vector $c$) for a sequence, but **we ask it to output the whole sequence of predictions**, so that the next layer can also operate on full sequences. Please bear this in mind during implementation, since this can be a common source of failure.  



## An LSTM language model in Keras

For this task the inspiration comes from the famous [reference work of Andrej Karpathy](https://karpathy.github.io/2015/05/21/rnn-effectiveness/). 

Note, that in this case we will not use regularization, since we are willing to overfit - for the sake of play with the text. This is now an "overfitting competition", so _not_ a generally good practice!

## Reader

In [1]:
import numpy as np
import tensorflow as tf
import nltk

from numpy.random import seed
seed(1212)

tf.random.set_seed(1234)

In [2]:
nltk.download("brown")

from nltk.corpus import brown

# This can be an important parameter, so be aware of it...
max_seq_length = 15
max_num_of_sents = 57200
# max_num_of_sents = 50 # How many sentences should we read from the corpus (max=57200)

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.


In [3]:
# building a vocab of word-id from the corpus
def generate_brown_word_to_id_map():
    """Return a dictionary mapping downcased Brown-words to their ids.
    Numbering starts from 1 since we use 0 for masking (!!!).
    """
    words = set()
    for word in brown.words():
        words.add(word.lower())
    word_to_id = {word: idx + 1 for idx, word in enumerate(sorted(words))}
    print(word_to_id)
    return word_to_id

In [4]:
class BrownReader:
    """A reader class for the Brown corpus.
    """

    def __init__(self):
        self.word_to_id_map = generate_brown_word_to_id_map()
        self.id_to_word_map = {idx: word for word, idx in self.word_to_id_map.items()}

    def n_words(self):
        return len(self.word_to_id_map)

    def sentence_to_ids(self, sentence):
        """Return the word ids of a sentence.
        """
        return [self.word_to_id_map[word.lower()] for word in sentence]
        
    def sentences(self):
        """Generator yielding features from the Brown corpus.
        """
        return (self.sentence_to_ids(sentence) for sentence in brown.sents())

    def sentence_matrixes(self):
        x = np.zeros((max_num_of_sents, max_seq_length-1))
        y = np.zeros((max_num_of_sents, max_seq_length-1))
        sents = self.sentences()
        for idx, sent in enumerate(sents):
            if idx == max_num_of_sents:
                break
            np_array = np.asarray(sent)
            length  = min(max_seq_length, len(np_array))
            x[idx, :length - 1] = np_array[:length - 1]
            y[idx, :length - 1] = np_array[1:length]
        return x, y


## Model

### Parameters

In [5]:
br = BrownReader()
n_words = br.n_words()

max_input_length = max_seq_length - 1 # since our x/y input does not contain the last/first element of the sentences



In [6]:
data_x, data_y = br.sentence_matrixes()

In [7]:
data_x

array([[44607., 19054., 11761., ..., 36512., 34919., 15516.],
       [44607., 24984., 19127., ..., 10490.,   394., 48772.],
       [44607., 39779., 44460., ..., 24899., 15092., 35724.],
       ...,
       [44607., 15373.,   394., ...,  3812., 44853.,  4846.],
       [44607., 29804., 31265., ..., 31265., 44607., 36201.],
       [44851., 33246., 48868., ..., 33115., 31265., 21958.]])

In [8]:
data_y

array([[19054., 11761., 20188., ..., 34919., 15516., 35066.],
       [24984., 19127., 38597., ...,   394., 48772., 20797.],
       [39779., 44460., 24984., ..., 15092., 35724., 45229.],
       ...,
       [15373.,   394., 30201., ..., 44853.,  4846., 45143.],
       [29804., 31265.,  6419., ..., 44607., 36201., 36609.],
       [33246., 48868., 48671., ..., 31265., 21958., 49302.]])

In [9]:
data_y = np.expand_dims(data_y, -1) # It seems that Keras needs this for the "one-cold" and softmax dims to match

In [10]:
brown.sents()

[['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', 'Friday', 'an', 'investigation', 'of', "Atlanta's", 'recent', 'primary', 'election', 'produced', '``', 'no', 'evidence', "''", 'that', 'any', 'irregularities', 'took', 'place', '.'], ['The', 'jury', 'further', 'said', 'in', 'term-end', 'presentments', 'that', 'the', 'City', 'Executive', 'Committee', ',', 'which', 'had', 'over-all', 'charge', 'of', 'the', 'election', ',', '``', 'deserves', 'the', 'praise', 'and', 'thanks', 'of', 'the', 'City', 'of', 'Atlanta', "''", 'for', 'the', 'manner', 'in', 'which', 'the', 'election', 'was', 'conducted', '.'], ...]

* At each time-step, the RNN tries to predict what is the next word given the previous words. 
* The dataset $\mathbf{X} = (x^{\langle 1 \rangle}, x^{\langle 2 \rangle}, ..., x^{\langle T_x \rangle})$ is a list of words in the training set.
* $\mathbf{Y} = (y^{\langle 1 \rangle}, y^{\langle 2 \rangle}, ..., y^{\langle T_x \rangle})$ is the same list of words but shifted one word forward. 
* At every time-step $t$, $y^{\langle t \rangle} = x^{\langle t+1 \rangle}$.  The prediction at time $t$ is the same as the input at time $t + 1$.

# Tasks

See below

In [11]:
# Network parameters

lstm_size = 512
embedding_size = 100

### Network

In [12]:
# Import the necessary libraries
from tensorflow.keras import Sequential
from tensorflow.keras.layers import LSTM, Input, Embedding, Dense
from tensorflow.keras.models import Model
from tensorflow.keras import backend as be
from tensorflow.keras.optimizers import Adadelta, Adam, SGD
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.initializers import glorot_normal
from tensorflow.keras.models import load_model

In [13]:
# reset the graph
be.clear_session()
tf.compat.v1.reset_default_graph()


# Input layer 
inputs = Input(shape = (max_input_length,))


# embedding layer
# set the mask_zero=True, because our ids in word_to_id mapping starts from 1
# this is why our input dimention to embedding layer is equal to (n_words+1)
# our input_length is fixed (size of the sentence/sequence)
embedding_layer = Embedding(n_words+1,embedding_size, input_length=max_input_length, mask_zero=True)(inputs)


# first LSTM layer. 
# If we're to build a stacked LSTM layer, then we would need to access the hidden state output for each time step. 
# This can be done by setting return_sequences argument to True when defining our LSTM layer.
lstm_1 = LSTM(lstm_size, activation='relu', kernel_initializer=glorot_normal(seed=19), return_sequences = True)(embedding_layer)


# For second LSTM layer return state is True
# It will return 3 tensors (the series of outputs, last hidden state, last cell state)
output, hidden_state, cell_state = LSTM(lstm_size, activation='relu', kernel_initializer=glorot_normal(seed=19), return_sequences = True, return_state=True)(lstm_1)


# the output is a sequence of probability distributions over the words in vocabulary.(softmax)
# E.g: In a binary class classification problem, we have 2 possible outcomes so we have
# 2 units in output layer.
# In language modelling,  the 'next' word could be any of the words in the vocab.
# so to predict a word in a sentence, there are vocab size possible words that can be predicted.
predictions = Dense(n_words+1, activation = 'softmax')(output)

model = Model(inputs,predictions)
model.summary()

Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 14)]              0         
_________________________________________________________________
embedding (Embedding)        (None, 14, 100)           4981600   
_________________________________________________________________
lstm (LSTM)                  (None, 14, 512)           1255424   
_________________________________________________________________
lstm_1 (LSTM)                [(None, 14, 512), (None,  2099200   
_________________________________________________________________
dense (Dense)                (None, 14, 49816)         25555608  
Total params: 33,891,832
Trainable params: 33,891,832
Non-trainable params: 0
_________________________________________________________________


The shape of the ground truth is **(sent_length,   )** while that of the output is **(sent_length, vocab_size)**. So there is a shape discrepancy, which is handled, in this case, by using sparse categorical crossentropy loss.

### Error, optimizer, compilation

In [14]:
# Loss 
loss = "sparse_categorical_crossentropy"

# Optimizer
optimizer = Adam(learning_rate=0.0001) 
 
# Compilation
model.compile(loss=loss, optimizer= optimizer)

### Training

We generate the trainig data.

In [15]:
data_y = np.expand_dims(data_y, -1) # It seems that Keras needs this for the "one-cold" and softmax dims to match

And train! 

In [None]:
# Fit a language model to the data!
# Use 10% validation - not so important in case of language models.
# Use default alidation split of Keras.
# And try to guess a realistic batch size!


checkpoint_filepath = '/content/drive/MyDrive/temp'

# we want the model to overfit, hence save the weights corresponding to minimum training loss..
model_checkpoint_callback = ModelCheckpoint(filepath=checkpoint_filepath, save_weights_only=True, monitor='loss', mode='min', save_best_only=True)

# Loads the weights (IF NEEDED)
model.load_weights(checkpoint_filepath)


history = model.fit(data_x, data_y,
                    epochs=1,
                    validation_split=0.1,
                    batch_size=64,
                    verbose=1,
                    shuffle=False,
                    use_multiprocessing=True,
                    callbacks=[model_checkpoint_callback])



I trained the model for about 140 epochs. After that colab disconnected and says that i have exceeded the GPU limits. Fortunalty i used checkpoint to save the weights in drive. Final epoch was done using CPU which took 1 hour. 

In [16]:

from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [19]:
model.load_weights(checkpoint_filepath)

<tensorflow.python.training.tracking.util.CheckpointLoadStatus at 0x7f110c8d1e80>

## Demo 1: Predict next word

In [29]:
# Prediction
############

def str_to_input(s):
    """Convert a string to appropriate model input.
    """
    words = [x.lower() for x in s.split()[:max_input_length]]
    ids = [br.word_to_id_map[word] for word in words]
    ids_array = np.asarray(ids)
    length = min(max_input_length, len(ids_array))
    result = np.zeros((1, max_input_length))
    result[0, :length] = ids_array[:length]
    return result, length
    

while True:
    s = input("\nEnter a few starting words of a sentence or <return> to stop: ")
    if s == "":
        break
    else:
        try:
            x, length = str_to_input(s)
            predictions = model.predict(x)
            probs = predictions[0][length - 1]
            most_probable = np.argmax(probs)
            print("Predicted next word:", br.id_to_word_map[most_probable])
        except KeyError:
            print("Unknown words -- please try again!")


Enter a few starting words of a sentence or <return> to stop: The Fulton County Grand Jury said Friday an investigation of 
Predicted next word: atlanta's

Enter a few starting words of a sentence or <return> to stop: The jury further said in term-end presentments that the
Predicted next word: city

Enter a few starting words of a sentence or <return> to stop: The jury further said in term-end presentments that the City
Predicted next word: executive

Enter a few starting words of a sentence or <return> to stop: 


## Demo 2: Similarity of sentences

First we define a function that generates the hidden state of the LSTM from an input sentence:

In [30]:
input_layer = model.get_layer("input_1")
lstm_2_layer = model.get_layer("lstm_1")

cell_state_fun = be.function([input_layer.input],[lstm_2_layer.output[2]])

def get_embedding(x):
    """Return the final cell state associated with the input.
       Returns the last cell state as a vector.
    """
    return cell_state_fun([x])[0].flatten()

Then we use the vectors for calculating the cosine distance between sentences.

In [31]:
def cos_sim(a, b):
	return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

while True:
    s1 = input("\nEnter the first sentence or <return> to quit: ")
    if s1 == "": break
    s2 = input("\nEnter the second sentence: ")
    try:
        x1, _ = str_to_input(s1)
        x2, _ = str_to_input(s2)
        e1 = get_embedding(x1)
        e2 = get_embedding(x2)
        print("The cosine similarity between the two sentences is", cos_sim(e1, e2))
    except KeyError:
        print("Unknown words -- please try again!")


Enter the first sentence or <return> to quit: The jury further said in term-end presentments that the City

Enter the second sentence: The jury further said in term-end presentments that the City
The cosine similarity between the two sentences is 1.0

Enter the first sentence or <return> to quit: The Fulton County Grand Jury said Friday an investigation of Atlanta's recent primary election  produced no evidence that any irregularities took place. The jury further said in term-end presentments that the City Executive Committee.

Enter the second sentence: The Fulton County Grand Jury said Friday an investigation of Atlanta's recent primary election  produced no evidence that any irregularities took place. The jury further said in term-end presentments that the City Executive Committee.
The cosine similarity between the two sentences is 0.99999994

Enter the first sentence or <return> to quit: said Friday an investigation of

Enter the second sentence:  City Executive Committee.
Unknown

## Demo 3: Mini search engine

We use the library [Annoy](https://github.com/spotify/annoy) published by Spotify to create a vector space index of the Brown corpus from the LSTM's cell state. We assign a vector for each sentence, and then store it to be able to run nearest neighbor queries on it. With this we effectively created a **semantic search engine**.

(There are multiple solutions for approximate nearest neighbor search a scale which are worth looking into, one of them is [FAISS](https://code.fb.com/data-infrastructure/faiss-a-library-for-efficient-similarity-search/) from Facebook Research.)

In [32]:
def brown_sent_to_input(ids):
  ids_array = np.asarray(ids)
  length = min(max_input_length, len(ids_array))
  result = np.zeros((1, max_input_length))
  result[0, :length] = ids_array[:length]
  return result, length

In [33]:
sentlist = list(br.sentences())

In [34]:
!pip install annoy



In [35]:
INDEX_COVERAGE_PERCENT = 1.0 #How much of the corpus you want ot index? 1.0 means whole, 0.5 means half.
NEAREST_NEIGHBOR_NUM = 5

In [36]:
from annoy import AnnoyIndex
from tqdm import tqdm

index = AnnoyIndex(512, metric="angular")

for i in tqdm(range(int(len(sentlist)*INDEX_COVERAGE_PERCENT))):
  inputs,length = brown_sent_to_input(sentlist[i])
  vector = get_embedding(inputs)
  index.add_item(i,vector)

print("Building index...")
index.build(100)
print("Index done, ready to query!")

100%|██████████| 57340/57340 [1:03:29<00:00, 15.05it/s]


Building index...
Index done, ready to query!


In [37]:
def print_brown_index(sentences, indices):
  for i in indices:
    word_ids_list = sentences[i]
    for j in word_ids_list:
      print(br.id_to_word_map[j]+" ", end='')
    print()

    

In [38]:
while True:
  query = input("\nEnter the query or <return> to quit: ")
  if query == "": break
  try:
    in_ids, length = str_to_input(query)
    in_vector = get_embedding(in_ids)
    nearest_sentence_indices = index.get_nns_by_vector(in_vector, NEAREST_NEIGHBOR_NUM)
    #print("nearest indices:", nearest_sentence_indices)
    print_brown_index(sentlist, nearest_sentence_indices)

  except KeyError:
    print("Unknown words -- please try again!")


Enter the query or <return> to quit: The Fulton County Grand Jury said Friday an investigation of Atlanta's recent primary election
the fulton county grand jury said friday an investigation of atlanta's recent primary election produced `` no evidence '' that any irregularities took place . 
fair dealer humphrey upped the ante , asked cloture power for a mere majority of senators . 
for winning larson will receive a $100 u.s. savings bond from the junior achievement national organization . 
now if one hydrogen atom were placed at the surface of a large sphere of hydrogen atoms , it would be subject both to the gravitation of the sphere and the charge-excess of all those atoms in the sphere . 
arnold palmer , the defending champion , lost his title on the 72nd hole after a few minutes of misfortune that left even his fellow pros gaping in disbelief . 

Enter the query or <return> to quit: 
