# Part 2
In the first part, we have seen how to compute joint probability of a sentence using the independence assumption.
However, this assumption is too weak as words do depend on each other. In the second part, we would like to introduce a better approach using Markov chain.

# Markov chain

Markov chain is based on the following assumption: *given the present, the future does not depend on the past*.

##### Zeroth order Markov chain
The simplest form of Markov chain is so called zeroth order Markov chain where words are completely independent from each other:

$p(w_1\ w_2\ \dots\ w_{|T|}) \approx p(w_1)\ p(w_2)\ p(w_3)\ \dots \ p(w_{|T|}) = \prod^{|T|}_{i=1}p(w_i)$

Recall that this is similar to the independence assumption covered in the first part.
In literature, this probabilistic model is also known as unigram language model.

##### First order Markov chain:
In the first order Markov chain, a probability of a word is conditioned only on the preceeding word:

$p(w_1\ w_2\ \dots\ w_{|T|}) \approx p(w_1)\ p(w_2|w_1)\ p(w_2|w_3)\ \dots \ p(w_{|T|}|w_{|T|-1}) = p(w_1)\prod^{|T|}_{i=2}p(w_i|w_{i-1})$

where conditional probabilities are estimated as:

$p(w_i|w_{i-1})=\frac{count(w_{i-1}\ w_{i})}{count(w_{i-1})}$

In literature, this probabilistic model is called bigram (or 2-gram) language model.

Similarly, in the N-th order Markov chain, a probability of a word is conditioned only on the preceeding N words.


For each of the following tasks we will continue working with PTB dataset.

### Task 1: Computing probability of a sentence (first order Markov chain)
Write a function called **compute_prob1** which takes a sentence as input (and maybe some additional arguments) and returns its probability estimated using the first order Markov chain. 

Your function should first append **\<bos>** and **\<eos>** symbols to the input sentence  and replace all words that are not present in the dictionary with **\<unk>** symbol. 

If the original sentence was "my name is madina", then "madina" is obviously not in the dictionary (see the description of the PTB). The probability of such sentence can be computed as:
  p("my name is madina") = p("my" | "\<bos>") * p("name" | "my") * p("is" | "name") * p("\<unk>" | "is") * p("\<eos>" | "\<unk>")
  

In [1]:
with open('ptb.train.txt') as f:
   raw_data = f.read()  

In [2]:
data = raw_data.split('\n')
data = [sent for sent in data if sent != '']

In [3]:
stop = ('...', '.', '?', '!', '!!!')
data2 = data
data2 = ['<bos>'+sent+'<eos>' for sent in data2]

In [4]:
dictionary = dict()
for i in range(len(data2)):
    words = data2[i].split()
    for word in words:
        dictionary[word] = dictionary.get(word, 0) + 1
print(dictionary)



In [5]:
dictionary2 = dict()
for i in range(len(data2)):
    words = data2[i].split()
    print(words)
    #for word in words:
        #print(word)
        #if word[i], word[i-1] in dictionary:
        #   dictionary2[word[i], word[i-1]]+=1
        #else:
        #    dictionary2[word[i], word[i-1]]=1
print(dictionary)

['<bos>', 'aer', 'banknote', 'berlitz', 'calloway', 'centrust', 'cluett', 'fromstein', 'gitano', 'guterman', 'hydro-quebec', 'ipo', 'kia', 'memotec', 'mlx', 'nahb', 'punts', 'rake', 'regatta', 'rubens', 'sim', 'snack-food', 'ssangyong', 'swapo', 'wachter', '<eos>']
['<bos>', 'pierre', '<unk>', 'N', 'years', 'old', 'will', 'join', 'the', 'board', 'as', 'a', 'nonexecutive', 'director', 'nov.', 'N', '<eos>']
['<bos>', 'mr.', '<unk>', 'is', 'chairman', 'of', '<unk>', 'n.v.', 'the', 'dutch', 'publishing', 'group', '<eos>']
['<bos>', 'rudolph', '<unk>', 'N', 'years', 'old', 'and', 'former', 'chairman', 'of', 'consolidated', 'gold', 'fields', 'plc', 'was', 'named', 'a', 'nonexecutive', 'director', 'of', 'this', 'british', 'industrial', 'conglomerate', '<eos>']
['<bos>', 'a', 'form', 'of', 'asbestos', 'once', 'used', 'to', 'make', 'kent', 'cigarette', 'filters', 'has', 'caused', 'a', 'high', 'percentage', 'of', 'cancer', 'deaths', 'among', 'a', 'group', 'of', 'workers', 'exposed', 'to', 'it', 

['<bos>', 'a', 'spokesman', 'for', 'the', 'guild', 'said', 'the', 'union', "'s", 'lawyers', 'are', 'reviewing', 'the', 'suit', '<eos>']
['<bos>', 'he', 'said', 'disciplinary', 'proceedings', 'are', 'confidential', 'and', 'declined', 'to', 'comment', 'on', 'whether', 'any', 'are', 'being', 'held', 'against', 'mr.', 'trudeau', '<eos>']
['<bos>', 'mr.', 'trudeau', "'s", 'attorney', 'norman', 'k.', '<unk>', 'said', 'the', '<unk>', 'consists', 'mainly', 'of', 'the', 'guild', "'s", '<unk>', 'threats', 'of', 'disciplinary', 'action', '<eos>']
['<bos>', 'mr.', '<unk>', 'said', 'a', 'guild', 'disciplinary', 'hearing', 'is', 'scheduled', 'next', 'monday', 'in', 'new', 'york', '<eos>']
['<bos>', 'mr.', '<unk>', 'who', 'will', 'go', 'before', 'the', 'disciplinary', 'panel', 'said', 'the', 'proceedings', 'are', 'unfair', 'and', 'that', 'any', 'punishment', 'from', 'the', 'guild', 'would', 'be', '<unk>', '<eos>']
['<bos>', 'in', 'addition', 'to', 'the', 'damages', 'the', 'suit', 'seeks', 'a', 'court

['<bos>', 'the', 'action', 'on', 'poland', 'came', 'as', 'the', 'conference', 'separately', 'approved', '$', 'N', 'million', 'for', 'international', 'population', 'planning', 'activities', 'an', 'N', 'N', 'increase', 'over', 'fiscal', 'N', '<eos>']
['<bos>', 'the', 'house', 'and', 'senate', 'are', 'divided', 'over', 'whether', 'the', 'united', 'nations', 'population', 'fund', 'will', 'receive', 'any', 'portion', 'of', 'these', 'appropriations', 'but', 'the', 'size', 'of', 'the', 'increase', 'is', 'itself', 'significant', '<eos>']
['<bos>', 'in', 'a', 'second', 'area', 'of', 'common', 'concern', 'the', 'world', 'environment', 'an', 'additional', '$', 'N', 'million', 'will', 'be', 'provided', 'in', 'development', 'assistance', 'to', 'fund', 'a', 'series', 'of', 'initiatives', 'related', 'both', 'to', 'global', 'warming', 'and', 'the', 'plight', 'of', 'the', 'african', 'elephant', '<eos>']
['<bos>', 'the', 'sweeping', 'nature', 'of', 'the', 'bill', 'draws', 'a', 'variety', 'of', 'special'

['<bos>', 'n.v', '<unk>', 'said', 'net', 'income', 'in', 'the', 'third', 'quarter', 'jumped', 'N', 'N', 'as', 'the', 'company', 'had', 'substantially', 'lower', 'extraordinary', 'charges', 'to', 'account', 'for', 'a', 'restructuring', 'program', '<eos>']
['<bos>', 'the', 'dutch', 'chemical', 'group', 'said', 'net', 'income', 'gained', 'to', 'N', 'million', 'guilders', '$', 'N', 'million', 'or', 'N', 'guilders', 'a', 'share', 'from', 'N', 'million', 'guilders', 'or', 'N', 'guilders', 'a', 'share', 'a', 'year', 'ago', '<eos>']
['<bos>', 'the', 'N', 'N', 'state-owned', '<unk>', 'had', 'eight', 'million', 'guilders', 'of', 'extraordinary', 'charges', 'in', 'the', 'latest', 'quarter', 'mainly', 'to', 'reflect', 'one-time', 'losses', 'in', 'connection', 'with', 'the', 'disposal', 'of', 'some', 'operations', '<eos>']
['<bos>', 'the', 'charges', 'were', 'offset', 'in', 'part', 'by', 'a', 'gain', 'from', 'the', 'sale', 'of', 'the', 'company', "'s", 'construction', 'division', '<eos>']
['<bos>',

['<bos>', 'today', 'the', '<unk>', '<unk>', 'has', 'mostly', 'dropped', 'the', '<unk>', 'work', 'though', 'a', 'touch', 'of', 'the', 'old', '<unk>', 'still', '<unk>', 'and', 'now', 'goes', 'on', 'the', 'road', 'with', 'piano', 'bass', 'a', 'slide', 'show', 'and', 'a', '<unk>', 'that', 'ranges', 'from', 'light', 'classical', 'to', 'light', 'jazz', 'to', 'light', 'pop', 'with', 'a', 'few', 'notable', 'exceptions', '<eos>']
['<bos>', 'just', 'the', 'thing', 'for', 'the', '<unk>', 'set', 'the', '<unk>', 'audience', 'that', 'has', 'embraced', 'new', 'age', 'as', 'its', 'very', 'own', 'easy', 'listening', '<eos>']
['<bos>', 'but', 'you', 'ca', "n't", 'dismiss', 'mr.', 'stoltzman', "'s", 'music', 'or', 'his', '<unk>', 'as', 'merely', 'commercial', 'and', '<unk>', '<eos>']
['<bos>', 'he', 'believes', 'in', 'what', 'he', 'plays', 'and', 'he', 'plays', '<unk>', '<eos>']
['<bos>', 'his', 'recent', 'appearance', 'at', 'the', 'metropolitan', 'museum', 'dubbed', 'a', 'musical', '<unk>', 'was', 'a', 

['<bos>', 'the', 'issue', 'is', 'backed', 'by', 'a', 'N', 'N', 'letter', 'of', 'credit', 'from', 'credit', 'suisse', '<eos>']
['<bos>', '<unk>', '<unk>', 'electric', 'railway', 'co', 'japan', '<eos>']
['<bos>', '$', 'N', 'million', 'of', 'bonds', 'due', 'nov.', 'N', 'N', 'with', 'equity-purchase', 'warrants', 'indicating', 'a', 'N', 'N', 'N', 'coupon', 'at', 'par', 'via', 'nomura', 'international', 'ltd', '<eos>']
['<bos>', 'each', '$', 'N', 'bond', 'carries', 'one', 'warrant', 'exercisable', 'from', 'nov.', 'N', 'through', 'nov.', 'N', 'N', 'to', 'buy', 'company', 'shares', 'at', 'an', 'expected', 'premium', 'of', 'N', 'N', 'N', 'to', 'the', 'closing', 'share', 'price', 'when', 'terms', 'are', 'fixed', 'tuesday', '<eos>']
['<bos>', 'diesel', '<unk>', 'co', 'japan', '<eos>']
['<bos>', '$', 'N', 'million', 'of', 'bonds', 'due', 'nov.', 'N', 'N', 'with', 'equity-purchase', 'warrants', 'indicating', 'a', 'N', 'N', 'N', 'coupon', 'at', 'par', 'via', 'yamaichi', 'international', 'europe', '

['<bos>', 'and', 'a', 'week', 'later', 'japan', 'raised', 'its', 'official', 'discount', 'rate', 'by', 'a', 'half', 'point', 'to', 'N', 'N', '<eos>']
['<bos>', 'the', 'japanese', 'discount', 'rate', 'is', 'the', 'central', 'bank', "'s", 'base', 'rate', 'on', 'loans', 'to', 'commercial', 'banks', '<eos>']
['<bos>', 'after', 'a', 'surprisingly', 'sharp', 'widening', 'in', 'the', 'u.s.', 'august', 'merchandise', 'trade', 'deficit', '$', 'N', 'billion', 'from', 'a', 'revised', '$', 'N', 'billion', 'in', 'july', 'and', 'well', 'above', 'expectations', 'and', 'a', 'startling', '190-point', 'drop', 'in', 'stock', 'prices', 'on', 'oct.', 'N', 'the', 'federal', 'reserve', '<unk>', 'short-term', 'interest', 'rates', 'knocking', 'fed', 'funds', 'from', 'around', 'N', 'N', 'to', 'N', 'N', 'N', '<eos>']
['<bos>', 'but', 'predictions', 'that', 'central', 'banks', 'of', 'the', 'group', 'of', 'seven', '<unk>', 'major', 'industrial', 'nations', 'would', 'continue', 'their', 'massive', 'dollar', 'sales'

['<bos>', 'under', 'the', 'firm', "'s", 'original', 'bank', 'credit', 'agreement', 'it', 'was', 'required', 'to', 'raise', '$', 'N', 'million', 'of', 'subordinated', 'debt', 'to', 'be', 'used', 'to', 'repay', 'some', 'of', 'the', 'bank', 'borrowings', 'drawn', 'to', 'redeem', '$', 'N', 'million', 'of', 'increasing', 'rate', 'debentures', 'in', 'august', '<eos>']
['<bos>', 'a', 'month', 'ago', 'when', 'beatrice', 'first', 'filed', 'to', 'sell', 'debt', 'the', 'company', 'had', 'planned', 'to', 'offer', '$', 'N', 'million', 'of', 'its', 'senior', 'subordinated', 'reset', 'notes', 'at', 'a', 'yield', 'of', 'N', 'N', 'N', '<eos>']
['<bos>', 'the', '$', 'N', 'million', 'in', 'senior', 'subordinated', 'floating-rate', 'notes', 'were', 'targeted', 'to', 'be', 'offered', 'at', 'a', 'price', 'to', 'float', 'four', 'percentage', 'points', 'above', 'the', 'three-month', 'libor', '<eos>']
['<bos>', 'by', 'october', 'however', 'market', 'conditions', 'had', 'deteriorated', 'and', 'the', 'reset', 'n

['<bos>', 'the', 'recent', 'outcry', 'over', 'program', 'trading', 'will', 'cast', 'a', 'pall', 'over', 'the', '<unk>', 'environment', 'in', 'the', 'coming', 'months', 'some', 'analysts', 'say', '<eos>']
['<bos>', 'the', 'public', 'is', 'very', 'close', 'to', 'having', 'had', 'it', 'mr.', '<unk>', 'says', '<eos>']
['<bos>', 'investors', 'pulled', 'back', 'from', 'bond', 'funds', 'in', 'september', 'too', '<eos>']
['<bos>', 'net', 'sales', 'of', 'bond', 'funds', 'for', 'the', 'month', 'totaled', '$', 'N', 'billion', 'down', 'two-thirds', 'from', '$', 'N', 'billion', 'in', 'august', '<eos>']
['<bos>', 'the', 'major', 'reason', 'heavy', 'outflows', 'from', 'high-risk', 'high-yield', 'junk', 'bond', 'funds', '<eos>']
['<bos>', 'big', 'withdrawals', 'from', 'the', 'junk', 'funds', 'have', 'continued', 'this', 'month', '<eos>']
['<bos>', 'overall', 'net', 'sales', 'of', 'all', 'mutual', 'funds', 'excluding', 'money', 'market', 'funds', 'fell', 'to', '$', 'N', 'billion', 'in', 'september', 'f

['<bos>', 'the', 'company', 'reported', 'third-quarter', 'operating', 'profit', 'of', 'N', 'cents', 'a', 'share', 'compared', 'with', 'N', 'cents', 'a', 'share', 'a', 'year', 'earlier', '<eos>']
['<bos>', 'a', 'third-quarter', 'charge', 'of', '$', 'N', 'million', 'related', 'to', 'planned', 'restaurant', 'closings', 'resulted', 'in', 'a', 'net', 'loss', 'for', 'the', 'quarter', '<eos>']
['<bos>', 'employers', 'casualty', 'which', 'reported', 'a', '$', 'N', 'million', 'third-quarter', 'loss', 'late', 'friday', 'fell', 'N', 'N', 'to', 'N', 'N', '<eos>']
['<bos>', 'the', 'loss', 'was', 'largely', 'due', 'to', 'a', '$', 'N', 'million', 'addition', 'to', 'reserves', '<eos>']
['<bos>', 'employers', 'casualty', 'had', 'a', 'loss', 'of', '$', 'N', 'million', 'in', 'the', 'year-earlier', 'quarter', '<eos>']
['<bos>', 'old', 'stone', 'fell', 'N', 'N', 'to', 'N', 'N', '<eos>']
['<bos>', 'late', 'friday', 'the', 'company', 'reported', 'a', 'loss', 'of', '$', 'N', 'million', 'for', 'the', 'third', 

['<bos>', 'over', 'the', 'next', 'few', 'years', 'i', 'would', 'advise', 'caution', '<eos>']
['<bos>', 'in', 'a', '<unk>', 'published', 'book', 'on', 'the', 'territory', 'a', 'political', 'economist', '<unk>', '<unk>', 'has', 'derived', 'three', 'future', 'scenarios', 'from', 'interviews', 'with', 'N', 'hong', 'kong', 'government', 'officials', 'and', 'businessmen', '<eos>']
['<bos>', 'nearly', 'half', 'of', 'them', 'argue', 'that', 'hong', 'kong', "'s", 'uneasy', 'relationship', 'with', 'china', 'will', '<unk>', 'though', 'not', 'inhibit', 'long-term', 'economic', 'growth', '<eos>']
['<bos>', 'the', 'rest', 'are', 'split', 'roughly', 'between', '<unk>', 'who', 'expect', 'hong', 'kong', 'to', '<unk>', 'along', 'as', 'before', 'and', '<unk>', 'who', '<unk>', '<unk>', 'chaos', '<eos>']
['<bos>', 'the', 'interviews', 'took', 'place', 'two', 'years', 'ago', '<eos>']
['<bos>', 'since', 'the', 'china', 'crisis', 'erupted', 'mr.', '<unk>', 'says', 'the', 'scenario', 'as', '<unk>', 'by', 'the'

['<bos>', 'the', 'opposition', 'labor', 'party', 'leader', 'neil', '<unk>', 'in', 'a', 'display', 'of', 'the', 'male', '<unk>', 'typical', 'of', 'the', 'british', 'lower', 'class', 'denounced', 'mrs.', 'thatcher', 'for', 'having', 'an', 'independent', 'mind', 'and', 'refusing', 'to', '<unk>', 'the', 'men', 'in', 'her', 'cabinet', '<eos>']
['<bos>', 'the', 'british', 'press', 'making', 'a', 'mountain', 'out', 'of', 'a', '<unk>', '<unk>', 'an', 'unnecessary', 'economic', 'crisis', 'by', 'portraying', 'mrs.', 'thatcher', 'as', 'an', '<unk>', 'who', 'had', 'thrown', 'economic', 'policy', 'into', 'confusion', 'by', 'driving', 'a', 'respected', 'figure', 'from', 'her', 'government', '<eos>']
['<bos>', 'behind', 'the', 'silly', '<unk>', 'lies', 'a', 'real', 'dispute', '<eos>']
['<bos>', 'mr.', 'lawson', 'and', 'his', '<unk>', 'colleagues', 'want', 'the', 'british', 'pound', 'formally', 'tied', 'to', 'the', 'west', 'german', 'mark', '<eos>']
['<bos>', 'sir', 'alan', 'considers', 'this', 'an', 

['<bos>', 'this', 'clause', 'they', 'argue', 'is', 'designed', 'to', 'go', 'beyond', 'an', 'earlier', 'clause', '<unk>', 'that', 'the', 'president', 'can', 'veto', 'a', 'bill', 'and', 'is', 'broad', 'enough', 'to', 'allow', 'him', 'to', 'strike', 'out', 'items', 'and', 'riders', 'within', 'bills', '<eos>']
['<bos>', 'senate', 'minority', 'leader', 'robert', 'dole', 'r.', 'kan.', 'for', 'one', '<unk>', 'this', 'argument', 'and', 'earlier', 'this', 'year', 'publicly', 'urged', 'mr.', 'bush', 'to', 'use', 'the', 'line-item', 'veto', 'and', 'allow', 'the', 'courts', 'to', 'decide', 'whether', 'or', 'not', 'it', 'is', 'constitutional', '<eos>']
['<bos>', 'there', "'s", 'little', 'doubt', 'that', 'such', 'a', 'move', 'would', 'be', 'immediately', 'challenged', 'in', 'court', 'and', 'that', 'it', 'would', 'quickly', 'make', 'its', 'way', 'to', 'the', 'supreme', 'court', 'to', 'be', 'ultimately', 'resolved', '<eos>']
['<bos>', 'it', "'s", 'a', 'major', 'issue', 'and', 'they', 'would', "n't", '

['<bos>', 'but', 'while', 'analysts', 'say', 'that', 'municipal', 'bonds', 'still', 'offer', 'good', 'value', 'you', 'would', "n't", 'know', 'it', 'by', 'the', 'way', 'institutional', 'investors', 'are', 'rushing', 'to', 'dump', 'their', 'holdings', '<eos>']
['<bos>', 'bond', 'market', 'analysts', 'say', 'the', 'institutional', 'selling', 'was', 'triggered', 'by', 'several', 'factors', '<eos>']
['<bos>', 'big', 'banks', 'such', 'as', 'chemical', 'bank', 'and', 'chase', 'manhattan', 'which', 'have', 'been', 'taking', 'heavy', 'charges', 'to', 'expand', 'their', 'third', 'world', 'loan-loss', 'reserves', 'are', "n't", 'looking', 'for', 'tax-exempt', 'income', '<eos>']
['<bos>', 'we', 'do', "n't", 'need', 'the', 'shelter', 'of', 'tax-free', 'bonds', 'said', 'a', 'spokeswoman', 'at', 'chemical', '<eos>']
['<bos>', 'in', 'recent', 'weeks', 'traders', 'said', 'chemical', 'has', 'sold', 'more', 'than', '$', 'N', 'billion', 'of', 'tax-free', 'bonds', '<eos>']
['<bos>', 'the', 'spokeswoman', 'c

['<bos>', 'wall', 'street', 'generally', 'likes', 'the', 'industry', 'again', '<eos>']
['<bos>', 'the', 'appetite', 'for', '<unk>', 'stocks', 'has', 'been', 'especially', 'strong', 'although', 'some', 'got', 'hit', 'yesterday', 'when', 'shearson', 'lehman', 'hutton', 'cut', 'its', 'short-term', 'investment', 'ratings', 'on', 'them', '<eos>']
['<bos>', 'contractors', 'such', 'as', 'parker', 'drilling', 'co.', 'are', 'raising', 'cash', 'again', 'through', 'stock', 'offerings', 'and', 'for', 'the', 'first', 'time', 'in', 'years', 'two', '<unk>', 'companies', 'recently', 'went', 'public', '<eos>']
['<bos>', 'they', 'are', 'grace', 'energy', 'corp.', 'of', 'dallas', 'and', 'marine', 'drilling', 'co.', 'of', 'houston', '<eos>']
['<bos>', 'most', 'oil', 'companies', 'are', 'still', 'reluctant', 'to', 'add', 'to', 'the', 'office', 'and', 'professional', 'staffs', 'they', 'slashed', 'so', 'deeply', '<eos>']
['<bos>', 'but', 'a', 'few', 'new', 'spots', 'are', 'opening', '<eos>']
['<bos>', 'arthu

['<bos>', 'now', 'she', 'says', 'she', "'s", 'thinking', 'of', '<unk>', 'her', 'own', 'insurance', 'agent', '<eos>']
['<bos>', 'for', 'ms.', 'johnson', 'dealing', 'with', 'the', 'earthquake', 'has', 'been', 'more', 'than', 'just', 'a', 'work', 'experience', '<eos>']
['<bos>', 'she', 'lives', 'in', 'oakland', 'a', 'community', 'hit', 'hard', 'by', 'the', 'earthquake', '<eos>']
['<bos>', 'she', 'did', "n't", 'have', 'hot', 'water', 'for', 'five', 'days', '<eos>']
['<bos>', 'the', 'apartment', 'she', 'shares', 'with', 'a', '<unk>', 'daughter', 'and', 'her', 'sister', 'was', 'rattled', 'books', 'and', 'crystal', 'hit', 'the', 'floor', 'but', 'nothing', 'was', 'severely', 'damaged', '<eos>']
['<bos>', 'her', 'sister', 'cynthia', 'wishes', '<unk>', 'had', 'a', 'different', 'job', '<eos>']
['<bos>', 'we', 'worry', 'about', 'her', 'out', 'there', 'cynthia', 'says', '<eos>']
['<bos>', 'last', 'sunday', 'ms.', 'johnson', 'finally', 'got', 'a', 'chance', 'to', 'water', 'her', 'plants', 'but', 'st

['<bos>', 'paribas', 'says', 'it', 'will', 'offer', 'N', 'francs', '$', 'N', 'each', 'for', 'navigation', 'mixte', 'shares', 'that', 'enjoy', 'full', 'dividend', 'rights', 'and', 'N', 'francs', 'each', 'for', 'a', 'block', 'of', 'shares', 'issued', 'july', 'N', 'which', 'will', 'receive', 'only', 'partial', 'dividends', 'this', 'year', '<eos>']
['<bos>', 'alternatively', 'it', 'is', 'to', 'offer', 'three', 'paribas', 'shares', 'for', 'one', 'navigation', 'mixte', 'share', '<eos>']
['<bos>', 'the', 'paribas', 'offer', 'values', 'navigation', 'mixte', 'at', 'about', 'N', 'billion', 'francs', 'depending', 'on', 'how', 'many', 'of', 'navigation', 'mixte', "'s", 'warrants', 'are', 'converted', 'into', 'shares', 'during', 'the', 'takeover', 'battle', '<eos>']
['<bos>', 'blockbuster', 'entertainment', 'corp.', 'said', 'it', 'raised', '$', 'N', 'million', 'from', 'an', 'offering', 'of', 'liquid', 'yield', 'option', 'notes', '<eos>']
['<bos>', 'the', 'gross', 'proceeds', 'from', 'the', 'sale', 

['<bos>', 'perhaps', 'in', 'time', 'the', 'supreme', 'court', 'will', 'correct', 'them', '<eos>']
['<bos>', 'but', 'writing', 'history', 'is', 'tough', 'enough', 'without', 'judges', '<unk>', 'throwing', 'obstacles', 'in', 'the', 'scholar', "'s", 'path', '<eos>']
['<bos>', 'mr.', '<unk>', 'is', 'albert', '<unk>', 'professor', 'of', 'the', '<unk>', 'at', 'the', 'city', 'university', 'of', 'new', 'york', 'and', 'a', 'winner', 'of', '<unk>', 'prizes', 'in', 'history', 'and', '<unk>', '<eos>']
['<bos>', '<unk>', '<unk>', 'N', 'years', 'old', 'senior', 'vice', 'president', 'marketing', 'at', '<unk>', 'entertainment', 'inc.', 'was', 'named', 'president', 'of', 'capitol', 'records', 'inc.', 'a', 'unit', 'of', 'this', 'entertainment', 'concern', '<eos>']
['<bos>', 'mr.', '<unk>', 'succeeds', 'david', '<unk>', 'who', 'resigned', 'last', 'month', '<eos>']
['<bos>', 'legal', '<unk>', 'in', 'america', 'have', 'a', 'way', 'of', 'assuming', 'a', '<unk>', 'significance', 'far', 'exceeding', 'what', '

['<bos>', 'their', 'reluctance', 'to', 'support', 'the', 'proposal', 'is', 'another', 'blow', 'to', 'the', 'capital-gains', 'cut', 'which', 'has', 'had', 'a', 'roller-coaster', 'existence', 'since', 'the', 'beginning', 'of', 'the', 'year', 'when', 'it', 'was', 'considered', 'dead', 'and', 'then', 'suddenly', 'revived', 'and', 'was', 'passed', 'by', 'the', 'house', '<eos>']
['<bos>', 'nevertheless', 'oregon', 'sen.', 'bob', 'packwood', 'the', 'ranking', 'gop', 'member', 'on', 'the', '<unk>', 'senate', 'finance', 'committee', 'last', 'night', 'introduced', 'his', 'plan', 'as', 'an', 'amendment', 'to', 'a', 'pending', 'measure', '<unk>', 'u.s.', 'aid', 'for', 'poland', 'and', 'hungary', '<eos>']
['<bos>', 'senate', 'majority', 'leader', 'george', 'mitchell', 'd.', 'maine', 'was', 'confident', 'he', 'had', 'enough', 'votes', 'to', 'block', 'the', 'maneuver', 'on', 'procedural', 'grounds', 'perhaps', 'as', 'soon', 'as', 'today', '<eos>']
['<bos>', 'mr.', 'packwood', 'all', 'but', 'conceded'

['<bos>', 'the', 'soviet', 'union', 'has', 'purchased', 'roughly', 'eight', 'million', 'tons', 'of', 'grain', 'this', 'month', 'and', 'is', 'expected', 'to', 'take', 'delivery', 'by', 'year', 'end', 'analysts', 'said', '<eos>']
['<bos>', 'cotton', '<eos>']
['<bos>', 'futures', 'prices', 'rose', 'modestly', 'but', 'trading', 'volume', 'was', "n't", 'very', 'heavy', '<eos>']
['<bos>', 'the', 'december', 'contract', 'settled', 'at', 'N', 'cents', 'a', 'pound', 'up', 'N', 'cent', 'but', 'it', 'rose', 'as', 'high', 'as', 'N', 'cents', '<eos>']
['<bos>', 'several', 'cotton', 'analysts', 'said', 'that', 'the', 'move', 'appeared', 'to', 'be', 'mostly', 'technical', '<eos>']
['<bos>', 'traders', 'who', 'had', 'sold', 'contracts', 'earlier', 'in', 'hopes', 'of', 'buying', 'them', 'back', 'at', 'lower', 'prices', 'yesterday', 'were', 'buying', 'contracts', 'back', 'at', 'higher', 'prices', 'to', 'limit', 'their', 'losses', '<eos>']
['<bos>', 'floor', 'traders', 'also', 'said', 'that', 'the', 'mar

['<bos>', 'accepted', 'bids', 'ranged', 'from', 'N', 'N', 'to', 'N', 'N', '<eos>']
['<bos>', 'however', 'citicorp', 'said', 'that', 'the', 'average', 'rate', 'fell', 'to', 'N', 'N', 'at', 'its', '$', 'N', 'million', 'auction', 'of', '<unk>', 'commercial', 'paper', 'from', 'N', 'N', 'at', 'last', 'week', "'s", 'sale', '<eos>']
['<bos>', 'bids', 'totaling', '$', 'N', 'million', 'were', 'submitted', '<eos>']
['<bos>', 'accepted', 'bids', 'were', 'all', 'at', 'N', 'N', '<eos>']
['<bos>', 'the', 'bank', 'holding', 'company', 'will', 'auction', 'another', '$', 'N', 'million', 'in', 'each', 'maturity', 'next', 'tuesday', '<eos>']
['<bos>', 'hughes', 'aircraft', 'co.', 'a', 'general', 'motors', 'corp.', 'unit', 'said', 'the', '<unk>', '<unk>', 'commercial', 'communications', 'satellite', 'is', 'set', 'to', 'be', 'launched', 'friday', '<eos>']
['<bos>', 'the', 'satellite', 'built', 'by', 'hughes', 'for', 'the', 'international', 'telecommunications', 'satellite', 'organization', 'is', 'part', 'o

['<bos>', '<unk>', 'hyman', 'vice', 'president', 'of', 'equity', 'research', 'for', 'first', 'boston', 'corp.', 'expects', 'p&g', 'to', 'post', 'net', 'of', 'about', '$', 'N', 'a', 'share', 'on', 'a', '<unk>', 'basis', '<eos>']
['<bos>', 'but', 'i', "'m", 'recognizing', 'there', "'s", 'a', 'good', 'chance', 'they', "'ll", 'do', 'a', 'bit', 'better', 'than', 'that', 'she', 'says', '<eos>']
['<bos>', 'in', 'fiscal', 'N', 'p&g', 'earned', '$', 'N', 'a', 'share', 'adjusted', 'for', 'the', 'stock', 'split', '<eos>']
['<bos>', 'one', 'big', 'factor', 'affecting', 'the', 'fiscal', 'second', 'half', 'will', 'be', 'the', 'new', '<unk>', 'of', 'edwin', 'l.', '<unk>', 'who', 'becomes', 'chairman', 'and', 'chief', 'executive', 'officer', 'in', 'january', '<eos>']
['<bos>', 'because', 'of', 'his', 'remarkable', 'success', 'turning', 'around', 'p&g', "'s", 'international', 'operations', 'analysts', 'have', 'high', 'hopes', 'for', 'his', 'tenure', '<eos>']
['<bos>', 'if', 'he', 'does', 'to', 'the', '

['<bos>', 'homefed', 'had', 'been', 'one', 'of', 'the', 'handful', 'of', 'large', 'west', 'coast', 'thrifts', 'that', 'in', 'recent', 'quarters', 'had', '<unk>', 'interest-rate', 'problems', '<unk>', 'the', 'industry', 'by', 'keeping', 'a', 'lid', 'on', 'problem', 'assets', 'and', 'lending', 'heavily', 'into', 'the', '<unk>', 'california', 'housing', 'market', '<eos>']
['<bos>', 'analysts', 'had', 'been', 'projecting', 'fully', 'diluted', 'earnings', 'in', 'the', 'third', 'quarter', 'in', 'the', 'range', 'of', 'about', '$', 'N', 'a', 'share', '<eos>']
['<bos>', 'however', 'homefed', "'s", 'loan', '<unk>', 'and', 'purchases', 'plunged', 'N', 'N', 'in', 'the', 'quarter', 'to', '$', 'N', 'billion', 'from', '$', 'N', 'billion', 'a', 'year', 'earlier', '<eos>']
['<bos>', 'meanwhile', '<unk>', 'assets', 'rose', 'to', '$', 'N', 'million', 'from', '$', 'N', 'million', '<eos>']
['<bos>', 'some', '$', 'N', 'million', 'of', 'the', 'troubled', 'assets', 'is', '<unk>', 'real', 'estate', 'a', 'N', '

['<bos>', 'stock', 'prices', 'swung', 'wildly', 'as', 'the', 'market', 'reacted', 'to', 'an', 'initial', 'plunge', 'by', 'ual', 'shares', 'followed', 'by', 'a', 'sharp', 'rebound', 'in', 'the', 'afternoon', '<eos>']
['<bos>', 'the', 'dow', 'jones', 'industrials', 'down', 'over', 'N', 'points', 'in', 'the', 'morning', 'closed', 'off', 'N', 'at', 'N', '<eos>']
['<bos>', 'bond', 'prices', 'surged', 'in', 'reaction', 'to', 'the', 'sell-off', 'in', 'stocks', 'then', 'eased', 'slightly', 'during', 'the', 'afternoon', 'recovery', '<eos>']
['<bos>', 'the', 'dollar', 'finished', 'lower', '<eos>']
['<bos>', 'ual', "'s", 'stock', 'regained', 'most', 'of', 'an', 'early', 'loss', 'amid', 'speculation', 'one', 'or', 'more', 'investors', 'may', 'challenge', 'the', 'airline', "'s", 'decision', 'to', 'stay', 'independent', '<eos>']
['<bos>', 'the', 'stock', 'closed', 'down', '$', 'N', 'at', '$', 'N', 'after', 'plunging', '$', 'N', 'to', '$', 'N', '<eos>']
['<bos>', 'ford', 'may', 'seek', 'all', 'of', '

['<bos>', 'its', 'chief', '<unk>', '<unk>', '<unk>', '<unk>', '<unk>', 'his', '<unk>', 'the', '<unk>', 'for', 'its', 'tactical', '<unk>', 'at', '<unk>', 'out', 'of', 'horrible', 'positions', '<eos>']
['<bos>', 'd.t.', 'also', 'has', 'a', '<unk>', 'and', '<unk>', 'memory', 'is', 'utterly', '<unk>', 'and', 'could', "n't", 'be', '<unk>', 'by', 'the', '<unk>', '<unk>', '<unk>', 'spread', 'around', 'the', 'playing', 'hall', 'in', 'the', 'new', 'york', 'academy', 'of', 'art', '<eos>']
['<bos>', 'in', 'fact', 'd.t.', 'never', 'left', 'home', '<unk>', 'mellon', 'university', 'in', 'pittsburgh', 'but', '<unk>', 'with', 'its', 'human', '<unk>', 'by', 'telephone', 'link', '<eos>']
['<bos>', 'they', 'conceded', 'that', 'the', 'odds', 'favored', 'mr.', 'kasparov', 'but', 'they', 'put', 'their', 'hope', 'in', 'd.t.', "'s", 'recently', 'enhanced', 'capacity', 'for', '<unk>', 'positions', 'up', 'to', 'a', 'million', 'per', 'second', 'from', 'N', '<eos>']
['<bos>', 'but', 'the', '<unk>', 'mistakenly', 

['<bos>', 'atlantic', 'richfield', '<eos>']
['<bos>', 'citing', 'its', 'reduced', 'ownership', 'in', 'the', 'lyondell', 'petrochemical', 'co.', 'atlantic', 'richfield', 'reported', 'that', 'net', 'income', 'slid', 'N', 'N', 'in', 'the', 'third', 'quarter', 'to', '$', 'N', 'million', 'or', '$', 'N', 'a', 'share', 'from', '$', 'N', 'million', 'or', '$', 'N', 'a', 'share', 'for', 'the', 'comparable', 'period', 'last', 'year', '<eos>']
['<bos>', 'sales', 'fell', 'N', 'N', 'to', '$', 'N', 'billion', 'from', '$', 'N', 'billion', '<eos>']
['<bos>', 'arco', "'s", 'earnings', 'from', 'its', 'N', 'N', 'stake', 'in', 'lyondell', 'fell', 'to', '$', 'N', 'million', 'from', '$', 'N', 'million', 'for', 'the', 'same', 'period', 'last', 'year', 'when', 'lyondell', 'was', 'wholly', 'owned', '<eos>']
['<bos>', 'offsetting', 'the', 'lower', 'stake', 'in', 'lyondell', 'were', 'higher', 'crude', 'oil', 'prices', 'increased', 'natural', 'gas', 'volumes', 'and', 'higher', 'coke', 'prices', 'the', 'company', '

['<bos>', 'the', 'house', 'appropriations', 'committee', 'approved', 'a', '$', 'N', 'billion', 'aid', 'package', 'for', 'the', 'quake', 'region', 'less', 'than', 'the', '$', 'N', 'billion', 'sought', 'by', 'california', 'officials', '<eos>']
['<bos>', 'hungary', 'declared', 'itself', 'a', 'democracy', 'and', 'for', 'the', 'first', 'time', 'openly', '<unk>', 'the', 'anniversary', 'of', 'the', 'N', '<unk>', '<unk>', 'that', 'was', 'crushed', 'by', 'the', 'soviet', 'union', '<eos>']
['<bos>', 'a', 'crowd', 'estimated', 'at', 'N', 'held', 'a', '<unk>', 'march', 'through', '<unk>', 'as', 'acting', 'president', '<unk>', 'delivered', 'a', 'nationally', 'televised', 'address', 'rejecting', 'communist', 'dominance', '<eos>']
['<bos>', 'about', 'N', 'east', 'germans', 'marched', 'in', 'leipzig', 'and', 'thousands', 'more', 'staged', 'protests', 'in', 'three', 'other', 'cities', 'in', 'a', 'fresh', 'challenge', 'to', 'the', 'communist', 'leadership', 'to', 'introduce', 'democratic', 'freedoms', '

['<bos>', 'its', 'N', 'workers', 'who', 'had', 'battled', 'tiger', "'s", 'management', 'for', 'years', 'over', '<unk>', 'were', 'union', 'members', 'until', 'the', 'day', 'of', 'the', 'merger', 'when', 'most', 'of', 'their', 'unions', 'were', 'automatically', '<unk>', '<eos>']
['<bos>', 'soon', 'after', 'the', 'merger', 'moreover', 'federal', "'s", 'management', 'asked', 'tiger', "'s", 'pilots', 'to', 'sign', 'an', 'agreement', '<unk>', 'that', 'they', 'could', 'be', 'fired', 'any', 'time', 'without', 'cause', 'or', 'notice', '<eos>']
['<bos>', 'when', 'the', 'pilots', 'refused', 'the', 'company', '<unk>', 'it', '<eos>']
['<bos>', 'mr.', 'smith', 'angered', 'federal', "'s", 'pilots', 'too', '<eos>']
['<bos>', 'in', 'his', '<unk>', 'to', 'seal', 'the', 'deal', 'with', 'tiger', 'chairman', 'saul', 'steinberg', 'last', 'august', 'mr.', 'smith', 'ignored', 'a', 'promise', 'that', 'he', 'had', 'made', 'to', 'his', 'own', 'pilots', 'three', 'years', 'ago', 'that', 'any', '<unk>', 'acquired',

['<bos>', '<unk>', 'soda', 'co', 'japan', '$', 'N', 'million', 'of', 'eurobonds', 'due', 'nov.', 'N', 'N', 'with', 'equity-purchase', 'warrants', 'indicating', 'a', 'N', 'N', 'coupon', 'at', 'par', 'via', 'nomura', 'international', 'ltd', '<eos>']
['<bos>', 'each', '$', 'N', 'bond', 'carries', 'one', 'warrant', 'exercisable', 'from', 'nov.', 'N', 'N', 'through', 'oct.', 'N', 'N', 'to', 'buy', 'company', 'shares', 'at', 'an', 'expected', 'premium', 'of', 'N', 'N', 'N', 'to', 'the', 'closing', 'share', 'price', 'when', 'terms', 'are', 'fixed', 'oct.', 'N', '<eos>']
['<bos>', 'for', 'bankers', 'and', 'regulators', 'arizona', 'is', 'looking', 'more', 'like', 'texas', 'every', 'day', '<eos>']
['<bos>', 'on', 'friday', 'los', 'angeles-based', 'first', 'interstate', 'bancorp', 'said', 'it', 'expects', 'a', 'net', 'loss', 'of', '$', 'N', 'million', 'for', 'the', 'third', 'quarter', 'of', 'N', 'because', 'of', 'hemorrhaging', 'at', 'its', 'first', 'interstate', 'bank', 'of', 'arizona', 'unit', 

['<bos>', 'but', 'not', 'all', 'strategists', 'or', 'money', 'managers', 'are', 'ready', 'to', 'throw', 'in', 'the', 'towel', 'completely', 'on', '<unk>', '<eos>']
['<bos>', 'growth', 'stocks', 'may', '<unk>', 'cyclical', 'stocks', 'next', 'year', 'if', 'the', 'federal', 'reserve', 'begins', 'to', 'let', 'interest', 'rates', '<unk>', 'sufficiently', 'lower', 'to', 'boost', 'the', 'economy', '<eos>']
['<bos>', 'goldman', 'sachs', "'s", 'mr.', 'einhorn', 'for', 'one', '<unk>', 'to', 'that', 'scenario', '<eos>']
['<bos>', 'he', 'suggests', 'investors', 'think', 'about', 'buying', 'cyclical', 'shares', 'in', 'the', 'weeks', 'ahead', 'as', 'well', 'as', 'growth', 'issues', '<eos>']
['<bos>', 'friday', "'s", 'market', 'activity', '<eos>']
['<bos>', 'stock', 'prices', 'finished', 'about', 'unchanged', 'friday', 'in', 'quiet', 'expiration', 'trading', '<eos>']
['<bos>', 'traders', 'anticipated', 'a', 'volatile', 'session', 'due', 'to', 'the', 'october', 'expiration', 'of', 'stock-index', 'futu

['<bos>', 'the', 'incident', 'occurred', 'saturday', 'night', '<eos>']
['<bos>', 'the', 'sandinista', 'government', 'and', 'the', '<unk>', '<unk>', 'agreed', 'in', 'march', 'to', 'suspend', 'offensive', 'operations', 'but', 'there', 'has', 'been', 'sporadic', 'fighting', '<eos>']
['<bos>', 'scientists', 'have', 'isolated', 'a', '<unk>', 'that', 'may', 'hold', 'potential', 'as', 'a', 'treatment', 'for', 'disruptions', 'of', 'the', 'immune', 'system', 'ranging', 'from', '<unk>', 'rejection', 'to', '<unk>', 'and', '<unk>', '<unk>', 'corp.', 'said', '<eos>']
['<bos>', 'the', '<unk>', 'is', 'the', 'mouse', 'version', 'of', 'a', 'protein', 'called', 'the', '<unk>', '<unk>', 'which', 'directs', 'the', 'growth', 'and', 'function', 'of', 'white', 'blood', 'cells', '<eos>']
['<bos>', 'died', 'alfred', '<unk>', 'N', 'former', 'president', 'of', 'the', 'federal', 'reserve', 'bank', 'of', 'new', 'york', 'saturday', 'in', 'new', '<unk>', 'conn', '<eos>']
['<bos>', 'contel', 'corp.', 'said', 'third-q

['<bos>', 'he', 'said', 'rorer', 'sold', 'the', 'drugs', 'for', 'nice', 'prices', 'and', 'will', 'record', 'a', 'combined', 'pretax', 'gain', 'on', 'the', 'sales', 'of', '$', 'N', 'million', '<eos>']
['<bos>', 'as', 'the', 'gain', 'from', 'the', 'sales', 'indicates', 'operating', 'profit', 'was', 'significantly', 'below', 'the', 'year-earlier', 'level', 'mr.', '<unk>', 'said', '<eos>']
['<bos>', 'rorer', 'in', 'july', 'had', 'projected', 'lower', 'third-quarter', 'operating', 'profit', 'but', 'higher', 'profit', 'for', 'all', 'of', 'N', '<eos>']
['<bos>', 'he', 'said', 'the', 'company', 'is', 'still', 'looking', 'for', 'a', 'strong', 'fourth', 'quarter', 'in', 'all', 'areas', 'sales', 'operating', 'income', 'and', 'net', 'income', '<eos>']
['<bos>', 'mr.', '<unk>', 'attributed', 'the', 'decline', 'in', 'third-quarter', 'operating', 'profit', 'to', 'the', 'stronger', 'dollar', 'which', 'reduces', 'the', 'value', 'of', 'overseas', 'profit', 'when', 'it', 'is', 'translated', 'into', 'doll

['<bos>', 'for', 'more', 'than', 'a', 'decade', 'banks', 'have', 'been', 'pressing', 'congress', 'and', 'banking', 'regulators', 'for', 'expanded', 'powers', 'to', 'act', 'like', 'securities', 'firms', 'in', 'playing', 'wall', 'street', "'s", 'lucrative', 'takeover', 'game', 'from', 'giving', 'mergers', 'advice', 'all', 'the', 'way', 'to', 'selling', 'and', 'trading', 'high-yield', 'junk', 'bonds', '<eos>']
['<bos>', 'those', 'expanded', 'powers', 'reached', 'their', 'zenith', 'in', 'july', 'when', 'bankers', 'trust', 'new', 'york', 'corp.', 'provided', 'mergers', 'advice', 'an', 'equity', 'investment', 'and', 'bank', 'loans', 'for', 'the', '$', 'N', 'billion', 'leveraged', 'buy-out', 'of', 'northwest', 'airlines', 'parent', 'nwa', 'inc', '<eos>']
['<bos>', 'one', 'of', 'the', 'major', 'selling', 'points', 'used', 'by', 'los', 'angeles', 'financier', 'alfred', '<unk>', 'in', 'getting', 'the', 'takeover', 'approved', 'was', 'that', 'the', 'deal', 'did', "n't", 'include', 'any', 'junk', 

['<bos>', 'the', 'catalyst', 'has', 'been', 'the', 'congressional', 'move', 'to', 'restore', '<unk>', 'tax', 'treatment', 'for', 'capital', 'gains', 'an', 'effort', 'that', 'is', 'likely', 'to', 'succeed', 'in', 'this', 'congress', '<eos>']
['<bos>', 'other', 'fundamental', 'reforms', 'of', 'the', 'N', 'act', 'have', 'been', 'threatened', 'as', 'well', '<eos>']
['<bos>', 'the', 'house', 'seriously', 'considered', 'raising', 'the', 'top', 'tax', 'rate', 'paid', 'by', 'individuals', 'with', 'the', 'highest', 'incomes', '<eos>']
['<bos>', 'the', 'senate', 'finance', 'committee', 'voted', 'to', 'expand', 'the', 'deduction', 'for', 'individual', 'retirement', 'accounts', 'and', 'also', 'to', 'bring', 'back', 'income', 'averaging', 'for', 'farmers', 'a', 'tax', 'preference', 'that', 'allows', 'income', 'to', 'be', 'spread', 'out', 'over', 'several', 'years', '<eos>']
['<bos>', 'as', 'part', 'of', 'the', 'same', 'bill', 'the', 'finance', 'panel', 'also', 'voted', 'in', 'favor', 'of', 'billion

['<bos>', 'new', 'york', 'times', 'co.', "'s", 'third-quarter', 'earnings', 'report', 'is', 'reinforcing', 'analysts', "'", 'belief', 'that', 'newspaper', 'publishers', 'will', 'be', 'facing', 'continued', 'poor', 'earnings', 'comparisons', 'through', 'N', '<eos>']
['<bos>', 'the', 'publisher', 'was', 'able', 'to', 'register', 'soaring', 'quarter', 'net', 'income', 'because', 'of', 'a', '<unk>', 'gain', 'on', 'the', 'sale', 'of', 'its', 'cable-tv', 'system', '<eos>']
['<bos>', 'however', 'operating', 'profit', 'fell', 'N', 'N', 'to', '$', 'N', 'million', '<eos>']
['<bos>', 'the', 'decline', 'reflected', 'the', 'expense', 'of', 'buying', 'three', 'magazines', 'lower', 'earnings', 'from', 'the', 'forest-products', 'group', 'and', 'what', 'is', 'proving', 'to', 'be', 'a', 'nagging', 'major', 'problem', 'continued', 'declines', 'in', 'advertising', '<unk>', 'at', 'the', 'new', 'york', 'times', 'the', 'company', "'s", 'flagship', 'daily', 'newspaper', '<eos>']
['<bos>', 'in', 'composite', '

['<bos>', 'land', 'and', 'other', 'real', 'estate', 'land', 'on', 'which', 'primary', 'home', 'is', 'built', 'investment', 'property', '<eos>']
['<bos>', 'consumer', '<unk>', 'automobiles', 'appliances', 'furniture', '<eos>']
['<bos>', 'bank', 'deposits', 'currency', '<unk>', 'deposits', 'small', 'savings', 'and', 'time', 'deposits', 'certificates', 'of', 'deposits', 'money-market', 'fund', 'shares', '<eos>']
['<bos>', 'bonds', '<unk>', 'bond', 'funds', '<eos>']
['<bos>', '<unk>', 'funds', 'stocks', 'and', 'mutual', 'funds', 'other', 'than', 'money-market', 'funds', '<eos>']
['<bos>', '<unk>', 'business', 'partnerships', 'and', 'sole', '<unk>', 'professional', 'corporations', '<eos>']
['<bos>', 'pension', 'reserves', 'holdings', 'by', 'pension', 'funds', '<eos>']
['<bos>', 'mccaw', 'cellular', 'communications', 'inc.', 'said', 'it', 'sent', 'a', 'letter', 'to', 'lin', 'broadcasting', 'corp.', '<unk>', 'its', 'revised', 'tender', 'offer', 'for', 'lin', 'and', 'asking', 'lin', 'to', 'con

['<bos>', 'congress', 'sent', 'president', 'bush', 'an', '$', 'N', 'billion', 'fiscal', 'N', 'treasury', 'and', 'postal', 'service', 'bill', 'providing', '$', 'N', 'billion', 'for', 'the', 'internal', 'revenue', 'service', 'and', 'increasing', 'the', 'customs', 'service', "'s", '<unk>', 'program', 'nearly', 'a', 'third', '<eos>']
['<bos>', 'final', 'approval', 'came', 'on', 'a', 'simple', 'voice', 'vote', 'in', 'the', 'senate', 'and', 'the', 'swift', 'passage', '<unk>', 'with', 'months', 'of', 'negotiations', 'over', 'the', 'underlying', 'bill', 'which', 'is', '<unk>', 'with', 'special-interest', 'provisions', 'for', 'both', 'members', 'and', 'the', 'executive', 'branch', '<eos>']
['<bos>', 'an', 'estimated', '$', 'N', 'million', 'was', 'added', 'for', 'university', 'and', 'science', 'grants', 'including', '$', 'N', 'million', 'for', 'smith', 'college', '<eos>']
['<bos>', 'and', 'southwest', 'lawmakers', 'were', 'a', 'driving', 'force', 'behind', '$', 'N', 'million', 'for', '<unk>', 'b

['<bos>', 'that', 'includes', 'the', '<unk>', 'funds', 'and', 'the', 'federal', 'housing', 'administration', 'which', 'loans', 'out', 'money', 'for', 'private', 'home', 'mortgages', 'and', 'has', 'just', 'been', 'discovered', 'to', 'be', '$', 'N', 'billion', 'in', 'the', 'hole', '<eos>']
['<bos>', 'selling', 'the', 'fha', "'s", 'loan', 'portfolio', 'to', 'the', 'highest', 'bidder', 'would', 'save', 'the', 'taxpayers', '<unk>', 'billions', 'in', 'future', 'losses', '<eos>']
['<bos>', 'some', 'hud', 'money', 'actually', 'does', '<unk>', 'down', 'to', 'the', 'poor', 'and', '<unk>', 'out', 'housing', 'middlemen', 'would', 'free', 'up', 'more', 'money', 'for', 'public', 'housing', 'tenants', 'to', 'manage', 'and', 'even', 'own', 'their', 'units', '<eos>']
['<bos>', 'the', 'rest', 'ought', 'to', 'be', 'used', 'to', 'clean', 'out', 'drugs', 'from', 'the', '<unk>', '<eos>']
['<bos>', 'rival', 'gangs', 'have', 'turned', 'cities', 'into', 'combat', 'zones', '<eos>']
['<bos>', 'even', 'suburban',

['<bos>', 'nbc', 'broadcast', 'throughout', 'the', 'entire', 'night', 'and', 'did', 'not', 'go', 'off', 'the', 'air', 'until', 'noon', 'yesterday', '<eos>']
['<bos>', 'the', 'quake', 'postponed', 'the', 'third', 'and', 'fourth', 'games', 'of', 'the', 'world', 'series', '<eos>']
['<bos>', 'in', 'place', 'of', 'the', 'games', 'abc', 'said', 'it', 'planned', 'to', 'broadcast', 'next', 'week', "'s", 'episodes', 'of', 'its', 'prime-time', 'wednesday', 'and', 'thursday', '<unk>', 'except', 'for', 'a', 'one-hour', 'special', 'on', 'the', 'earthquake', 'at', 'N', 'p.m.', 'last', 'night', '<eos>']
['<bos>', 'the', 'series', 'is', 'scheduled', 'to', 'resume', 'tuesday', 'evening', 'in', 'san', 'francisco', '<eos>']
['<bos>', 'there', 'are', 'no', 'commercials', 'to', 'make', 'up', 'for', 'since', 'we', "'re", 'going', 'to', 'eventually', 'broadcast', 'the', 'world', 'series', 'said', 'a', 'network', 'spokesman', '<eos>']
['<bos>', 'pinnacle', 'west', 'capital', 'corp.', 'said', 'it', 'suspended'

['<bos>', 'in', 'texas', 'after', 'hurricane', '<unk>', 'major', 'grocery', 'chains', 'used', 'their', 'truck', 'fleets', 'to', 'ship', 'essential', 'goods', 'to', 'houston', 'no', '<unk>', 'just', 'good', 'will', '<eos>']
['<bos>', 'tom', '<unk>', '<eos>']
['<bos>', '<unk>', 'texas', '<eos>']
['<bos>', 'we', 'here', 'in', 'the', 'affected', 'areas', 'were', '<unk>', 'by', 'mr.', 'laband', "'s", 'analysis', 'of', 'time', 'values', 'and', 'his', 'comparisons', 'of', 'effectiveness', 'concerning', 'research', 'and', 'development', '<eos>']
['<bos>', 'his', 'theoretical', 'approach', 'and', 'its', 'publication', 'in', 'this', 'venerable', 'paper', 'are', 'no', 'doubt', 'a', '<unk>', '<unk>', 'for', 'him', '<eos>']
['<bos>', 'too', 'bad', 'theory', 'fails', 'in', 'practice', '<eos>']
['<bos>', 'we', 'consumers', 'tend', 'to', 'have', 'long', 'memories', '<eos>']
['<bos>', 'the', 'businesses', '<unk>', 'to', 'mr.', 'laband', "'s", 'effective', 'price', 'system', 'will', 'be', 'remembered', 

['<bos>', 'experts', 'on', 'sales', 'technique', 'say', 'anyone', 'representing', 'a', 'troubled', 'company', 'must', 'walk', 'a', 'fine', 'line', '<eos>']
['<bos>', 'if', 'a', 'salesman', '<unk>', 'his', 'credibility', 'in', 'this', 'time', 'of', 'trouble', 'it', 'will', 'be', 'a', 'problem', 'for', 'the', 'long', 'run', 'says', 'george', '<unk>', 'a', '<unk>', 'nev.', 'sales', 'consultant', 'and', 'author', 'of', 'the', 'marketing', 'edge', '<eos>']
['<bos>', 'still', 'says', 'john', 'sullivan', 'a', 'management', '<unk>', 'with', 'daniel', 'roberts', 'inc.', 'of', 'boston', 'who', 'has', 'held', 'senior', 'sales', 'positions', 'at', 'polaroid', 'and', '<unk>', 'the', 'customer', 'will', 'react', 'to', 'strength', '<eos>']
['<bos>', 'ignore', 'the', 'present', 'condition', '<eos>']
['<bos>', 'show', 'it', "'s", 'business', 'as', 'usual', '<eos>']
['<bos>', 'that', 'is', "n't", 'easy', '<eos>']
['<bos>', 'wang', "'s", 'customers', 'are', 'data', 'processing', 'managers', 'who', 'want'

['<bos>', 'the', 'contract', 'was', 'negotiated', 'by', 'the', 'countries', "'", 'two', 'prime', 'ministers', 'and', 'was', 'supposed', 'to', 'be', 'free', 'of', 'commissions', 'or', 'agents', "'", 'costs', '<eos>']
['<bos>', 'in', 'april', 'N', 'evidence', 'surfaced', 'that', 'commissions', 'were', 'paid', '<eos>']
['<bos>', 'the', 'opposition', 'charged', 'that', 'the', 'money', 'was', 'used', 'to', 'bribe', 'indian', 'government', 'officials', 'an', '<unk>', 'denied', 'by', 'mr.', 'gandhi', "'s", 'administration', '<eos>']
['<bos>', 'but', 'many', 'of', 'his', 'statements', 'on', 'the', 'issue', 'in', 'parliament', 'subsequently', 'were', 'proven', 'wrong', 'by', '<unk>', 'evidence', '<eos>']
['<bos>', 'the', 'scandal', 'has', 'faded', 'and', '<unk>', 'but', 'recent', 'disclosures', 'propelled', 'it', 'back', 'onto', 'the', 'front', 'pages', 'and', 'that', 'has', 'helped', '<unk>', 'the', 'opposition', 'which', 'last', 'week', 'blocked', 'passage', 'of', 'two', 'constitutional', 'am

['<bos>', 'there', "'s", 'quite', 'a', 'bit', 'of', 'value', 'left', 'in', 'the', 'jaguar', 'shares', 'here', 'even', 'though', 'they', 'have', 'run', 'up', 'lately', 'says', 'doug', 'johnson', 'a', 'fund', 'manager', 'for', '<unk>', '<unk>', 'asset', 'management', '<eos>']
['<bos>', 'at', 'the', 'moment', 'he', 'intends', 'to', 'keep', 'the', 'firm', "'s", 'N', 'jaguar', 'shares', '<eos>']
['<bos>', 'the', 'risk', 'is', 'that', 'jaguar', "'s", 'share', 'price', 'could', 'slump', 'if', 'gm', "'s", 'agreement', 'with', 'jaguar', 'effectively', '<unk>', 'out', 'its', 'u.s.', 'rival', '<eos>']
['<bos>', 'ford', "'s", 'appetite', 'to', 'attack', 'jaguar', 'could', 'gradually', '<unk>', 'over', 'time', 'particularly', 'if', 'saab', 'is', 'a', 'reasonably', 'attractive', 'proposition', 'says', 'john', 'lawson', 'an', 'auto', 'analyst', 'at', 'london', "'s", 'nomura', 'research', 'institute', '<eos>']
['<bos>', 'he', 'thinks', 'saab-scania', 'ab', 'on', 'friday', 'will', 'announce', 'the', 's

In [None]:
def compute_prob0(sentence, dictionary_prob):
    prob = 1
    for word in sentence.split():
        if word in dictionary:
            prob = prob*dictionary_prob[word]
        else:
            prob = prob * dictionary_prob['<unk>']
    return prob

If you completed the task correctly then your function should compute:

p("my name is madina") = 8.512130344833542e-10

p("hello how are you") = 6.89010961921847e-13

p("this is an estimate") = 4.418442586905126e-10

In [13]:
p("my" | "\<bos>") * p("name" | "my") * p("is" | "name") * p("\<unk>" | "is") * p("\<eos>" | "\<unk>")

NameError: name 'p' is not defined

### Task 2: Predicting the most likely next words (Autocomplete)
In this task, you will implement autocomplete function which is similar to the message completion service in mobile phones.

Write a function **autofill** which takes a word and integer k as input (and maybe some additional arguments) and returns k most probable words to follow it according to the estimates computed by first order Markov chain.


If you completed the task correctly, your function should return that the 5 most probable words to follow "san" are \['francisco', '\<unk>', 'jose', 'diego', 'antonio'\]

### Task 3: Generate a text (zeroth order Markov chain)
Write a function **generate_text0** which takes as an input integer k (and maybe some additional arguments) and generates k sentences sampled from the probability distribution estimated by the zeroth order Markov chain. 

The length of each sentence should be at least 3 words, including \<bos> and \<eos>. For example "\<bos> is \<eos>".

\<bos> and \<eos> must not appear in the middle of a sentence.

Hint: you can use random.choices() for sampling.

If you completed the task correctly, then for k = 2 your output would look like this (but with different sentences):

['\<bos> fees property the year 's drop world died or \<unk> j. trust the which for meanwhile action economic criminal germany \<unk> by with white \<eos>',

 '\<bos> koch to a N laws \<eos>']

### Task 4: Generate a text (first order Markov chain)
Write a function **generate_text1** which takes as an input integer k (and maybe some additional arguments) and generates k sentences sampled from the probability distribution estimated using the first order Markov chain.

The length of each sentence should be at least 3 words, including \<bos> and \<eos>. For example "\<bos> is \<eos>".

\<bos> and \<eos> must not appear in the middle of a sentence.

Hint: you can use random.choices() for sampling.

If you completed the task correctly, then for k = 2 your output would look like this (but with different sentences):

['\<bos> what happened at st. louis assembly business \<eos>',

 '\<bos> integrated combines some people familiar with forecasts \<eos>',