In [1]:
sen="""1 Introduction to Machine Learning
Machine learning is a set of tools that, broadly speaking, allow us to “teach” computers how to
perform tasks by providing examples of how they should be done. For example, suppose we wish
to write a program to distinguish between valid email messages and unwanted spam. We could try
to write a set of simple rules, for example, flagging messages that contain certain features (such
as the word “viagra” or obviously-fake headers). However, writing rules to accurately distinguish
which text is valid can actually be quite difficult to do well, resulting either in many missed spam
messages, or, worse, many lost emails. Worse, the spammers will actively adjust the way they
send spam in order to trick these strategies (e.g., writing “vi@gr@”). Writing effective rules —
and keeping them up-to-date — quickly becomes an insurmountable task. Fortunately, machine
learning has provided a solution. Modern spam filters are “learned” from examples: we provide the
learning algorithm with example emails which we have manually labeled as “ham” (valid email)
or “spam” (unwanted email), and the algorithms learn to distinguish between them automatically.
Machine learning is a diverse and exciting field, and there are multiple ways of defining it:
1. The Artifical Intelligence View. Learning is central to human knowledge and intelligence,
and, likewise, it is also essential for building intelligent machines. Years of effort in AI
has shown that trying to build intelligent computers by programming all the rules cannot be
done; automatic learning is crucial. For example, we humans are not born with the ability
to understand language — we learn it — and it makes sense to try to have computers learn
language instead of trying to program it all it.
2. The Software Engineering View. Machine learning allows us to program computers by
example, which can be easier than writing code the traditional way.
3. The Stats View. Machine learning is the marriage of computer science and statistics: computational
techniques are applied to statistical problems. Machine learning has been applied
to a vast number of problems in many contexts, beyond the typical statistics problems. Machine
learning is often designed with different considerations than statistics (e.g., speed is
often more important than accuracy).
Often, machine learning methods are broken into two phases:
1. Training: A model is learned from a collection of training data.
2. Application: The model is used to make decisions about some new test data.
For example, in the spam filtering case, the training data constitutes email messages labeled as ham
or spam, and each new email message that we receive (and which to classify) is test data. However,
there are other ways in which machine learning is used as well.
Copyright c
 2011 Aaron Hertzmann and David Fleet 1
CSC 411 / CSC D11 Introduction to Machine Learning
1.1 Types of Machine Learning
Some of the main types of machine learning are:
1. Supervised Learning, in which the training data is labeled with the correct answers, e.g.,
“spam” or “ham.” The two most common types of supervised learning are classification
(where the outputs are discrete labels, as in spam filtering) and regression (where the outputs
are real-valued).
2. Unsupervised learning, in which we are given a collection of unlabeled data, which we wish
to analyze and discover patterns within. The two most important examples are dimension
reduction and clustering.
3. Reinforcement learning, in which an agent (e.g., a robot or controller) seeks to learn the
optimal actions to take based the outcomes of past actions.
There are many other types of machine learning as well, for example:
1. Semi-supervised learning, in which only a subset of the training data is labeled
2. Time-series forecasting, such as in financial markets
3. Anomaly detection such as used for fault-detection in factories and in surveillance
4. Active learning, in which obtaining data is expensive, and so an algorithm must determine
which training data to acquire
and many others.
1.2 A simple problem
Figure 1 shows a 1D regression problem. The goal is to fit a 1D curve to a few points. Which curve
is best to fit these points? There are infinitely many curves that fit the data, and, because the data
might be noisy, we might not even want to fit the data precisely. Hence, machine learning requires
that we make certain choices:
1. How do we parameterize the model we fit? For the example in Figure 1, how do we parameterize
the curve; should we try to explain the data with a linear function, a quadratic, or a
sinusoidal curve?
2. What criteria (e.g., objective function) do we use to judge the quality of the fit? For example,
when fitting a curve to noisy data, it is common to measure the quality of the fit in terms of
the squared error between the data we are given and the fitted curve. When minimizing the
squared error, the resulting fit is usually called a least-squares estimate.
Copyright c
 2011 Aaron Hertzmann and David Fleet 2
CSC 411 / CSC D11 Introduction to Machine Learning
3. Some types of models and some model parameters can be very expensive to optimize well.
How long are we willing to wait for a solution, or can we use approximations (or handtuning)
instead?
4. Ideally we want to find a model that will provide useful predictions in future situations. That
is, although we might learn a model from training data, we ultimately care about how well
it works on future test data. When a model fits training data well, but performs poorly on
test data, we say that the model has overfit the training data; i.e., the model has fit properties
of the input that are not particularly relevant to the task at hand (e.g., Figures 1 (top row and
bottom left)). Such properties are refered to as noise. When this happens we say that the
model does not generalize well to the test data. Rather it produces predictions on the test
data that are much less accurate than you might have hoped for given the fit to the training
data.
Machine learning provides a wide selection of options by which to answer these questions,
along with the vast experience of the community as to which methods tend to be successful on
a particular class of data-set. Some more advanced methods provide ways of automating some
of these choices, such as automatically selecting between alternative models, and there is some
beautiful theory that assists in gaining a deeper understanding of learning. In practice, there is no
single “silver bullet” for all learning. Using machine learning in practice requires that you make
use of your own prior knowledge and experimentation to solve problems. But with the tools of
machine learning, you can do amazing things!"""

In [2]:
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer


In [3]:
tokenizer=Tokenizer()

In [4]:
tokenizer.fit_on_texts([sen])

In [5]:
tokenizer.word_index

{'the': 1,
 'to': 2,
 'learning': 3,
 'of': 4,
 'a': 5,
 'and': 6,
 'we': 7,
 'data': 8,
 'is': 9,
 'in': 10,
 'machine': 11,
 'are': 12,
 'which': 13,
 'that': 14,
 '1': 15,
 'for': 16,
 'as': 17,
 'training': 18,
 'model': 19,
 'fit': 20,
 'example': 21,
 'or': 22,
 'it': 23,
 'be': 24,
 'spam': 25,
 'well': 26,
 'e': 27,
 'with': 28,
 '2': 29,
 'some': 30,
 'how': 31,
 'many': 32,
 'g': 33,
 'there': 34,
 'test': 35,
 'curve': 36,
 'email': 37,
 'such': 38,
 'can': 39,
 'do': 40,
 'has': 41,
 'learn': 42,
 'types': 43,
 'computers': 44,
 'by': 45,
 'between': 46,
 'messages': 47,
 'rules': 48,
 'writing': 49,
 'these': 50,
 '—': 51,
 'labeled': 52,
 'not': 53,
 'than': 54,
 '3': 55,
 'problems': 56,
 'csc': 57,
 'might': 58,
 'when': 59,
 'on': 60,
 'introduction': 61,
 'set': 62,
 'examples': 63,
 'program': 64,
 'distinguish': 65,
 'valid': 66,
 'try': 67,
 'an': 68,
 'from': 69,
 'provide': 70,
 'have': 71,
 'ways': 72,
 'view': 73,
 'all': 74,
 'statistics': 75,
 'often': 76,
 '

In [6]:
len(tokenizer.word_index)

426

In [7]:
for sentences in sen.split('\n'):
    tokenized_sen=tokenizer.texts_to_sequences([sentences])[0]
    print(tokenized_sen)
    

[15, 61, 2, 11, 3]
[11, 3, 9, 5, 62, 4, 85, 14, 161, 162, 163, 86, 2, 164, 44, 31, 2]
[165, 166, 45, 167, 63, 4, 31, 87, 88, 24, 89, 16, 21, 168, 7, 90]
[2, 91, 5, 64, 2, 65, 46, 66, 37, 47, 6, 92, 25, 7, 169, 67]
[2, 91, 5, 62, 4, 93, 48, 16, 21, 170, 47, 14, 171, 94, 172, 38]
[17, 1, 173, 174, 22, 175, 176, 177, 95, 49, 48, 2, 178, 65]
[13, 179, 9, 66, 39, 180, 24, 181, 182, 2, 40, 26, 96, 183, 10, 32, 184, 25]
[47, 22, 97, 32, 185, 98, 97, 1, 186, 99, 187, 188, 1, 100, 87]
[189, 25, 10, 190, 2, 191, 50, 192, 27, 33, 49, 193, 194, 101, 49, 195, 48, 51]
[6, 196, 102, 197, 2, 198, 51, 199, 200, 68, 201, 103, 202, 11]
[3, 41, 203, 5, 104, 204, 25, 205, 12, 206, 69, 63, 7, 70, 1]
[3, 105, 28, 21, 98, 13, 7, 71, 207, 52, 17, 208, 66, 37]
[22, 106, 92, 37, 6, 1, 209, 42, 2, 65, 46, 102, 107]
[11, 3, 9, 5, 210, 6, 211, 212, 6, 34, 12, 213, 72, 4, 214, 23]
[15, 1, 215, 108, 73, 3, 9, 216, 2, 217, 109, 6, 108]
[6, 218, 23, 9, 219, 220, 16, 221, 110, 222, 223, 4, 224, 10, 225]
[41, 226, 14, 11

In [8]:
inp_seq=[]
for sentences in sen.split('\n'):
    tokenized_sen=tokenizer.texts_to_sequences([sentences])[0]
    
    
    
    for i in range(1,len(tokenized_sen)):
        inp_seq.append(tokenized_sen[:i+1])
    


In [9]:
inp_seq

[[15, 61],
 [15, 61, 2],
 [15, 61, 2, 11],
 [15, 61, 2, 11, 3],
 [11, 3],
 [11, 3, 9],
 [11, 3, 9, 5],
 [11, 3, 9, 5, 62],
 [11, 3, 9, 5, 62, 4],
 [11, 3, 9, 5, 62, 4, 85],
 [11, 3, 9, 5, 62, 4, 85, 14],
 [11, 3, 9, 5, 62, 4, 85, 14, 161],
 [11, 3, 9, 5, 62, 4, 85, 14, 161, 162],
 [11, 3, 9, 5, 62, 4, 85, 14, 161, 162, 163],
 [11, 3, 9, 5, 62, 4, 85, 14, 161, 162, 163, 86],
 [11, 3, 9, 5, 62, 4, 85, 14, 161, 162, 163, 86, 2],
 [11, 3, 9, 5, 62, 4, 85, 14, 161, 162, 163, 86, 2, 164],
 [11, 3, 9, 5, 62, 4, 85, 14, 161, 162, 163, 86, 2, 164, 44],
 [11, 3, 9, 5, 62, 4, 85, 14, 161, 162, 163, 86, 2, 164, 44, 31],
 [11, 3, 9, 5, 62, 4, 85, 14, 161, 162, 163, 86, 2, 164, 44, 31, 2],
 [165, 166],
 [165, 166, 45],
 [165, 166, 45, 167],
 [165, 166, 45, 167, 63],
 [165, 166, 45, 167, 63, 4],
 [165, 166, 45, 167, 63, 4, 31],
 [165, 166, 45, 167, 63, 4, 31, 87],
 [165, 166, 45, 167, 63, 4, 31, 87, 88],
 [165, 166, 45, 167, 63, 4, 31, 87, 88, 24],
 [165, 166, 45, 167, 63, 4, 31, 87, 88, 24, 89],
 [1

In [10]:
max_len=max([len(x) for x in inp_seq])
max_len

21

In [11]:
from tensorflow.keras.preprocessing.sequence import pad_sequences
padded_inp_seq=pad_sequences(inp_seq ,maxlen=max_len,padding='pre')

In [12]:
padded_inp_seq

array([[  0,   0,   0, ...,   0,  15,  61],
       [  0,   0,   0, ...,  15,  61,   2],
       [  0,   0,   0, ...,  61,   2,  11],
       ...,
       [  0,   0,   0, ...,  84,  39,  40],
       [  0,   0,   0, ...,  39,  40, 425],
       [  0,   0,   0, ...,  40, 425, 426]])

In [13]:
x=padded_inp_seq[:,:-1 ]

In [14]:
x

array([[  0,   0,   0, ...,   0,   0,  15],
       [  0,   0,   0, ...,   0,  15,  61],
       [  0,   0,   0, ...,  15,  61,   2],
       ...,
       [  0,   0,   0, ...,   3,  84,  39],
       [  0,   0,   0, ...,  84,  39,  40],
       [  0,   0,   0, ...,  39,  40, 425]])

In [15]:
len(x[0])

20

In [16]:
x.shape

(1056, 20)

In [17]:
y=padded_inp_seq[:,-1]

In [18]:
y

array([ 61,   2,  11, ...,  40, 425, 426])

In [19]:
len(y)

1056

In [20]:
y.shape

(1056,)

In [21]:
#One hot encoding on y

from tensorflow.keras.utils import to_categorical

y=to_categorical(y,num_classes=427)

In [22]:
y.shape

(1056, 427)

In [23]:
y

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 0., 0., ..., 0., 0., 1.]], dtype=float32)

In [24]:
 len(y[0])

427

In [25]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding,LSTM,Dense

In [26]:
model=Sequential()

In [27]:
model.add(Embedding(427,100,input_length=20))
model.add(LSTM(150))
model.add(Dense(427,activation='softmax'))

In [28]:
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])

In [29]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 20, 100)           42700     
                                                                 
 lstm (LSTM)                 (None, 150)               150600    
                                                                 
 dense (Dense)               (None, 427)               64477     
                                                                 
Total params: 257,777
Trainable params: 257,777
Non-trainable params: 0
_________________________________________________________________


In [30]:
model.fit(x,y,epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.callbacks.History at 0x18f72d9fbe0>

In [44]:
text='Active'

#tokenize
token_txt=tokenizer.texts_to_sequences([text])[0]
print(token_txt)
#padding
pad_txt=pad_sequences([token_txt],maxlen=20,padding='pre')
print(pad_txt)

[314]
[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0 314]]


In [45]:
model.predict(pad_txt)

array([[1.21677235e-09, 2.68734340e-03, 2.12496452e-05, 9.60712671e-01,
        3.90172529e-04, 2.26735952e-03, 5.42762782e-03, 6.51537121e-05,
        8.85186182e-06, 6.82970767e-06, 1.11219415e-03, 5.65328577e-04,
        6.99613452e-07, 6.26581546e-04, 1.40475505e-03, 8.94026816e-05,
        7.83340147e-05, 4.53031862e-05, 8.73228316e-07, 3.93251923e-07,
        1.45846656e-07, 3.64423729e-04, 3.16016551e-04, 1.42449804e-03,
        4.73539997e-03, 6.11004771e-06, 2.88321394e-06, 1.38809573e-05,
        4.60164774e-05, 2.24569030e-05, 9.62486211e-03, 7.38871086e-06,
        3.37248290e-04, 2.88135561e-06, 2.91402648e-05, 6.71748694e-07,
        5.21743141e-06, 1.23017180e-05, 1.29954657e-04, 3.33618773e-05,
        1.32114046e-05, 2.38251796e-05, 8.27846733e-08, 3.53335090e-05,
        6.31865760e-08, 1.23597729e-07, 6.91635159e-06, 2.63540051e-06,
        1.70353296e-08, 1.97344730e-06, 7.74596934e-04, 1.03619790e-07,
        1.33856020e-06, 1.94422705e-06, 2.21342566e-06, 2.820707

In [46]:
model.predict(pad_txt).shape

(1, 427)

In [47]:
import numpy as np
idx=np.argmax(model.predict(pad_txt))

In [48]:
tokenizer.word_index

{'the': 1,
 'to': 2,
 'learning': 3,
 'of': 4,
 'a': 5,
 'and': 6,
 'we': 7,
 'data': 8,
 'is': 9,
 'in': 10,
 'machine': 11,
 'are': 12,
 'which': 13,
 'that': 14,
 '1': 15,
 'for': 16,
 'as': 17,
 'training': 18,
 'model': 19,
 'fit': 20,
 'example': 21,
 'or': 22,
 'it': 23,
 'be': 24,
 'spam': 25,
 'well': 26,
 'e': 27,
 'with': 28,
 '2': 29,
 'some': 30,
 'how': 31,
 'many': 32,
 'g': 33,
 'there': 34,
 'test': 35,
 'curve': 36,
 'email': 37,
 'such': 38,
 'can': 39,
 'do': 40,
 'has': 41,
 'learn': 42,
 'types': 43,
 'computers': 44,
 'by': 45,
 'between': 46,
 'messages': 47,
 'rules': 48,
 'writing': 49,
 'these': 50,
 '—': 51,
 'labeled': 52,
 'not': 53,
 'than': 54,
 '3': 55,
 'problems': 56,
 'csc': 57,
 'might': 58,
 'when': 59,
 'on': 60,
 'introduction': 61,
 'set': 62,
 'examples': 63,
 'program': 64,
 'distinguish': 65,
 'valid': 66,
 'try': 67,
 'an': 68,
 'from': 69,
 'provide': 70,
 'have': 71,
 'ways': 72,
 'view': 73,
 'all': 74,
 'statistics': 75,
 'often': 76,
 '

In [49]:
for word ,index in tokenizer.word_index.items():
    if index==idx:
        print(word)
    

learning


In [54]:
text='Machine learning'
for i in range(10):
    #tokenize
    token_txt=tokenizer.texts_to_sequences([text])[0]
    #padding
    pad_txt=pad_sequences([token_txt],maxlen=20,padding='pre')
    idx=np.argmax(model.predict(pad_txt))
    
    for word ,index in tokenizer.word_index.items():
        if index==idx:
            text=text+" "+word
            print(text)
        

Machine learning is
Machine learning is a
Machine learning is a set
Machine learning is a set of
Machine learning is a set of tools
Machine learning is a set of tools that
Machine learning is a set of tools that broadly
Machine learning is a set of tools that broadly speaking
Machine learning is a set of tools that broadly speaking allow
Machine learning is a set of tools that broadly speaking allow us
