# Q1.Define Meta learner. Discuss the role of activation functions in performing classification and regression tasks in the design of neural networks. Implement a baseline neural network for modelling a categorical target variable. Report your observation on model parameters and hyper-parameters.

# Implementation

Meta learner is a type of machine learning algorithm that can learn how to learn. This means that it can adapt to new environments and situations, and improve its performance over time. Meta learning is a relatively new area of research, but it has the potential to greatly enhance the capabilities of AI systems.

Activation functions are a critical component of neural networks, which are a type of machine learning model that is inspired by the structure and function of the human brain. Activation functions determine how much signal is passed on from one neuron to the next in the network, based on the input received.

For classification tasks, activation functions help to map the input data to a set of discrete outputs, such as labels or class probabilities. Common activation functions used for classification tasks include the softmax function and the sigmoid function.

For regression tasks, activation functions help to map the input data to a continuous output, such as a numerical value or a range of values. Common activation functions used for regression tasks include the linear function and the hyperbolic tangent function.

In [30]:
# Import necessary libraries
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import GridSearchCV
import pandas as pd
import numpy as np

#load the dataset
df=pd.read_csv('C:\\Users\\GAYATHRI\\Documents\\SRET-I YR- MSC\\TERM III\\Neural Networks\\diabetes.csv')
df

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63,0
764,2,122,70,27,0,36.8,0.340,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1


In [31]:
# Extract the features and target variable
features = df.drop(columns=['Outcome'])
target = df['Outcome']


In [32]:
# Normalize the features
scaler = StandardScaler()
features = scaler.fit_transform(features)

In [33]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)

In [34]:
# Define the model architecture
def create_model(num_layers=2, num_neurons=64, activation='relu', optimizer='adam'):
    model = Sequential()
    model.add(Dense(num_neurons, input_dim=X_train.shape[1], activation=activation))
    for i in range(num_layers-1):
        model.add(Dense(num_neurons, activation=activation))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])
    return model

In [35]:
# Define hyper-parameters for grid search
num_layers = [1, 2, 3]
num_neurons = [32, 64, 128]
activation = ['relu', 'sigmoid']
optimizer = ['adam', 'sgd']

In [36]:
# Use KerasClassifier and GridSearchCV for hyper-parameter tuning
model = KerasClassifier(build_fn=create_model, verbose=0)
param_grid = dict(num_layers=num_layers, num_neurons=num_neurons, activation=activation, optimizer=optimizer)
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=-1, cv=3)
grid_result = grid.fit(X_train, y_train)

  model = KerasClassifier(build_fn=create_model, verbose=0)


In [37]:
# Print best hyper-parameters and accuracy
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

Best: 0.726359 using {'activation': 'relu', 'num_layers': 2, 'num_neurons': 128, 'optimizer': 'adam'}


# Observation

The output of the GridSearchCV algorithm shows the results of the hyperparameter tuning process. The best set of hyperparameters for your model is listed under the key "bestparams", which in this case resulted in an accuracy score of 0.726359

The hyperparameters that produced the best results are:

num_layers: 2
num_neurons: 128
activation: relu
optimizer: adam

This means that for this specific model architecture, the best number of layers is 2, with 128 neurons in each layer, a ReLU activation function, and the Adam optimizer. The accuracy score achieved by this set of hyperparameters is 0.726359.

# Q4.Develop a baseline multiplayer neural network to recognize the handwritten digits  from MNIST database. Load the dataset using Keras API. Design a Large  Convolutional neural network (CNN) to read 28x28 pixels square. Extract three  different patterns say 32, 64, and 128 feature maps using appropriate filter sizes,  pooling of size 2x2 with 20% dropout for regularization. Classify the digits using a  fully connected layer of 128 neurons. Report your observation on accuracy in  classifying the digits with comparison on errors between the baseline network, a simple  CNN and a larger CNN model.



# Implementation

In [1]:
import tensorflow as tf
from tensorflow import keras

# Load the MNIST dataset
(X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()

In [2]:
# Preprocess the data
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)) / 255.0
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)) / 255.0
y_train = keras.utils.to_categorical(y_train)
y_test = keras.utils.to_categorical(y_test)

In [3]:
# Define the baseline model architecture
baseline_model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28, 1)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

In [4]:
# Compile the baseline model
baseline_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [5]:
# Train the baseline model
baseline_history = baseline_model.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_test, y_test))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [6]:
# Evaluate the baseline model on the test data
baseline_test_loss, baseline_test_acc = baseline_model.evaluate(X_test, y_test)
print('Baseline test accuracy:', baseline_test_acc)


Baseline test accuracy: 0.9793999791145325


In [7]:
# Define the simple CNN model architecture
simple_cnn_model = keras.Sequential([
    keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Dropout(0.2),
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])


In [8]:
# Compile the simple CNN model
simple_cnn_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [9]:
# Train the simple CNN model
simple_cnn_history = simple_cnn_model.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_test, y_test))


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [10]:
# Evaluate the simple CNN model on the test data
simple_cnn_test_loss, simple_cnn_test_acc = simple_cnn_model.evaluate(X_test, y_test)
print('Simple CNN test accuracy:', simple_cnn_test_acc)

Simple CNN test accuracy: 0.9876000285148621


In [11]:
# Define the larger CNN model architecture
larger_cnn_model = keras.Sequential([
    keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Dropout(0.2),
    keras.layers.Conv2D(64, (3, 3), activation='relu'),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Dropout(0.2),
    keras.layers.Conv2D(128, (3, 3), activation='relu'),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Dropout(0.2),
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

In [12]:
# Compile the larger CNN model
larger_cnn_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [13]:
# Train the larger CNN model
larger_cnn_history = larger_cnn_model.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_test, y_test))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [14]:
# Evaluate the larger CNN model on the test data
larger_cnn_test_loss, larger_cnn_test_acc = larger_cnn_model.evaluate(X_test, y_test)
print('Larger CNN test accuracy:', larger_cnn_test_acc)

Larger CNN test accuracy: 0.9904000163078308


# Observation
The first model is a baseline fully connected neural network with no convolutional layers. The architecture consists of a flatten layer that takes the input image of size 28 x 28 pixels and flattens it into a vector, followed by a dense layer with 128 neurons and ReLU activation function, and another dense layer with 10 neurons and softmax activation function for classification.  

Baseline test accuracy: 0.9793999791145325=97.93%

The second model is a simple CNN with one convolutional layer that extracts 32 feature maps using a filter size of (3,3), followed by max pooling of size (2,2), a dropout layer for regularization, a flatten layer, a dense layer with 128 neurons and ReLU activation function, and another dense layer with 10 neurons and softmax activation function for classification.

Simple CNN test accuracy: 0.9876000285148621=98.76%

The third model is a larger CNN with three convolutional layers that extract 32, 64, and 128 feature maps respectively, using filter sizes of (3,3), max pooling of size (2,2), and a dropout layer after each convolutional layer for regularization. The architecture also includes a flatten layer, a dense layer with 128 neurons and ReLU activation function, and another dense layer with 10 neurons and softmax activation function for classification.

Larger CNN test accuracy: 0.9904000163078308=99.04%

After defining each model, we compile them using the Adam optimizer, categorical cross-entropy loss function, and accuracy metric. We then train each model on the MNIST training data for 10 epochs with a batch size of 64, and evaluate their performance on the MNIST test data using the evaluate() method.

We can observe that the larger CNN model achieved the highest accuracy among the three models, followed by the simple CNN model, and the baseline model achieved the lowest accuracy. 

CNN models are designed to extract features from images which can improve the accuracy of the classification task compared to a simple fully connected neural network.



# Q5.Design and implement a Long Short Time Memory (LSTM) network with 32 units and  single output neurons to learn the following tasks. Fit the model over 300 epochs with  unit batch size and necessary optimizer. Report your observation on output, model  accuracy and loss. 

1. Prediction of the next character in the alphabet given the context of just one character.

2. Learn a random sub-sequence of the alphabet to predict the next letter in the  alphabet. 

For the first task, we can start by creating a dataset consisting of pairs of input and output characters. Each input character will be the context, 
while the corresponding output character will be the predicted next character. Here's some sample code to create the dataset:

In [31]:
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
from tensorflow.keras.preprocessing.sequence import pad_sequences

In [32]:
# fix random seed for reproducibility
numpy.random.seed(7)
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# prepare the dataset of input to output pairs encoded as integers
num_inputs = 1000
max_len = 5
dataX = []
dataY = []


In [33]:
#generating a dataset of input-output pairs using randomly selected sequences from the alphabet
for i in range(num_inputs):
    start = numpy.random.randint(len(alphabet)-2)
    end = numpy.random.randint(start, min(start+max_len,len(alphabet)-1))
    sequence_in = alphabet[start:end+1]
    sequence_out = alphabet[end + 1]
    dataX.append([char_to_int[char] for char in sequence_in])
    dataY.append(char_to_int[sequence_out])
    print(sequence_in,'->', sequence_out) #printing out each input-output pair 

PQRST -> U
W -> X
O -> P
OPQ -> R
IJKLM -> N
QRSTU -> V
ABCD -> E
X -> Y
GHIJ -> K
M -> N
XY -> Z
QRST -> U
ABC -> D
JKLMN -> O
OP -> Q
XY -> Z
D -> E
T -> U
B -> C
QRSTU -> V
HIJ -> K
JKLM -> N
ABCDE -> F
X -> Y
V -> W
DE -> F
DEFG -> H
BCDE -> F
EFGH -> I
BCDE -> F
FG -> H
RST -> U
TUV -> W
STUV -> W
LMN -> O
P -> Q
MNOP -> Q
JK -> L
MNOP -> Q
OPQRS -> T
UVWXY -> Z
PQRS -> T
D -> E
EFGH -> I
IJK -> L
WX -> Y
STUV -> W
MNOPQ -> R
P -> Q
WXY -> Z
VWX -> Y
V -> W
HI -> J
KLMNO -> P
UV -> W
JKL -> M
ABCDE -> F
WXY -> Z
M -> N
CDEF -> G
KLMNO -> P
RST -> U
RS -> T
W -> X
J -> K
WX -> Y
JKLMN -> O
MN -> O
L -> M
BCDE -> F
TU -> V
MNOPQ -> R
NOPQR -> S
HIJ -> K
JKLM -> N
STUVW -> X
QRST -> U
N -> O
VWXY -> Z
B -> C
UVWX -> Y
OP -> Q
K -> L
C -> D
X -> Y
ST -> U
JKLM -> N
B -> C
QR -> S
RS -> T
VWXY -> Z
S -> T
NOP -> Q
KLMNO -> P
IJ -> K
EF -> G
MNOP -> Q
WXY -> Z
HI -> J
P -> Q
STUVW -> X
Q -> R
MN -> O
O -> P
C -> D
L -> M
JKLM -> N
K -> L
IJKLM -> N
FGHIJ -> K
LM -> N
OPQ -> R
U -> V
HIJ

In [34]:
# convert list of lists to array and pad sequences if needed
X = pad_sequences(dataX, maxlen=max_len, dtype='float32')
# reshape X to be [samples, time steps, features]
X = numpy.reshape(X, (X.shape[0], max_len, 1))
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)

In [35]:
# create and fit the model
batch_size = 1
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], 1)))
model.add(Dense(y.shape[1], activation= 'softmax' ))
model.compile(loss='categorical_crossentropy', optimizer='adam' , metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=batch_size, verbose=2)

Epoch 1/500
1000/1000 - 5s - loss: 3.0899 - accuracy: 0.0830 - 5s/epoch - 5ms/step
Epoch 2/500
1000/1000 - 4s - loss: 2.8086 - accuracy: 0.1040 - 4s/epoch - 4ms/step
Epoch 3/500
1000/1000 - 3s - loss: 2.4624 - accuracy: 0.1920 - 3s/epoch - 3ms/step
Epoch 4/500
1000/1000 - 3s - loss: 2.2257 - accuracy: 0.2310 - 3s/epoch - 3ms/step
Epoch 5/500
1000/1000 - 3s - loss: 2.0595 - accuracy: 0.2990 - 3s/epoch - 3ms/step
Epoch 6/500
1000/1000 - 4s - loss: 1.9353 - accuracy: 0.3360 - 4s/epoch - 4ms/step
Epoch 7/500
1000/1000 - 4s - loss: 1.8321 - accuracy: 0.3530 - 4s/epoch - 4ms/step
Epoch 8/500
1000/1000 - 4s - loss: 1.7435 - accuracy: 0.3940 - 4s/epoch - 4ms/step
Epoch 9/500
1000/1000 - 4s - loss: 1.6749 - accuracy: 0.4220 - 4s/epoch - 4ms/step
Epoch 10/500
1000/1000 - 3s - loss: 1.5882 - accuracy: 0.4440 - 3s/epoch - 3ms/step
Epoch 11/500
1000/1000 - 4s - loss: 1.5315 - accuracy: 0.4710 - 4s/epoch - 4ms/step
Epoch 12/500
1000/1000 - 3s - loss: 1.4651 - accuracy: 0.4880 - 3s/epoch - 3ms/step
E

Epoch 99/500
1000/1000 - 4s - loss: 0.3965 - accuracy: 0.8670 - 4s/epoch - 4ms/step
Epoch 100/500
1000/1000 - 5s - loss: 0.4533 - accuracy: 0.8440 - 5s/epoch - 5ms/step
Epoch 101/500
1000/1000 - 4s - loss: 0.3839 - accuracy: 0.8740 - 4s/epoch - 4ms/step
Epoch 102/500
1000/1000 - 5s - loss: 0.3875 - accuracy: 0.8640 - 5s/epoch - 5ms/step
Epoch 103/500
1000/1000 - 4s - loss: 0.3906 - accuracy: 0.8640 - 4s/epoch - 4ms/step
Epoch 104/500
1000/1000 - 4s - loss: 0.4356 - accuracy: 0.8480 - 4s/epoch - 4ms/step
Epoch 105/500
1000/1000 - 5s - loss: 0.4405 - accuracy: 0.8610 - 5s/epoch - 5ms/step
Epoch 106/500
1000/1000 - 4s - loss: 0.3752 - accuracy: 0.8680 - 4s/epoch - 4ms/step
Epoch 107/500
1000/1000 - 4s - loss: 0.4146 - accuracy: 0.8530 - 4s/epoch - 4ms/step
Epoch 108/500
1000/1000 - 4s - loss: 0.3699 - accuracy: 0.8730 - 4s/epoch - 4ms/step
Epoch 109/500
1000/1000 - 4s - loss: 0.3718 - accuracy: 0.8810 - 4s/epoch - 4ms/step
Epoch 110/500
1000/1000 - 4s - loss: 0.3942 - accuracy: 0.8620 - 4

Epoch 196/500
1000/1000 - 4s - loss: 0.2674 - accuracy: 0.9120 - 4s/epoch - 4ms/step
Epoch 197/500
1000/1000 - 4s - loss: 0.2582 - accuracy: 0.9190 - 4s/epoch - 4ms/step
Epoch 198/500
1000/1000 - 4s - loss: 0.2530 - accuracy: 0.9240 - 4s/epoch - 4ms/step
Epoch 199/500
1000/1000 - 4s - loss: 0.3485 - accuracy: 0.8930 - 4s/epoch - 4ms/step
Epoch 200/500
1000/1000 - 4s - loss: 0.2643 - accuracy: 0.9230 - 4s/epoch - 4ms/step
Epoch 201/500
1000/1000 - 5s - loss: 0.2454 - accuracy: 0.9280 - 5s/epoch - 5ms/step
Epoch 202/500
1000/1000 - 4s - loss: 0.2490 - accuracy: 0.9190 - 4s/epoch - 4ms/step
Epoch 203/500
1000/1000 - 4s - loss: 0.2516 - accuracy: 0.9210 - 4s/epoch - 4ms/step
Epoch 204/500
1000/1000 - 4s - loss: 0.3215 - accuracy: 0.8970 - 4s/epoch - 4ms/step
Epoch 205/500
1000/1000 - 4s - loss: 0.2685 - accuracy: 0.9190 - 4s/epoch - 4ms/step
Epoch 206/500
1000/1000 - 4s - loss: 0.2430 - accuracy: 0.9300 - 4s/epoch - 4ms/step
Epoch 207/500
1000/1000 - 4s - loss: 0.2454 - accuracy: 0.9280 - 

Epoch 293/500
1000/1000 - 4s - loss: 0.1784 - accuracy: 0.9520 - 4s/epoch - 4ms/step
Epoch 294/500
1000/1000 - 4s - loss: 0.1778 - accuracy: 0.9560 - 4s/epoch - 4ms/step
Epoch 295/500
1000/1000 - 4s - loss: 0.1824 - accuracy: 0.9480 - 4s/epoch - 4ms/step
Epoch 296/500
1000/1000 - 4s - loss: 0.1810 - accuracy: 0.9500 - 4s/epoch - 4ms/step
Epoch 297/500
1000/1000 - 4s - loss: 0.2777 - accuracy: 0.9300 - 4s/epoch - 4ms/step
Epoch 298/500
1000/1000 - 5s - loss: 0.1757 - accuracy: 0.9540 - 5s/epoch - 5ms/step
Epoch 299/500
1000/1000 - 4s - loss: 0.1765 - accuracy: 0.9520 - 4s/epoch - 4ms/step
Epoch 300/500
1000/1000 - 4s - loss: 0.1738 - accuracy: 0.9490 - 4s/epoch - 4ms/step
Epoch 301/500
1000/1000 - 5s - loss: 0.1757 - accuracy: 0.9480 - 5s/epoch - 5ms/step
Epoch 302/500
1000/1000 - 4s - loss: 0.3217 - accuracy: 0.9200 - 4s/epoch - 4ms/step
Epoch 303/500
1000/1000 - 5s - loss: 0.1688 - accuracy: 0.9550 - 5s/epoch - 5ms/step
Epoch 304/500
1000/1000 - 4s - loss: 0.1718 - accuracy: 0.9600 - 

Epoch 390/500
1000/1000 - 5s - loss: 0.1344 - accuracy: 0.9710 - 5s/epoch - 5ms/step
Epoch 391/500
1000/1000 - 4s - loss: 0.1328 - accuracy: 0.9680 - 4s/epoch - 4ms/step
Epoch 392/500
1000/1000 - 4s - loss: 0.1345 - accuracy: 0.9640 - 4s/epoch - 4ms/step
Epoch 393/500
1000/1000 - 4s - loss: 0.1320 - accuracy: 0.9650 - 4s/epoch - 4ms/step
Epoch 394/500
1000/1000 - 5s - loss: 0.1343 - accuracy: 0.9630 - 5s/epoch - 5ms/step
Epoch 395/500
1000/1000 - 4s - loss: 0.1323 - accuracy: 0.9610 - 4s/epoch - 4ms/step
Epoch 396/500
1000/1000 - 4s - loss: 0.1370 - accuracy: 0.9570 - 4s/epoch - 4ms/step
Epoch 397/500
1000/1000 - 4s - loss: 0.1308 - accuracy: 0.9650 - 4s/epoch - 4ms/step
Epoch 398/500
1000/1000 - 4s - loss: 0.1937 - accuracy: 0.9540 - 4s/epoch - 4ms/step
Epoch 399/500
1000/1000 - 4s - loss: 0.1618 - accuracy: 0.9580 - 4s/epoch - 4ms/step
Epoch 400/500
1000/1000 - 4s - loss: 0.1282 - accuracy: 0.9680 - 4s/epoch - 4ms/step
Epoch 401/500
1000/1000 - 4s - loss: 0.1305 - accuracy: 0.9660 - 

Epoch 487/500
1000/1000 - 4s - loss: 0.1045 - accuracy: 0.9690 - 4s/epoch - 4ms/step
Epoch 488/500
1000/1000 - 4s - loss: 0.1064 - accuracy: 0.9720 - 4s/epoch - 4ms/step
Epoch 489/500
1000/1000 - 3s - loss: 0.1042 - accuracy: 0.9770 - 3s/epoch - 3ms/step
Epoch 490/500
1000/1000 - 3s - loss: 0.1605 - accuracy: 0.9680 - 3s/epoch - 3ms/step
Epoch 491/500
1000/1000 - 4s - loss: 0.1020 - accuracy: 0.9820 - 4s/epoch - 4ms/step
Epoch 492/500
1000/1000 - 3s - loss: 0.0999 - accuracy: 0.9740 - 3s/epoch - 3ms/step
Epoch 493/500
1000/1000 - 4s - loss: 0.1010 - accuracy: 0.9830 - 4s/epoch - 4ms/step
Epoch 494/500
1000/1000 - 4s - loss: 0.1038 - accuracy: 0.9730 - 4s/epoch - 4ms/step
Epoch 495/500
1000/1000 - 3s - loss: 0.1016 - accuracy: 0.9760 - 3s/epoch - 3ms/step
Epoch 496/500
1000/1000 - 4s - loss: 0.1028 - accuracy: 0.9740 - 4s/epoch - 4ms/step
Epoch 497/500
1000/1000 - 4s - loss: 0.1623 - accuracy: 0.9670 - 4s/epoch - 4ms/step
Epoch 498/500
1000/1000 - 4s - loss: 0.0990 - accuracy: 0.9820 - 

<keras.callbacks.History at 0x18281fd20d0>

In [36]:
# summarize performance of the model
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))

Model Accuracy: 96.50%


In [37]:
# demonstrate some model predictions
for i in range(20):
    pattern_index = numpy.random.randint(len(dataX))
    pattern = dataX[pattern_index]
    x = pad_sequences([pattern], maxlen=max_len, dtype= 'float32' )
    x = numpy.reshape(x, (1, max_len, 1))
    x = x / float(len(alphabet))
    prediction = model.predict(x, verbose=0)
    index = numpy.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    print(seq_in, "->", result)

['J'] -> L
['H', 'I', 'J'] -> K
['E', 'F'] -> G
['K', 'L', 'M'] -> N
['B'] -> C
['C'] -> D
['R', 'S'] -> T
['A', 'B', 'C'] -> D
['C', 'D', 'E'] -> F
['N', 'O', 'P'] -> Q
['C', 'D'] -> E
['L', 'M'] -> N
['F', 'G', 'H', 'I', 'J'] -> K
['N', 'O', 'P', 'Q'] -> R
['C', 'D', 'E', 'F', 'G'] -> H
['A', 'B', 'C'] -> D
['R', 'S', 'T', 'U', 'V'] -> W
['B', 'C', 'D'] -> E
['F', 'G'] -> H
['K'] -> M


#3. Download a free corpus of size 100 KB on any topic of your choice and save it  as “Topic.txt”. Design a LSTM recurrent neural network model for generating  text from the file Topic.txt. Create checkpoints for storing the weights with  smallest loss. Save the model as an .hdf5 file. Use the saved model to generate  new text sequences. Report your observation on model accuracy and discuss the  correctness of the generated text sequence.


In [16]:
import sys
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import LSTM
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.utils import to_categorical
import os
print('Get current working directory : ', os.getcwd())

Get current working directory :  C:\Users\GAYATHRI


In [2]:
# load ascii text and covert to lowercase
filename = "wonderland.txt"
raw_text = open(filename, 'r', encoding='utf-8').read()
raw_text = raw_text.lower()

In [13]:
# create mapping of unique chars to integers
chars = sorted(list(set(raw_text)))
char_to_int = dict((c, i) for i, c in enumerate(chars))
int_to_char = dict((i, c) for i, c in enumerate(chars))

In [4]:
# summarize the loaded data
n_chars = len(raw_text)
n_vocab = len(chars)
print("Total Characters: ", n_chars)
print("Total Vocab: ", n_vocab)

Total Characters:  164016
Total Vocab:  64


In [5]:
# prepare the dataset of input to output pairs encoded as integers
seq_length = 100
dataX = []
dataY = []
for i in range(0, n_chars - seq_length, 1):
    seq_in = raw_text[i:i + seq_length]
    seq_out = raw_text[i + seq_length]
    dataX.append([char_to_int[char] for char in seq_in])
    dataY.append(char_to_int[seq_out])
n_patterns = len(dataX)
print("Total Patterns: ", n_patterns)

Total Patterns:  163916


In [6]:
# reshape X to be [samples, time steps, features]
X = np.reshape(dataX, (n_patterns, seq_length, 1))
# normalize
X = X / float(n_vocab)
# one hot encode the output variable
y = to_categorical(dataY)

In [7]:
# define the LSTM model
model = Sequential()
model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.2))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

In [8]:
# define the checkpoint
filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

In [9]:
# fit the model
model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list)

Epoch 1/20
Epoch 1: loss improved from inf to 3.02414, saving model to weights-improvement-01-3.0241.hdf5
Epoch 2/20
Epoch 2: loss improved from 3.02414 to 2.84241, saving model to weights-improvement-02-2.8424.hdf5
Epoch 3/20
Epoch 3: loss improved from 2.84241 to 2.75886, saving model to weights-improvement-03-2.7589.hdf5
Epoch 4/20
Epoch 4: loss improved from 2.75886 to 2.69239, saving model to weights-improvement-04-2.6924.hdf5
Epoch 5/20
Epoch 5: loss improved from 2.69239 to 2.63422, saving model to weights-improvement-05-2.6342.hdf5
Epoch 6/20
Epoch 6: loss improved from 2.63422 to 2.57830, saving model to weights-improvement-06-2.5783.hdf5
Epoch 7/20
Epoch 7: loss improved from 2.57830 to 2.52606, saving model to weights-improvement-07-2.5261.hdf5
Epoch 8/20
Epoch 8: loss improved from 2.52606 to 2.47714, saving model to weights-improvement-08-2.4771.hdf5
Epoch 9/20
Epoch 9: loss improved from 2.47714 to 2.43275, saving model to weights-improvement-09-2.4328.hdf5
Epoch 10/20
Ep

<keras.callbacks.History at 0x1ecdc92bcd0>

In [11]:
# load the network weights
filename = "weights-improvement-20-2.0943.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')

In [14]:
# pick a random seed
start = np.random.randint(0, len(dataX)-1)
pattern = dataX[start]
print("Seed:")
print("\"", ''.join([int_to_char[value] for value in pattern]), "\"")

Seed:
" led out in a trembling voice
to its children, “come away, my dears! it’s high time you were all in
b "


In [18]:
# generate characters
for i in range(1000):
    x = np.reshape(pattern, (1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model.predict(x, verbose=0)
    index = np.argmax(prediction)
    result = int_to_char[index]
    seq_in = [int_to_char[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
print("\nDone.")

hiught alice, “in wou dee you whnl toe world to tee iot of the sooe.”

“io mo toth the soeet ” shiught alice, “in wou dee you whnl toe world to tee iot of the sooe.”

“io mo toth the soeet ” shiught alice, “in wou dee you whnl toe world to tee iot of the sooe.”

“io mo toth the soeet ” shiught alice, “in wou dee you whnl toe world to tee iot of the sooe.”

“io mo toth the soeet ” shiught alice, “in wou dee you whnl toe world to tee iot of the sooe.”

“io mo toth the soeet ” shiught alice, “in wou dee you whnl toe world to tee iot of the sooe.”

“io mo toth the soeet ” shiught alice, “in wou dee you whnl toe world to tee iot of the sooe.”

“io mo toth the soeet ” shiught alice, “in wou dee you whnl toe world to tee iot of the sooe.”

“io mo toth the soeet ” shiught alice, “in wou dee you whnl toe world to tee iot of the sooe.”

“io mo toth the soeet ” shiught alice, “in wou dee you whnl toe world to tee iot of the sooe.”

“io mo toth the soeet ” shiught alice, “in wou dee you whnl toe w

# Observation
First, you must transform the list of input sequences into the form [samples, time steps, features] expected by an LSTM network.

Next, you need to rescale the integers to the range 0-to-1 to make the patterns easier to learn by the LSTM network using the sigmoid activation function by default.

Finally, you need to convert the output patterns (single characters converted to integers) into a one-hot encoding. This is so that you can configure the network to predict the probability of each of the 47 different characters in the vocabulary (an easier representation) rather than trying to force it to predict precisely the next character. Each y value is converted into a sparse vector with a length of 47, full of zeros, except with a 1 in the column for the letter (integer) that the pattern represents.

Define a single hidden LSTM layer with 256 memory units. The network uses dropout with a probability of 20. The output layer is a Dense layer using the softmax activation function to output a probability prediction for each of the 47 characters between 0 and 1.

The problem is really a single character classification problem with 47 classes and, as such, is defined as optimizing the log loss (cross entropy) using the ADAM optimization algorithm for speed.

There is no test dataset. You are modeling the entire training dataset to learn the probability of each character in a sequence

You are not interested in the most accurate (classification accuracy) model of the training dataset. This would be a model that predicts each character in the training dataset perfectly

now fit your model to the data. Here, you use a modest number of 20 epochs and a large batch size of 128 patterns.

The simplest way to use the Keras LSTM model to make predictions is to first start with a seed sequence as input, generate the next character, then update the seed sequence to add the generated character on the end and trim off the first character. This process is repeated for as long as you want to predict new characters (e.g., a sequence of 1,000 characters in length).

It generally conforms to the line format observed in the original text of fewer than 80 characters before a new line.

The results are not perfect.