<a href="https://colab.research.google.com/github/anhtel94/CAP4630---Machine-Learning/blob/master/HW_4/HW4_problem5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Problem 5**

# **1. General concepts**

**Artificial intelligence** is a concept defined in serveral ways by serveral people, but John McCarthy's definition is the most encompassing one, which is "the science and engineering of making intelligent machines".

Another way to define the study of AI is that it is the study of designing computer systems that able to perform tasks that traditional programming virtually unable to emulate but human intelligence is good at.

**Machine Learning** is one component of AI. According to Arthur Samuel, it is a study field that allow computers to have ability learning without being explicitly programmed. The purposes of ML is to design programs that adjust their output data in respone to a given set of input data, like human's ability of learning over their lifetimes. These kind of programs are usually more flexible than regular program that designed by people programmers, they able to change themselves over the data they generated and thus create possible solitions to solve problems of AI.

**Deep Learning** is a component of ML. This is a field that breaking up an algorithm into sereral layers, each layer can handle different part of the process.

# **2. Basic Concepts**

**Linear regression** is a statistical tool that approach to modeling the relationship between a dependent variable and one or more independent variables.

**Logistic regression** is a statistical model that used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. It also can be extended to model several classes of events such as determining an image contains animals

**Gradient** of a function is the collection of all its partial derivatives into a vector. The gradient vector's direction is equal to the direction of fastest increase of f, and the rate of increasing in that direction equal to the gradient vector's magnitude.

**Gradient descent** is an algorithm for finding local minimum of a differentiable function. It can be computationally expensive. To reduce this, dataset can be broken down into smaller batches, which approaching same amount of accuracy with less computing over iterations.

# **3. Building a model**

Keras model consists of sereral types of layers for different use cases:

**Convolutional layers (Conv2D)**

These layers usually process image data, which stored in 4D tensors. They perform a convolution operation that applying a filter on the input matrix to create an appropriate output matrix. This output must be flattened before being passed to different layers.

**Fully connected layers (Dense)**

These layers consist of simple vector data, which stored in 2D tensors of shape (samples, features) and of often processed by densely connected layers.

**Recurrent layers**

These layers typically processed sequence data, which store in 3D tensors of shape (samples, timesteps, features).

In [0]:
# Example with a convolution layer and two fully connected layers

# Create the model
model = models.Sequential()

# Add convolution layer and flatten its output
model.add(Conv2D)
model.add(layers.Flatten())

# Add 2 fully connected layers with output of 256 and ReLU activation function,
# and 10 and sigmoid activation function
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(10, activation='sigmoid'))

# **4. Comping a model**

Once the network architecture is defined, we have to choose a loss function and optimizer to use while training


*   The loss function is the function that will be minimized during training. For supervised learning problems, it measures the deviation between the predicted value and the target for the training examples.
*   The optimizer determines how the network will be updated based on the loss function. it implements a specific variant of stochastic gradient descent (SGD).
*   Also choose a metric function used to judge the performance of the model.






In [0]:
network.compile(optimizer='rmsprop'
                loss='categorical_crossentropy'
                metrics=['accuracy'])



*   Choosing the right objective function for the right problem is extremely important. There are simple guidelines we can use to choose the correct loss for common problems such as classification, regression, and sequence prediction.


| Problem type              | Last layer activation  | Loss function              | 
|:-:                        |:-:                     |:-:                         |
| Binary classification     | sigmoid                | binary_crossentropy        |
| Multiclass, single-label  | softmax                | categorical_crossentropy   |
| Mutlticlass, multi-label  | sigmoid                | binary_crossentropy        |
| Regression to real values | none                   | mse                        |
| Regression to \[0,1\]     | sigmoid                | mse or binary_crossentropy |

# **5. Training a model**

After comping a model, we pass the training data, validation data, number of epochs, steps, and other parameters into the fit() or fit_generator() function. Output generated by the functions will be stored into a "history" variable that allow us to examine the loss/accuracy curves when training is complete.



*   **Overfitting** is a problem in training that when model is only able to make correct predictions on the training dataset. but unable to generate predictions on data outside of that set. There are some methods that help overfitting: early stopping, dropout, data augmentation.
*   **Underfitting** is problem when model not complex enough for the dataset it is analyzing. It is unable to train well enough the input dataset.

In [0]:
# Example for early stopping
es = EarlyStopping(monitor='val_loss', mode='min', patience=50)
cb_list = [cb, ...]

In [0]:
# Example for dropout layer
model.add(layers.Dropout(0.2))

In [0]:
# Example for data augmentation
train_datagen = ImageDataGenerator(
    rescale=1./255, 
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

train_generator = train_datagen.flow_from_directory(
    train_dir,
    target_size=(150, 150),
    batch_size=20,
    class_mode='binary')

validation_datagen = ImageDataGenerator(rescale=1./255)

validation_generator = validation_datagen.flow_from_directory(
    validation_dir,
    target_size=(150, 150),
    batch_size=20,
    class_mode='binary')

In [0]:
# Training model with callback list created
history = model.fit_generator(
    train_generator,
    steps_per_epoch=100,
    epochs=30,
    validation_data=validation_generator,
    validation_steps=50
    callbacks=cb_list
)

# Training without data augmentation
history = model.fit(
    training_data,
    training_labels,
    epochs=30,
    batch_size = 128,
    validation_data=(test_data, test_labels)
)

# **6. Finetuning a pretrained model**

Before completeing training a model, we need to freeze the pretrained model from the training steps and only train what we add onto it.

In [0]:
from keras.applications import VGG16
from keras import layers
from keras import models
from keras import optimizers

conv_base = VGG16(
    weights='imagenet', 
    include_top=False, 
    input_shape=(150, 150, 3))

conv_base.trainable = False

model = models.Sequential()
model.add(conv_base)
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
#model.add(layers.Dropout(0.1))
model.add(layers.Dense(1, activation='sigmoid'))

After training, we need to unfreeze some layers at the end of the pretrained model that can increase the accuracy of the model.

In [0]:
conv_base.trainable = True

set_trainable = False
for layer in conv_base.layers:
  if layer.name == 'block5_conv1':
    set_trainable = True
  if set_trainable:
    layer.trainable = True
  else:
    layer.trainable = False

After fine-tuning, we can use a smaller learning rate to re-compile and re-train this model.

In [0]:
# compile model

model.compile(
    loss='binary_crossentropy',
    #
    # choose a smaller learning rate
    #
    optimizer=optimizers.RMSprop(lr=1e-5), 
    metrics=['acc'])

# train

history = model.fit_generator(
    train_generator,
    steps_per_epoch=100,
    epochs=100,
    validation_data=validation_generator,
    validation_steps=50)

**References**

*   Professor's slide  
  *   https://github.com/schneider128k/machine_learning_course/blob/master/slides/1_a_slides.pdf
  *   https://github.com/schneider128k/machine_learning_course/blob/master/slides/2_e_slides.pdf
  *   https://github.com/schneider128k/machine_learning_course/blob/master/keras_basics.md
  *   https://colab.research.google.com/drive/1F-RWvoxH8MmT7c1UmNy41iuOp-ejiLoF#scrollTo=Fh6gZSeAjF7c

*   Online sources
  *   https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-the-right-time-using-early-stopping/
  *   https://en.wikipedia.org/wiki/Linear_regression
  *   https://en.wikipedia.org/wiki/Logistic_regression
  *   https://en.wikipedia.org/wiki/Gradient_descent