<font color="white">.</font> | <font color="white">.</font> | <font color="white">.</font>
-- | -- | --
![NASA](http://www.nasa.gov/sites/all/themes/custom/nasatwo/images/nasa-logo.svg) | <h1><font size="+3">ASTG Python Courses</font></h1> | ![NASA](https://www.nccs.nasa.gov/sites/default/files/NCCS_Logo_0.png)

---

<center>
    <h1><font color="red">Machine Learning Modeling with Tensorflow</font></h1>
</center>

## Useful Reference

- <a href="https://www.mygreatlearning.com/blog/what-is-tensorflow-machine-learning-library-explained/">What is TensorFlow? The Machine Learning Library Explained</a>
- <a href="https://www.tensorflow.org/tutorials/keras/regression">Basic regression: Predict fuel efficiency</a>
- <a href="https://stackabuse.com/tensorflow-2-0-solving-classification-and-regression-problems/">Tensorflow 2.0: Solving Classification and Regression Problems</a>
- <a href="https://www.toptal.com/machine-learning/tensorflow-machine-learning-tutorial">Getting Started with TensorFlow: A Machine Learning Tutorial</a>
- <a href="https://sebastianraschka.com/faq/docs/tensorflow-vs-scikitlearn.html">What is the main difference between TensorFlow and scikit-learn?</a>
- <a href="https://adventuresinmachinelearning.com/python-tensorflow-tutorial/">Python TensorFlow Tutorial – Build a Neural Network</a>
- <a href="https://steadforce.com/en/first-steps-tensorflow-part-3/">A simple neural network with TensorFlow</a>

# <font color="red"> Overview of Neural Networks</font>

- Neural networks are developed to mimic the neural connection in a brain. 
- A neural network consists of a number of layers, and each layer consists of a number of units (or neurons). 
- The task of every neuron is to process the information received and then transmit it to the neurons in the next layer.


**How do we choose the number of hidden layers and hidden neurons?**

There is not any precise rule, in general the number of hidden layers depends strongly on the problem, at odds to the input and output layers. In particular:

- **Input layer:** The input layer is where the network starts. The number of units in this layer is fixed and corresponds exactly to the number of input features.
- **Hidden layers:** The number of hidden layer and units per layer are the free parameters that one has to fix. There is no rule to decide these two parameters but it depends strongly on the problem.
- **Output layer:** The output layer is where the network ends and the predictions are given. In the case of a classification problem, the number of units in the output layer corresponds exactly to the number of classes.

These layers form the base for all new networks regardless of complexity.
Once the architecture of a neural network has been set, we can proceed to the training of the neural network.

- The input features are fed to the input layer. 
- The network goes through all the hidden layer(s) until the output layer, where the predictions are produced.
- Ecah layer has an activation function which activates a neuron and can have several expressions, but in most the cases it is either a Rectified Linear Unit (ReLU) or a logistic function.
- Between two adjacent layers there is always a weight matrix which is responsible to transmit the information. For an N-layer neural network, we have N-1 weight matrices and the j-th matrix, i.e. the matrix of the j-th layer, will be a function of all the j-1 matrices in the previous layers.
- The output layer delivers a prediction, which must be compared to the real values. This comparison is an estimate of the error. We want to minimize the error, hence optimize the parameters of the problems, i.e. the weight matrices. One of the best known optimizing algorithms is the gradient descent.

# <font color="red">What is TensorFlow?</font>
- Tensorflow is an open-source library for numerical computation and large-scale machine learning that ease `Google Brain TensorFlow`, the process of acquiring data, training models, serving predictions, and refining future results.
- Tensorflow bundles together Machine Learning and Deep Learning models and algorithms.
- Tensorflow allows developers to create a graph of computations to perform. Nodes in the graph represent mathematical operations and connections (edges) represent data which usually are multidimensional data arrays or tensors, that are communicated between these edges.
- The name `TensorFlow` is derived from the operations which neural networks perform on multidimensional data arrays or tensors! It’s literally a flow of tensors.


**First Example of TensorFlow Graph**

Consider the expression:
<center>
    a = (b + c) * (c + 2)
</center>
We can break this down into:
<center>
    d = b + c
    
    e = c + 2
    
    a = d * e
</center>
Now we can represent these operations graphically as:

![fig_gr1](https://i1.wp.com/adventuresinmachinelearning.com/wp-content/uploads/2017/03/Simple-graph-example.png)
Image Source: adventuresinmachinelearning.com

Note that the operations `d = b + c` and `e = c + 2` can be performed in parallel: potential of distributing such calcultions across CPUs and GPUs. 

**Second Example of TensorFlow Graph**

The graph below shows the computational graph of a three-layer neural network.
The animated data flows between different nodes in the graph are tensors which are multi-dimensional data arrays. 

![fig_gr2](https://i1.wp.com/adventuresinmachinelearning.com/wp-content/uploads/2017/03/TensorFlow-data-flow-graph.gif)

# <font color="red">Main Steps of a ML Program</font>
    
![FIG_AXES](https://www.altudo.co/-/media/altudo/images/resources/blogs/5-steps-to-define-ml-flow-to-deliver-custom-user-experience/2.ashx?la=en&hash=0A8E8BEC05A4C64C37908FB87757285E)


### Load the modules

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
%matplotlib inline
import sys
import csv
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import pandas as pd
import seaborn as sns
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
#import tensorflow_datasets as tfds
from tensorflow.keras.layers import Dense, Flatten, Conv2D
from tensorflow.keras import Model

print(tf.__version__)

# <font color="red">First Problem:</font> Regression Analysis

## <font color="blue">Problem Statement</font>

We consider the function: <br>
$$
f(x,y) = (1-(x^2 + y^3))e^{-\frac{1}{2}(x^2 + y^2)}
$$
<br>
defined in the domain $D=[-3,3] \times [-3,3]$.
<OL>
<LI> We randomnly select $n$ points in the domain $D$ and compute the function on those points to create a (training) dataset containing $n$ pairs points/values.
<LI> We use the dataset for training a ML algorithm.
<LI> We generate a uniform set of points (testing set) in $D$ to test the algorithm.
</OL>

## <font color="blue">Generating the Data</font>

#### Define the Function

In [None]:
def ff(x,y):
    return (1-(x**2+y**3))*np.exp(-(x**2+y**2)/2)

#### Create the Data

In [None]:
num_dims = 2
nx = 30
ny = 30
num_points = nx * ny

# Boundary of the domain
a_min = -3.0
a_max = 3.0

<font color="blue">Generate dataset for training</font>
- The grid points are randomly generated over the domain
- The arrays are 1D

In [None]:
yt = np.zeros(num_points)  # 1D targets for training
xt = np.zeros((num_points, num_dims))  # grid points for training

x = np.random.uniform(a_min, a_max, nx) # Feature vectors
y = np.random.uniform(a_min, a_max, ny) # Labels

k = 0
for i in range(nx):
    for j in range(ny):
        xt[k, 0] = x[i]
        xt[k, 1] = y[j]
        yt[k] = ff(x[i], y[j])
        k += 1

<font color="blue">Add noise in the training targets</font>

Gaussian normal distribution with `noise_mean` as mean and `noise_std` as standard deviation.

In [None]:
noise_mean = 0.0
noise_std  = 1.0e-2
noise = np.random.normal(noise_mean, noise_std, num_points)
yt = yt + noise

<font color="blue">Generate dataset for validation</font>
- The grid points are uniformly distributed over the domain
- The arrays are 1D

In [None]:
yv = np.zeros(num_points)  # 1D targets for validation
xv = np.zeros((num_points, num_dims))  # grid points for validation

x = np.linspace(-3.0, 3.0, nx)
y = np.linspace(-3.0, 3.0, ny)

k = 0
for i in range(nx):
    for j in range(ny):
        xv[k,0] = x[i]
        xv[k,1] = y[j]
        yv[k] = ff(x[i],y[j])
        k += 1

## <font color="blue">Data Gathering and Basic Analyses</font>

#### Data to be used for training

In [None]:
train_data = pd.DataFrame({"x0": xt[:,0], "x1": xt[:,1], 
                           "TargetValues": yt[:]})
print(train_data.head(5))                          

In [None]:
print(len(train_data.keys()))

#### Data to be used for validation

In [None]:
valid_data  = pd.DataFrame({"x0": xv[:,0], "x1": xv[:,1], 
                            "TargetValues": yv[:]})
print(valid_data.head(5))

#### Plot the data to be trained

In [None]:
threedee = plt.figure().gca(projection='3d');
threedee.scatter(train_data['x0'], train_data['x1'], 
                 train_data['TargetValues']);
threedee.set_xlabel('x');
threedee.set_ylabel('y');
threedee.set_zlabel('f(x,y)');
plt.show();

#### Display the joint distribution of the columns from the training set

In [None]:
sns.pairplot(train_data.drop(columns=["TargetValues"]), diag_kind="kde");

Do something similar for the data used for validation

In [None]:
sns.pairplot(valid_data.drop(columns=["TargetValues"]), diag_kind="kde");

#### Check the overall statistics

In [None]:
train_stats = train_data.describe()
train_stats.pop("TargetValues")
train_stats = train_stats.transpose()
train_stats

#### Split features from labels
- Separate the target value, or `label`, from the features.
- This `label` is the value that you will train the model to predict.

In [None]:
train_labels = train_data.pop('TargetValues')
valid_labels = valid_data.pop('TargetValues')

## <font color="blue">Normailized the Data</font>

- It is good practice to normalize features that use different scales and ranges. 
- Although the model might converge without feature normalization, it makes training more difficult, and it makes the resulting model dependent on the choice of units used in the input.

In [None]:
def normalize_data(x):
    "Function to normalize the data"
    return (x - train_stats['mean']) / train_stats['std']

**Normalize the data that will be used to train the model**

In [None]:
normed_train_data = normalize_data(train_data)

**We also need to normalize the validation dataset by projecting it into the same distribution that the model has been trained on**

In [None]:
normed_valid_data = normalize_data(valid_data)

<font color="blue">**The same normalization will have to be applied to any other data used in this model.**</font>

## <font color="blue">Build the Model</font>

#### Instantiate a sequential model using `keras`
- `keras` is TensorFlow's high-level API for building and training deep learning models. It's used for fast prototyping, state-of-the-art research, and production.
- <font color="red">The sequential model is the simplest model to use, especially when getting started.</font>
- It involves defining a Sequential class and adding layers to the model one by one in a linear manner, from input to output.
- The model needs to know what input shape (`input_shape`) it should expect. The first layer of the `Sequential` model needs to receive the information.

In the model below:

- The model expects rows of data with `num_shape` variables (the `input_shape=num_shape` argument)
- The first hidden layer has 64 nodes and uses the `relu` activation function.
- The second hidden layer has 64 nodes and uses the `relu` activation function.
- The output layer has one node and uses no activation function.

The rectified linear activation function (`relu`) is a piecewise linear function that will output the input directly if is positive, otherwise, it will output zero. 
- Because rectified linear units are nearly linear, they preserve many of the properties that make linear models easy to optimize with gradient-based methods. They also preserve many of the properties that make linear models generalize well.
- It has become the default activation function for many types of neural networks because a model that uses it is easier to train and often achieves better performance.

In [None]:
num_shape = len(train_data.keys())
num_nodes = 16

model = keras.Sequential([
             layers.Dense(num_nodes, activation=tf.nn.relu, 
                          input_shape=[num_shape]),
             layers.Dense(num_nodes, activation=tf.nn.relu),
             layers.Dense(1) ])

The above model creation can also be written as:

```python
model = keras.Sequential()
model.add(layers.Dense(num_nodes, activation=tf.nn.relu, input_shape=[num_shape]))
model.add(layers.Dense(num_nodes, activation=tf.nn.relu))
model.add(layers.Dense(1))
```

Dense layers represent a function that maps the input tensor `x` to an output tensor `y` via the equation `y = Ax + b` where `A` (the kernel) and `b` (the bias) are parameters of the dense layer.

![nn](https://github.com/astg606/py_materials/blob/master/machine_learning/tensorflow_nn.png?raw=1)

#### Compile the model
- Once you have specified the architecture of the network, you need to specify the method for back-propagation by choosing an optimizer and specify the loss.
- Compiling the model uses the efficient numerical libraries (Theano or TensorFlow) in the background.

Define the optimizer:

In [None]:
optimizer = tf.keras.optimizers.RMSprop(0.001)

Required to provide a loss function and an optimizer: 
- We are asking the network to use the `rmsprop` optimizer to change weights in such a way that the loss `mse` (mean squared error) is minimized at each iteration.

In [None]:
model.compile(loss = 'mse',
              optimizer = optimizer,
              metrics = ['mae', 'mse'])

#### Inspect the model

`model.summary()` is a useful method if you want to get an overview of your model and see the total number of parameters.
It prints:

- Name and type of all layers in the model.
- Output shape for each layer.
- Number of weight parameters of each layer.
-  If the model has general topology, the inputs each layer receives
- The total number of trainable and non-trainable parameters of the model.



In [None]:
model.summary()

In [None]:
import tensorflow.keras.backend as K

trainable_count = np.sum([K.count_params(w) for w in model.trainable_weights])
non_trainable_count = np.sum([K.count_params(w) for w in model.non_trainable_weights])

print('Total params: {:,}'.format(trainable_count + non_trainable_count))
print('Trainable params: {:,}'.format(trainable_count))
print('Non-trainable params: {:,}'.format(non_trainable_count))

[Let](https://towardsdatascience.com/counting-no-of-parameters-in-deep-learning-models-by-hand-8f1716241889):

- **i**: input size (2 in this case)
- **h**: size of hidden layers (64, 64 here)
- **o**: output size (1 in this case)

We have:

    num_params = connections between layers + biases in every layer
               = (i×h + h×o) + (h+o)
               = (2x16 + 16x16 + 16*1) + (16 + 16 + 1)
               = (2x16 + 16) + (16x16 + 16) + (16x1 + 1)
               = 48 + 272 + 17
               = 337
               
   input = **Input**((None, 2))
   <br>
   dense = **Dense**(16)(input)
      <br>
   dense = **Dense**(16)(dense)
    <br>
  output = **Dense**(1)(dense)
   <br>
   model = Model(input, output)

#### Try the model

10 samples from the training data and call `model.predict`.

In [None]:
example_batch = normed_train_data[:10]
example_result = model.predict(example_batch)
print(example_result)

It seems to be working, and it produces a result of the expected shape and type.

## <font color="blue">Train the Model</font>

Training occurs over epochs and each epoch is split into batches.

- **Epoch**: One pass through all of the rows in the training dataset.
- **Batch**: One or more samples considered by the model within an epoch before weights are updated.
- One epoch is comprised of one or more batches, based on the chosen batch size and the model is fit for many epochs. 
- The model is "fit" to the training data using the `fit` method. We also specify the `batch_size` and the maximum number of `epochs` we want training to go on.
- The callback function is applied at given stages of the training procedure. We use it to get a view on internal states and statistics of the model during training.

Train the model for 1000 epochs, and record the training and validation accuracy in the history object.

In [None]:
# Display training progress by printing a 
# single dot for each completed epoch
class PrintDot(keras.callbacks.Callback):
      def on_epoch_end(self, epoch, logs):
          if epoch % 100 == 0: 
             print('')
          print('.', end='')

# How many times we go through the entire dataset
EPOCHS = 1000

history = model.fit(normed_train_data, train_labels,    
                    epochs=EPOCHS, verbose=1, 
                    callbacks=[PrintDot()])
#epochs=EPOCHS, validation_split = 0.2, verbose=0, callbacks=[PrintDot()])

#### Visualize the model's training progress

In [None]:
# Use the stats stored in the history object.
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
hist.tail()

In [None]:
print(history.history.keys())

In [None]:
def plot_history(history):
    hist = pd.DataFrame(history.history)
    hist['epoch'] = history.epoch

    plt.figure()
    plt.xlabel('Epoch')
    plt.ylabel('Mean Abs Error [Target]')
    plt.plot(hist['epoch'], hist['mean_absolute_error'],
             label='Train Error')
#    plt.plot(hist['epoch'], hist['val_mean_absolute_error'],
#             label = 'Val Error')
    plt.legend()
    plt.ylim([min(hist['mean_absolute_error']) ,max(hist['mean_absolute_error'])])

    plt.figure()
    plt.xlabel('Epoch')
    plt.ylabel('Mean Square Error [$Target^2$]')
    plt.plot(hist['epoch'], hist['mean_squared_error'],
             label='Train Error')
#    plt.plot(hist['epoch'], hist['val_mean_squared_error'],
#             label = 'Val Error')
    plt.legend()
    plt.ylim([0,max(hist['mean_squared_error'])])

plot_history(history)

## <font color="blue">Evaluate the Model on Test Data</font>

**Compute the Scores**

In [None]:
loss, mae, mse = model.evaluate(normed_valid_data, valid_labels, verbose=1)
#print("Testing set Mean Abs Error: {} ".format(mae))

**Make Prediction**

In [None]:
valid_predictions = model.predict(normed_valid_data).flatten()

#### Do the 45-degree plot

In [None]:
plt.scatter(valid_labels, valid_predictions);
plt.xlabel('True Values');
plt.ylabel('Predictions');
plt.axis('equal');
plt.axis('square');
plt.xlim([0,plt.xlim()[1]]);
plt.ylim([0,plt.ylim()[1]]);
_ = plt.plot([-100, 100], [-100, 100]);

**Error Distribution**

In [None]:
sns.distplot(valid_predictions - valid_labels);

#### Plotting Function Using Predicted Values

In [None]:
threedee = plt.figure().gca(projection='3d');
threedee.scatter(valid_data['x0'], valid_data['x1'], valid_predictions);
threedee.set_xlabel('x');
threedee.set_ylabel('y');
threedee.set_zlabel('f(x,y)');
plt.show();

## <font color="blue">Exercise</font>

Consider the 2D problem presented here.
- Create a dataset of 1000 randomly selected points (in the domain) and their associated targets.
- Randomly choose 80% of the data for training and the remaining for testing
- Create your ML model and test it.

# <font color="red">Second Problem:</font> Image Classification

We use the [MNIST data set](http://yann.lecun.com/exdb/mnist/) (Modified National Institute of Standards and Technology database).

* Is a large database of handwritten digits that is commonly used for training various image processing systems.
* The database is also widely used for training and testing in the field of machine learning.
* The dataset we will be using contains 70000 images of handwritten digits among which 10000 are reserved for testing.
* It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

### Check GPU Availability in Tensorflow

In [None]:
#https://www.kaggle.com/hassanamin/tensorflow-mnist-gpu-tutorial

gpus = tf.config.experimental.list_physical_devices('GPU')
for gpu in gpus:
    print("Name:", gpu.name, "  Type:", gpu.device_type)

### Listing Devices including GPU's with Tensorflow

In [None]:
from tensorflow.python.client import device_lib

device_lib.list_local_devices()

### Check GPU in Tensorflow

In [None]:
tf.test.is_gpu_available()

## <font color="blue"> Load MNiST Dataset</font>

In [None]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

In [None]:
print("Shape train inputs:  ", x_train.shape)
print("Shape train outputs: ", y_train.shape)
print("Shape test  inputs:  ", x_test.shape)
print("Shape test  outputs: ", y_test.shape)

In [None]:
print("Type train inputs:  ", x_train.dtype)
print("Type train outputs: ", y_train.dtype)
print("Type test  inputs:  ", x_test.dtype)
print("Type test  outputs: ", y_test.dtype)

In [None]:
np.unique(x_train)

In [None]:
np.unique(x_test)

In [None]:
np.unique(y_train)

In [None]:
np.unique(y_test)

## <font color="blue"> Preprocess the Training and Test Datasets</font>

Change the type from integer to floating point. This will reduce our memory requirements by forcing the precision of the pixel values to be 32 bit, the default precision used by Keras anyway.

In [None]:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

Normalize the data:

In [None]:
x_train = x_train / 255.0
x_test = x_test / 255.0

- The training and test datasets are structured as a 3-dimensional array of instance, image width and image height. 
- For a multi-layer perceptron model we must reduce the images down into a vector of pixels. In this case the 28×28 sized images will be 784 pixel input values.
- We can do this transform easily using the `reshape()` function on the NumPy array.

In [None]:
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)

**Convert class vectors to binary class matrices**

In [None]:
num_classes = 10
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)

## <font color="blue"> Create Sequential Model Using Tensorflow Keras</font>

Architecture of the Network is:

1. Input layer for 28x28=784 images in MNiST dataset
2. Dense layer with 128 neurons and ReLU activation function
2. Dense layer with 128 neurons and ReLU activation function
3. Output layer with 10 neurons for classification of input images as one of ten digits(0 to 9)

In [None]:
mnist_model = tf.keras.models.Sequential()
mnist_model.add(tf.keras.layers.Dense(128, activation='relu', 
                                input_shape=(784,)))
#mnist_model.add(tf.keras.layers.Dropout(0.2))
#mnist_model.add(tf.keras.layers.Dense(128, activation='relu'))
mnist_model.add(tf.keras.layers.Dropout(0.2))
mnist_model.add(tf.keras.layers.Dense(10, activation='softmax'))

In [None]:
mnist_model.summary()

### Compile the Model Designed Earlier

Before the model is ready for training, it needs a few more settings. These are added during the model's compile step:

- Loss function This measures how accurate the model is during training. You want to minimize this function to "steer" the model in the right direction.
- Optimizer This is how the model is updated based on the data it sees and its loss function.
- Metrics Used to monitor the training and testing steps. The following example uses accuracy, the fraction of the images that are correctly classified.

In [None]:
mnist_model.compile(loss='categorical_crossentropy',
              optimizer=tf.keras.optimizers.RMSprop(),
              metrics=['accuracy'])

## <font color="blue"> Training and Validation</font>

The `mnist_model.fit` method adjusts the model parameters to minimize the loss:

In [None]:
num_epochs = 5
batch_size = 16

In [None]:
history = mnist_model.fit(x_train, y_train,
                          batch_size = batch_size,
                          epochs = num_epochs,
                          verbose = 1,
                          validation_data = (x_test, y_test))

## <font color="blue"> Plot the Deceasing Loss over Epochs</font>

Use Pandas to plot a graph showing the decrease in mean squared error (mse) as training improves the model.

In [None]:
loss_df = pd.DataFrame(history.history)
loss_df.plot()

## <font color="blue"> Evaluate the Model</font>

The `mnist_model.evaluate` method checks the models performance, usually on a "Validation-set" or "Test-set".

In [None]:
score = mnist_model.evaluate(x_test,  y_test, verbose=0)
print('Test loss:     {}'.format(score[0]))
print('Test accuracy: {}'.format(score[1]))

## <font color="blue"> Visualize Predictions</font>

In [None]:
probabilities = mnist_model.predict(x_test, steps=1)
predicted_labels = np.argmax(probabilities, axis=1)

In [None]:
def display_digits(X, y):
    """
      Given an array of images of digits X and 
      the corresponding values of the digit y,
      this function plots the first 96 images and their values.
    """
    # Figure size (width, height) in inches
    fig = plt.figure(figsize=(8, 6))

    # Adjust the subplots 
    fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)

    for i in range(96):
        # Initialize the subplots: 
        #    Add a subplot in the grid of 8 by 12, at the i+1-th position
        ax = fig.add_subplot(8, 12, i + 1, xticks=[], yticks=[])
        
        # Display an image at the i-th position
        ax.imshow(X[i].reshape(28, 28), cmap=plt.cm.binary, interpolation='nearest')
       
        # label the image with the target value
        ax.text(0, 7, str(y[i]))

    # Show the plot
    plt.show()

In [None]:
display_digits(x_test, predicted_labels)

## <font color="blue">Save the Model</font>

In [None]:
mnist_model.save('my_MNIST_model')

Then to reload the model later, we can use this:

In [None]:
from tensorflow.keras.models import load_model
model = load_model('my_MNIST_model')

In [None]:
# https://www.tensorflow.org/tutorials/quickstart/advanced
# https://liufuyang.github.io/2017/04/01/just-another-tensorflow-beginner-guide-3.html