# Core Learning Algorithms

## Four Basic Machine Learning Algorithms

- Linear Regression
- Classification
- Clustering
- Hidden Markov Models

## Classification
- Used to seperate datapoints into classes of different labels 

### Example Dataset
- iris dataset
- 120 entries, 4 features, 1 label

Three species : Setosa, Versicolor, Virginica
Has features  : sepal length, sepal width, petal length, petal width

In [None]:
import tensorflow as tf
import pandas as pd

In [None]:
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

train = pd.read_csv("https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv", names=CSV_COLUMN_NAMES, header=None, skiprows=1)
test = pd.read_csv("https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv", names=CSV_COLUMN_NAMES, header=None, skiprows=1)

train_y = train.pop('Species')
test_y = test.pop('Species')

In [None]:
# Feature Column - create numeric columns for each feature (no categorical features)

my_feature_columns = []
for key in train.keys():
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))

In [None]:
# Input Function to make dataset

def input_fn(features, labels, training=True, batch_size=256):
    # Convert input to dataset
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle and repeat if in training mode
    if training:
        dataset = dataset.shuffle(1000).repeat()
    
    return dataset.batch(batch_size)

### Build Model
- Variety of estimators/models to choose from for classification
- Some Options:
    - `DNNClassifier` (Deep Neural Network)
    - `LinearClassification` (similar to linear regression, but does classification instead of regression)

In [None]:
# Build a DNN w9ith 2 hidden layers with 30 and 10 hidden nodes each

classifier = tf.estimator.DNNClassifier(
    feature_columns=my_feature_columns,
    hidden_units=[30,10],       # Hidden Layer sizes
    n_classes=3                 # Number of classes to choose from
)

### Training


In [None]:
# Train
classifier.train(
    input_fn = lambda: input_fn(train, train_y, training=True), # give input function as lambda
    steps=5000  # number of entries to repeat
)


In [None]:

# Evaluate
result = classifier.evaluate(
    input_fn = lambda: input_fn(test, test_y, training=False)
)

print(f"\nTest set accuracy: {result['accuracy']:.3f}")

### Prediction Script
- Predict species of flowers with given features

In [None]:
def predict_input_fn(features, batch_size=256):
    # Convert the inputs to a Dataset without labels.
    return tf.data.Dataset.from_tensor_slices(dict(features)).batch(batch_size)

features = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth']

# This gets user input
# predict = {}
# 
# print("Please type numeric values as prompted.")
# for feature in features:
#   valid = True
#   while valid: 
#     val = input(feature + ": ")
#     if not val.isdigit(): valid = False

#   predict[feature] = [float(val)]

# Hard code features
predict = {'SepalLength': [2.4], 'SepalWidth': [2.6], 'PetalLength': [6.5], 'PetalWidth': [6.3]}


predictions = classifier.predict(input_fn=lambda: predict_input_fn(predict))
for pred_dict in predictions:
    class_id = pred_dict['class_ids'][0]
    probability = pred_dict['probabilities'][class_id]

    print('Prediction is "{}" ({:.1f}%)'.format(
        SPECIES[class_id], 100 * probability))