# Classification
In the last section i've seen linear regression. **Linear regression** is used to predict numeric value.  
**Classification** is used to seperated data points into different classes of labels.

In [2]:
# from __future__ import absolut_import, division, print_function, unicode_literals

import tensorflow as tf
import pandas as pd
import os

## Datasets
It's possible to use tf.keras.utils.get_file to get file from a url, and store it to cache.  
Still, i prefer to use local files to avoid api calls.

In [11]:
train_path = "data/iris/iris_training.csv"
test_path = "data/iris/iris_test.csv"

# define some constants to help us later on
CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']

# read dataset
train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)
# names: list of column names to use.
# header: row number(s) to use as the column names.

# pop the species column off
train_y = train.pop("Species")
test_y = test.pop("Species")

print(train.head(3)) # the Species column is now gone.
print(' ')
print("The shape is train is %s" % str(train.shape))

   SepalLength  SepalWidth  PetalLength  PetalWidth
0          6.4         2.8          5.6         2.2
1          5.0         2.3          3.3         1.0
2          4.9         2.5          4.5         1.7
 
The shape is train is (120, 4)


## Input function
We need to create input function again. This time it will be easier to digest.

In [None]:
def input_fn(features, labels, training=True, batch_size=256):
    # convert the inputs to a dataset
    dataset = tf.Dataset.from_tensor_slices((dict(features), labels))

    # shuffle and repeat if you are in training mode
    if training:
        dataset = dataset.shuffle(1000).repeat()

    return dataset.batch(batch_size)

A few explanation of the above code.
- `tf.Dataset.from_tensor_slices()` is used to create input pipeline for machine learning models. It creates a `tf.data.Dataset`object.
- `dataset.shuffle(1000).repeat()` the `repeat()` method:  
`repeat(count=None, name=None)` is used to repeat dataset so each original value is seen count times. While put nothing, the dataset will be repeated indefinitely.

## Feature columns
Don't forget feature columns.

In [22]:
# Feature columns describe how to use the input
my_feature_columns = []
for key in train.keys(): # train is pd dataframe. .keys() is used to get column names
    my_feature_columns.append(tf.feature_column.numeric_column(key=key))
print(my_feature_columns)
print(' ')
# WARNING:tensorflow:From C:\Users\eziod\AppData\Local\Temp\ipykernel_17516\269687074.py:5: numeric_column (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.
# Instructions for updating:
# Use Keras preprocessing layers instead, either directly or via the `tf.keras.utils.FeatureSpace` utility. Each of `tf.feature_column.*` has a functional equivalent in `tf.keras.layers` for feature preprocessing when training a Keras model.

tmp = {}
tmp['SepalLength'] = "float"
tmp['SepalWidth'] = "float"
tmp['PetalLength'] = "float"
tmp['PetalWidth'] = "float"
tmp2 = tf.keras.utils.FeatureSpace(tmp) 
print("tmp2 is %s" % tmp2)

[NumericColumn(key='SepalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='SepalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalLength', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), NumericColumn(key='PetalWidth', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]
 
tmp2 is <keras.utils.feature_space.FeatureSpace object at 0x000002347FEC3DC0>


A few explanations...
- `df.keys()` and `df.columns` are nearly the same. While `.columns` can be used to set column labels, `.keys()` is "read only".
- `tf.feature_column.numeric_column(key=key)` returns an object of type "<class 'tensorflow.python.feature_column.feature_column_v2.NumericColumn'>"