# TensorFlow 101

Getting Started with TensorFlow - https://www.tensorflow.org/get_started/

## The TensorFlow programming stack

As the following illustration shows, TensorFlow provides a programming stack consisting of multiple API layers:

![](https://www.tensorflow.org/images/tensorflow_programming_environment.png)

As you start writing TensorFlow programs, we strongly recommend focusing on the following two high-level APIs:

- Estimators, which represent a complete model. The Estimator API provides methods to train the model, to judge the model's accuracy, and to generate predictions.

- Datasets, which build a data input pipeline. The Dataset API has methods to load and manipulate data, and feed it into your model. The Dataset API meshes well with the Estimators API.

The **training set** contains the examples that we'll use to train the model; the **test set** contains the examples that we'll use to evaluate the trained model's effectiveness.

The training set and test set started out as a single data set. Then, someone split the examples, with the majority going into the training set and the remainder going into the test set. Adding examples to the training set usually builds a better model; however, adding more examples to the test set enables us to better gauge the model's effectiveness. Regardless of the split, the examples in the test set must be separate from the examples in the training set. Otherwise, you can't accurately determine the model's effectiveness.

The premade_estimators.py program relies on the **load_data** function in the adjacent **project_name.py** file to read in and parse the training set and test set. 

## The Algorithm

The program trains a Deep Neural Network classifier model having the following topology:

- 2 hidden layers.
- Each hidden layer contains 10 nodes.

The following figure illustrates the features, hidden layers, and predictions (not all of the nodes in the hidden layers are shown):

![](https://www.tensorflow.org/images/custom_estimators/full_network.png)

## Overview of programming with Estimators

An Estimator is TensorFlow's high-level representation of a complete model. It handles the details of initialization, logging, saving and restoring, and many other features so you can concentrate on your model. For more details see Estimators.

An Estimator is any class derived from tf.estimator.Estimator. TensorFlow provides a collection of pre-made Estimators (for example, LinearRegressor) to implement common ML algorithms. Beyond those, you may write your own custom Estimators. We recommend using pre-made Estimators when just getting started with TensorFlow. After gaining expertise with the pre-made Estimators, we recommend optimizing your model by creating your own custom Estimators.

To write a TensorFlow program based on pre-made Estimators, you must perform the following tasks:

- Create one or more input functions.
- Define the model's feature columns.
- Instantiate an Estimator, specifying the feature columns and various hyperparameters.
- Call one or more methods on the Estimator object, passing the appropriate input function as the source of the data.

## Creating Input Functions

You must create input functions to supply data for training, evaluating, and prediction.

An input function is a function that returns a tf.data.Dataset object which outputs the following two-element tuple:

- features - A Python dictionary in which:
    - Each key is the name of a feature.
    - Each value is an array containing all of that feature's values.
- label - An array containing the values of the label for every example.

Just to demonstrate the format of the input function, here's a simple implementation:

```
def input_evaluation_set():
    features = {'SepalLength': np.array([6.4, 5.0]),
                'SepalWidth':  np.array([2.8, 2.3]),
                'PetalLength': np.array([5.6, 3.3]),
                'PetalWidth':  np.array([2.2, 1.0])}
    labels = np.array([2, 1])
    return features, labels
```

Your input function may generate the features dictionary and label list any way you like. However, we recommend using TensorFlow's Dataset API, which can parse all sorts of data. At a high level, the Dataset API consists of the following classes:

![](https://www.tensorflow.org/images/dataset_classes.png)

Where the individual members are:

- Dataset - Base class containing methods to create and transform datasets. Also allows you to initialize a dataset from data in memory, or from a Python generator.
- TextLineDataset - Reads lines from text files.
- TFRecordDataset - Reads records from TFRecord files.
- FixedLengthRecordDataset - Reads fixed size records from binary files.
- Iterator - Provides a way to access one data set element at a time.

The Dataset API can handle a lot of common cases for you. For example, using the Dataset API, you can easily read in records from a large collection of files in parallel and join them into a single stream.


## Define the feature columns

A feature column is an object describing how the model should use raw input data from the features dictionary. When you build an Estimator model, you pass it a list of feature columns that describes each of the features you want the model to use. The tf.feature_column module provides many options for representing data to the model.

## Instantiate an estimator

TensorFlow provides several pre-made classifier Estimators, including:

- tf.estimator.DNNClassifier for deep models that perform multi-class classification.
- tf.estimator.DNNLinearCombinedClassifier for wide & deep models.
- tf.estimator.LinearClassifier for classifiers based on linear models.

## Train, Evaluate, and Predict

Now that we have an Estimator object, we can call methods to do the following:

- Train the model.
- Evaluate the trained model.
- Use the trained model to make predictions.