# TensorFlow
---

In this module, we will learn about the TensorFlow package and how it can be used to build our own machine learning models.

#### Introduction
TensorFlow is a open-source library for machine learning and artificial intelligence. It is maintained by Google.


Artifical intelligence is anything that automates intellectual tasks which are normally performed by humans/
A.I. can be some programme that has a specific set of rules to adapt to certain situations in the way a human would/
A.I. has developed into a larger field containing machine learning and neural networks/
Whereas A.I. follows a set of rules to automate a task, Machine Learning makes new rules to perform the task/
Machine learning therefore needs a lot of data to generate the rules/
Because the Machine Learning algorithm finds the rules for us, there can be some accuracy issues/
Neural Networks or Deep Learning are a type of Machine Learning that uses layered representations of data/
Typically in a Neural Network, there is more than one set of rules between input and output data/
Input information is often called the feature/
Output information is often called the label/
Finding the rules which turns features into labels is called training and requires training data/
There are three types of Machine Learning/
Supervised learning is when we have features that correspond to a label, allowing us to find the rules/
It uses the labels we already have for a feature and checks the rules to see if they match/
Unsupervised learning is when we only have features, no labels/
This is useful for clustering groups of data, when we don't care how the data is clustered/
Reinforcement learning is when we have an agent, an environment, and a reward/
The agent aims to complete a task within the environment and if it makes progress it receives a reward/
Through maximising it's reward, it finds out how to finish the task/
Time to do this can depend on the environment/
TensorFlow uses basic math and linear algebra, but more advance stuff is made easier/
When you write code in TensorFlow we make a graph/
For example, a graph made up of variables can be used to store the sum of the variables, but only really stores the equation of the sum/
No actual result is stored in the graph, this is done in a session/
A session executes a part of the graph, performing partial computations throughout the graph/
To install TensorFlow, use:

In [None]:
pip install tensorflow
import tensorflow as tf

In [None]:
print(tf.version)

A Tensor is a vector (datapoint) generalised to higher dimensions/
A vector can have any number of dimensions in it/
Tensors are the main object we manipulate in TensorFlow/
Each Tensor has a datatype and a shape (what is contained and what dimensions there are)/
Make a tensor using:


In [None]:
tf.Variable(3.14, tf.float64)

Rank is another word for degree of number of dimensions/
A scalar has rank 0/
An array of objects has a rank of 1/
Find the rank using/

In [None]:
tf.rank(tensor_name)

Find the shape of a tensor using

In [None]:
tensor_name.shape

Returns a value indicating the number of elements in each rank (2,3 for a rank 2 tensor with three items in each dimension)/
Shape of a tensor can be changed using:

In [None]:
tf.reshape(tensor_name, [shape])

If you put negative one as one of the shapes, it fills the blank/
There are different types of Tensors: Variable, Constant, Placeholder, SparseTensor/
Only the Variable Tensor can change/
Evaluating tensors can be done by crating a session:/

In [None]:
with tf.Session() as sess:
    tensor_name.eval()

There are certain algorithms that are needed to use TensorFlow/
These are Linear Regression, Classification, Clustering, and Hidden Markov Models/
Linear Regression finds a linear correspondence between data points/
THis then allows us to predict some value based on a line of best fit/
This is easy to consider with two-dimensional Tensors (x and y), but can be useful for more complicated dimesnionalities/
Linear regression aims to be as close to every data point as every other data point, and also to spit data evenly on either side of the line/
In a higher dimension system (say 3D), knowing all but one variable of the equation for the line of best fin allows you to calculate the other/
Linear in this sense means that something increases or decreases when something else increases or decreases/

To use linear regression, you need to start:

In [None]:
pip install sklearn

from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import clear_output
from six.moves import urllib

import tensorflow.compat.v2.feature_column as fc
import tensorflow as tf

We can test using data from the titanic to predict who would survive


In [None]:
dftrain = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/train.csv') # training data
dfeval = pd.read_csv('https://storage.googleapis.com/tf-datasets/titanic/eval.csv') # testing data
y_train = dftrain.pop('survived')
y_eval = dfeval.pop('survived')

Here we take csv data of the titanic survivors (one for training and one for testing), and then remove the label(outputs) from the features (inputs)/
There are feature columns which can be composed of numerical or categorical data (not numeric)/
Categorical data needs to be encoded into an integer value

In [None]:
dftrain[feature_name].unique()

This returns all the unique categorical features in a column called feature_name/
This can be used alongside:

In [None]:
tf.feature_column.numeric_column_with_vocabulary_list(feature_name, dftrain[feature_name].unique())

to make a feature column out of all possible features
/
Train a model by feeding data, loading in batches if we have large datas/
Epochs are feeding the same data in a different order to get a better prediction/
However, it is possible to overfit training data, to the point where the algorithm just learns the data points off by heart/
Avoid this by slowly increasing the number of epochs/

An input function allows us to describe how we will feed data into our model/
Need to make a tf.data.Dataset object out of our data base for the model to work/

In [None]:
def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
    def input_function():  # inner function, this will be returned
        ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df))  # create tf.data.Dataset object with data and its label
        if shuffle:
            ds = ds.shuffle(1000)  # randomize order of data
        ds = ds.batch(batch_size).repeat(num_epochs)  # split dataset into batches of 32 and repeat process for number of epochs
        return ds  # return a batch of the dataset
    return input_function  # return a function object for use

train_input_fn = make_input_fn(dftrain, y_train)  # here we will call the input_function that was returned to us to get a dataset object we can feed to the model
eval_input_fn = make_input_fn(dfeval, y_eval, num_epochs=1, shuffle=False)

The model can be created using:

In [None]:
linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)

The model can then be trained using

In [None]:
linear_est.train(train_input_fn)  # train
result = linear_est.evaluate(eval_input_fn)  # get model metrics/stats by testing on tetsing data
clear_output()

You can access facts about the result by referencing it in dictionary format (result['accuracy'])/
You would change epochs to acheive the best model with highest accuracy/
We can predict using:

In [None]:
result = list(linear_est.predict(eval_input_fn))