#  Module 1. Introduction to Tensorflow
The purpose of this Notebook is to provide an introduction to Tensorflow 2.0 for creating basic machine learning models. To this end, we are going to be working with an example, which consist in classifying flowers attending to differerent input features (e.g., sepal length/width and the petal length/width.) For this we are going to use an opensource dataset known as the [Iris dataset](https://archive.ics.uci.edu/ml/datasets/iris). The first part of this module will focus on downloading, and preparing the data. Do not hurry nor worry if you do not get all details yet. There are many ways of doing this task in Tensorflow (as well as in other frameworks), you will get well versed in doing this with experience. 

## Donwloading and preparing the dataset
In this case we are going to download the data using the python request API. Particularly, we will use the method `get` to download the data and will write it into a local file 'iris.data'

In [None]:
import requests
import pandas as pd
import tensorflow as tf

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
r = requests.get(url)

# Writing data to a file
with open('iris.data', 'wb') as f:
  f.write(r.content)

You can validate that you have create the file by listing the content of the working directory

In [None]:
!ls

If you look at the content of the file you just created in the row above, you will see that there is one line per element in the dataset. Each of these lines is a comma separated list of values, which are either numerical or text. The first four columns of every line are numerical and indicate respectively the value of four different features we have for each flower: tthe sepal length, the sepal witdth, the petal width and the petal length. The last column in the line is the name of the flower type in text. The first 5 lines of the file are as follows:
```
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa

```
Obviously, this is going to be different for different datasets. There is no receipt for this: each dataset creator decides what is bests or makes more sense. Your task should always be reading the documentation associated to the dataset to understand its structure. 


Let's move on with our dataset though. We are going to use pandas now to load the dataset into a dataframe so it will be easier for us to manipulate the data. 

In [None]:
iris_df = pd.read_csv('iris.data',names=['sepal_length', 'sepal_width', 'petal_width', 'petal_length', 'label'], header=None)
iris_df.head()

First, we will focus on the numerical values of the dataframe. We will extract them into a separate dataframe and will center these values around the mean of eavery column. 

In [None]:
numerical_values = iris_df[["sepal_length", "sepal_width", "petal_width", "petal_length"]]
numerical_values = numerical_values - numerical_values.mean(axis=0)
numerical_values.head()

The dataframe contains numerical values, but these are not yet prepared to be the input of a machine learning model. In Tensorflow, we need to use the `tf.Tensor` tensor type. Tensors are n-dimensional structures that can hold different type of data. In addition, tensors have additional methods and storage that facilitate computing and storing gradients for example. A tensor might or might not share the underlaying memory of another python structure (such as a dataframe or nd-array), but it definitely adds some additional features on top fo these. 

A `tf.Tensor` can be created out of a dataframe using the function `tf.convert_to_tensor`.

In [None]:
input_features = tf.convert_to_tensor(numerical_values)
print(input_features)

Once the features are numerical values and have been converted into tensors, they are ready to be use for training a model. However the labels are still in text form. Recall from the slides, that a machine learning model is a mathematical functions and these require numerical representations. In this example, we are working with labels, or categories for doing a classification. A typical numerical representation that is used for this kind of problems is called `one hot` encoding. Basically, each label is represented by a vector of length equal to the number of different existing categories. In our case, we have three types of flowers, so this vector will have lenght three. The components of the vector will be all `zero`, except one, which will have the value `one`. This non-zero component will indicate the type of the label represented by that vector. For example, we might decide that the `Iris-setosa` type use the first vector element, `Iris-versicolor` type use the second vector element, and `Iris-virginica` use the third vector component. To convert a set of categories into this representation we need to make use of the tensorflow functions tf.one_hot. 

In [None]:
iris_df['label'] = iris_df['label'].map({'Iris-setosa':0, 'Iris-versicolor':1, 'Iris-virginica':2})
output = tf.one_hot(iris_df['label'],depth=3)
print(output)

# Creating a model
In the next step, we will use a neural network for classifying the input values. In particular, for this example, we will use a fully connected neural network consisting of four layers: 
- Input layer with as many neurons as input features we have
- A hidden layer of 32 neurons
- A hidden layer of 16 neurons
- An output layer with as many neurons as categories we have

In [None]:

from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential
model = Sequential([
        Dense(32,activation='relu',input_shape=(4,)),
        Dense(16,activation='relu'),
        Dense(3, activation='softmax')])
model.summary()

Note that the input layer is the input itself and we only need to indicate this as the shape of the first hidden neuron in our model. The second dimension of the `input_shape` parameter in this case is left empty to indicate that elements can be provided as input to the model in batches of unspecified size. This is a cool feature because as we know the training algorithm works in batches. 

The model will provide a prediction as we know. Would would like to measure the quality of that prediction with a loss function. In this case, as we are working with categories, we will use the `categorical_crossentropy` function, which measures the difference between probability distributions for a given random variable/set of events. We also would need an algorithm for training the network. In this example we will use a vanilla stochastic gradient descent one. Finally, we will be also interested in knowing the value of some metrics during the training phase. In this example, we will be interested to know how many flowers have been classified correctly. To pass all these arguments to our model, we will make use of the compile function.



In [None]:
model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['acc'])

Once all these parameters are defined, we can train our model. As our features and output are already tensors, we can use them as input directly. Training can be trigered with the `tf.fit` function. This function accept different parameters. Besides the input and outputs of the model, we are going to set the values for the batch size (`batch_size`) and number of epochs (`epochs`).

In [None]:
model.fit(input_features,output,batch_size=64,epochs=25)


Great! We have now created a model that is able to predict with around 75% accuracy the type of a flower given four features regarding its dimension. But we have left so many things on the table: how do we use the model, how does the model generalise, how do we improve the model, etc. Do not worry, we are going to be covering these in the next modules. 