# VOIR AUSSI LA VERSION SUR COLAB


# Tensorflow 💻💻

Tensorflow is an end-to-end open source machine learning platform developped by Google that will make it very easy for us to preprocess data, build models and monitor performance for deep learning projects!

<img src="https://full-stack-assets.s3.eu-west-3.amazonaws.com/images/M08-DeepLearning/tf_logo.png" />

## What you will learn in this course ? 🧐🧐

This lecture will work like a demo that you can follow along where we start with the basics of tensorflow and learn how to load data and prepare it for training deep learning models. Here is the outline of the subjects we will cover here :
* General introduction to what is tensorflow and why we use it?
* Understand how to use and navigate the documentation
* Intro to tensorflow, tensor operations
* Data processing with tensorflow, how to take different kinds of data and turn them into tensors and tensor datasets?

## General Introduction

Tensorflow is a python library for machine learning and more specifically deep learning, since the release of version 2 it's able to run eagerly (meaning that you can now see the results of any operation immediatly very much like what we have seen in python, as opposed to Spark for example that uses lazy execution).

It is not the only tool out there to build deep learning applications, another very popular one is Pytorch, which has been developped by facebook.

Tensorflow is open-source, which means that all the code for every functionnality is publicly available on github <a href="https://github.com/tensorflow">here</a> and a very active community of users may suggest updates and even contribute to improving this tool! It also means you can create your own functionnalities using the source code provided.

The main reasons why we chose to teach you deep learning through tensorflow is its simplicity of use and the ever increasing variety of ressources that come with it. We will discover some of them together like tensorboard and tensorflow hub. Tensorflow is also very flexible and customizable, so during the module we will go from learning how to use veyr simple functionnalities and digging deeper and deeper until you know how to reproduce any application imaginable using the library.

## Tensorflow documentation

The tensoflow documentation contains four main sections :
* <a href="https://www.tensorflow.org/install">Installation guidelines</a>
* <a href="https://www.tensorflow.org/tutorials">Tutorials</a> with examples on how to use tensorflow on practical usecases
* <a href="https://www.tensorflow.org/guide">Guide</a>, a general walthrough the various functionnalities and additional libraries that come with tensorflow
* <a href="https://www.tensorflow.org/api_docs/">TF documentation</a> a classic documentation describing all the different modules of tensorflow.

Let's take an example page from the documentation so we can explain to you how to navigate, read it, and understand it :


## Practical introduction to tensorflow en tensor operations

The main class of tensorflow objects is... you guessed it : Tensors ! Tensors are built on another class of objects that you are already very familiar with : numpy arrays! This is great news because it means that anything you used to be able to do with numpy arrays will be possible with tensors as well, also tensors can easily be converted into arrays and the other way around.

Let's practice a little together :

In [None]:
# tensorflow is pre installed in colab but if you wish to use your local machine 
# please refer to the installation guidelines

In [1]:
# first let's check that the installation went smoothly by printing the installed
# version
import tensorflow as tf
tf.__version__

'2.10.0'

### Tensor operations

In [2]:
# let's create our first tensor
# this is a constant tensor, meaning it is immutable, the values inside it
# may not be changed in place
tensor = tf.constant([[1,2],[3,4]])
print(tensor)

tf.Tensor(
[[1 2]
 [3 4]], shape=(2, 2), dtype=int32)


In [3]:
# Adding a scalar to a tensor
print(tensor + 5)
# We summing elements with different shapes, something called broadcasting
# happens, that expands the smaller object so it can be added to the larger 
# object
# It's equivalent to this operation
print(tensor + tf.constant([[5,5],[5,5]]))

tf.Tensor(
[[6 7]
 [8 9]], shape=(2, 2), dtype=int32)
tf.Tensor(
[[6 7]
 [8 9]], shape=(2, 2), dtype=int32)


In [4]:
# Adding tensors of the same shape
tensor2 = tf.constant([[5,6],[7,8]])
print(tensor + tensor2)

tf.Tensor(
[[ 6  8]
 [10 12]], shape=(2, 2), dtype=int32)


In [5]:
# Adding tensors of different shapes
tensor3 = tf.constant([1,2])
print(tensor + tensor3)
# By broadcasting it's equivalent to
print(tensor + tf.constant([[1,2],[1,2]]))

tf.Tensor(
[[2 4]
 [4 6]], shape=(2, 2), dtype=int32)
tf.Tensor(
[[2 4]
 [4 6]], shape=(2, 2), dtype=int32)


In [6]:
# Adding tensors of different shapes 2
tensor4 = tf.constant([[1],[2]])
print(tensor + tensor4)
# By broacasting is equivalent to
print(tensor + tf.constant([[1,1],[2,2]]))

tf.Tensor(
[[2 3]
 [5 6]], shape=(2, 2), dtype=int32)
tf.Tensor(
[[2 3]
 [5 6]], shape=(2, 2), dtype=int32)


In [7]:
# Multiplication by a scalar
print(tensor * 4)

tf.Tensor(
[[ 4  8]
 [12 16]], shape=(2, 2), dtype=int32)


In [8]:
# Pointwise multiplication by a tensor of same shape
print(tensor * tensor2)

tf.Tensor(
[[ 5 12]
 [21 32]], shape=(2, 2), dtype=int32)


In [9]:
# Pointwise multiplication of tensors of different shapes
print(tensor * tensor3)
print(tensor * tensor4)

tf.Tensor(
[[1 4]
 [3 8]], shape=(2, 2), dtype=int32)
tf.Tensor(
[[1 2]
 [6 8]], shape=(2, 2), dtype=int32)


In [10]:
# Matrix multiplication of tensors
print(tf.matmul(tensor, tensor2))
print(tf.matmul(tensor, tensor4))

tf.Tensor(
[[19 22]
 [43 50]], shape=(2, 2), dtype=int32)
tf.Tensor(
[[ 5]
 [11]], shape=(2, 1), dtype=int32)


### Variable tensors
Variable tensors, unlike constant tensors, are mutable which means new values may be assigned to them in place, this will be useful when working with models with trainable parameters.

In [11]:
variable_tensor = tf.Variable([[1,2],[3,4]])
variable_tensor.assign_add([[1,1],[1,1]]) # adding one to all components
print(variable_tensor)
variable_tensor.assign_sub([[2,2],[2,2]]) # substracting two to all components
print(variable_tensor)
variable_tensor.assign([[1,2],[3,4]]) # assigning a new value all together
print(variable_tensor)

<tf.Variable 'Variable:0' shape=(2, 2) dtype=int32, numpy=
array([[2, 3],
       [4, 5]])>
<tf.Variable 'Variable:0' shape=(2, 2) dtype=int32, numpy=
array([[0, 1],
       [2, 3]])>
<tf.Variable 'Variable:0' shape=(2, 2) dtype=int32, numpy=
array([[1, 2],
       [3, 4]])>


### Useful tensor attributes and methods
Here we present some common attributes and methods of tensors (we will discover many more in the future but these are really fundamental)

In [12]:
tensor.numpy() # converts the tensor to a numpy array

array([[1, 2],
       [3, 4]])

In [13]:
tensor.shape # gives the shape of the tensor

TensorShape([2, 2])

In [14]:
tf.reshape(tensor, [-1,1]) # reshapes the tensor

<tf.Tensor: shape=(4, 1), dtype=int32, numpy=
array([[1],
       [2],
       [3],
       [4]])>

## Data processing with tensorflow

In this part we will see an example of processing tabular data with tensorflow

In [16]:
from sklearn.datasets import load_iris 
iris = load_iris() # loding the iris dataset
data = iris.data # storing data in a separate object
target = iris.target # storin the target in a separate object

In [17]:
# To train deep learning models, we will use batch gradient descent
# Therefore we are going to form batch datasets with tensorflow

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data,target)

# to form a tensor dataset we will use a function call from_tensor_slices
# that converts tuples of arrays into tensor datasets
train = tf.data.Dataset.from_tensor_slices((X_train,y_train))
test = tf.data.Dataset.from_tensor_slices((X_test,y_test))

# to extract a tensor from these objects we can use two different techniques :
x, y = next(iter(train)) # iter turns train into an iterator and next picks the next element of train
print('x:',x)
print('y:',y)

for x, y in train.take(1): #take will give you the first n tensors in the dataset
  print('x:',x)
  print('y:',y)

x: tf.Tensor([6.  2.2 4.  1. ], shape=(4,), dtype=float64)
y: tf.Tensor(1, shape=(), dtype=int32)
x: tf.Tensor([6.  2.2 4.  1. ], shape=(4,), dtype=float64)
y: tf.Tensor(1, shape=(), dtype=int32)


In [18]:
# let's see the type of object obtained for train
train
# it's a TensorSliceDataset that contains tuples of tensors of respective shapes
# (4,) (meaning 4 columns) and () (meaning it's a scalar)

<TensorSliceDataset element_spec=(TensorSpec(shape=(4,), dtype=tf.float64, name=None), TensorSpec(shape=(), dtype=tf.int32, name=None))>

In [19]:
# Before creating our batches we need to add a property to our tensor dataset
# the ability to shuffle th observations every time we use this object
# the argument inside shuffle "buffer_size" gives you the amount of samples we wish to select
# after each shuffle, if buffer_size is greater than the number of elements inside
# the dataset it will simply take them all (it does not oversample)
train_shuffle = train.shuffle(buffer_size=len(X_train))
test_shuffle = test.shuffle(buffer_size=len(X_test))
# the shuffle method will give this property to the tensor dataset
for x, y in train_shuffle.take(1): 
  print('x:',x)
  print('y:',y)
for x, y in train_shuffle.take(1): 
  print('x:',x)
  print('y:',y)
# and now every time I use .take I get a different tensor
# same thing goes for next(iter())
x, y = next(iter(train_shuffle)) 
print('x:',x)
print('y:',y)
x, y = next(iter(train_shuffle)) 
print('x:',x)
print('y:',y)

x: tf.Tensor([4.8 3.4 1.6 0.2], shape=(4,), dtype=float64)
y: tf.Tensor(0, shape=(), dtype=int32)
x: tf.Tensor([6.3 2.7 4.9 1.8], shape=(4,), dtype=float64)
y: tf.Tensor(2, shape=(), dtype=int32)
x: tf.Tensor([6.4 2.7 5.3 1.9], shape=(4,), dtype=float64)
y: tf.Tensor(2, shape=(), dtype=int32)
x: tf.Tensor([5.4 3.4 1.5 0.4], shape=(4,), dtype=float64)
y: tf.Tensor(0, shape=(), dtype=int32)


In [20]:
# Now we are ready to form our batches, let's use the .batch method
train_batch = train_shuffle.batch(batch_size=8)
test_batch = test_shuffle.batch(batch_size=8)

# When extracting data from these objects we now get batches!
for x, y in train_batch.take(1): 
  print('x:',x)
  print('y:',y)
# This gives us a batch of 8 observations from the training data of 
# shape (8,4) (batch_size, ncol) and (8,) for the target associated with each
# observation in the batch

x: tf.Tensor(
[[7.6 3.  6.6 2.1]
 [6.3 2.5 4.9 1.5]
 [5.1 2.5 3.  1.1]
 [5.4 3.9 1.3 0.4]
 [6.1 2.8 4.  1.3]
 [5.  3.5 1.6 0.6]
 [5.2 3.5 1.5 0.2]
 [7.7 2.8 6.7 2. ]], shape=(8, 4), dtype=float64)
y: tf.Tensor([2 1 1 0 1 0 0 2], shape=(8,), dtype=int32)


You now know a little bit about tensorflow and how to process data now go have some practice on your own with the exercises!

## Ressources 📚📚

* <a href="https://www.tensorflow.org/tutorials/load_data/csv"> A tensorflow tutorial that goes a little further than what we have seen here </a>