# **Basic introduction to Logistic Regression in TensorFlow 2.0**

## **Learning Objectives**

1. Build a logistic regression model
2. Train the model on example data
3. Use the model to make predictions about unknown data

## **Introduction**

This notebook walks through a classification problem. The goal is to *categorise* Iris flowers by species. TensorFlow is used for:
- getting familiar with default eager execution environment
- importing data with the Datasets API
- building models and layers with the Keras API

### **Configure imports**

Import TensorFlow and the other required Python modules. By default, TensorFlow uses eager execution to evaluate operations immediately, returning concrete values instead of creating a computational graph to be executed (which would be lazy evaluation).

In [4]:
import os
import matplotlib.pyplot as plt
import tensorflow as tf
print("TensorFlow version: {}".format(tf.__version__))
print("Eager execution mode: {}".format(tf.executing_eagerly()))

TensorFlow version: 2.4.1
Eager execution mode: True


## **The Iris classification problem**

Imagine you are a botanist seeking an automated way to categorise each Iris flower you find. ML provides many algorithms to classify flowers statistically. For instance, a sophisticated ML program could classify flowers based on photographs. Our ambitions are more modest here -- we are going to classify Iris flowers based on the length and width measurements of their sepals and petals.

The Iris genus entails about 300 species, but our program will only classify the following three:
- Iris setosa
- Iris virginica
- Iris versicolor
Fortunately, someone has already created a data set of 120 Iris flowers with the sepal and petal measurements. This is a classic data set that is popular for basic ML classification problems.

### **Import and parse the training data set**
Download the data set file and convert it into a structure to be fed into TensorFlow.

#### **Download the dataset**

In [5]:
train_url = "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv"

# tf.keras.utils.get_file returns an object file path named fname of the downloaded file from origin
train_dataset_fp = tf.keras.utils.get_file(fname=os.path.basename(train_url),
                                           origin=train_url)
print("Local copy of the data set file: {}".format(train_dataset_fp))

Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv
Local copy of the data set file: /home/antounes/.keras/datasets/iris_training.csv


#### **Inspect the data**
This data set, `iris_training.csv` is a plain text file that stores tabular data formatted as comma-separated values (CSV). Let's take a peek at the first five entries

In [6]:
!head -n5 {train_dataset_fp}

120,4,setosa,versicolor,virginica
6.4,2.8,5.6,2.2,2
5.0,2.3,3.3,1.0,1
4.9,2.5,4.5,1.7,2
4.9,3.1,1.5,0.1,0


Note the following:
1. The first line is a header containing information about the data set:
- there are 120 total examples. Each example has four *features* and one of three possible *label names*
2. Subsequent rows are data records, one example per line:
- the first 4 fields are the *features*: here float numbers representing flower measurements
- the last columnis the *label*: here an integer value corresponding to a flower name
    

In [9]:
column_names = ["sepal_length", "sepal_width", "petal_length", "petal_width", "species"]

feature_names = column_names[:-1]
label_name = column_names[-1]

print("Features: {}".format(feature_names))
print("Label: {}".format(label_name))

Features: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
Label: species


Each label is associated with a string name (e.g. `setosa`), but ML typically relies on *numeric values*. The label numbers are mapped to a named representation, such as:

-`0`: Iris setosa
-`1`: Iris versicolor
-`2`: Iris virginica

In [10]:
class_names = ["Iris setosa", "Iris versicolor", "Iris virginica"]

In [None]:
####