<a href="https://colab.research.google.com/github/Kerriea-star/TensorFlow-Customization/blob/main/Custom_Training_Walkthrough.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This project shows how to train a machine learning model with a custom training loop to categorize penguins by species. In this notebook, you use TensorFlow to accomplish the following:

1.   Import a dataset
2.   Build a simple linear model
3.   Train the model
5.   Evaluate the model
6.   Use the trained model to make predictions



## TensorFlow programming

This tutorial demonstrates the following TensorFlow programming tasks:

*   Importing data with the TensorFlow Datasets API
*   Building models and layers with the Keras API



## Penguin classification problem

Imagine you are an ornithologist seeking an automated way to categorize each penguin you find. Machine learning provides many algorithms to classify penguins statistically. For instance, a sophisticated machine learning program could classify penguins based on photographs. The model you build in this project is a little simpler. It classifies penguins based on their body weight, flipper length, and beaks, specifically the length and width measurements of their culmen.

There are 18 species of penguins, but in this tutorial you will only attempt to classify the following three:

*   Chinstrap penguins
*   Gentoo penguins
*   Adélie penguins


Fortunately, a research team has already created and shared a dataset of 334 penguins with body weight, flipper length, beak measurements, and other data. This dataset is also conveniently available as the penguins TensorFlow Dataset.


### Setup

Install the `tfds-nightly` package for the penguins dataset. `The tfds-nightly` package is the nightly released version of the TensorFlow Datasets (TFDS).

In [1]:
!pip install -q tfds-nightly

Import TensorFlow and the other required Python modules

In [2]:
import os
import tensorflow as tf
import tensorflow_datasets as tfds
import matplotlib.pyplot as plt

print("TensorFlow version: {}".format(tf.__version__))
print("TensorFlow Datasets version: ", tfds.__version__)

TensorFlow version: 2.13.0
TensorFlow Datasets version:  4.9.3+nightly


#### Import the dataset

The default penguins/processed TensorFlow Dataset is already cleaned, normalized, and ready for building a model. Before you download the processed data, preview a simplified version to get familiar with the original penguin survey data

#### Preview the data

Download the simplified version of the penguins dataset (penguins/simple) using the TensorFlow Datasets tfds.load method. There are 344 data records in this dataset. Extract the first five records into a DataFrame object to inspect a sample of the values in this dataset:

In [3]:
ds_preview, info = tfds.load('penguins/simple', split='train', with_info=True)
df = tfds.as_dataframe(ds_preview.take(5), info)
print(df)
print(info.features)

Downloading and preparing dataset 13.20 KiB (download: 13.20 KiB, generated: 56.10 KiB, total: 69.30 KiB) to /root/tensorflow_datasets/penguins/simple/1.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/1 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/344 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/penguins/simple/1.0.0.incompleteBCS3HM/penguins-train.tfrecord*...:   0%| …

Dataset penguins downloaded and prepared to /root/tensorflow_datasets/penguins/simple/1.0.0. Subsequent calls will reuse this data.
   body_mass_g  culmen_depth_mm  culmen_length_mm  flipper_length_mm  island  \
0       4200.0             13.9         45.500000              210.0       0   
1       4650.0             13.7         40.900002              214.0       0   
2       5300.0             14.2         51.299999              218.0       0   
3       5650.0             15.0         47.799999              215.0       0   
4       5050.0             15.8         46.299999              215.0       0   

   sex  species  
0    0        2  
1    0        2  
2    1        2  
3    1        2  
4    1        2  
FeaturesDict({
    'body_mass_g': float32,
    'culmen_depth_mm': float32,
    'culmen_length_mm': float32,
    'flipper_length_mm': float32,
    'island': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'sex': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'species

The numbered rows are data records, one example per line, where:

*   The first six fields are features: these are the characteristics of an example. Here, the fields hold numbers representing penguin measurements.
*   The last column is the label: this is the value you want to predict. For this dataset, it's an integer value of 0, 1, or 2 that corresponds to a penguin species name.

In the dataset, the label for the penguin species is represented as a number to make it easier to work with in the model you are building. These numbers correspond to the following penguin species:

*   `0`: Adélie penguin
*   `1`: Chinstrap penguin
*   `2`: Gentoo penguin

Create a list containing the penguin species names in this order. You will use this list to interpret the output of the classification model:


In [4]:
class_names = ["Adélie", "Chinstrap", "Gentoo"]

### Download the preprocessed dataset

Now, download the preprocessed penguins dataset (penguins/processed) with the tfds.load method, which returns a list of tf.data.Dataset objects. Note that the penguins/processed dataset doesn't come with its own test set, so use an 80:20 split to slice the full dataset into the training and test sets. You will use the test dataset later to verify your model.

In [5]:
ds_split, info = tfds.load("penguins/processed", split=["train[:20%]", "train[:20%]"], as_supervised=True, with_info=True)

ds_test = ds_split[0]
ds_train = ds_split[1]
assert isinstance(ds_test, tf.data.Dataset)

print(info.features)
df_test = tfds.as_dataframe(ds_test.take(5), info)
print("Test dataset sample: ")
print(df_test)

df_train = tfds.as_dataframe(ds_train.take(5), info)
print("Train dataset sample: ")
print(df_train)

ds_train_batch = ds_train.batch(32)

Downloading and preparing dataset 25.05 KiB (download: 25.05 KiB, generated: 17.61 KiB, total: 42.66 KiB) to /root/tensorflow_datasets/penguins/processed/1.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/1 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/334 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/penguins/processed/1.0.0.incompleteQK90NC/penguins-train.tfrecord*...:   0…

Dataset penguins downloaded and prepared to /root/tensorflow_datasets/penguins/processed/1.0.0. Subsequent calls will reuse this data.
FeaturesDict({
    'features': Tensor(shape=(4,), dtype=float32),
    'species': ClassLabel(shape=(), dtype=int64, num_classes=3),
})
Test dataset sample: 
                                         features  species
0  [0.6545454, 0.22619048, 0.89830506, 0.6388889]        2
1        [0.36, 0.04761905, 0.6440678, 0.4027778]        2
2       [0.68, 0.30952382, 0.91525424, 0.6944444]        2
3   [0.6181818, 0.20238096, 0.8135593, 0.6805556]        2
4  [0.5527273, 0.26190478, 0.84745765, 0.7083333]        2
Train dataset sample: 
                                         features  species
0  [0.6545454, 0.22619048, 0.89830506, 0.6388889]        2
1        [0.36, 0.04761905, 0.6440678, 0.4027778]        2
2       [0.68, 0.30952382, 0.91525424, 0.6944444]        2
3   [0.6181818, 0.20238096, 0.8135593, 0.6805556]        2
4  [0.5527273, 0.26190478, 0.84745765

Notice that this version of the dataset has been processed by reducing the data down to four normalized features and a species label. In this format, the data can be quickly used to train a model without further processing.

In [6]:
features, labels = next(iter(ds_train_batch))

print(features)
print(labels)

tf.Tensor(
[[0.6545454  0.22619048 0.89830506 0.6388889 ]
 [0.36       0.04761905 0.6440678  0.4027778 ]
 [0.68       0.30952382 0.91525424 0.6944444 ]
 [0.6181818  0.20238096 0.8135593  0.6805556 ]
 [0.5527273  0.26190478 0.84745765 0.7083333 ]
 [0.17818181 0.45238096 0.22033899 0.08333334]
 [0.3090909  0.48809522 0.2542373  0.21527778]
 [0.6872727  0.6785714  0.5254237  0.3888889 ]
 [0.33454546 0.95238096 0.3898305  0.4722222 ]
 [0.20727272 0.5952381  0.3559322  0.29166666]
 [0.32       0.6904762  0.20338982 0.33333334]
 [0.7490909  0.79761904 0.42372882 0.2847222 ]
 [0.6909091  0.3809524  0.8135593  0.9166667 ]
 [0.37818182 0.5        0.2542373  0.18055555]
 [0.38545454 0.07142857 0.6101695  0.3472222 ]
 [0.05090909 0.70238096 0.30508474 0.25      ]
 [0.46545455 0.02380952 0.69491524 0.6666667 ]
 [0.5672727  0.22619048 0.7457627  0.5694444 ]
 [0.21818182 0.6547619  0.30508474 0.2777778 ]
 [0.47636363 0.08333334 0.7288136  0.5694444 ]
 [0.17454545 0.6547619  0.2881356  0.22222222]
 [