Pip installing tools required in order to import the dataset

In [2]:
!pip install opendatasets
!pip install pandas

Collecting opendatasets
  Downloading opendatasets-0.1.22-py3-none-any.whl (15 kB)
Installing collected packages: opendatasets
Successfully installed opendatasets-0.1.22


Importing the libraries required for installing the dataset and downloaded the dataset

* Kaggle Username: Veronica Mordvinova2
* Kaggle Key: will be given during the lesson

In [4]:
import opendatasets as od
import pandas

od.download(
	"https://www.kaggle.com/datasets/asdasdasasdas/garbage-classification")

Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: Veronica Mordvinova2
Your Kaggle Key: ··········
Dataset URL: https://www.kaggle.com/datasets/asdasdasasdas/garbage-classification
Downloading garbage-classification.zip to ./garbage-classification


100%|██████████| 82.0M/82.0M [00:02<00:00, 32.4MB/s]





#**Importing and Formatting the Data**

Importing the rest of the libraries needed to train the model

In [5]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
import os
import PIL
import pathlib

Loading in the images from the dataset and splitting them into "training" and "validation" data

Training data - is labeled data which we give the computer in order for it to learn how to guess images correctly

Validation data - used to see how accurate the computer's guesses are

In [6]:
training_data = tf.keras.utils.image_dataset_from_directory(
  "garbage-classification/Garbage classification/Garbage classification",
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(256, 256),
  batch_size=5)

validation_data = tf.keras.utils.image_dataset_from_directory(
  "garbage-classification/Garbage classification/Garbage classification",
  validation_split=0.2,
  subset="validation",
  seed=123,
  image_size=(256, 256),
  batch_size=5)

class_names = training_data.class_names
print(class_names)

Found 2527 files belonging to 6 classes.
Using 2022 files for training.
Found 2527 files belonging to 6 classes.
Using 505 files for validation.
['cardboard', 'glass', 'metal', 'paper', 'plastic', 'trash']


#**Building the Model**


Making sure that there is buffered prefetching - so you can you can yield date from disk without having I/O becoming a blocker

Dataset.cache - allow the images to be stored in memory so they don't have to be searched for repeatedly with each iteration

Dataset.prefetch - connects the cache to the model execution (you'll see this further down on the page)

In [7]:
AUTOTUNE = tf.data.AUTOTUNE

training_data = training_data.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
validation_data = validation_data.cache().prefetch(buffer_size=AUTOTUNE)

Since colours appear as values between 0 and 255, and we want our values to be in the 0-1 range - we will divide the numbers by 255 in order to **normalize** them.

In [8]:
normalization_layer = tf.keras.layers.Rescaling(1./255)

Here we apply the **"normalization"** - meaning that we are dividing the pixel values in each image, and making sure that the computer is still able to know what the image is of.

In [9]:
normalized_ds = training_data.map(lambda x, y: (normalization_layer(x), y))
image_batch, labels_batch = next(iter(normalized_ds))
first_image = image_batch[0]
# Notice the pixel values are now in `[0,1]`.
print(np.min(first_image), np.max(first_image))

0.21617648 1.0


Here we are building what you would see in a machine learning diagram - a lot of different layers that connect together.

These layers filter information and allow the machine to learn!

In [11]:
num_classes = len(class_names)

model = tf.keras.Sequential([
  tf.keras.layers.Rescaling(1./255),
  tf.keras.layers.RandomFlip("horizontal", input_shape=(256,
  256,3)),
  tf.keras.layers.RandomRotation(0.1),
  tf.keras.layers.RandomZoom(0.1),
  tf.keras.layers.Conv2D(32, 3, activation='relu'),
  tf.keras.layers.MaxPooling2D(),
  tf.keras.layers.Conv2D(32, 3, activation='relu'),
  tf.keras.layers.MaxPooling2D(),
  tf.keras.layers.Conv2D(32, 3, activation='relu'),
  tf.keras.layers.MaxPooling2D(),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(num_classes)
])

Now we are compiling (just like in Python) our model, so that we can see a variety of summary information (such as accuracy of the model at any step) when we are training our model.

In [12]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

#**Training the Model**

Now we are ready to train the model! We are using our training data to train the model and the validation data in order to test how accurate the machine's guesses are

Epochs are how many passthroughs of the data that you want to do, in order to make the model more accurate.

In [18]:
history = model.fit(
  training_data,
  validation_data=validation_data,
  epochs=10
)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#**Testing**

Are you curious as to how accurate our models ended up being?

You can test our machine learning model by inserting a new link in the "test_url" variable and seeing if it predicted the result accurately!

Follow up question: why might it not have predicted the material accurately?

In [20]:
test_url = "https://images-na.ssl-images-amazon.com/images/I/81VQ-mOl7CL.jpg"
test_path = tf.keras.utils.get_file(origin=test_url)

img = tf.keras.utils.load_img(
    test_path, target_size=(256, 256)
)
img_array = tf.keras.utils.img_to_array(img)
img_array = tf.expand_dims(img_array, 0) # Create a batch

predictions = model.predict(img_array)
score = tf.nn.softmax(predictions[0])

print(
    "This image most likely belongs to {} with a {:.2f} percent confidence."
    .format(class_names[np.argmax(score)], 100 * np.max(score))
)

This image most likely belongs to cardboard with a 99.05 percent confidence.
