# Project Flowers - Predicting Types of Flowers (Classification)


In this project you will build a flower detector and use transfer learning to train it.  You will be using a real-world dataset which contains 4242 images of flowers across 5 different flower varieties.

### In this project you will:
- Download and analyze the flower vision dataset
- Convert the images into PyTorch dataloaders building both the training, validation and testing sets
- Use a state-of-the-art vision model and apply transfer learning to build and train your model
- Experiment with different models to achieve a target average loss (or better) on the test set
- Analyze your model’s performance using accuracy, confusion matrix and other techniques
- Test your model using your own images of flowers

### To get started:
- Open up a web browser (preferable Chrome)
- Copy the Project GitHub Link: https://github.com/LeakyAI/PyTorch-Overview
- Head over to Google Colab (https://colab.research.google.com)
- Load the notebook: Project Flowers - START HERE.ipynb
- Replace the [TBD]'s with your own code
- Execute the notebook after completing each cell

### Hint
Don't forget to print out and have your PyTorch and Pandas Cheatsheet handy when tackling this project. You can find it on the right-hand side of the the course home landing page in the Resource section.

### How to Submit your Project:
When you have completed filling out all the TBD sections and achieved a 90% accuracy or better on the test set, you may submit your project for review by downloading the notebook and emailing it to the address listed on the project page inside of your course.

Good luck!


# Download the Dataset

In [None]:
# Downlod the flower dataset
!wget https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz
data_dir = "/content/flower_photos/" 

# Unzip all the images    
!tar zxvf flower_photos.tgz

# Download a state-of-the-art vision model (efficientnet)
!pip install efficientnet_pytorch

# Load the Libraries and Check for GPU Support

In [None]:
# Load the appropiate PyTorch libraries and visualization libraries
import [TBD]
import [TBD].nn as nn
import [TBD].optim as optim
import [TBD].nn.functional as F
from [TBD] import datasets, transforms, models
from torch.utils.data import Subset
import matplotlib.pyplot as plt
import numpy as np
import random
import time

In [None]:
# Setup our notebook to be able to regenertate results
SEED = 4321
random.seed([TBD])
torch.manual_seed([TBD])
torch.cuda.manual_seed([TBD])
torch.backends.cudnn.deterministic = True
np.random.seed([TBD])

In [None]:
# This project will require a GPU
# Check which GPU is avaialble
use_cuda = torch.cuda.is_available()

# Set the device type to cude or cpu depending on what is available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Print out the details of the GPU or CPU backend
print ("GPU is availble!" if use_cuda else "No GPU :(")
print (torch.cuda.get_device_name(0) if use_cuda else None)
print (torch.__version__)

# Load our Data into Training, Validation and Test Sets
Next, we will load our dataset using PyTorch's built-in function (ImageFolder).  We will start by building our image transformations for both training (with data augmentation) as well as testing (no data augmentation).

In [None]:
# Normalize our images with the ImageNet means and stdev
# The mean values are [0.485, 0.456, 0.406]
# The standard deviation values are [0.229, 0.224, 0.225]
normalize = transforms.Normalize(
    mean=[[TBD], [TBD], [TBD]], 
    std=[[TBD], [TBD], [TBD]])

# Undo our normalization for display purposes
inv_normalize = transforms.Normalize(
    mean=[-0.485/0.229, -0.456/0.224, -0.406/0.225],
    std=[1/0.229, 1/0.224, 1/0.225])

# Build our transform for training using data augmentation 
trans = transforms.Compose([transforms.RandomRotation(25),
                            transforms.RandomResizedCrop(224),
                            transforms.RandomHorizontalFlip(),
                            transforms.ToTensor(),
                            normalize])

# Build our transform for validation and testing dataset
# without data augmentation (no data augmentation)
transNoAugment = transforms.Compose([transforms.[TBD](224), 
                                     transforms.[TBD](224),
                                     transforms.[TBD](),
                                     [TBD]])

In [None]:
# Split our dataset into test, train and validation sets 
# using about 80% of our dataset for training.
# Create datasets using ImageFolder which will use 
# foldernames to populate labels
fullDataAug = datasets.ImageFolder(data_dir, transform=[TBD])
fullDataNoAug = datasets.ImageFolder(data_dir, transform=[TBD])

# Save the class labels for later
classes=fullDataAug.classes

# Create the index splits for training, validation and test
total = len(fullDataAug)
indices = list(range(total))

# Grab 80% of the data for training, then 10% / 10% for test and validation
trainingPercent = .8
split1 = int(total*trainingPercent)
split2 = int(((total - split1)/2)+split1)
np.random.shuffle(indices)

# Build datasets by using Subset and pass the split indices
traindata = Subset(fullDataAug, indices[:split1])
valdata = Subset(fullDataNoAug, indices[split1:split2])
testdata = Subset(fullDataNoAug, indices[split2:])

# Analyze our Data

In [None]:
# Print out the classes and association found by ImageFolder using
# the class_to_idx variable in the dataset class
print(fullDataAug.class_to_idx)

In [None]:
# Analyze the dataset by checking the balance of the labels
@torch.no_grad() 
def labelDistribution(dataset, classes, setName):
    num_classes = len(classes)
    labels = torch.zeros(num_classes, dtype=torch.long)
    for _, target in dataset:
        labels[target] += 1
    print ('------- {} ({}) ------- '.format(setName,labels.sum()))
    for idx, name in enumerate(classes):
        print ('{} {} ({:0.1f}%)'.format(labels[idx], name, labels[idx].float()/labels.sum()*100))
    print("\n")

# Check our Training Dataset
labelDistribution([TBD], classes, "Train")

# Check our Validation Dataset
labelDistribution([TBD], classes, "Validation")

# Check our Test Dataset
labelDistribution([TBD], classes, "Testing")

# Compare Original vs. Augmented Images
Here we will take a closer look at how the images are modified by the transform.  You can replay this and see the changes made to the image.  Note, both images have been normalized to the ImageNet mean/stdev so their colors are modified a bit from the original.

In [None]:
def compareImg2Img(img1,img2):
    fontsz = 16
    img1 = inv_normalize(img1)
    img2 = inv_normalize(img2)
    
    img1 = img1.numpy()
    img1 = np.transpose(img1, (1, 2, 0))
    img2 = img2.numpy()
    img2 = np.transpose(img2, (1, 2, 0))

    f, ax = plt.subplots(1, 2, figsize=(10, 10))

    ax[0].imshow(img1, cmap='gray')
    ax[1].imshow(img2,cmap='gray')
    ax[0].set_title('No Augmentation', fontsize=fontsz)
    ax[1].set_title('Augmented', fontsize=fontsz)

compareImg2Img(fullDataNoAug[traindata.indices[10]][0],traindata[10][0])

# Create the DataLoaders for Train, Validation and Test
Using the 3 datasets above, create our dataloaders for each making sure to pass in the num_workers and batch_size.  Shufflying the data is a good idea for the training dataloader.

In [None]:
num_workers = 2
batch_size = 32

trainLoader = torch.utils.data.DataLoader([TBD], batch_size=[TBD], 
                                          num_workers=num_workers, 
                                          drop_last=True, shuffle=[TBD])

valLoader = torch.utils.data.DataLoader([TBD], batch_size=[TBD],
                                        num_workers=num_workers, 
                                        drop_last=True)

testLoader = torch.utils.data.DataLoader([TBD], batch_size=[TBD], 
                                         num_workers=num_workers, 
                                         drop_last=True)

In [None]:
# Check out the first batch of labels and ensure they are shuffled

# Grab the first batch of inputs (x) and outputs (y)
x,y = next(iter(trainLoader))

# Print out the labels and ensure they are randomized
print (y)

# Build the Scoring and Training Code

In [None]:
# Calculate validation loss and accuracy for a loader and model
# Return the average loss
@torch.no_grad() 
def scoreModel(loader, criterion, model, device, name):
    
    # Set the model to eval (not training)
    model.[TBD]()        
    lossTotal = 0.0
    numCorrect = 0

    for x,y in loader:
        
        # Move the input x, label y to the appropiate device found in the "Check for GPU Support" above
        x, y = x.to([TBD]), y.to([TBD])
        pred = model(x)
        loss = criterion(pred, y)
        lossTotal+=loss.item()*x.size(0)
        
        predClass = pred.max(1)[1] # grab largest logit index of the 5 converting [32,5] to [32]
        numCorrect+=((predClass==y).sum())

    lossAvg = lossTotal/len(loader.sampler)
    acc = numCorrect.float()/len(loader.sampler)*100
    
    print('{} Loss {:0.2f}  Accuracy: {:0.2f}%  '.format(name,lossAvg,acc),end='')
    
    return lossAvg

In [None]:
# Train our model - train and save the best model in fileName
def trainModel(model, trainLoader, criterion, optimizer, fileName, epochs = 5):

    # Time how long the model was trained
    tStart = time.time()

    # Initialize the validation loss to inf
    v_loss = float('inf')  
    
    for epoch in range(epochs):
        t_loss = 0  

        # Set the model to train
        model.[TBD]()

        # Iterate across the training dataloader
        for x, y in [TBD]:
            x, y = x.to([TBD]), y.to([TBD])  # Move x and y to proper device
            optimizer.zero_grad()
            pred = model(x)
            loss = criterion(pred,y)
            loss.backward()
            optimizer.step()
            
            # Hint: loss.item() represents the average loss for the current batch
            t_loss += loss.item()*x.size(0)  # Multiply by the current batch size

        # Compute the final training loss after a full epoch
        finalLoss = t_loss / len(trainLoader.sampler)
        print ('{} / {} Training Loss {:0.2f}  '.format(epoch+1,epochs,finalLoss), end='')

        # Check the validation loss and save the model only if its decreased
        loss = scoreModel(valLoader, criterion, model, device, "Validation")
        if (v_loss>loss):
            torch.save(model.state_dict(), fileName)
            print ("  Saving model...", end='')
            v_loss = loss
        print ("\n")
    print ("Total Training Time: {:0.2f} seconds".format(time.time()-tStart))

# Build a Model with 5 Classes using a Pre-Trained Model (ResNet 50)
Here we will use ResNet50 with pre-trained weights and replace the last layer with a new classifer

In [None]:
# Start by loading the Resnet50 model with pre-trained weights
model = models.resnet50(weights=models.ResNet50_Weights.DEFAULT)

# Freeze the model weights keeping only FC trainable
for param in model.parameters():
    param.requires_grad = False

# Build new FC layer
from collections import OrderedDict
model.fc = nn.Sequential(OrderedDict([
                          ('fc1', nn.[TBD](2048, 500)),
                          ('relu', nn.[TBD]()),
                          ('dropout', nn.[TBD](0.25)),
                          ('fc2', nn.Linear(500, [TBD])),
                          ]))

# Push model to GPU if we have one
model = model.to([TBD])

# Define our loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=[TBD], momentum=[TBD])

In [None]:
trainModel(model, trainLoader, criterion, optimizer, "bestflowerresnet.pt")
model.load_state_dict(torch.load("bestflowerresnet.pt"))
scoreModel(testLoader, criterion, model, device, "Testing");

# Build a Model with 5 Classes using a Pre-Trained Model (EfficientNet-B0)

In [None]:
from efficientnet_pytorch import EfficientNet
model = EfficientNet.from_pretrained('efficientnet-b0', num_classes=[TBD])
model = model.to([TBD])

# Define our loss and optimizer and then fine tune
criterion = nn.CrossEntropyLoss()
optimizer = optim.RMSprop(model.parameters(), lr=[TBD])

In [None]:
trainModel(model, trainLoader, criterion, optimizer, "bestflowerenet.pt")
model.load_state_dict(torch.load("bestflowerenet.pt"))
scoreModel(testLoader, criterion, model, device, "Testing");

# Analyze the Model's Performance

In [None]:
# Adapted from https://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html
%matplotlib inline  
import sklearn
import itertools
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
import time

@torch.no_grad()
def confusion(loader, model):
    preds = torch.tensor([], dtype=torch.float, device=device)
    labels = torch.tensor([],dtype=torch.long, device=device)
    model.eval()
    for x,y in loader:
        x, y = x.to(device), y.to(device)
        pred = model(x)
        preds = torch.cat((preds, pred), dim=0)
        labels = torch.cat((labels,y), dim=0)

    preds = preds.argmax(dim=1)
    cm = confusion_matrix(labels.cpu(), preds.cpu())

    # Visualize the confusion matrix using SciKit learn code
    normalize=False
    title='Confusion matrix'
    cmap=plt.cm.Blues
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt), horizontalalignment="center", color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    return preds, labels

In [None]:
# Display the confusion matrix for the model using the test data
preds, labels = confusion(testLoader, model)

# Congratulations on Finishing the Project!
### In this project you:
- Downloaded and analyzed the flower vision dataset
- Converted the images into PyTorch dataloaders building both the training, validation and testing sets
- Used a state-of-the-art vision model and apply transfer learning to build and train your model
- Experimented with different models to achieve a target average loss (or better) on the test set
- Analyzed your model’s performance using accuracy, confusion matrix and other techniques
- Tested your model using your own images of flowers

### How to Submit your Project:
When you have completed filling out all the TBD sections and achieved a 90% accuracy or better on the test set, you may submit your project for review by downloading the notebook and emailing it to the address listed on the project page inside of your course.