First, mount your drive to give the notebook access to your Drive.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

ModuleNotFoundError: No module named 'google'

Next, change directory to wherever you created your folder. 

In [0]:
import os
# Change this to your Drive folder location
WORKING_DIRECTORY = '/content/drive/My Drive/Colab Notebooks/GateDetector/object_detection/data'
os.chdir(WORKING_DIRECTORY)

Now, let's install the Detecto package using pip. 

In [0]:
!pip install detecto

Import everything we need in the following code block:

In [0]:
import torch
import torchvision
import matplotlib.pyplot as plt

from torchvision import transforms

from detecto import core
from detecto import utils
from detecto import visualize

How cute! Now, we're ready to create our dataset and train our model. However, before doing so, it's a bit awkward working with hundreds of individual XML label files, so we need to convert them into a single CSV file. 

In [0]:
# Do this twice: once for our training labels and once for our validation labels
utils.xml_to_csv('train_labels', 'train_labels.csv')
utils.xml_to_csv('val_labels', 'test_labels.csv')

Below, we create our dataset, applying a couple of transforms beforehand. These are optional, but they can be useful for augmenting your dataset without gathering more data. 

In [0]:
# Specify a list of transformations for our dataset to apply on our images
transform_img = transforms.Compose([
    transforms.ToPILImage(),
    transforms.Resize(800),
    transforms.RandomHorizontalFlip(0.5),
    transforms.ToTensor(),
    utils.normalize_transform(),
])

dataset = core.Dataset('train_labels.csv', 'images/', transform=transform_img)

Finally, let's train our model! We need to create a DataLoader over our dataset to specify how we feed the images into our model. We should also use our validation dataset to track the accuracy of the model throughout training. 

In [0]:
# Create our validation dataset
val_dataset = core.Dataset('test_labels.csv', 'images/', transform=transform_img)

# Create the loaders for our train and validation datasets
loader = core.DataLoader(dataset, batch_size=2, shuffle=True)
val_loader = core.DataLoader(val_dataset)

# Create our model, passing in all unique classes we're predicting
# Note: make sure these match exactly with the labels in the XML/CSV files!
model = core.Model(['gate'])

# Train the model! This step can take a while, so make sure you
# the GPU is turned on in Edit -> Notebook settings
losses = model.fit(loader, val_loader, epochs=30, learning_rate=0.01, gamma=0.2, lr_step_size=5, verbose=True)

model.save('trained_model.pth')

# Plot the accuracy over time
plt.plot(losses)
plt.show()

In [0]:
from detecto.visualize import plot_prediction_grid, detect_video

detect_video(model, 'input.mp4', 'output.mp4')