# Connect to Google Drive
The Porto dataset should be stored in Google Drive.

As such we need to access it.

In [0]:
from google.colab import drive
drive.mount('/content/gdrive')
!ls '/content/gdrive/My Drive'

# Import code from Tensorflow object detection API

In [0]:
!mkdir -p /content/gdrive/My\ Drive/porto-dataset-2
%cd /content/gdrive/My\ Drive/porto-dataset-2
!git clone https://github.com/tensorflow/models.git
!mv models/research/object_detection /content/gdrive/My\ Drive/porto-dataset-2
!mv -u models/research/slim/* /content/gdrive/My\ Drive/porto-dataset-2
!mv models/research/setup.py /content/gdrive/My\ Drive/porto-dataset-2
!rm -r models
!python setup.py install
!protoc object_detection/protos/*.proto --python_out=.

# Load Dataset
Create a duplicate of the dataset

And create the train and test folder

In [0]:
!cp -r '/content/gdrive/My Drive/porto-dataset/.' '/content/gdrive/My Drive/porto-dataset-2/object_detection'
!mkdir -p /content/gdrive/My\ Drive/porto-dataset-2/object_detection/images/test
!mkdir -p /content/gdrive/My\ Drive/porto-dataset-2/object_detection/images/train

## Move images
Move the first 20% of each category to test

In [0]:
%cd /content/gdrive/My\ Drive/porto-dataset-2/object_detection/images/
%cd arrabida
!mv `ls | head -120` ../test/
%cd ../camara
!mv `ls | head -120` ../test/
%cd ../clerigos
!mv `ls | head -120` ../test/
%cd ../musica
!mv `ls | head -120` ../test/
%cd ../serralves
!mv `ls | head -120` ../test/

Move all the other images to train

In [0]:
%cd /content/gdrive/My\ Drive/porto-dataset-2/object_detection/
!mv images/arrabida/*.* images/train
!mv images/camara/*.* images/train
!mv images/clerigos/*.* images/train
!mv images/musica/*.* images/train
!mv images/serralves/*.* images/train

## Move annotations
Move the first 20% to test

In [0]:
%cd /content/gdrive/My\ Drive/porto-dataset-2/object_detection/annotations/
%cd arrabida
!mv `ls | head -120` /content/gdrive/My\ Drive/porto-dataset-2/object_detection/images/test
%cd ../camara
!mv `ls | head -120` /content/gdrive/My\ Drive/porto-dataset-2/object_detection/images/test
%cd ../clerigos
!mv `ls | head -120` /content/gdrive/My\ Drive/porto-dataset-2/object_detection/images/test
%cd ../musica
!mv `ls | head -120` /content/gdrive/My\ Drive/porto-dataset-2/object_detection/images/test
%cd ../serralves
!mv `ls | head -120` /content/gdrive/My\ Drive/porto-dataset-2/object_detection/images/test
%cd /content/gdrive/My\ Drive/porto-dataset-2/object_detection

Move the other annotations to train

In [0]:
!mv annotations/arrabida/*.* images/train
!mv annotations/camara/*.* images/train
!mv annotations/clerigos/*.* images/train
!mv annotations/musica/*.* images/train
!mv annotations/serralves/*.* images/train

Remove unused directories

In [0]:
%cd /content/gdrive/My\ Drive/porto-dataset-2/object_detection/
!rm -r annotations
!rmdir images/arrabida
!rmdir images/camara
!rmdir images/clerigos
!rmdir images/musica
!rmdir images/serralves

# Convert Dataset
After getting the files of the dataset, we need to adapt it to our algorithm.
## Import repo for needed files
As the repository is private, the files needed should be manualy placed in the drive at the root of the dataset. In this case, to `/content/gdrive/My Drive/porto-dataset-2/object_detection/`.

## Convert XML to CSV
The algorithm that is going to be used needs CSV files instead of XML.

So, first we need to convert our bounding box files to CSV.

But before we need to remove bad files from the dataset.

In [0]:
!rm /content/gdrive/My\ Drive/porto-dataset-2/object_detection/images/train/clerigos-0135.bmp
!rm /content/gdrive/My\ Drive/porto-dataset-2/object_detection/images/train/clerigos-0135.xml
!rm /content/gdrive/My\ Drive/porto-dataset-2/object_detection/images/test/serralves-0119.xml

In [0]:
%cd /content/gdrive/My\ Drive/porto-dataset-2/object_detection/  
!python xml_to_csv.py

## Generate tensorflow records
The TFRecord file format is a simple record-oriented binary format that many TensorFlow applications use for training data.

In [0]:
%cd /content/gdrive/My\ Drive/porto-dataset-2/object_detection/
!python generate_tfrecord.py --csv_input=images/train_labels.csv --image_dir=images/train --output_path=train.record
!python generate_tfrecord.py --csv_input=images/test_labels.csv --image_dir=images/test --output_path=test.record

# Create Label Map

In [0]:
!mkdir -p /content/gdrive/My\ Drive/porto-dataset-2/training
%cd /content/gdrive/My\ Drive/porto-dataset-2/training
!echo "item {id: 1 name: 'arrabida'}" > labelmap.pbtxt
!echo "item {id: 2 name: 'camara'}" >> labelmap.pbtxt
!echo "item {id: 3 name: 'clerigos'}" >> labelmap.pbtxt
!echo "item {id: 4 name: 'musica'}" >> labelmap.pbtxt
!echo "item {id: 5 name: 'serralves'}" >> labelmap.pbtxt

In addition to the `labelmap`, the configuration of the network should also be placed manualy in `/content/gdrive/My Drive/porto-dataset-2/training`.

# Train

In [0]:
%cd /content/gdrive/My\ Drive/porto-dataset-2
!python train.py --logtostderr --train_dir='/content/gdrive/My Drive/porto-dataset-2/training/' --pipeline_config_path='/content/gdrive/My Drive/porto-dataset-2/training/faster_rcnn_inception_v2_porto.config'