<a href="https://colab.research.google.com/github/abriyanyusuf/C23PS423_ML/blob/main/TrainObjDetect.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction
This notebook aimed to create feature of object detection using TensorFlow. We want to create object detection feature using custom dataset.

Created by : CS42-PS423 Team 

#Create Dataset : Gathering and Labelling Training Images (Optional)
We will use this data set [ECUS TFD dataset](https://github.com/Liang-yc/ECUSTFD-resized-) that contain 19 types of food collected from different angle. 
There are \: 
1. Apple 296 Images
2. banana 178 Images
3. bread 66 Images
4. bun 90 Images
5. doughnut 210 Images
6. egg 104 Images
7. fired dough twist 124 Images
8. grape 58 Images
9. lemon 148 Images
10. litchi 78 Images
11. mango 220 Images
12. mooncake 134 Images
13. orange 254 Images
14. peach 126 Images
15. pear 166 Images
16. plum 176 Images
17. qiwi 120 Images
18. sachima 150 Images
19. tomato 172 Images
20. Mix fruit 108 Images

Total Images \: 2978 Images

We able to add new categories of food by using [LabelImg](https://github.com/heartexlabs/labelImg). We can use this code below to collect images from web cam if we want. It will automatically capture images from web came in certain time that we desired before

##1. Import Dependencies

In [None]:
##Import cv2
import cv2
#Import UUID
import uuid
#Import Time
import time
##Import OS
import os

##2. Define Initial Parameter

In [None]:
##Define Images to Collect
labels = ['label1', 'label2']
number_images = 10 ##Put number images per label that we want capture

##3. Setup Folder to Save Collected Images

In [None]:
##Define images path 
%cd '/mydrive'
IMAGES_PATH = os.path.join('Tensorflow', 'collectedimages')

In [None]:
##Create folder it not exist
if not os.path.exists(IMAGES_PATH):
  if os.name == 'posix':
    !mkdir -p {IMAGES_PATH}
  if os.name == 'nt':
    !mkdir {IMAGES_PATH}

##Create folder based on label
for label in labels:
  path = os.path.join(IMAGES_PATH, label)
  if not os.path.exists(path):
    !mkdir {path}

##4. Capture Images Using Webcam

In [None]:
for label in labels:
    cap = cv2.VideoCapture(0)
    print('Collecting images for {}'.format(label))
    ##give a delay before start capturing
    time.sleep(5)
    ##kita akan mengambil 5 gambar untuk masing-masing label
    for imgnum in range(number_imgs):
        print('Collecting image {}'.format(imgnum))
        ret, frame = cap.read()
        imgname = os.path.join(IMAGES_PATH,label,label+'.'+'{}.jpg'.format(str(uuid.uuid1())))
        cv2.imwrite(imgname, frame)
        cv2.imshow('frame', frame)
        #give delay to each capturing process eg image 0 to image 1 etc
        time.sleep(2)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
cap.release()
cv2.destroyAllWindows()

##5. Labelling Image Using LabelImg

In [None]:
##Setup directory folder
LABELIMG_PATH = os.path.join('Tensorflow', 'labelimg')

In [None]:
##Cloning LabelImg from github
if not os.path.exists(LABELIMG_PATH):
    !mkdir {LABELIMG_PATH}
    !git clone https://github.com/tzutalin/labelImg {LABELIMG_PATH}

In [None]:
##Installing LabelIMG
if os.name == 'posix':
    !make qt5py3
if os.name =='nt':
    !cd {LABELIMG_PATH} && pyrcc5 -o libs/resources.py resources.qrc

In [None]:
##Open labelImg
!cd {LABELIMG_PATH} && python labelImg.py

We can start labelling by choosing image in collected folder. The XML file will created automatically inside each folder. After that we have to move all files from labelled folder to 1 folder. Finally we have to compress that folder as a zip. And now we ready to upload .zip file that we have created before as a dataset. We will upload .zip file to our google drive

#Install Dependencies for TensorFlow Object Detection

We will clone repository from TensorFlow github to use TensorFlow Object Detection [TensorFlow model repository](https://github.com/tensorflow/models)

In [1]:
# Clone the tensorflow models repository from GitHub
!git clone --depth 1 https://github.com/tensorflow/models

Cloning into 'models'...
remote: Enumerating objects: 3843, done.[K
remote: Counting objects: 100% (3843/3843), done.[K
remote: Compressing objects: 100% (2955/2955), done.[K
remote: Total 3843 (delta 1109), reused 1946 (delta 837), pack-reused 0[K
Receiving objects: 100% (3843/3843), 49.59 MiB | 22.29 MiB/s, done.
Resolving deltas: 100% (1109/1109), done.


In [2]:
# Copy setup files into models/research folder
%%bash
cd models/research/
protoc object_detection/protos/*.proto --python_out=.
#cp object_detection/packages/tf2/setup.py .

In [3]:
# Modify setup.py file to install the tf-models-official repository targeted at TF v2.8.0
import re
with open('/content/models/research/object_detection/packages/tf2/setup.py') as f:
    s = f.read()

with open('/content/models/research/setup.py', 'w') as f:
    # Set fine_tune_checkpoint path
    s = re.sub('tf-models-official>=2.5.1',
               'tf-models-official==2.8.0', s)
    f.write(s)

In [4]:
# Install the Object Detection API
!pip install /content/models/research/

# Need to downgrade to TF v2.8.0 due to Colab compatibility bug with TF v2.10 (as of 10/03/22)
!pip install tensorflow==2.8.0

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Processing ./models/research
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting avro-python3 (from object-detection==0.1)
  Downloading avro-python3-1.10.2.tar.gz (38 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting apache-beam (from object-detection==0.1)
  Downloading apache_beam-2.47.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (14.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.3/14.3 MB[0m [31m49.8 MB/s[0m eta [36m0:00:00[0m
Collecting lvis (from object-detection==0.1)
  Downloading lvis-0.5.3-py3-none-any.whl (14 kB)
Collecting tf-models-official==2.8.0 (from object-detection==0.1)
  Downloading tf_models_official-2.8.0-py2.py3-none-any.whl (2.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m59.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tensorflow_io (from obj

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorflow==2.8.0
  Downloading tensorflow-2.8.0-cp310-cp310-manylinux2010_x86_64.whl (497.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m497.6/497.6 MB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
Collecting tf-estimator-nightly==2.8.0.dev2021122109 (from tensorflow==2.8.0)
  Downloading tf_estimator_nightly-2.8.0.dev2021122109-py2.py3-none-any.whl (462 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m462.5/462.5 kB[0m [31m32.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tf-estimator-nightly, tensorflow
  Attempting uninstall: tensorflow
    Found existing installation: tensorflow 2.8.1
    Uninstalling tensorflow-2.8.1:
      Successfully uninstalled tensorflow-2.8.1
Successfully installed tensorflow-2.8.0 tf-estimator-nightly-2.8.0.dev2021122109


In [5]:
# Run Model Bulider Test file, just to verify everything's working properly
!python /content/models/research/object_detection/builders/model_builder_tf2_test.py


caused by: ['/usr/local/lib/python3.10/dist-packages/tensorflow_io/python/ops/libtensorflow_io_plugins.so: undefined symbol: _ZN3tsl5mutexC1Ev']
caused by: ['/usr/local/lib/python3.10/dist-packages/tensorflow_io/python/ops/libtensorflow_io.so: undefined symbol: _ZNK10tensorflow4data11DatasetBase8FinalizeEPNS_15OpKernelContextESt8functionIFN3tsl8StatusOrISt10unique_ptrIS1_NS5_4core15RefCountDeleterEEEEvEE']
2023-05-28 03:31:37.675455: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-05-28 03:31:37.675519: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
Running tests under Python 3.10.11: /usr/bin/python3
[ RUN      ] ModelBuilderTF2Test.test_create_center_net_deepmac
W0528 03:31:38.070764 139760008443712 model_builder.py:1112] Bui

#3. Upload Image Dataset and Prepare Training Data

##3.1. Mounting Google Drive and Copy zip file

In [35]:
%cd ..
from google.colab import drive
drive.mount('/content/gdrive')

/
Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


We will copy our dataset from gdrive

In [36]:
!cp '/content/gdrive/MyDrive/Capstone Project Machine Learning/ECUSTFD.zip' '/content'

##3.2. Split Images Into Train, Validation, and Test Folders

We will unzip the dataset and put the result to folder that we create

In [57]:
%cd /content

/content


In [58]:
!rm -rf images

In [59]:
%cd ..

/


In [60]:
!mkdir /content/images
!mkdir /content/images/train; mkdir /content/images/validation; mkdir /content/images/test

In [61]:
%cd /content

/content


In [62]:
!unzip -q ECUSTFD.zip -d /content/images/all

We will move 80% of the images to the tran folder, 10% to validation and 10% to the test folder. In this code we will use several library \: 

1. The Python Glob module searches all path names looking for files matching a specified pattern according to the rules dictated by the Unix shell.

2. Pathlib is a native Python library for handling files and paths on your operating system. It offers a bunch of path methods and attributes that make handling files more convenient than using the os module.

3. The random module in Python defines a series of functions for generating or manipulating random integers. The import random loads the random module, which contains a number of random number generation-related functions.

4. Python os system function allows us to run a command in the Python script



```
import glob
from pathlib import Path
import random
import os


# Define paths to image folders
image_path = '/content/images/all/ECUSTFD/ImagesWithXml'
train_path = '/content/images/train'
val_path = '/content/images/validation'
test_path = '/content/images/test'

# Get list of all images
jpg_file_list = [path for path in Path(image_path).rglob('*.jpg')]
JPG_file_list = [path for path in Path(image_path).rglob('*.JPG')]
png_file_list = [path for path in Path(image_path).rglob('*.png')]
bmp_file_list = [path for path in Path(image_path).rglob('*.bmp')]

file_list = jpg_file_list + JPG_file_list + png_file_list + bmp_file_list
file_num = len(file_list)
print('Total images: %d' % file_num)

# Determine number of files to move to each folder
train_percent = 0.8  # 80% of the files go to train
val_percent = 0.1    # 10% go to validation
test_percent = 0.1   # 10% go to test
train_num = int(file_num*train_percent)
val_num = int(file_num*val_percent)
test_num = file_num - train_num - val_num
print('Images moving to train: %d' % train_num)
print('Images moving to validation: %d' % val_num)
print('Images moving to test: %d' % test_num)

# Select 80% of files randomly and move them to train folder
for i in range(train_num):
    move_me = random.choice(file_list) #Choose randomly one file from file list
    fn = move_me.name #store file name in fn
    base_fn = move_me.stem #store file name without extension
    parent_path = move_me.parent #get root path from move_me
    xml_fn = base_fn + '.xml' #add extension xml
    os.rename(move_me, train_path+'/'+fn) #move file move_me to train folder
    #move xml from choosen file to train folder
    os.rename(os.path.join(parent_path,xml_fn),os.path.join(train_path,xml_fn))
    file_list.remove(move_me) #remove choosen file from file list to avoid it will choosen again

# Select 10% of remaining files and move them to validation folder
for i in range(val_num):
    move_me = random.choice(file_list)
    fn = move_me.name
    base_fn = move_me.stem
    parent_path = move_me.parent
    xml_fn = base_fn + '.xml'
    os.rename(move_me, val_path+'/'+fn)
    os.rename(os.path.join(parent_path,xml_fn),os.path.join(val_path,xml_fn))
    file_list.remove(move_me)

# Move remaining files to test folder
for i in range(test_num):
    move_me = random.choice(file_list)
    fn = move_me.name
    base_fn = move_me.stem
    parent_path = move_me.parent
    xml_fn = base_fn + '.xml'
    os.rename(move_me, test_path+'/'+fn)
    os.rename(os.path.join(parent_path,xml_fn),os.path.join(test_path,xml_fn))
    file_list.remove(move_me)
```



We already create .py contain code above in our repo. So we will use it by using this code below

In [63]:
!wget https://raw.githubusercontent.com/abriyanyusuf/C23PS423_ML/main/Split_to_TrainValTest.py

--2023-05-28 04:16:22--  https://raw.githubusercontent.com/abriyanyusuf/C23PS423_ML/main/Split_to_TrainValTest.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2591 (2.5K) [text/plain]
Saving to: ‘Split_to_TrainValTest.py’


2023-05-28 04:16:22 (35.8 MB/s) - ‘Split_to_TrainValTest.py’ saved [2591/2591]



We will run the code to split our data

In [64]:
!python Split_to_TrainValTest.py

Total images: 2978
Images moving to train: 2382
Images moving to validation: 297
Images moving to test: 299


##3.3 Create Labelmap and TFRecords

We need to create labelmap for the detector and convert the images into a data file format called TFRecords, which are used by TensorFlow for training. 
What we need to do?
1. Defining a label map for our classes by creating a "labelmap.txt"
2. Convert the data into TFRecord format

In [74]:
##Creates labelmap.txt that contain list of label below
%%bash
cat <<EOF >> /content/labelmap.txt #command to append text inside txt file
apple 
banana 
bread 
bun 
doughnut 
egg 
fired dough twist 
grape 
lemon 
litchi 
mango 
mooncake 
orange 
peach 
pear 
plum 
qiwi 
sachima 
tomato 
EOF



We will use data conversion scripts from the GitHub. By running this code, it will create TFRecord files for the train and validation datasets, as well as a labelmap.pbtxt file which contains the label map in a different format.

In [78]:
# Download data conversion scripts
!wget https://raw.githubusercontent.com/abriyanyusuf/C23PS423_ML/main/Create_CSV_from_VOC.py
!wget https://raw.githubusercontent.com/abriyanyusuf/C23PS423_ML/main/CSV_to_TFRecord_Converter.py

--2023-05-28 04:43:41--  https://raw.githubusercontent.com/abriyanyusuf/C23PS423_ML/main/Create_CSV_from_VOC.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1370 (1.3K) [text/plain]
Saving to: ‘Create_CSV_from_VOC.py’


2023-05-28 04:43:41 (47.2 MB/s) - ‘Create_CSV_from_VOC.py’ saved [1370/1370]

--2023-05-28 04:43:41--  https://raw.githubusercontent.com/abriyanyusuf/C23PS423_ML/main/CSV_to_TFRecord_Converter.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4440 (4.3K) [text/plain]
Saving to: ‘CSV_to_TFRecord_Converter.py’




In [80]:
# Create CSV data files and TFRecord files
!python3 Create_CSV_from_VOC.py


Traceback (most recent call last):
  File "/content/models/Create_CSV_from_VOC.py", line 37, in <module>
    main()
  File "/content/models/Create_CSV_from_VOC.py", line 34, in main
    xml_df.to_csv(('images/' + folder + '_labels.csv'), index=None)
  File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pandas/core/generic.py", line 3720, in to_csv
    return DataFrameRenderer(formatter).to_csv(
  File "/usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/pandas/io/formats/format.py", line 1189, in to_csv
    csv_formatter.save()
  File "/usr/local/lib/python3.10/dist-packages/pandas/io/formats/csvs.py", line 241, in save
    with get_handle(
  File "/usr/local/lib/python3.10/dist-packages/pandas/io/common.py", line 734, in get_handle
    check_par

In [76]:
%cd ..

/content/models


In [77]:
!python3 CSV_to_TFRecord_Converter.py --csv_input=images/train_labels.csv --labelmap=labelmap.txt --image_dir=images/train --output_path=train.tfrecord
!python3 CSV_to_TFRecord_Converter.py --csv_input=images/validation_labels.csv --labelmap=labelmap.txt --image_dir=images/validation --output_path=val.tfrecord

python3: can't open file '/content/models/CSV_to_TFRecord_Converter.py': [Errno 2] No such file or directory
python3: can't open file '/content/models/CSV_to_TFRecord_Converter.py': [Errno 2] No such file or directory


We will store the locations of the TFRecord and labelmap files as variables so we can reference them later in this Colab session

In [69]:
train_record_fname = '/content/train.tfrecord'
val_record_fname = '/content/val.tfrecord'
label_map_pbtxt_fname = '/content/labelmap.pbtxt'

#4. Set Up Training Configuration

We will use pre-trained model from [TensorFlow 2 Object Detection Model Zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md) Each model also comes with a configuration file that points to file locations, sets training parameters (such as learning rate and total number of training steps), and more. We'll modify the configuration file for our custom training job.

1. The first section of code lists out some models availabe in the TF2 Model Zoo and defines some filenames that will be used later to download the model and config file. This makes it easy to manage which model you're using and to add other models to the list later.

2. Set the "chosen_model" variable to match the name of the model you'd like to train with. It's currently set to use the popular "ssd-mobilenet-v2-fpnlite-320" model. Click play on the next block once the chosen model has been set.

In [70]:
# Change the chosen_model variable to deploy different models available in the TF2 object detection zoo
chosen_model = 'ssd-mobilenet-v2-fpnlite-320'

MODELS_CONFIG = {
    'ssd-mobilenet-v2': {
        'model_name': 'ssd_mobilenet_v2_320x320_coco17_tpu-8',
        'base_pipeline_file': 'ssd_mobilenet_v2_320x320_coco17_tpu-8.config',
        'pretrained_checkpoint': 'ssd_mobilenet_v2_320x320_coco17_tpu-8.tar.gz',
    },
    'efficientdet-d0': {
        'model_name': 'efficientdet_d0_coco17_tpu-32',
        'base_pipeline_file': 'ssd_efficientdet_d0_512x512_coco17_tpu-8.config',
        'pretrained_checkpoint': 'efficientdet_d0_coco17_tpu-32.tar.gz',
    },
    'ssd-mobilenet-v2-fpnlite-320': {
        'model_name': 'ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8',
        'base_pipeline_file': 'ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.config',
        'pretrained_checkpoint': 'ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz',
    },

}

model_name = MODELS_CONFIG[chosen_model]['model_name']
pretrained_checkpoint = MODELS_CONFIG[chosen_model]['pretrained_checkpoint']
base_pipeline_file = MODELS_CONFIG[chosen_model]['base_pipeline_file']

Download the pretrained model file and configuration file by clicking Play on the following section.

In [71]:
# Create "mymodel" folder for holding pre-trained weights and configuration files
%mkdir /content/models/mymodel/
%cd /content/models/mymodel/

# Download pre-trained model weights
import tarfile
download_tar = 'http://download.tensorflow.org/models/object_detection/tf2/20200711/' + pretrained_checkpoint
!wget {download_tar}
tar = tarfile.open(pretrained_checkpoint)
tar.extractall()
tar.close()

# Download training configuration file for model
download_config = 'https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/configs/tf2/' + base_pipeline_file
!wget {download_config}

/content/models/mymodel
--2023-05-28 04:24:23--  http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 142.251.8.128, 2404:6800:4008:c15::80
Connecting to download.tensorflow.org (download.tensorflow.org)|142.251.8.128|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20515344 (20M) [application/x-tar]
Saving to: ‘ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz’


2023-05-28 04:24:25 (17.1 MB/s) - ‘ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.tar.gz’ saved [20515344/20515344]

--2023-05-28 04:24:25--  https://raw.githubusercontent.com/tensorflow/models/master/research/object_detection/configs/tf2/ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8.config
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent

###4.1 Modify the configuration file with hyper parameters
We will modify the pre-trained model with some parameter,
1. We will set **number_steps = 1000**
2. We will set **batch_size = 16**

We can learn more about those information regarding training configuration with TensorFlow Object Detction API best practices by access this [article ](https://neptune.ai/blog/tensorflow-object-detection-api-best-practices-to-training-evaluation-deployment)

In [72]:
##Modify hyper parameter of the pre-trained model
num_steps = 1000
batch_size = 16

Now we will set file locations and get number of classes for config file 

In [73]:
pipeline_fname = '/content/models/mymodel/' + base_pipeline_file
fine_tune_checkpoint = '/content/models/mymodel/' + model_name + '/checkpoint/ckpt-0'

def get_num_classes(pbtxt_fname):
    from object_detection.utils import label_map_util
    label_map = label_map_util.load_labelmap(pbtxt_fname)
    categories = label_map_util.convert_label_map_to_categories(
        label_map, max_num_classes=90, use_display_name=True)
    category_index = label_map_util.create_category_index(categories)
    return len(category_index.keys())
num_classes = get_num_classes(label_map_pbtxt_fname)
print('Total classes:', num_classes)

NotFoundError: ignored

Next, we'll rewrite the configuration file to use the training parameters we just specified. The following snippet automatically replaces the required parameters in the downloaded .config file and saves it as our custom pipeline_file.config file.

In [None]:
# Create custom configuration file by writing the dataset, model checkpoint, and training parameters into the base pipeline file
import re

%cd /content/models/mymodel
print('writing custom configuration file')

with open(pipeline_fname) as f:
    s = f.read()
with open('pipeline_file.config', 'w') as f:
    
    # Set fine_tune_checkpoint path
    s = re.sub('fine_tune_checkpoint: ".*?"',
               'fine_tune_checkpoint: "{}"'.format(fine_tune_checkpoint), s)
    
    # Set tfrecord files for train and test datasets
    s = re.sub(
        '(input_path: ".*?)(PATH_TO_BE_CONFIGURED/train)(.*?")', 'input_path: "{}"'.format(train_record_fname), s)
    s = re.sub(
        '(input_path: ".*?)(PATH_TO_BE_CONFIGURED/val)(.*?")', 'input_path: "{}"'.format(val_record_fname), s)

    # Set label_map_path
    s = re.sub(
        'label_map_path: ".*?"', 'label_map_path: "{}"'.format(label_map_pbtxt_fname), s)

    # Set batch_size
    s = re.sub('batch_size: [0-9]+',
               'batch_size: {}'.format(batch_size), s)

    # Set training steps, num_steps
    s = re.sub('num_steps: [0-9]+',
               'num_steps: {}'.format(num_steps), s)
    
    # Set number of classes num_classes
    s = re.sub('num_classes: [0-9]+',
               'num_classes: {}'.format(num_classes), s)

    # Change fine-tune checkpoint type from "classification" to "detection"
    s = re.sub(
        'fine_tune_checkpoint_type: "classification"', 'fine_tune_checkpoint_type: "{}"'.format('detection'), s)
    
    # If using ssd-mobilenet-v2, reduce learning rate (because it's too high in the default config file)
    if chosen_model == 'ssd-mobilenet-v2':
      s = re.sub('learning_rate_base: .8',
                 'learning_rate_base: .08', s)
      
      s = re.sub('warmup_learning_rate: 0.13333',
                 'warmup_learning_rate: .026666', s)
    
    # If using efficientdet-d0, use fixed_shape_resizer instead of keep_aspect_ratio_resizer (because it isn't supported by TFLite)
    if chosen_model == 'efficientdet-d0':
      s = re.sub('keep_aspect_ratio_resizer', 'fixed_shape_resizer', s)
      s = re.sub('pad_to_max_dimension: true', '', s)
      s = re.sub('min_dimension', 'height', s)
      s = re.sub('max_dimension', 'width', s)

    f.write(s)

###4.2. Displaying The Configuration Files Content

In [None]:
!cat /content/models/mymodel/pipeline_file.config

###4.3. Set Location of Config File and Model Output Directory as Variables

In [None]:
pipeline_file = '/content/models/mymodel/pipeline_file.config'
model_dir = '/content/training/'