##  Synthetic data generation for image data-set competition

### Author/ML Engineer: Leon Hamnett - [linkedIn](https://www.linkedin.com/in/leon-hamnett/)


### Introduction:

As part of a team of machine learning engineers, I took part in a [datadriven contest](https://https-deeplearning-ai.github.io/data-centric-comp/) organised by Andrew Ng (a well known machine learning teacher and researcher). The aim of this competition was to focus on methods to improve dataset quality as opposed to improving the machine learning model itself. 

During this contest we created a number of different image datasets using such methods as cleaning and relabelling the existing dataset, creating synthetic data and applying a number of different image transforms and augmentations on the images. 

This notebook was used to generate synthetic images similar to the existing images we had within our dataset. It was hoped by adding images in different styles, the model that was trained would be better able to classify the images for each of the ten image classes.

### Setting up the folders:

In [1]:
import os
from os import getcwd
import shutil
import numpy as np

In [2]:
work_dir = getcwd()
print(work_dir)

/home/leon/Documents/datscience practise/datascentric_comp/image_gen


In [3]:
numerals_lower = ['i' , 'ii' ,'iii' , 'iv', 'v', 'vi', 'vii', 'viii', 'ix', 'x']
numerals_upper = ['I', 'II', 'III', 'IV' , 'V' ,'VI' ,'VII','VIII', 'IX' , 'X']
#delete all files if folder already exists, then make new folders
for x in numerals_lower:
    temp_path = os.path.join(work_dir,x)
    shutil.rmtree(temp_path)
    os.mkdir(temp_path)
    
print('folders deleted')

folders deleted


In [4]:
font_dir = '/usr/share/fonts/TTF/'
hand_dir = '/home/leon/Documents/datscience practise/datascentric_comp/fonts_handwritten/'
new_file_paths = []
files = os.listdir('/home/leon/Documents/datscience practise/datascentric_comp/image_gen/ii/')
files_ordered = sorted(files)

### Generating synthetic images:

To generate the synthetic images we use the imagemagick CLI to generate the roman numerals. We use a number of different fonts so that we can generate a varied and robust image dataset. 

In [5]:
#set the filepaths for all of the fonts we wish to use
fonts = ['C059-Italic',
         '/usr/share/fonts/TTF/Baline_Script.ttf',
         '/usr/share/fonts/TTF/Calligraffitti-Regular.ttf',
         '/usr/share/fonts/TTF/chancur.ttf', 
         '/usr/share/fonts/TTF/Dynalight-Regular.ttf',
         '/usr/share/fonts/TTF/MarckScript-Regular.tff', 
        '/usr/share/fonts/TTF/marinto.ttf',
        '/usr/share/fonts/TTF/Pacifico.ttf',
        '/usr/share/fonts/TTF/Rochester-Regular.ttf',
        '/usr/share/fonts/TTF/Tangerine-Regular.ttf',
        '/usr/share/fonts/TTF/This_July.ttf',
        '/usr/share/fonts/TTF/Vervelle-Script.ttf',
        '/usr/share/fonts/TTF/Yellowtail-Regular.ttf',
         #annes fonts
         '/usr/share/fonts/TTF/richard_hamilton_italic.ttf', 
         '/usr/share/fonts/TTF/winkle_regular.ttf', 
         '/usr/share/fonts/TTF/Moon_Flower.ttf', 
         '/usr/share/fonts/TTF/christian_heedlay.ttf', 
         '/usr/share/fonts/TTF/popsicle.ttf', 
         #'/usr/share/fonts/TTF/lovtony_script.ttf', #added back in
         '/usr/share/fonts/TTF/chocolate.ttf', 
         '/usr/share/fonts/TTF/KGMissKindergarten.ttf', 
         #'/usr/share/fonts/TTF/anyer_beach.ttf', #added back in
         #'/usr/share/fonts/TTF/harmony.ttf', #added back in
         '/usr/share/fonts/TTF/richard_hamilton.ttf', 
         '/usr/share/fonts/TTF/My_Unprofessional_Handwriting.ttf', 
         '/usr/share/fonts/TTF/christian_heedlay_italic.ttf', 
         '/usr/share/fonts/TTF/mr_right.ttf', 
         '/usr/share/fonts/TTF/Please_write_me_a_song.ttf',
         #serif
         '/usr/share/fonts/liberation/LiberationSerif-Italic.ttf',
         '/usr/share/fonts/gsfonts/Z003-MediumItalic.otf',
         '/usr/share/fonts/TTF/ROMANUS.otf',
         '/usr/share/fonts/TTF/Roman_SD.ttf',
         '/usr/share/fonts/TTF/Romanicum_Italic.ttf',
         '/usr/share/fonts/TTF/Erie_Roman.ttf',
         '/usr/share/fonts/TTF/spqri.ttf',
         '/usr/share/fonts/TTF/spqr.ttf',
         '/usr/share/fonts/TTF/ROSART__.ttf',
         '/usr/share/fonts/TTF/EMPORO.TTF',
         '/usr/share/fonts/TTF/achilles3superital.ttf',
         '/usr/share/fonts/TTF/achilles3left.ttf'
        ]
fonts_lower = ['C059-Italic',
         '/usr/share/fonts/TTF/Calligraffitti-Regular.ttf',
         '/usr/share/fonts/TTF/MarckScript-Regular.tff', #not working
        '/usr/share/fonts/TTF/marinto.ttf',
        '/usr/share/fonts/TTF/Pacifico.ttf',
        '/usr/share/fonts/TTF/Rochester-Regular.ttf',
        '/usr/share/fonts/TTF/Tangerine-Regular.ttf',
        '/usr/share/fonts/TTF/This_July.ttf',
        '/usr/share/fonts/TTF/Vervelle-Script.ttf',
        '/usr/share/fonts/TTF/Yellowtail-Regular.ttf',
         #annes fonts
         '/usr/share/fonts/TTF/richard_hamilton_italic.ttf', 
         '/usr/share/fonts/TTF/winkle_regular.ttf', 
         '/usr/share/fonts/TTF/Moon_Flower.ttf', 
         '/usr/share/fonts/TTF/christian_heedlay.ttf', 
         '/usr/share/fonts/TTF/popsicle.ttf', 
         '/usr/share/fonts/TTF/lovtony_script.ttf', 
         '/usr/share/fonts/TTF/chocolate.ttf', 
         '/usr/share/fonts/TTF/KGMissKindergarten.ttf', 
         '/usr/share/fonts/TTF/anyer_beach.ttf', 
         '/usr/share/fonts/TTF/harmony.ttf', 
         '/usr/share/fonts/TTF/richard_hamilton.ttf', 
         '/usr/share/fonts/TTF/My_Unprofessional_Handwriting.ttf', 
         '/usr/share/fonts/TTF/christian_heedlay_italic.ttf', 
         '/usr/share/fonts/TTF/mr_right.ttf', 
         '/usr/share/fonts/TTF/Please_write_me_a_song.ttf']
               
fonts_upper = ['/usr/share/fonts/TTF/C059-Italic',
'/usr/share/fonts/TTF/Calligraffitti-Regular.ttf',
'/usr/share/fonts/TTF/KGMissKindergarten.ttf',
'/usr/share/fonts/TTF/Moon_Flower.ttf',
'/usr/share/fonts/TTF/My_Unprofessional_Handwriting.ttf',
'/usr/share/fonts/TTF/Pacifico.ttf',
'/usr/share/fonts/TTF/Please_write_me_a_song.ttf',
'/usr/share/fonts/TTF/Yellowtail-Regular.ttf',
'/usr/share/fonts/TTF/anyer_beach.ttf',
'/usr/share/fonts/TTF/chocolate.ttf',
'/usr/share/fonts/TTF/christian_heedlay.ttf',
'/usr/share/fonts/TTF/christian_heedlay_italic.ttf',
'/usr/share/fonts/TTF/harmony.ttf',
'/usr/share/fonts/TTF/lovtony_script.ttf',
'/usr/share/fonts/TTF/marinto.ttf',
'/usr/share/fonts/TTF/mr_right.ttf',
'/usr/share/fonts/TTF/popsicle.ttf',
'/usr/share/fonts/TTF/richard_hamilton.ttf',
'/usr/share/fonts/TTF/richard_hamilton_italic.ttf',
'/usr/share/fonts/TTF/winkle_regular.ttf']


len(fonts) * ((len(numerals_upper)+len(numerals_lower)))

740

### For separate lower/upper fonts:

In [6]:
#loop through numerals lower
for numeral in numerals_lower: #loop through numerals
    for elem in range(len(fonts_lower)): #loop through available fonts
        for num in range(1,10): #create 10 images for each loop
            temp_path = os.path.join(numeral,str(fonts_lower[elem].split('/')[-1] +'_lower'+str(num))) #create file path/image name
            temp_filename = temp_path + '.png'
            #create the string command to enter into the terminal shell
            term_str = 'convert -background white  -fill black -font {} -pointsize 72  -size 200x200  -gravity Center caption:{} {}'.format('"'+fonts_lower[elem]+'"',"'"+numeral+"'",temp_filename) 
            os.system(term_str) #run command in terminal to create the image
    print('numeral {} finished'.format(numeral))

#loop through numerals upper
for numeral in range(len(numerals_upper)):
#     print('numeral :',numeral)
    for elem in range(len(fonts_upper)):
        for num in range(1,2):
            temp_path = os.path.join(numerals_lower[numeral],str(fonts_upper[elem].split('/')[-1]+'_upper'+str(num)))
            temp_filename = temp_path + '.png'
            term_str = 'convert -background white  -fill black -font {} -pointsize 72  -size 200x200  -gravity Center caption:{} {}'.format('"'+fonts_upper[elem]+'"',"'"+numerals_upper[numeral]+"'",temp_filename)
            os.system(term_str)
    print('numeral {} finished'.format(numerals_upper[numeral]))
     
        

numeral i finished
numeral ii finished
numeral iii finished
numeral iv finished
numeral v finished
numeral vi finished
numeral vii finished
numeral viii finished
numeral ix finished
numeral x finished
numeral I finished
numeral II finished
numeral III finished
numeral IV finished
numeral V finished
numeral VI finished
numeral VII finished
numeral VIII finished
numeral IX finished
numeral X finished


### For all fonts both upper and lower:

In [7]:
#set gravity list to set text position within generated image
gravity_list = ['NorthWest', 'North', 'NorthEast', 'West', 'Center', 'East', 'SouthWest', 'South', 'SouthEast']

#loop through numerals lower
for numeral in numerals_lower: #loop through numerals
    for elem in range(len(fonts)): #loop through available fonts
        for num in range(1,2): #change back to 10 #create 10 images for each loop
            temp_path = os.path.join(numeral,str(fonts[elem].split('/')[-1] +'_lower'+str(num))) #create file path/image name
            temp_filename = temp_path + '.png'
            #create the command to enter into the terminal shell
            s_font = '"'+fonts[elem]+'"'
            s_gravity = 'Center' #np.random.choice(gravity_list)
            s_num = "'"+numeral+"'"
            term_str = 'convert -background white  -fill black -font {} -pointsize 28  -size 64x64  -gravity {} caption:{} {}'.format(s_font,s_gravity,s_num,temp_filename) 
            os.system(term_str) #run command in terminal to create the image
    print('numeral {} finished'.format(numeral))

#loop through numerals upper
for numeral in range(len(numerals_upper)):
#     print('numeral :',numeral)
    for elem in range(len(fonts)):
        for num in range(1,11):
            temp_path = os.path.join(numerals_lower[numeral],str(fonts[elem].split('/')[-1]+'_upper'+str(num)))
            temp_filename = temp_path + '.png'
            #create strings
            s_font = '"'+fonts[elem]+'"'
            s_gravity = 'Center' #np.random.choice(gravity_list)
            s_num = "'"+numerals_upper[numeral]+"'"
            term_str = 'convert -background white  -fill black -font {} -pointsize 28  -size 64x64  -gravity {} caption:{} {}'.format(s_font,s_gravity,s_num,temp_filename)
            os.system(term_str)
    print('numeral {} finished'.format(numerals_upper[numeral]))

numeral i finished
numeral ii finished
numeral iii finished
numeral iv finished
numeral v finished
numeral vi finished
numeral vii finished
numeral viii finished
numeral ix finished
numeral x finished
numeral I finished
numeral II finished
numeral III finished
numeral IV finished
numeral V finished
numeral VI finished
numeral VII finished
numeral VIII finished
numeral IX finished
numeral X finished


In [8]:
### generate individual terminal commands for testings
test_coms = []
for font_type in fonts:
    term_str = '!convert -background white  -fill black -font {} -pointsize 16  -size 32x32  -gravity Center caption:{} {}'.format('"'+font_type+'"',"'"+"VII"+"'",'testing.png')
    print(term_str)

!convert -background white  -fill black -font "C059-Italic" -pointsize 16  -size 32x32  -gravity Center caption:'VII' testing.png
!convert -background white  -fill black -font "/usr/share/fonts/TTF/Baline_Script.ttf" -pointsize 16  -size 32x32  -gravity Center caption:'VII' testing.png
!convert -background white  -fill black -font "/usr/share/fonts/TTF/Calligraffitti-Regular.ttf" -pointsize 16  -size 32x32  -gravity Center caption:'VII' testing.png
!convert -background white  -fill black -font "/usr/share/fonts/TTF/chancur.ttf" -pointsize 16  -size 32x32  -gravity Center caption:'VII' testing.png
!convert -background white  -fill black -font "/usr/share/fonts/TTF/Dynalight-Regular.ttf" -pointsize 16  -size 32x32  -gravity Center caption:'VII' testing.png
!convert -background white  -fill black -font "/usr/share/fonts/TTF/MarckScript-Regular.tff" -pointsize 16  -size 32x32  -gravity Center caption:'VII' testing.png
!convert -background white  -fill black -font "/usr/share/fonts/TTF/mari

In [9]:
#test specific command
!convert -background white  -fill black -font "/usr/share/fonts/TTF/ROMANUS.otf" -pointsize 32  -size 64x64  -gravity West caption:'IV' testing.png