<a href="https://colab.research.google.com/gist/dusskapark/c59eae6cae2b822bf7d133a2860c4c51/convert_rico_for_object_detection_based_on_clay.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prepare Training Data

[Rico](https://interactionmining.org/rico) is the largest open repository of mobile app designs to date, created to support five classes of data-driven applications: design search, UI layout generation, UI code generation, user interaction modeling, and user perception prediction. 

In this notebook, we will download and rearrange this dataset to fit for TensorFlow Object Detection training based on [CLAY](https://github.com/google-research-datasets/clay) that is used for training and evaluation of the screen layout denoising models


We also recommend reading [my blog post on Medium](https://judepark-6960.medium.com/a-design-system-made-with-tensorflow-js-e8646733d6d3) or [this Colab notebook](https://colab.research.google.com/drive/1fjnVbDnHnGnILuvDTv7zAwUwIMntrC7p?usp=sharing) side by side.


In [1]:
# Cloning CLAY pipeline
!git clone https://github.com/google-research-datasets/clay.git

# Downloading data from RICO dataset
!curl -L "https://storage.googleapis.com/crowdstf-rico-uiuc-4540/rico_dataset_v0.1/unique_uis.tar.gz" > jpg.tar.gz; tar -zxvf jpg.tar.gz; rm jpg.tar.gz



[1;30;43m스트리밍 출력 내용이 길어서 마지막 5000줄이 삭제되었습니다.[0m
combined/60081.jpg
combined/41146.jpg
combined/10989.jpg
combined/47925.jpg
combined/22853.json
combined/18700.jpg
combined/52996.json
combined/60340.json
combined/41482.json
combined/10572.json
combined/1602.jpg
combined/9663.jpg
combined/34295.json
combined/19579.json
combined/30952.json
combined/3370.jpg
combined/12825.json
combined/67964.jpg
combined/70602.json
combined/25768.json
combined/8016.json
combined/30564.jpg
combined/66161.jpg
combined/25907.json
combined/45764.jpg
combined/50674.json
combined/2417.jpg
combined/53387.json
combined/9536.jpg
combined/14048.jpg
combined/1790.json
combined/62235.json
combined/38856.json
combined/38576.json
combined/15460.jpg
combined/55820.json
combined/35294.jpg
combined/33008.jpg
combined/26584.json
combined/45099.json
combined/61596.jpg
combined/25338.json
combined/48111.jpg
combined/48318.json
combined/62283.jpg
combined/40676.json
combined/4817.jpg
combined/18467.jpg
combined/29336.json
c

# Manipulate RICO dataset




## Scale images 

Most of RICO's screenshots are high-resolution images at 1440 × 2560 pixels. If you use these directly, they will use a lot of resources with regards to the GPU and memory within Google Colab's training environment.  

So we'll reduce all images to 640px height JPG. Later, we're going to change the value inside annotation files too.


In [2]:
cd ..

/Users/danil/Documents/github/clay


In [3]:
!mkdir image

!find data/image_orig/. -name "*.jpg" | wc -l
!find data/jsons_with_ui/. -name "*.json" | wc -l

   66261
   66261


In [1]:
from tqdm import tqdm

In [23]:
from PIL import Image, ImageOps
import os

raw_path = './data/image_orig/' # source image path
data_path = './jpg/'  # Resized image path

# Start resize --------------------
## If there is no data_path, create
if not os.path.exists(data_path):
    os.mkdir(data_path)

## Specify a list of all images in the source image path
data_list = os.listdir(raw_path)
print(len(data_list))

# Save all images after resizing
for name in tqdm(data_list):
    im = Image.open(raw_path + name)
    im = im.resize((360, 640))
    im = ImageOps.expand(im, border=(140, 0, 140, 0))
    im.save(data_path + name)
    #print('end ::: ' + data_path + name)


66261


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 66261/66261 [38:37<00:00, 28.60it/s]


## Convert to XML

We'll extract only the necessary information such as such as the bounding box, filename, and component name from the JSON files and convert them to `xml` format.

We will use the dataset selected by CLAY in this process. By using this we can get a model with much better performance than using RICO directly. Please see more details from the links below: 

- CHI'22 Paper: https://t.co/vHXyTmJOsp
- CLAY dataset: https://t.co/EepAt2PMpM

In [4]:
import os
import json
from xml.etree.ElementTree import Element, SubElement
import csv
import sys

sys.setrecursionlimit(10**6)


def beautify(elem, indent=0):
    """
    xml 트리를 문자열로 변환합니다.
    Converts the XML tree to a string.

    :param elem: xml element
    :param indent: Indent Level to display in front
    """
    result0 = f"{'    ' * indent}<{elem.tag}>"
    # 값이 있는 태그면 값을 바로 출력
    # If there is a value in the tag, immediately print the value
    if elem.text is not None:
        result0 += elem.text + f"</{elem.tag}>\n"
    # 값이 없고 자식 노드가 있으면 재귀 호출로 출력합니다.
    # If the tag has no value and has child nodes, call the recursive function.
    else:
        result0 += "\n"
        for _child in elem:
            result0 += beautify(_child, indent + 1)
        result0 += f"{'    ' * indent}</{elem.tag}>\n"
    return result0


def labelinlist(screen_id, name):
    """
    clay_csv에 따라서 node를 분류합니다. 
    Filter nodes according to clay_csv
    """
    global resultdic

    for label in resultdic[screen_id]:
        if(label[0] == name):
            return True
    return False


def replace_label(screen_id, node):
    """
    label의 번호를 리턴합니다. 
    Returns the label number.
    """
    global resultdic
    global labeldict
    for label in resultdic[screen_id]:
        if(label[0] == node):
            label_map = int(label[1])
            return labeldict[label_map]
    return False


def recursive(screen_id, child, result_out):
    """
    원하는 역할을 하기 위해서 재귀호출을 할 수 있는 함수를 생성합니다.
    Create a function that makes a recursive call.
    """
    obj = Element("object")

        # Set bounds
    bounds = child['bounds']

        # Set difficult
    SubElement(obj, "difficult").text = '0'

    # Set bndbox
    bndbox = SubElement(obj, "bndbox")
    xmin = bounds[0]
    ymin = bounds[1]
    xmax = bounds[2]
    ymax = bounds[3]

    SubElement(bndbox, "xmin").text = str(round(xmin/4))
    SubElement(bndbox, "ymin").text = str(round(ymin/4))
    SubElement(bndbox, "xmax").text = str(round(xmax/4))
    SubElement(bndbox, "ymax").text = str(round(ymax/4))

    name = child['pointer']
    if(labelinlist(screen_id, name)):
        label_map = replace_label(screen_id, name)

        # Set name
        SubElement(obj, "name").text = label_map
        result_out.append(beautify(obj))

    # 생성한 object 태그를 문자열로 변환해서 추가합니다.
    # Convert the created object tag to a string and add it.
    if 'children' not in child:
        return

    # 자식 노드가 있는 경우 자식 노드에 대해 재귀 호출을 수행합니다.
    # If there is a child node, make a recursive call to the child node.
    for ch in child.get('children', []):
        if(ch != None):
            recursive(screen_id, ch, result_out)
        else:
            print(screen_id)
            print(ch,sep="\n")
            

def json2xml(screen_id, infile, outfile):
    """
    json2xml function
    param infile :
    ourfile :
    """

    result_out = []
    imgName = infile.replace("./combined/", "")
    imgName = imgName.replace("json", "jpg")

    # Read the file
    with open(infile, "r", encoding="UTF-8") as f:
        data = json.load(f)

    # 자식 노드에 대해 재귀 호출을 수행합니다.
    # Make a recursive call on child nodes.
    for child in data['activity']['root'].get('children', []):
        recursive(screen_id, child, result_out)

    # 그리고 해당 결과를 파일로 저장합니다.
    # And save the result to the XML file.
    with open(outfile, "w", encoding="UTF-8") as f:
        f.write("".join("<annotation><folder />"))
        f.write("".join("<filename>" + imgName + "</filename>\n"
                        "<path>" + imgName + "</path>\n" +
                        "<source><database>RICO</database></source><size><width>360</width><height>640</height><depth>3</depth></size><segmented>0</segmented>"))
        f.write("".join(result_out))
        f.write("".join("</annotation>"))


def readcsv(filename):
    """
    {screen_id:[[name0,id0]...],}
    """
    result = []
    with open(filename, newline='') as csvfile:
        reader = csv.reader(csvfile, delimiter=',', quotechar='|')

        for row in reader:
            result.append(row)

            # print(row)
    del result[0]

    resultdic = {}
    for row in result:
        if row[0] in resultdic.keys():
            resultdic[row[0]].append([row[1], row[2]])
        else:
            resultdic[row[0]] = [[row[1], row[2]]]

    return resultdic


def readtxt(filename):
    """
    [{num}][{label}]
    """
    result = {}
    with open(filename, newline='') as f:
        while True:
            line = f.readline()
            if(not line):
                break
            num, label = line.split(':')
            num = int(num)
            label = label.strip()
            # print(f'[{num}][{label}]')
            result[num] = label
    return result


def main():
    """
    메인 함수
    The main function 
    """
    global resultdic
    global labeldict
    # print(files)

    resultdic = readcsv("./clay/clay_labels.csv")
    labeldict = readtxt('./clay/label_map.txt')

    data_path = './xml/'
    if not os.path.exists(data_path):
        os.mkdir(data_path)

    files = os.listdir('./combined')

    for infile in files:
        screen_id = infile.replace('.json', '').strip()
        selected_screen = resultdic.get(screen_id)
        #print(selected_screen)
        if selected_screen == None:
            print(screen_id+" is not exist!")   
        
        else:
            infile = f'./combined/{infile}'
            outfile = infile.replace("combined", "xml").replace("json", "xml")
            #print(infile, outfile)
            json2xml(screen_id, infile, outfile)


if __name__ == "__main__":
    main()


[1;30;43m스트리밍 출력 내용이 길어서 마지막 5000줄이 삭제되었습니다.[0m
67484 is not exist!
65660 is not exist!
29263 is not exist!
34113 is not exist!
63944 is not exist!
59758 is not exist!
46082 is not exist!
24452 is not exist!
23885 is not exist!
69415 is not exist!
45383 is not exist!
47868 is not exist!
45510 is not exist!
21914 is not exist!
754 is not exist!
30750 is not exist!
71659 is not exist!
69161 is not exist!
16159 is not exist!
8798 is not exist!
3128 is not exist!
22698 is not exist!
7018 is not exist!
10008 is not exist!
60165 is not exist!
45238 is not exist!
25864 is not exist!
60192 is not exist!
69985 is not exist!
32096 is not exist!
41570 is not exist!
11142 is not exist!
63164 is not exist!
41714 is not exist!
13853 is not exist!
64683 is not exist!
58411 is not exist!
51605 is not exist!
2376 is not exist!
28182 is not exist!
9270 is not exist!
16993 is not exist!
34653 is not exist!
26378 is not exist!
56744 is not exist!
44001 is not exist!
40759 is not exist!
16402 is not exis

## Generate Label Map

Next, we should generate the `label_map.pbtxt` based on label_map text file from CLAY. 


In [5]:
import os
import xml.etree.ElementTree as ET

obj = []

for filename in os.listdir("./xml/"):
    # with open(os.path.join("xml", filename), 'r') as f:
    tree = ET.parse(os.path.join("./xml/", filename))
    root = tree.getroot()
    object = root.findall("object")
    name = [x.findtext("name") for x in object]

    for i in name:
        obj.append(i)

obj_unique = list(set(obj))
# print(obj_unique)

pbtxt = ""
for i in range(len(obj_unique)):
    pbtxt += "item {\n    name: \""+obj_unique[i]+"\",\n    id: "+str(i+1)+"\n}\n"+"\n"
    
with open("label_map.pbtxt", "w", encoding="utf-8") as f:
    f.write(pbtxt)

print(pbtxt)

item {
    name: "PAGER_INDICATOR",
    id: 1
}

item {
    name: "NAVIGATION_BAR",
    id: 2
}

item {
    name: "TEXT_INPUT",
    id: 3
}

item {
    name: "MAP",
    id: 4
}

item {
    name: "ADVERTISEMENT",
    id: 5
}

item {
    name: "LIST_ITEM",
    id: 6
}

item {
    name: "RADIO_BUTTON",
    id: 7
}

item {
    name: "DATE_PICKER",
    id: 8
}

item {
    name: "TOOLBAR",
    id: 9
}

item {
    name: "TEXT",
    id: 10
}

item {
    name: "CONTAINER",
    id: 11
}

item {
    name: "IMAGE",
    id: 12
}

item {
    name: "PICTOGRAM",
    id: 13
}

item {
    name: "BUTTON",
    id: 14
}

item {
    name: "SWITCH",
    id: 15
}

item {
    name: "SPINNER",
    id: 16
}

item {
    name: "CHECK_BOX",
    id: 17
}

item {
    name: "NUMBER_STEPPER",
    id: 18
}

item {
    name: "DRAWER",
    id: 19
}

item {
    name: "CARD_VIEW",
    id: 20
}

item {
    name: "PROGRESS_BAR",
    id: 21
}

item {
    name: "BACKGROUND",
    id: 22
}

item {
    name: "SLIDER",
    id: 23
}

## Partition the dataset

Next, We're going to split our dataset into the desired training and testing subsets. Typically, the ratio is 9:1. 90% of the images are used for training and the rest 10% is maintained for testing, but you can chose whatever ratio suits your needs.

[Lyudmil Vladimirov](https://github.com/sglvladi) has published a great code example on [splitting the dataset](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#partition-the-dataset).

In [6]:
import os
import shutil


def readtxt(filename):
    result = []
    with open(filename, newline='') as f:
        while True:
            screen_id = f.readline()
            if(not screen_id):
                break
            num = int(screen_id)
            result.append(num)

    return result


def split(folder):
    data_path = f'./{folder}/'
    if not os.path.exists(data_path):
        os.mkdir(data_path)
    list = readtxt(f'./clay/split_{folder}_id.txt')

    for infile in list:
        outfile = f'{data_path}{infile}.xml'
        outimgfile = f'{data_path}{infile}.jpg'
        
        inimgfile = f'./jpg/{infile}.jpg'
        infile = f'./xml/{infile}.xml'
        
        print(infile, outfile)
        shutil.copyfile(infile, outfile)
        shutil.copyfile(inimgfile, outimgfile)


def main():
    """
    메인 함수
    The main function 
    """
    split('dev')
    split('test')
    split('train')


if __name__ == "__main__":
    main()


[1;30;43m스트리밍 출력 내용이 길어서 마지막 5000줄이 삭제되었습니다.[0m
./xml/67719.xml ./train/67719.xml
./xml/67720.xml ./train/67720.xml
./xml/67721.xml ./train/67721.xml
./xml/67722.xml ./train/67722.xml
./xml/67723.xml ./train/67723.xml
./xml/67724.xml ./train/67724.xml
./xml/67725.xml ./train/67725.xml
./xml/67726.xml ./train/67726.xml
./xml/67727.xml ./train/67727.xml
./xml/67728.xml ./train/67728.xml
./xml/67729.xml ./train/67729.xml
./xml/67731.xml ./train/67731.xml
./xml/67732.xml ./train/67732.xml
./xml/67734.xml ./train/67734.xml
./xml/67735.xml ./train/67735.xml
./xml/67736.xml ./train/67736.xml
./xml/67737.xml ./train/67737.xml
./xml/67739.xml ./train/67739.xml
./xml/67757.xml ./train/67757.xml
./xml/67758.xml ./train/67758.xml
./xml/67759.xml ./train/67759.xml
./xml/67767.xml ./train/67767.xml
./xml/67768.xml ./train/67768.xml
./xml/6778.xml ./train/6778.xml
./xml/67781.xml ./train/67781.xml
./xml/67782.xml ./train/67782.xml
./xml/67783.xml ./train/67783.xml
./xml/67784.xml ./train/67784.xml


In [7]:

!find train/. -name "*.jpg" | wc -l
!find test/. -name "*.jpg" | wc -l
!find dev/. -name "*.jpg" | wc -l

#%rm -rf xml combined image

44629
8719
6207


## Downloading data by mounting Google drive

> Congrats!

Hope you enjoyed this!

In [8]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Convert XML to TFrecord

it is time to convert VOC XML into TFRecord files by using [Lyudmil Vladimirov](https://github.com/sglvladi)'s [code example](https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#create-tensorflow-records). 

However, his `generate_tfrecord.py` file presumes the user is only working with a single class. A slight modification is required. So, I folked his script and use it as below:

In [23]:
%cd /content/

# Download utils
!wget "https://raw.githubusercontent.com/dusskapark/TensorFlowObjectDetectionTutorial/master/docs/source/scripts/generate_tfrecord.py"

# Generate TFrecords 
!python generate_tfrecord.py -x train -l label_map.pbtxt -o train.record
!python generate_tfrecord.py -x test -l label_map.pbtxt -o test.record
!python generate_tfrecord.py -x dev -l label_map.pbtxt -o dev.record


--2022-02-04 13:21:21--  https://raw.githubusercontent.com/dusskapark/TensorFlowObjectDetectionTutorial/master/docs/source/scripts/generate_tfrecord.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6437 (6.3K) [text/plain]
Saving to: ‘generate_tfrecord.py’


2022-02-04 13:21:22 (69.8 MB/s) - ‘generate_tfrecord.py’ saved [6437/6437]

2022-02-04 13:21:25.262013: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Successfully created the TFRecord file: train.record
2022-02-04 13:24:16.102638: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Successfully created the TFRecord file: test.record
20

# Download outputs

> Congrats!

Hope you enjoyed this!


In [None]:
from google.colab import files
files.download("/content/train.record")
files.download("/content/test.record")
files.download("/content/dev.record")
files.download("/content/label_map.pbtxt")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [26]:
!mv dev.record /content/drive/MyDrive
!mv test.record /content/drive/MyDrive
!mv train.record /content/drive/MyDrive
!mv label_map.pbtxt /content/drive/MyDrive

mv: cannot stat 'dev.record': No such file or directory
mv: cannot stat 'test.record': No such file or directory
mv: cannot stat 'train.record': No such file or directory
