 # PyPRT - Dataset Collection

This notebook presents a way of collecting data from CGA reports. To do so, we do repeated generations of an initial shape with varying input attributes. Finally, some simple numerical processing is applied on the collected dataset.

In [1]:
import sys
import os

import pyprt
from pyprt.pyprt_utils import visualize_prt_results

import pandas as pd

In [2]:
CS_FOLDER = os.getcwd()


def asset_file(filename):
    return os.path.join(CS_FOLDER, 'data', filename)

#### PRT Initialization

In [3]:
print("\nInitializing PRT.")
pyprt.initialize_prt()

if not pyprt.is_prt_initialized():
    raise Exception("PRT is not initialized")


Initializing PRT.
[PRT] [2020-02-20 11:53:14] [error] failed to load library: error while loading library 'c:\users\cami9495\documents\esri-cityengine-sdk-master\examples\py4prt\pyprt\pyprt\lib\com.esri.prt.unreal.dll': The specified module could not be found.

[PRT] [2020-02-20 11:53:14] [error] failed to load library: error while loading library 'c:\users\cami9495\documents\esri-cityengine-sdk-master\examples\py4prt\pyprt\pyprt\lib\DatasmithSDK.dll': The specified module could not be found.



In [4]:
v = [0, 0, 0,  0, 0, 20,  10, 0, 10,  10, 0, 0]
initial_shape1 = pyprt.InitialShape(v)

rpk = asset_file("extrusion_rule.rpk")
attrs = {'ruleFile': 'bin/extrusion_rule.cgb',
         'startRule': 'Default$Footprint'}

mod = pyprt.ModelGenerator([initial_shape1])
generated_model = mod.generate_model(
    [attrs], rpk, 'com.esri.pyprt.PyEncoder', {})

visualize_prt_results(generated_model)


Number of generated geometries (= nber of initial shapes):
1

Initial Shape Index: 0

Size of the model vertices vector: 24
Number of model vertices: 8
Size of the model faces vector: 6

Report of the generated model:
{'Bool value_sum': True, 'Bool value_avg': True, 'Bool value_min': True, 'Bool value_max': True, 'Bool value_n': 1.0, 'Building Height.0_n': 1.0, 'Id_n': 1.0, 'Max Height.0_n': 1.0, 'Min Height.0_n': 1.0, 'Parcel Area.0_n': 1.0, 'Text_n': 1.0, 'Value_n': 1.0, 'Building Height.0_sum': 11.216289589885635, 'Building Height.0_avg': 11.216289589885635, 'Id_sum': 0.0, 'Id_avg': 0.0, 'Max Height.0_sum': 30.0, 'Max Height.0_avg': 30.0, 'Min Height.0_sum': 10.0, 'Min Height.0_avg': 10.0, 'Parcel Area.0_sum': 149.99999304605885, 'Parcel Area.0_avg': 149.99999304605885, 'Value_sum': 1.0, 'Value_avg': 1.0, 'Building Height.0_min': 11.216289589885635, 'Building Height.0_max': 11.216289589885635, 'Id_min': 0.0, 'Id_max': 0.0, 'Max Height.0_min': 30.0, 'Max Height.0_max': 30.0, 'Min He

#### RPK to gather values in report

In [5]:
def get_sum_report(model):
    sum_rep = {}
    all_rep = model.get_report()
    for it in all_rep:
        if "_sum" in it:
            sum_rep[it] = all_rep[it]
    return sum_rep

In [6]:
initial_shape2 = pyprt.InitialShape(
    [0, 0, 0,  0, 0, 10,  10, 0, 10,  10, 0, 0])
initial_shape3 = pyprt.InitialShape(
    [0, 0, 0,  0, 0, 30,  10, 0, 10,  10, 0, 0])

In [7]:
reports = []
model_to_generate = pyprt.ModelGenerator(
    [initial_shape1, initial_shape2, initial_shape3])
model_test = model_to_generate.generate_model(
    [attrs], rpk, 'com.esri.pyprt.PyEncoder', {'emitGeometry': False})

for val in range(0, 10):
    attrs['minBuildingHeight'] = float(val)
    models = model_to_generate.generate_model([attrs])

    for model in models:
        if model:
            reports.append(get_sum_report(model))

#### Transform report in pandas dataframe in order to process the dataset before training any ML/DL model on it:

In [8]:
reports_df = pd.DataFrame(reports)
reports_df

Unnamed: 0,Bool value_sum,Building Height.0_sum,Id_sum,Max Height.0_sum,Min Height.0_sum,Parcel Area.0_sum,Value_sum,Text_sum
0,True,1.824434,0.0,30.0,0.0,149.999993,1.0,salut
1,True,1.824434,0.0,30.0,0.0,99.999998,1.0,salut
2,True,1.824434,0.0,30.0,0.0,199.999993,1.0,salut
3,True,2.76362,0.0,30.0,1.0,149.999993,1.0,salut
4,True,2.76362,0.0,30.0,1.0,99.999998,1.0,salut
5,True,2.76362,0.0,30.0,1.0,199.999993,1.0,salut
6,True,3.702806,0.0,30.0,2.0,149.999993,1.0,salut
7,True,3.702806,0.0,30.0,2.0,99.999998,1.0,salut
8,True,3.702806,0.0,30.0,2.0,199.999993,1.0,salut
9,True,4.641991,0.0,30.0,3.0,149.999993,1.0,salut


In [9]:
dataset_uniqueRows = reports_df.drop_duplicates()

In [10]:
dataset_uniqueRows

Unnamed: 0,Bool value_sum,Building Height.0_sum,Id_sum,Max Height.0_sum,Min Height.0_sum,Parcel Area.0_sum,Value_sum,Text_sum
0,True,1.824434,0.0,30.0,0.0,149.999993,1.0,salut
1,True,1.824434,0.0,30.0,0.0,99.999998,1.0,salut
2,True,1.824434,0.0,30.0,0.0,199.999993,1.0,salut
3,True,2.76362,0.0,30.0,1.0,149.999993,1.0,salut
4,True,2.76362,0.0,30.0,1.0,99.999998,1.0,salut
5,True,2.76362,0.0,30.0,1.0,199.999993,1.0,salut
6,True,3.702806,0.0,30.0,2.0,149.999993,1.0,salut
7,True,3.702806,0.0,30.0,2.0,99.999998,1.0,salut
8,True,3.702806,0.0,30.0,2.0,199.999993,1.0,salut
9,True,4.641991,0.0,30.0,3.0,149.999993,1.0,salut


The next steps in a ML/DL application would be to split the dataset into a training and a testing set. Finally, the idea would be to train an algorithm on the training set.

In [11]:
print("\nShutdown PRT.")
pyprt.shutdown_prt()


Shutdown PRT.
