 # PyPRT - Dataset Collection

This notebook presents a way of collecting data from CGA reports. To do so, we do repeated generations of an initial shape with varying input attributes. Finally, some simple numerical processing is applied on the collected dataset.

In [19]:
import sys
import os

import pyprt
from pyprt.pyprt_utils import visualize_prt_results

import pandas as pd

In [20]:
CS_FOLDER = os.getcwd()


def asset_file(filename):
    return os.path.join(CS_FOLDER, 'data', filename)

#### PRT Initialization

In [21]:
print("\nInitializing PRT.")
pyprt.initialize_prt()

if not pyprt.is_prt_initialized():
    raise Exception("PRT is not initialized")


Initializing PRT.


In [22]:
v = [0, 0, 0,  0, 0, 20,  10, 0, 10,  10, 0, 0]
initial_shape1 = pyprt.InitialShape(v)

rpk = asset_file("extrusion_rule.rpk")
attrs = {'ruleFile': 'bin/extrusion_rule.cgb',
         'startRule': 'Default$Footprint'}

mod = pyprt.ModelGenerator([initialGeometry])
generated_model = mod.generate_model(
    [attrs], rpk, 'com.esri.pyprt.PyEncoder', {})

visualize_prt_results(generated_model)

[PRT] [2020-02-11 10:05:58] [info] using rule package C:\Users\cami9495\Documents\esri-cityengine-sdk-master\examples\py4prt\examples\data\extrusion_rule.rpk

[PRT] [2020-02-11 10:05:58] [error] getting resolve map from 'C:\Users\cami9495\Documents\esri-cityengine-sdk-master\examples\py4prt\examples\data\extrusion_rule.rpk' failed, aborting.

Number of generated geometries (= nber of initial shapes):
0


#### RPK to gather values in report

In [5]:
def get_sum_report(model):
    sum_rep = {}
    all_rep = model.get_report()
    for it in all_rep:
        if "_sum" in it:
            sum_rep[it] = all_rep[it]
    return sum_rep

In [8]:
initial_shape2 = pyprt.InitialShape(
    [0, 0, 0,  0, 0, 10,  10, 0, 10,  10, 0, 0])
initial_shape3 = pyprt.InitialShape(
    [0, 0, 0,  0, 0, 30,  10, 0, 10,  10, 0, 0])

In [18]:
reports = []
model_to_generate = pyprt.ModelGenerator(
    [initial_shape1, initial_shape2, initial_shape3])
model_test = model_to_generate.generate_model(
    [attrs], rpk, 'com.esri.pyprt.PyEncoder', {'emitGeometry': False})

for val in range(0, 10):
    attrs['minBuildingHeight'] = float(val)
    models = model_to_generate.generate_model([attrs])

    for model in models:
        if model:
            reports.append(get_sum_report(model))

#### Transform report in pandas dataframe in order to process the dataset before training any ML/DL model on it:

In [14]:
reports_df = pd.DataFrame(reports)
reports_df

Unnamed: 0,Bool value_sum,Building Height.0_sum,Id_sum,Max Height.0_sum,Min Height.0_sum,Parcel Area.0_sum,Value_sum,Text_sum
0,True,11.21629,0.0,30.0,10.0,149.999993,1.0,salut
1,True,11.21629,0.0,30.0,10.0,99.999998,1.0,salut
2,True,11.21629,0.0,30.0,10.0,199.999993,1.0,salut
3,True,11.21629,0.0,30.0,10.0,149.999993,1.0,salut
4,True,11.21629,0.0,30.0,10.0,99.999998,1.0,salut
5,True,11.21629,0.0,30.0,10.0,199.999993,1.0,salut
6,True,11.21629,0.0,30.0,10.0,149.999993,1.0,salut
7,True,11.21629,0.0,30.0,10.0,99.999998,1.0,salut
8,True,11.21629,0.0,30.0,10.0,199.999993,1.0,salut
9,True,11.21629,0.0,30.0,10.0,149.999993,1.0,salut


In [15]:
dataset_uniqueRows = reports_df.drop_duplicates()

In [16]:
dataset_uniqueRows

Unnamed: 0,Bool value_sum,Building Height.0_sum,Id_sum,Max Height.0_sum,Min Height.0_sum,Parcel Area.0_sum,Value_sum,Text_sum
0,True,11.21629,0.0,30.0,10.0,149.999993,1.0,salut
1,True,11.21629,0.0,30.0,10.0,99.999998,1.0,salut
2,True,11.21629,0.0,30.0,10.0,199.999993,1.0,salut


The next steps in a ML/DL application would be to split the dataset into a training and a testing set. Finally, the idea would be to train an algorithm on the training set.

In [17]:
print("\nShutdown PRT.")
pyprt.shutdown_prt()


Shutdown PRT.
