Copyright (c) 2012-2024 Esri R&D Center Zurich

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

  https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
A copy of the license is available in the repository's LICENSE file.

 # PyPRT - Dataset Collection

This notebook presents a way of collecting data from CGA reports. To do so, we do repeated generations of an initial shape with varying input attributes. Finally, some simple numerical processing is applied on the collected dataset.

In [1]:
import sys
import os

import pyprt
from pyprt.pyprt_utils import visualize_prt_results

import pandas as pd

#### PRT Initialization

In [2]:
CS_FOLDER = os.getcwd()

def asset_file(filename):
    return os.path.join(CS_FOLDER, 'data', filename)

In [3]:
initial_shape1 = pyprt.InitialShape(
    [0, 0, 0,  10, 0, 0,  10, 0, 10,  0, 0, 20])

rpk = asset_file("extrusion_rule.rpk")
attrs = {}
encoder = 'com.esri.pyprt.PyEncoder'

mod = pyprt.ModelGenerator([initial_shape1])
generated_model = mod.generate_model(
    [attrs], rpk, encoder, {})

visualize_prt_results(generated_model)


Number of generated geometries (= nber of initial shapes):
1

Initial Shape Index: 0

Size of the model vertices vector: 24
Number of model vertices: 8
Size of the model faces vector: 6

Report of the generated model:
{'Bool value_sum': True, 'Bool value_avg': True, 'Bool value_min': True, 'Bool value_max': True, 'Bool value_n': 1.0, 'Building Height.0_n': 1.0, 'Id_n': 1.0, 'Max Height.0_n': 1.0, 'Min Height.0_n': 1.0, 'Parcel Area.0_n': 1.0, 'Text_n': 1.0, 'Value_n': 1.0, 'Building Height.0_sum': 15.456175208091736, 'Building Height.0_avg': 15.456175208091736, 'Id_sum': 0.0, 'Id_avg': 0.0, 'Max Height.0_sum': 30.0, 'Max Height.0_avg': 30.0, 'Min Height.0_sum': 10.0, 'Min Height.0_avg': 10.0, 'Parcel Area.0_sum': 150.0, 'Parcel Area.0_avg': 150.0, 'Value_sum': 1.0, 'Value_avg': 1.0, 'Building Height.0_min': 15.456175208091736, 'Building Height.0_max': 15.456175208091736, 'Id_min': 0.0, 'Id_max': 0.0, 'Max Height.0_min': 30.0, 'Max Height.0_max': 30.0, 'Min Height.0_min': 10.0, 'Min He

#### Gather values from generated models report

In [4]:
def get_sum_report(model):
    sum_rep = {}
    all_rep = model.get_report()
    for it in all_rep:
        if "_sum" in it:
            sum_rep[it] = all_rep[it]
    return sum_rep

In [5]:
initial_shape2 = pyprt.InitialShape(
    [0, 0, 0,  10, 0, 0,  10, 0, 10,  0, 0, 10])
initial_shape3 = pyprt.InitialShape(
    [0, 0, 0,  10, 0, 0,  10, 0, 10,  0, 0, 30])

In [6]:
reports = []
model_to_generate = pyprt.ModelGenerator(
    [initial_shape1, initial_shape2, initial_shape3])

for val in range(0, 10):
    attrs['minBuildingHeight'] = float(val)
    models = model_to_generate.generate_model([attrs], rpk, encoder, {'emitGeometry': False})

    for model in models:
        if model:
            reports.append(get_sum_report(model))

#### Transform report in pandas dataframe for future dataset processing

In [7]:
reports_df = pd.DataFrame(reports)
reports_df

Unnamed: 0,Bool value_sum,Building Height.0_sum,Id_sum,Max Height.0_sum,Min Height.0_sum,Parcel Area.0_sum,Value_sum,Text_sum
0,True,8.184263,0.0,30.0,0.0,150.0,1.0,salut
1,True,8.184263,0.0,30.0,0.0,100.0,1.0,salut
2,True,8.184263,0.0,30.0,0.0,200.0,1.0,salut
3,True,8.911454,0.0,30.0,1.0,150.0,1.0,salut
4,True,8.911454,0.0,30.0,1.0,100.0,1.0,salut
5,True,8.911454,0.0,30.0,1.0,200.0,1.0,salut
6,True,9.638645,0.0,30.0,2.0,150.0,1.0,salut
7,True,9.638645,0.0,30.0,2.0,100.0,1.0,salut
8,True,9.638645,0.0,30.0,2.0,200.0,1.0,salut
9,True,10.365837,0.0,30.0,3.0,150.0,1.0,salut


The next steps in a ML/DL application would be to split the dataset into a training and a testing set. Finally, the idea would be to train an algorithm on the training set.