<a href="https://colab.research.google.com/github/BeatriceVaienti/dhCityModeler/blob/master/tests/colab_filling_and_mapping.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **PREPARING THE DATASET TO USE DHCITYMODELER**

# **0. REPOSITORY CLONING AND IMPORTS**
Before starting, use this set of cells to clone and install the repository in the Google Colab, and to import the necessary modules.

## 0.0-A Repository cloning while it's private:

In [1]:
# ONLY UNTIL THE REPO IS PRIVATE!
!wget -q https://raw.githubusercontent.com/tsunrise/colab-github/main/colab_github.py
import colab_github
colab_github.github_auth(persistent_key=True)
!git clone git@github.com:BeatriceVaienti/dhCityModeler.git
# We move inside the folder that we just created and install the repo:
!pip install -q setuptools
%cd /content/dhCityModeler
%pip install .
#geopandas can have issues when installed with pip like we did, so we make sure it is correctly installed:
%pip install --force-reinstall geopandas
# after this, click on restart runtime

Mounted at /content/drive/
Looks that a private key is already created. If you have already push it to github, no action required.
 Otherwise, Please go to https://github.com/settings/ssh/new to upload the following key: 
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIHwvmnQNtn+uqNJq0CgCpcLeqEqSXUI5eBxjg61eZSOz root@3d3c5dad2a5a

Please use SSH method to clone repo.
Cloning into 'dhCityModeler'...
remote: Enumerating objects: 192, done.[K
remote: Counting objects: 100% (192/192), done.[K
remote: Compressing objects: 100% (97/97), done.[K
remote: Total 192 (delta 102), reused 176 (delta 92), pack-reused 0[K
Receiving objects: 100% (192/192), 479.22 KiB | 1.88 MiB/s, done.
Resolving deltas: 100% (102/102), done.
/content/dhCityModeler
Processing /content/dhCityModeler
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting geopandas@ git+https://github.com/geopandas/geopandas.git@main (from dhCityModeler==1.0.0)
  Cloning https://github.com/geopandas/geopandas.git (to revision main) to

## 0.0-B Repository cloning once it's public:

In [None]:
# only when the repo is public
! git clone https://github.com/BeatriceVaienti/dhCityModeller.git
! pip install git+https://github.com/BeatriceVaienti/dhCityModeller.git@master

## 0.1 Necessary Imports

In [2]:
import geopandas as gpd
import pandas as pd
import random
import sys
sys.path.append('../')
import modules.predict as predict
import modules.encoder as encoder
import copy
import numpy as np

In [6]:
gdf = gpd.read_file('./dhCityModeler/tests/import/TEST_BEFORE_COMPLETION.geojson')

In [7]:
#before performing the mapping we want to fill incomplete columns
target_columns = ['numberOfFloors_original', 'rooftype_original']
selected_predictors = ['class', 'first_year', 'last_year', 'numberOfFloors_original', 'material.value', 'rooftype_original']
gdf_new = copy.deepcopy(gdf)
for target in target_columns:
    filled_column, is_pred_column = predict.fill_missing_values(target, gdf, selected_predictors)
    #gdf[target] = filled_column
    #put the filled column in the gdf substituting the original one
    gdf_new[target] = filled_column
    gdf_new[target + '_type'] = is_pred_column

In [10]:
# specify in the mapping which columns should change name.
mapping = {
    'type': 'class',
    'time.estimatedStart.timeMoment.year': 'first_year',
    'time.estimatedEnd.timeMoment.year': 'last_year',
    'numberOfFloors.value':'numberOfFloors_original',
    'numberOfFloors.paradata.type':'numberOfFloors_original_type',
    'roof.type.value': 'rooftype_original',
    'roof.type.paradata.type': 'rooftype_original_type'
}
# on the left, the name of the field according to the Historical CityJSON extension. On the right, the name of the field in the input geodata.
# The fields that are already correctly encoded don't need to be inserted in the mapping
# it's not necessary to eliminate non mappable fields, when creating the cityjson they will be ignored

# Read the CSV file with the fields to check
fields_df = pd.read_csv('./dhCityModeler/extension/geojson_mapping.csv')
# Map the GeoDataFrame to HistoricalCityJSON
mapped_gdf = encoder.map_gdf_to_historicalcityjson(gdf_new, fields_df, mapping)

mapped_gdf['roof.type.value'].unique()


array(['gabled', 'flat', 'cupola', 'destroyed building'], dtype=object)

In [11]:
gdf= mapped_gdf
for row in gdf.iterrows():
    if row[1]['roof.type.value'] == 'gabled' or row[1]['roof.type.value'] == 'slanted':
        row[1]['roof.type.value'] = random.choice(['hip','gable'])
        gdf.loc[row[0],'roof.type.value'] = row[1]['roof.type.value']
    if row[1]['numberOfFloors.value'] == '1-3':
        row[1]['numberOfFloors.value'] = random.choice([1,2,3])
        gdf.loc[row[0],'numberOfFloors.value'] = row[1]['numberOfFloors.value']
    if row[1]['numberOfFloors.value'] == '4+':
        row[1]['numberOfFloors.value'] = random.choice([4,5,6])
        gdf.loc[row[0],'numberOfFloors.value'] = row[1]['numberOfFloors.value']
    if row[1]['numberOfFloors.value'] == '6+':
        row[1]['numberOfFloors.value'] = random.choice([6,7,8])
        gdf.loc[row[0],'numberOfFloors.value'] = row[1]['numberOfFloors.value']
    if row[1]['type'] == 'archway':
        row[1]['type'] = 'Archway'
        gdf.loc[row[0],'type'] = row[1]['type']

mapping_type = {'archway':'Archway'}
mapping_rooftype = {
    'destroyed building': '',
    'cupola':'domed'}

In [12]:
df_mapped_values = encoder.map_column_values(gdf, 'type', mapping_type)
df_mapped_values = encoder.map_column_values(df_mapped_values, 'roof.type.value', mapping_rooftype)

In [13]:
df_mapped_values['roof.type.value'].unique()

array(['gable', 'flat', 'hip', 'domed', ''], dtype=object)

In [None]:
#save the dataframe as a geojson
df_mapped_values.to_file('./dhCityModeler/import/TEST_AFTER_COMPLETION.geojson', driver='GeoJSON')