## Create and Use Dataset Object

Any analysis in pyIncore, by default uses **Dataset Object** as input. This
tutorial introduces users to the basic concept of creating and using **Dataset Object** via either loading from local
files, or connecting to remote IN-CORE Data Services.

In [None]:
import pandas as pd
from pyincore import IncoreClient, DataService, SpaceService, Dataset, FragilityService, MappingSet
from pyincore.analyses.buildingdamage import BuildingDamage
from pyincore.analyses.meandamage import MeanDamage

In [None]:
client = IncoreClient()
data_services = DataService(client)
space_services = SpaceService(client)

### Upload Dataset to Data Services

#### Write Metadata

- **Metadata** is a string describing the dataset. 
- **dataType** needs to be align with the analyses in pyincore.
- **format** is the file format of the dataset. Currently we support "shapefile", "table", "Network", "textFiles
", "raster", "geotiff" and etc. Please consult with development team if you intend to post a new format.

In [None]:
# note you have to put the correct dataType as well as format
dataset_metadata = {
    "title":"Tutorial Test ERGO Memphis Hospitals",
    "description": "ERGO Memphis Hospitals",
    "dataType": "ergo:buildingInventoryVer5",
    "format": "shapefile"
}

#### Upload metadata

After upload metadata the “placeholder” dataset object has been created on INCORE service with the id which does not have files attached to it yet. However it is already possible to see the empty dataset on the service by searching that particular id.

In [None]:
created_dataset = data_services.create_dataset(dataset_metadata)
dataset_id = created_dataset['id']
print('dataset is created with id ' + dataset_id)

#### Attach files to the dataset created

Using the dataset id we attach the files that contain the data for the dataset.

In [None]:
files = ['files/all_bldgs_ver5_WGS1984.shp',
         'files/all_bldgs_ver5_WGS1984.shx',
         'files/all_bldgs_ver5_WGS1984.prj',
         'files/all_bldgs_ver5_WGS1984.dbf']
full_dataset = data_services.add_files_to_dataset(dataset_id, files)

In [None]:
full_dataset

### Moving your dataset to INCORE space

If you would like other people to access your data, you can move your dataset to a certain space. Otherwise it wil
 be in your own space and not public accessible.

In [None]:
# for example, adding to incore space
response = space_services.add_dataset_to_space("5df8fd18b9219c068fb0257f", dataset_id)

### 1. Load Dataset from Data services

In [None]:
building_dataset_id = "5a284f0bc7d30d13bc081a28"
buildings = Dataset.from_data_service(building_dataset_id, data_services)
buildings

### 2. Load Dataset from local files

- Note you have to make sure you pass the right **data_type** when constructing Dataset Object from scratch
- To look up what **data_type** it should be, please refer to the **source code** of the analyses
- You want to look take a look at the **spec** section -> **input_datasets** -> **type**

In [None]:
buildings = Dataset.from_file("files/all_bldgs_ver5_WGS1984.shp", data_type="ergo:buildingInventoryVer5")
buildings

### 3. Input the Dataset object in analyses

In [None]:
# for example: Building Damage Analyses
bldg_dmg = BuildingDamage(client)
bldg_dmg.set_input_dataset("buildings", buildings)  

In [None]:
# Memphis Earthquake damage
# New madrid earthquake using Atkinson Boore 1995
hazard_type = "earthquake"
hazard_id = "5b902cb273c3371e1236b36b"

# Earthquake mapping
mapping_id = "5b47b350337d4a3629076f2c"
fragility_service = FragilityService(client)
mapping_set = MappingSet(fragility_service.get_mapping(mapping_id))
bldg_dmg.set_input_dataset('dfr3_mapping_set', mapping_set)

result_name = "memphis_eq_bldg_dmg_result"
bldg_dmg.set_parameter("result_name", result_name)
bldg_dmg.set_parameter("hazard_type", hazard_type)
bldg_dmg.set_parameter("hazard_id", hazard_id)
bldg_dmg.set_parameter("num_cpu", 4)

# Run Analysis
bldg_dmg.run_analysis()

### 4. Chaining the output Dataset object in subsequent analyses
Output is a dataset object as well, here is how to display

In [None]:
print("output datasets:", bldg_dmg.get_output_datasets())
bldg_dmg.get_output_dataset('ds_result').get_dataframe_from_csv().head()

### Chaining with Mean damage analysis

In [None]:
md = MeanDamage(client)

# use the output of road damage
building_damage_output = bldg_dmg.get_output_dataset('ds_result')
md.set_input_dataset("damage", building_damage_output)

md.load_remote_input_dataset("dmg_ratios", "5a284f2ec7d30d13bc08209a")
md.set_parameter("result_name", "building_mean_damage")
md.set_parameter("damage_interval_keys", ["DS_0", "DS_1", "DS_2", "DS_3"])
md.set_parameter("num_cpu", 1)

# Run analysis
md.run_analysis()

In [None]:
print("output datasets:", md.get_output_datasets())
md.get_output_dataset('result').get_dataframe_from_csv().head()[['meandamage', 'mdamagedev']]

### Utility methods

In [None]:
# e.g. read the shapefile properties
rd = buildings.get_inventory_reader()
for row in rd:
    print('year built:', row['properties']['year_built'])