# Model Validation Instructions

This notebook serves as a more thorough introduction to pyCIMS model validation functionality. For this notebook to run properly, ensure:
* pyCIMS is downloaded and installed on your local machine according to the [installation instructions](../../docs/Installation.md)
* The `pyCIMS_env` conda environment has been activated according the [conda environment instructions](../../docs/WorkingWithCondaEnvironment.md)

For a more general of pyCIMS, please see the [Quickstart](Quickstart.ipynb) tutorial. 

## Import pyCIMS  & other packages

In [1]:
import pyCIMS
import pprint as pp

Now that we have loaded `pyCIMS`, we can use the `ModelValidator`. First we will instantiate the `ModelValidator` class. To instantiate the class we must provide the location of the excel file specifying the model description.

Optionally you can also provide a `node_col` parameter. This tells the model validator what the name of column specifying Node names. In the current model description (2020-09-17) this column is `"Node"`. If not provided with a value, this parameter defaults to `"Node"`. 

## Initialize the `ModelValidator`

In [2]:
model_description_file = '../../pyCIMS_model_description_Alberta_Validated.xlsb'

model_validator = pyCIMS.ModelValidator(model_description_file)

Next, using our `model_validator` we will use the `validate()` method to check for any errors in our model description. There are a couple of paramters for this method. I'll explain them below: 
* **`verbose`** : Determines whether the method will use print statements to notify of any problems identified in the model description. Here we have set verbose to be True so that we will see printed statements letting us know about the errors. . 

* **`raise_warnings`** : Determines whether the method will raise warnings when it identifies problems in the model description. Warnings are more "in your face" than print statements, appearing in red for the user. However, warnings do go away if you run the cell multiple times. Here, we have set raise_warnings to False. We will just look at the printed statements and the resulting dictionary (next cell)


## Validate the model

In [3]:
model_validator.validate(verbose=True, raise_warnings=False)

0 node name/branch mismatches. 
0 references to unspecified nodes. 
0 non-root nodes are never referenced. 
0 nodes were specified but don't provide a service. 
0 nodes had invalid competition types. 


## Investigate the Warnings

Regardless of whether you use the `verbose` or `raise_warnings` options in the `validate()` method call, any problems identified can be accessed through the `ModelValidator.warnings` attribute.

In [4]:
pp.pprint(model_validator.warnings)

{'nodes_no_requested_service': [(44391, 'Aviation Turbo Fuel'),
                                (44399, 'Black Liquor'),
                                (44407, 'Byproduct Gas'),
                                (44415, 'Coal Sub Bituminous Western'),
                                (44423, 'Coke'),
                                (44431, 'Diesel'),
                                (44439, 'Diesel Marine'),
                                (44447, 'Diesel Rail'),
                                (44455, 'Diesel Road'),
                                (44463, 'Gasoline'),
                                (44471, 'Geothermal'),
                                (44479, 'Heavy Fuel Oil'),
                                (44487, 'Hog Fuel'),
                                (44503, 'Light Fuel Oil'),
                                (44519, 'Petroleum Coke'),
                                (44527, 'Propane'),
                                (44535, 'Propane Solvent'),
                             

Ideally the code above returned an empty dictionary. If not, the examples below should help explain what the `warnings` dictionary might contain. 

First off, the `warnings` dictionary can contain up to 7 keys (as of September 2020). These 7 keys are:  
* [`mismatched_node_names`](#mismatch)
* [`unspecified_nodes`](#unspecified)
* [`unreferenced_nodes`](#unreferenced)
* [`nodes_no_provided_service`](#no_provided_services)
* [`nodes_no_requested_service`](#no_requested_services)
* [`invalid_competition_type`](#comp)
* [`nodes_requesting_self`](#self)

See the sections below for more information on what each of these keys mean. 

### Node Name & Node Branch Mismatch <a id="mismatch"></a>
This indicates a node where the node's name and the last element in the node's branch do not match. This is usually the result of a simple typo related to capitalization, white space, or extra characters.  
```
'mismatched_node_names': [(16, 'Albertas', 'Alberta'), 
                          (16, 'Space heating', 'Space Heating')]
```

Each list item indicates a mismatched node and branch name. The tuple contains (1) the row in the Excel file where the mismatch has occurred, (2) the name given to the node in the "Node" column, and (3) the name of the node according to the "Service provided" branch structure.

### Unspecified Nodes <a id="unspecified"></a>
This indicates a node which is referenced in another node's "service requested" row, but is not specified within the model description. This typically happens because of a typo in the "service requested" row's branch name. For example, in the example below, the branch name in row 49 likely should have been `pyCIMS.Canada.Alberta.Residential.Buildings.Shell` but an extra `s` was added. 

```
'unspecified_nodes': [(49, 'pyCIMS.Canada.Alberta.Residential.Buildings.Shells'),
                      (59, 'pyCIMS.Canada.Alberta.Residential.Buildings.Shells'),
                      (286, 'pyCIMS.Canada.Alberta.Residential.Buildings.Shell.Space heating.Furnace')]
```

Each list item indicates a service being requested from a node that was never specified in the model description. The tuple contains (1) the row in the Excel file where the reference is made, and (2) the node from which a service is being requested. 

### Unreferenced Nodes <a id="unreferenced"></a>
This indicates a node which has been specified in the model description, but has not been requested by another node. This typically happens when the path to the node is incorrectly specified or contains a typo. 

```
'unreferenced_nodes': [(289, 'pyCIMS.Canada.Alberta.Residential.Buildings.Shell.Space heating.Furnaces')]
```

Each list item indicates a node specified in the model description but not requested by another node. The tuple contains (1) the row in the Excel file where the node was specified and (2) the name of the node in branch form. 


### Nodes which don't Provide Services<a id="no_provided_services"></a>
This indicates a _non-root_ node which has been specified in the model description, but doesn't have a "service provided" line. 

```
 'nodes_no_provided_service': [(873, 'pyCIMS.Canada.Alberta.Commercial')]

```
Each list item indicates a node specified in the model description which does not provide a service. The associated tuple contains (1) the row in the Excel file where the node was specified and (2) the name of the node. 


### Nodes & Technologies which don't Request Services<a id="no_requested_services"></a>
This indicates a node or technology which has been specified in the model description but doesn't request services from other nodes. This won't necessarily raise errors if you were to run the model, but these nodes and technologies should be checked to ensure there isn't a missing service request line. 

```
 'nodes_no_requested_service': [(44391, 'Aviation Turbo Fuel'),
                                (44399, 'Black Liquor'),
                                (2451, 'No AC', 'Existing')]

```

Each list item indicates a node or technology which doesn't request services from other nodes. The associated tuple contains (1) the row in the Excel file where the node or technology was specified, (2) the name of the node, and optionally (3) the name of the technology. The name of the technology is only included when it is a technology, rather than a node, that doesn't request a service. 

### Invalid Competition Type <a id="comp"></a>
This indicates a node which has been specified in the model description, but was assigned in invalid competition type. The only valid competition types for nodes are Root, Region, Sector, Sector No Tech, Tech Compete, and Fixed Ratio. Please note, Fixed Market Share is no longer a valid competition type. 

```
 'invalid_competition_type': [(57, 'Buildings'),
                              (2146, 'Dishwashing'),
                              (2487, 'Clothes drying')]
```

Each list item indicates a node with an invalid competition type. The associated tuple contains (1) the row in the Excel file where the incorrect competition type was specified and (2) the name of the node. 

### Nodes Requesting Self <a id="self"></a>
This indicates a node which has been specified in the model description to request services of itself. 

```
'nodes_requesting_self': [(36, 'pyCIMS.Canada.Alberta')]
```

Each list item indicates a node which requests services of itself. The associated tuple contains (1) the row in the Excel file where the self service request is being made and (2) the name of the node making this service request.  

# All the Code
Below, I've grouped together all the code needed for validating the model description. 

In [14]:
import pprint as pp
import pyCIMS

model_description_file = '../../pyCIMS_model_description_Alberta_Validated.xlsb'
model_description_file = '../../model_descriptions/pyCIMS_model_description.xlsm'

model_validator = pyCIMS.ModelValidator(model_description_file)
model_validator.validate(verbose=True, raise_warnings=False)
print("Problems\n********")
pp.pprint(model_validator.warnings)

0 nodes were specified but don't provide a service. 
Problems
********
{'invalid_competition_type': [(57, 'Buildings'),
                              (2146, 'Dishwashing'),
                              (2487, 'Clothes drying'),
                              (6830, 'Pumping'),
                              (6838, 'General'),
                              (7035, 'Slurry Stock'),
                              (7172, 'Precision'),
                              (7374, 'Machine Drive'),
                              (8691, 'Pumping'),
                              (8699, 'General'),
                              (8896, 'Slurry Stock'),
                              (9033, 'Precision'),
                              (9170, 'Machine Drive'),
                              (10696, 'Pumping'),
                              (10704, 'General'),
                              (10901, 'Slurry Stock'),
                              (11038, 'Precision'),
                              (11240, 'Machine D