# Model Validation Instructions

This notebook serves as a more thorough introduction to pyCIMS model validation functionality. For this notebook to run properly, ensure:
* pyCIMS is downloaded and installed on your local machine according to the [installation instructions](../../docs/Installation.md)
* The `pyCIMS_env` conda environment has been activated according the [conda environment instructions](../../docs/WorkingWithCondaEnvironment.md)

For a more general of pyCIMS, please see the [Quickstart](Quickstart.ipynb) tutorial. 

## Import pyCIMS  & other packages

In [None]:
import pyCIMS
import pprint as pp

Now that we have loaded `pyCIMS`, we can use the `ModelValidator`. First we will instantiate the `ModelValidator` class. To instantiate the class we must provide the location of the excel file specifying the model description.

Optionally you can also provide a `node_col` parameter. This tells the model validator what the name of column specifying Node names. In the current model description (2020-09-17) this column is `"Node"`. If not provided with a value, this parameter defaults to `"Node"`. 

## Initialize the `ModelValidator`

In [None]:
model_description_file = '../model_descriptions/pyCIMS_model_description_Alberta_Test.xlsb'

model_validator = pyCIMS.ModelValidator(model_description_file)

Next, using our `model_validator` we will use the `validate()` method to check for any errors in our model description. There are a couple of paramters for this method. I'll explain them below: 
* **`verbose`** : Determines whether the method will use print statements to notify of any problems identified in the model description. Here we have set verbose to be True so that we will see printed statements letting us know about the errors. . 

* **`raise_warnings`** : Determines whether the method will raise warnings when it identifies problems in the model description. Warnings are more "in your face" than print statements, appearing in red for the user. However, warnings do go away if you run the cell multiple times. Here, we have set raise_warnings to False. We will just look at the printed statements and the resulting dictionary (next cell)


## Validate the model

In [None]:
model_validator.validate(verbose=True, raise_warnings=False)

In [None]:
model_validator.warnings['nodes_no_requested_service']

## Investigate the Warnings

Regardless of whether you use the `verbose` or `raise_warnings` options in the `validate()` method call, any problems identified can be accessed through the `ModelValidator.warnings` attribute.

In [None]:
pp.pprint(model_validator.warnings)

Ideally the code above returned an empty dictionary. If not, the examples below should help explain what the `warnings` dictionary might contain. 

First off, the `warnings` dictionary can contain up to 7 keys (as of September 2020). These 7 keys are:  
* [`mismatched_node_names`](#mismatch)
* [`unspecified_nodes`](#unspecified)
* [`unreferenced_nodes`](#unreferenced)
* [`nodes_no_provided_service`](#no_provided_services)
* [`nodes_no_requested_service`](#no_requested_services)
* [`invalid_competition_type`](#comp)
* [`nodes_requesting_self`](#self)
* [`discrepencies_in_model_and_tree`](#discrepencies)
* [`nodes_with_zero_output`](#zero_output)
* [`fuels_without_lcc`](#snt_lcc)
* [`nodes_no_capital_cost`](#no_cap_cost)
* [`nodes_bad_total_market_share`](#bad_ms)
* [`duplicate_req`](#dup_service_req)
* [`bad_service_req`](#bad_service_req)


See the sections below for more information on what each of these keys mean. 

### Node Name & Node Branch Mismatch <a id="mismatch"></a>
This indicates a node where the node's name and the last element in the node's branch do not match. This is usually the result of a simple typo related to capitalization, white space, or extra characters.  
```
'mismatched_node_names': [(16, 'Albertas', 'Alberta'), 
                          (16, 'Space heating', 'Space Heating')]
```

Each list item indicates a mismatched node and branch name. The tuple contains (1) the row in the Excel file where the mismatch has occurred, (2) the name given to the node in the "Node" column, and (3) the name of the node according to the "Service provided" branch structure.

### Unspecified Nodes <a id="unspecified"></a>
This indicates a node which is referenced in another node's "service requested" row, but is not specified within the model description. This typically happens because of a typo in the "service requested" row's branch name. For example, in the example below, the branch name in row 49 likely should have been `pyCIMS.Canada.Alberta.Residential.Buildings.Shell` but an extra `s` was added. 

```
'unspecified_nodes': [(49, 'pyCIMS.Canada.Alberta.Residential.Buildings.Shells'),
                      (59, 'pyCIMS.Canada.Alberta.Residential.Buildings.Shells'),
                      (286, 'pyCIMS.Canada.Alberta.Residential.Buildings.Shell.Space heating.Furnace')]
```

Each list item indicates a service being requested from a node that was never specified in the model description. The tuple contains (1) the row in the Excel file where the reference is made, and (2) the node from which a service is being requested. 

### Unreferenced Nodes <a id="unreferenced"></a>
This indicates a node which has been specified in the model description, but has not been requested by another node. This typically happens when the path to the node is incorrectly specified or contains a typo. 

```
'unreferenced_nodes': [(289, 'pyCIMS.Canada.Alberta.Residential.Buildings.Shell.Space heating.Furnaces')]
```

Each list item indicates a node specified in the model description but not requested by another node. The tuple contains (1) the row in the Excel file where the node was specified and (2) the name of the node in branch form. 


### Nodes which don't Provide Services<a id="no_provided_services"></a>
This indicates a _non-root_ node which has been specified in the model description, but doesn't have a "service provided" line. 

```
 'nodes_no_provided_service': [(873, 'pyCIMS.Canada.Alberta.Commercial')]

```
Each list item indicates a node specified in the model description which does not provide a service. The associated tuple contains (1) the row in the Excel file where the node was specified and (2) the name of the node. 


### Nodes & Technologies which don't Request Services<a id="no_requested_services"></a>
This indicates a node or technology which has been specified in the model description but doesn't request services from other nodes. This won't necessarily raise errors if you were to run the model, but these nodes and technologies should be checked to ensure there isn't a missing service request line. 

```
 'nodes_no_requested_service': [(44391, 'Aviation Turbo Fuel'),
                                (44399, 'Black Liquor'),
                                (2451, 'No AC', 'Existing')]

```

Each list item indicates a node or technology which doesn't request services from other nodes. The associated tuple contains (1) the row in the Excel file where the node or technology was specified, (2) the name of the node, and optionally (3) the name of the technology. The name of the technology is only included when it is a technology, rather than a node, that doesn't request a service. 

### Invalid Competition Type <a id="comp"></a>
This indicates a node which has been specified in the model description, but was assigned in invalid competition type. The only valid competition types for nodes are Root, Region, Sector, Sector No Tech, Tech Compete, and Fixed Ratio. Please note, Fixed Market Share is no longer a valid competition type. 

```
 'invalid_competition_type': [(57, 'Buildings'),
                              (2146, 'Dishwashing'),
                              (2487, 'Clothes drying')]
```

Each list item indicates a node with an invalid competition type. The associated tuple contains (1) the row in the Excel file where the incorrect competition type was specified and (2) the name of the node. 

### Nodes Requesting Self <a id="self"></a>
This indicates a node which has been specified in the model description to request services of itself. 

```
'nodes_requesting_self': [(36, 'pyCIMS.Canada.Alberta')]
```

Each list item indicates a node which requests services of itself. The associated tuple contains (1) the row in the Excel file where the self service request is being made and (2) the name of the node making this service request.  

### Discrepencies between Model & Tree sheets <a id="discrepencies"></a>
This indicates nodes which have been defined in different orders in the Model and Tree sheets within the model description. The number of nodes identified often exagerates the work required to remedy the problem. For example, a single extra or missing node in the Tree sheet will result in every following node to be "out of order".  

```
'discrepencies_in_model_and_tree': [(13, None, 'pycims.canada.alberta.residential.buildings.refrigerator')]
```

Each list item indicates a node which has been defined in a different order within the Model and Tree sheets in the model description. The associated tuple contains (1) the order in the Tree sheet where the node was defined, (2) the row in the Model sheet where the node was defined, and (3) the name of the node. A `None` value in the first two positions within the tuple indicates the node was not defined within the associated sheet.  

### Nodes with Zero Output <a id="zero_output"></a>
This indicates nodes where the output has been exogenously set to 0 for any year(s) within the model description. 

```
'nodes_with_zero_output': [(6090, 'Urban')]
```

Each item in the list indicates a node where output was exogenously set to 0. The associated tuple contains (1) the row number where the node was defined in the model description and (2) the name of the node. 

### "Sector No Tech" Fuels with no LCC<a id="snt_lcc"></a>
This indicates "sector no tech" fuel nodes where an LCC hasn't been exogenously defined. 

```
'fuels_without_lcc': [(43140, 'Byproduct Gas')]
```

Each item in the list indicates a "Sector No Tech" fuel node where LCC wasn't exogenously set. The associated tuple contains (1) the row number where the node was defined in the model description and (2) the name of the node. 

### Technologies missing Capital Cost with Bad Total Market Shares <a id="no_cap_cost"></a>
Identifies tech compete nodes/technologies where the "Capital Cost_overnight" row hasn't been included in the model description. It doesn't matter whether this row contains year values, it just needs to exist.

```
'nodes_without_capital_cost': [(68,
                             'pyCIMS.Canada.Alberta.Residential.Buildings.Floorspace.Lighting',
                             'Incandescent')]
```

Each item in the list indicates a technology where the `Capital Cost_overnight` row is missing. The associated tuple contains (1) the row number where the node was defined in the model description, (2) the branch name of the node containing the technology, and (3) the name of the technology. 

### Nodes with Bad Total Market Shares <a id="bad_ms"></a>
This indicates nodes where the sum of base year market shares across all technologies does not equal 100%.  

```
'nodes_with_bad_total_ms': [(1150, 'Refrigerators', 1.0499999999999998),
                             (5538, 'Hot Water', 0.684584336),
                             (7005, 'Public Bus', 2.0),
                             (7068, 'Standard Emissions', 0.96)]
```

Each item in the list indicates a node where the base year market shares sum to more or less than 100%. The associated tuple contains (1) the row number where the node was defined in the model description, (2) the name of the node, and (3) the sum of base year market shares across all technologies at that node. 

### Technologies without Base Year Market Shares <a id="no_base_year_ms"></a>

This indicates technologies which are missing base year market shares in the model description.

```
'techs_no_base_year_ms': [[(81, 'Lighting', 'Incandescent'),
                           (109, 'Lighting', 'CFL'),
                           (332, 'Single Family Detached', 'single_family_detached_post_1960_furnace')]
```

Each item in the list indicates a technology where the base year market share hasn't been defined in the model description. The associated tuple contains (1) the row number where the market share is missing, (2) the name of the node, and (3) the name of the technology. 

### Nodes & Technologies Requesting from Service twice <a id="dup_service_req"></a>

This indicates nodes or technologies which request services from the same node more than once. 

```
'duplicate_req': [([19, 20], 'Alberta', ''), 
                  ([2083, 2112], 'Furnace', 'Natural Gas efficient')]
```

Each item in the list indicates a node or technology which has made a duplicate request. The associated tuple contains (1) the row numbers where the multiple requests are made, (2) the name of the node, and (3) the name of the technology (if there is one). 

### Nodes & Technologies With Incorrect Service Request Values <a id="bad_service_req"></a>

This identifies nodes/technologies that have a service requested line, but where the values in this lines have either been left blank or exogenously specified as 0.

```
'bad_service_req': [(6737, 'Passenger Vehicles'),
                  (6863, 'Existing'),
                  (8819, 'Freight TKT'),
                  (10502, 'Size Reduced Product')]
```

Each item in the list indicates a node or technology which has a service request value missing or set to 0. The associated tuple contains (1) the row numbers where the missing/zero value is specified and (2) the name of the node.

# All the Code
Below, I've grouped together all the code needed for validating the model description. 

In [None]:
import pprint as pp
import pyCIMS

model_description_file = '../model_descriptions/pyCIMS_model_description_Alberta_Test.xlsb'

model_validator = pyCIMS.ModelValidator(model_description_file)
model_validator.validate(verbose=True, raise_warnings=False)
print("Problems\n********")
pp.pprint(model_validator.warnings)