## Import CIMS  & other packages

# Model Validation Instructions

This notebook serves as a more thorough introduction to CIMS model validation functionality. For this notebook to run properly, ensure:
* CIMS is downloaded and installed on your local machine according to the [installation instructions](../../docs/Installation.md)
* The `CIMS_env` conda environment has been activated according the [conda environment instructions](../../docs/WorkingWithCondaEnvironment.md)

For a more general introduction to CIMS, please see the [Quickstart](Quickstart.ipynb) tutorial. 

In [None]:
import CIMS
import pprint as pp

## Initialize the `ModelValidator`

Now that `CIMS` has been imported, we can instantiate the `ModelValidator` class. To instantiate the class we must provide: 
1. **`infile`**: the location of excel file which specifies the model
2. **`sheet_map`**: a dictionary specifying which sheets within the Excel file contain certain information. In particular, (1) the sheet specifying the  model & default paameters, and (2) the sheet containing default values. 
3. **`root_node`**: an optional parameter specifying the name of the root node. By default, is `"CIMS"`. 
4. **`node_col`**: an optional parameter specifying the column in the model's Excel file where node names are specified. By default, is `"Branch"`. 
5. **`target_col`**: an optional parameter specifying the column in the model's Excel file where target nodes are specified. By default, is `"Target"`.

In [None]:
# Specify the location of the model file
model_file = '../models/CIMS_base model 2.xlsb'

# Create the Model Validator
model_validator = CIMS.ModelValidator(
    infile=model_file, 
    sheet_map={
        'model': ['CIMS', 'CAN', 'BC'],
        'default_param': 'Default values'},
    node_col='Branch'
)

## Validate the model

Next, using our `model_validator` we will use the `validate()` method to check for any errors in our model description. There are two optional parameters for this method:
* **`verbose`** : Determines whether the method will use print statements to notify of any problems identified in the model description. Here, since we've specified `verbose=True` we will see these statements print. 

In [None]:
model_validator.validate(verbose=True)

## Investigate the Warnings

Regardless of whether you use the `verbose` option in the `validate()` call, any problems identified can be accessed through the `ModelValidator.warnings` attribute.

In [None]:
pp.pprint(model_validator.warnings)

Ideally the code above returned an empty dictionary. If not, the examples below should help explain what the `warnings` dictionary might contain. 

First off, the `warnings` dictionary can contain up to 16 keys (as of December 2023). These 16 keys are:  
1. [`mismatched_node_names`](#mismatch)
2. [`unspecified_nodes`](#unspecified)
3. [`unreferenced_nodes`](#unreferenced)
4. [`nodes_no_provided_service`](#no_provided_services)
5. [`nodes_no_requested_service`](#no_requested_services)
6. [`invalid_competition_type`](#comp)
7. [`nodes_requesting_self`](#self)
8. [`nodes_with_zero_output`](#zero_output)
10. [`supply_without_lcc_or_price`](#snt_lcc)
11. [`techs_no_base_year_ms`](#no_base_year_ms)
13. [`duplicate_req`](#dup_service_req)
15. [`bad_service_req`](#bad_service_req)
16. [`tech_compete_nodes_no_techs`](#tech_compete_nodes_no_techs)
19. [`market_child_requested`](#market_child_requested)
21. [`techs_revenue_recycling`](#techs_revenue_recycling)
22. [`nodes_with_cop_and_p2000`](#nodes_with_cop_and_p2000)

See the sections below for more information on what each of these keys mean. 

### Node Name & Node Branch Mismatch <a id="mismatch"></a>

This indicates a node where the node's name and the last element in the node's branch do not match. This is usually the result of a simple typo related to capitalization, white space, or extra characters.  
```
'mismatched_node_names': [(16, 'Albertas', 'Alberta'), 
                          (16, 'Space heating', 'Space Heating')]
```

Each list item indicates a mismatched node and branch name. The tuple contains (1) the row in the Excel file where the mismatch has occurred, (2) the name given to the node in the "Node" column, and (3) the name of the node according to the "Service provided" branch structure.

### Unspecified Nodes <a id="unspecified"></a>

This indicates a node which is referenced in another node's "service requested" row, but is not specified within the model description. This typically happens because of a typo in the "service requested" row's branch name. For example, in the example below, the branch name in row 49 likely should have been `CIMS.Canada.Alberta.Residential.Buildings.Shell` but an extra `s` was added. 

```
'unspecified_nodes': [(49, 'CIMS.Canada.Alberta.Residential.Buildings.Shells'),
                      (59, 'CIMS.Canada.Alberta.Residential.Buildings.Shells'),
                      (286, 'CIMS.Canada.Alberta.Residential.Buildings.Shell.Space heating.Furnace')]
```

Each list item indicates a service being requested from a node that was never specified in the model description. The tuple contains (1) the row in the Excel file where the reference is made, and (2) the node from which a service is being requested. 

### Unreferenced Nodes <a id="unreferenced"></a>

This indicates a node which has been specified in the model description, but has not been requested by another node. This typically happens when the path to the node is incorrectly specified or contains a typo. 

```
'unreferenced_nodes': [(289, 'CIMS.Canada.Alberta.Residential.Buildings.Shell.Space heating.Furnaces')]
```

Each list item indicates a node specified in the model description but not requested by another node. The tuple contains (1) the row in the Excel file where the node was specified and (2) the name of the node in branch form. 

### Nodes which don't Provide Services<a id="no_provided_services"></a>

This indicates a _non-root_ node which has been specified in the model description, but doesn't have a "service provided" line. 

```
 'nodes_no_provided_service': [(873, 'CIMS.Canada.Alberta.Commercial')]

```
Each list item indicates a node specified in the model description which does not provide a service. The associated tuple contains (1) the row in the Excel file where the node was specified and (2) the name of the node. 


### Nodes & Technologies which don't Request Services<a id="no_requested_services"></a>

This indicates a node or technology which has been specified in the model description but doesn't request services from other nodes. This won't necessarily raise errors if you were to run the model, but these nodes and technologies should be checked to ensure there isn't a missing service request line. 

```
 'nodes_no_requested_service': [(44391, 'Aviation Turbo Fuel', ''),
                                (44399, 'Black Liquor', ''),
                                (2451, 'No AC', 'Existing')]

```

Each list item indicates a node or technology which doesn't request services from other nodes. The associated tuple contains (1) the row in the Excel file where the node or technology was specified, (2) the name of the node, and (3) the name of the technology (if it exists).

### Invalid Competition Type <a id="comp"></a>

This indicates a node which has been specified in the model description, but was assigned in invalid competition type. As of October 2023, there are X valid competition types: 
* Root
* Region
* Sector
* Tech Compete
* Node Tech Compete
* Fixed Ratio
* Market
* Supply - Fixed Price
* Supply - Cost Curve Annual
* Supply - Cost Curve Cumulative

```
 'invalid_competition_type': [(57, 'Buildings'),
                              (2146, 'Dishwashing'),
                              (2487, 'Clothes drying')]
```

Each list item indicates a node with an invalid competition type. The associated tuple contains (1) the row in the Excel file where the incorrect competition type was specified and (2) the name of the node. 

### Nodes Requesting Self <a id="self"></a>

This indicates a node which has been specified in the model description to request services of itself. 

```
'nodes_requesting_self': [(36, 'CIMS.Canada.Alberta')]
```

Each list item indicates a node which requests services of itself. The associated tuple contains (1) the row in the Excel file where the self service request is being made and (2) the name of the node making this service request.  

### Nodes with Zero Output <a id="zero_output"></a>

This indicates nodes where the output parameter has been exogenously set to 0 for any year(s) within the model description. 

```
'nodes_with_zero_output': [(6090, 'Urban')]
```

Each item in the list indicates a node where output was exogenously set to 0. The associated tuple contains (1) the row number where the node was defined in the model description and (2) the name of the node. 

### Supply Nodes with no LCC or Price<a id="snt_lcc"></a>

This indicates "sector no tech" supply nodes where an LCC hasn't been exogenously defined. 

```
'supply_without_lcc_or_price': [(43140, 'Byproduct Gas')]
```

Each item in the list indicates a "Sector No Tech" supply node where LCC wasn't exogenously set. The associated tuple contains (1) the row number where the node was defined in the model description and (2) the name of the node. 

### Technologies without Base Year Market Shares <a id="no_base_year_ms"></a>

This indicates technologies which are missing base year market shares in the model description.

```
'techs_no_base_year_ms': [[(81, 'Lighting', 'Incandescent'),
                           (109, 'Lighting', 'CFL'),
                           (332, 'Single Family Detached', 'single_family_detached_post_1960_furnace')]
```

Each item in the list indicates a technology where the base year market share hasn't been defined in the model description. The associated tuple contains (1) the row number where the market share is missing, (2) the name of the node, and (3) the name of the technology. 

### Nodes & Technologies Requesting from Service twice <a id="dup_service_req"></a>

This indicates nodes or technologies which request services from the same node more than once. 

```
'duplicate_req': [([19, 20], 'Alberta', ''), 
                  ([2083, 2112], 'Furnace', 'Natural Gas efficient')]
```

Each item in the list indicates a node or technology which has made a duplicate request. The associated tuple contains (1) the row numbers where the multiple requests are made, (2) the name of the node, and (3) the name of the technology (if there is one). 

### Nodes & Technologies With Incorrect Service Request Values <a id="bad_service_req"></a>

This identifies nodes/technologies that have a service requested line, but where the values in this lines have either been left blank or exogenously specified as 0.

```
'bad_service_req': [(6737, 'Passenger Vehicles'),
                    (6863, 'Existing'),
                    (8819, 'Freight TKT'),
                    (10502, 'Size Reduced Product')]
```

Each item in the list indicates a node or technology which has a service request value missing or set to 0. The associated tuple contains (1) the row numbers where the missing/zero value is specified and (2) the name of the node.

### Tech Compete Nodes without Technologies <a id="nodes_with_cop_and_p2000"></a>

This identifies tech compete nodes that contain neither "Technology" nor "Service" headings, thereby appearing to CIMS as if not having a technology or service at all. 

```
'tech_compete_nodes_no_techs': [[(2565, 'No AC'), 
                                 (14057, 'CCS')]]
```

Each item in the list indicates a tech compete node that doesn't have a technology/service header. The associated tuple contains (1) the row number where the identified node can be found and (2) the name of the node.

### Nodes & Technologies Requesting from Children of Markets <a id="market_child_requested"></a>

This identifies nodes and technologies that request services from nodes which are children of market nodes. These requests should be made directly to the markets, rather than their children.

```
'market_child_requested': [(364, 
                            'CIMS.Canada.British Columbia.Coal Mining.Coal.Raw Product.Transportation', 
                            'CIMS.Generic Fuels.Diesel'), 
                           (10733, 
                            'CIMS.Canada.British Columbia.Ethanol.Agricultural Input', 
                            'CIMS.Generic Fuels.Diesel') 
                           ]
```

Each item in the list corresponds to a request made by a node or technology to a node which is part of a market. The associated tuple contains (1) the index where the identified node can be found, (2) the branch of the node making the request, and (3) the branch of the node being requested.

### Technologies with Revenue Recycling <a id="techs_revenue_recycling"></a>

This identifies technologies that are attempting to revenue recycle. Revenue recycling should only happen at nodes, never at techs.

```
'techs_revenue_recycling': [(240, 'Extraction', 'Extraction of coal')]
```

Each item in the list corresponds to a technology that is set to do revenue recycling. The associated tuple contains (1 the index where the identified technology can be found, (2) the name of the node , and (3) the name of the technology.

### Nodes with both COP & P2000 <a id="nodes_with_cop_and_p2000"></a>

This identifies nodes where both COP & P2000 have been exogenously defined.  

```
'nodes_with_cop_and_p2000': [(225.0, 'CIMS.Canada.British Columbia.Coal Mining.Coal.Raw Product.Extraction')]
```

Each item in the list a node where COP & P2000 have both been provided exogenously. The associated tuple contains (1) the row index where the identified node can be found and (2) the branch name of the identified node.

# All the Code
Below, I've grouped together all the code needed for validating the model description. 

In [None]:
import pprint as pp
import CIMS

model_file = '../models/CIMS_base model.xlsb'

# Create the Model Validator
model_validator = CIMS.ModelValidator(
    infile=model_file, 
    sheet_map={
        'model': ['CIMS', 'CAN', 'BC'],
        'default_param': 'Default values'},
    node_col='Branch'
)

model_validator.validate(verbose=True)
print("Problems\n********")
pp.pprint(model_validator.warnings)