# Welcome
Solution Extraction is a process by which we take a Project Drawdown Solution, in the form of an Excel Workbook, and create a corresponding python solution that implements _most_ of the same functionality.  This notebook will guide you through that process.  See also `Extraction_Guide.md` for more explanation and notes.

The first step is _make a copy of this notebook_.  Give it a name that represents the model you will be working on.  That way it won't collide with other notebooks when you check in or merge fixes.

## Setup


In [None]:
import sys
sys.path.append('../')   # If you move this notebook to another location, change this path to point to the root directory of the solutions project

from tools import solution_xls_extract as sxe
from tools import create_expected_zip as cez
from tools import expected_ghost
from solution import factory
from pathlib import Path
import pandas as pd
import openpyxl
import importlib

In [None]:
# Identify where you will be storing your Excel file while you work on it, and what directory the final result will go into.

excelfile = Path("C:\\Working\\ModelsNew\\Glass_RRS_Model_Residential-Nov19.xlsm")
outdir = Path("C:\\Working\\solutions\\solution\\residentialglass")
outdir.mkdir(parents=True, exist_ok=True)

In [None]:
# If you make changes to the extraction code (or any other code), reload it
# NOTE: This kind of reloading DOES NOT work for solutions themselves, unfortunately.  If you re-generate or modify your solution,
# you have to restart the Jupyter kernel to get it to reload properly.

importlib.reload(sxe)

## Extract Code

> Note: if you are working on a model that has already been extracted, skip this step and move on to whichever next step is appropriate.

Exctraction is done by the `sxe.ouput_solution_python_file` function.  This function reads most of the data it needs to extract from the `ScenarioRecord` tab and additional data from the TAM, Adoption and other tabs, and writes them to a solution directory in the form of an `__init__.py` file and a bunch of csv and json files.  All of the solutions in `/solution` were produced this way.

In [None]:
# Expect to see some warnings from openpyxl; these can be ignored.  If there are other warnings, please note them, but they are not necessarily
# a problem.

sxe.output_solution_python_file(outputdir=outdir, xl_filename=str(excelfile))

In [None]:
# %debug is your friend.  If the extraction fails with an exception, jump in and see if anything looks wrong

%debug

It is not uncommon to encounter issues at this stage or later.  I can't overemphasize this: 
> **Finding, researching and reporting issues is hugely valuable for us, even if you don't fully solve them.**

As you work through issues, please keep a log of what you have done; it can help the next person to pick up where you leave off.  Our convention is to create a file named `changelog` in your solution directory, so the information stays with the solution.

## Prune the Number of Scenarios, and set PDS1, PDS2 and PDS3

Some solutions contain a lot of scenarios.  We do not intend the solutions repository to be a source of historical scenario data, so we need just a few of the most recent scenarios and can delete the rest.  The scenarios we want are:

* One set of PDS1, PDS2 and PDS3 scenarios that are recent.  These will usually have names with PDS1/2/3 in the title, or they may be have the labels "Plausible", "Drawdown" and "Optimum" instead (which mean the same thing).  We prefer the most recent set of three.
* If there are a set of three scenarios labeled something like "Book Version", keep those as well.

Make a git commit that has all the generated scenarios in it, then delete scenarios (in the solution's ac/ subdir) that we don't need.

Inside the `__init__.py` file you will find three fields like this:
````python
    # These are the "default" scenarios to use for each of the drawdown categories.
    # They should be set to the most recent "official" set"
    PDS1 = "NOT SET"
    PDS2 = "NOT SET"
    PDS3 = "NOT SET"
````

Set these values to the names of the good set of PDS1, 2 and 3 scenarios.  (The name of the scenario is found on the first line or so of the scenario json file.)

## Load Code / Sniff Test

Once the code has been sucessfully extracted and placed into a directory in `solution/`, all the tools that work with solutions should become available.  We can try loading one of the scenarios you defined above:

In [None]:
myscenario = factory.load_scenario("residentialglass", "PDS1")

And if it doesn't work...

In [None]:
# %debug is your friend.

%debug

## Look at some results

TODO: it would be nice put some examples below, for example showing a little graph of something.

In [None]:
myscenario.c2.co2_mmt_reduced()

## Create Test Results

**If you don't have Windows Excel (Mac Excel has trouble with some of the macros), ask someone to do this process for you.**

Create a clean temporary directory to generate the test set in.  Put (a copy of) your Excel spreadsheet in that directory.

Follow the instructions in `tools/CREATING_EXPECTED_ZIP.md` to create the CSV files in that directory.

In [None]:
# Run the VB macros first!

# Assemble the resulting csv files into the expected_zip file

csvdirectory = Path("C:\\Working\\temp")
cez.create_expected_zip(csvdirectory)

In [None]:
# Move the resulting file where it belongs.

testdirectory = outdir / "tests"
testdirectory.mkdir(exist_ok=True)

!cp $csvdirectory/expected.zip $testdirectory/expected.zip

## Create the Solution Test File

Copy the template file `tools/solution_test_template.py` to your new `tests` directory and give it a unique name based on the solution name:


In [None]:
solution_name='residentialglass'
solution_testfile_name=f"test_{solution_name}.py"

!cp tools/solution_test_template.py $testdirectory/$solution_testfile_name

In [None]:
!pytest $outdir

If errors occur, look through the error output for an Excel range (like Q135:AA181 in the result above).  Search on this string in `tools/expected_result_tester.py` to find the specific test that failed.  From there, you work your way back to the same questions we were working on above: is this a failure in extraction, model code, the excel workbook, or the test?  Rinse and repeat.

## Controlling which Tests Run

The solution results tests acutally run many, many tests.  You may want to skip past some of those tests to find and work on other issues.  There is a way to do that, but it requires running the tests from within python, rather than via pytest.

If you look at the second function definition in your test file, you will see it some extra arguments that you can set:
 * `scenario_skip`: if present, an array of scenario indices to skip over
 * `test_skip`: if present, an array of strings that should match the descriptions of tests to skip
 * `test_only`: if present, an array of strings such that _only_ tests whose description matches one of them will be executed.
 
 So for example, you could skip the second scenario, and only do the 'First Cost' and 'Operating Cost' tests, but skip the first 'First Cost' test, with the following (substituting your own solution name, of course):

In [None]:
import solution.afforestation.tests.test_afforestation as mytests
mytests.test_deep_results(scenario_skip=[1],test_only=['First Cost', 'Operating Cost'], test_skip=['C37:C82'])

# Stuff that Often Goes Wrong:

By far most of the problems happen in the Adoption, Helper Tables and Unit Adoption classes; if you make it that far, you are usually about done.

 * The formulas in the top row and first column of the Helper Tables tab (for each of the REF and PDS tables) tend to have varying formulas. The extractor traditionally tried to keep up to date with which models would have which varients, but that code is inevitably brittle, so I've put less emphasis on that.  Instead, read the documentation on the Helper Tables `init` options, look at the Excel formulas, and set the arguments appropriately.

 * Sometime VMA data is missing.  This can particularly happen when proprietary data is removed from the Excel model.  This will usually show up as an error trying to load the VMA data, and may also show up as the Excel model computing nan where the python code computes a number. Fix the first part (loading the VMA) by changing the entries in the scenario files.  Change them from something like this:
    ````
          "conv_lifetime_capacity": {
            "value": 9.73837488548039,
            "statistic": "mean"   # or may be "statistic": ""
        },  
    ````
    to this:
      ````
          "conv_lifetime_capacity": 9.73837488548039 
    ````
  The second part of the problem is because Excel itself is relying on those missing values.  This is a case where it is legitimate to exclude
  tests on a permanent basis. 
  
  * Sometimes Excel workbooks compute values on the Advanced Controls tab, and those computations override the Scenario values.  In Excel, when you load a scenario, it copies all the scenario information from the ScenarioRecord tab into the places it belongs on the Advanced Controls tab.  But some Excel workbooks have created additional computations to calculate those values, and the formula overrides the scenario value. The correct thing to do here is to remove the offending formula from the Excel, and re-create the expected.zip file with it out of the way.
  If you do this, __be sure to document that in the changelog__.
  * Sometimes, the names of data sets (particularly adoptions) get changed, and older scenarios are never updated to the new name.
  You can fix this either by changing the name in the scenario to match the current data source name as found in `__init__.py`
  (usually it should be obvious which scenario), or by deleting the scenario, if it isn't an important one.

# Tips

## Don't forget to restart a Jupyter Notebook kernel if you have modified code

If you change code you need to either reload the library (the 3nd cell of this notebook) or restart the kernel.   Rather than try to figure out if it safe to reload, I just restart the kernel every time.

## When comparing to Excel, make sure you've loaded the right Scenario

On the `ScenarioRecord` tab, cell `B9` shows the currently loaded scenario.  When a workbook is first opened, this is usally empty, meaning you don't know which scenario was last loaded.  Select the scenario you are debugging against from the dropdown, and click on 'Load Scenario'.

## Beautifier for Excel Formulas

Are you looking at an excel formula with five nested `IF(...` expressions?  Try [https://www.excelformulabeautifier.com/](https://www.excelformulabeautifier.com/).  You're welcome.


## Look through changelog files for other solutions

Look through changelog files to see if someone encountered a similar problem already, and how they solved it.
Especially solutions belonging to the same sector (like electricity generation or transportation), which tend to be constructed in the same way.

## Compare your Excel solution to other solutions

If you think the python code seems to be doing the wrong thing, it may be that your Excel workbook has a different implementation than other workbooks.
The best way to check this is to use the Multi-Excel-Sample tool (in the tools directory) to look at the same bit of Excel from _all_ the workbooks.
(If you don't have the permissions to look at all the workbooks, ask someone who does to produce the sample for you)

# Contributing your Result

Ideally you end up with a clean test run.  But even if you don't, we want to use the work you have done.  If you have made it as far as getting a Scenario object to load, please create a PR with your result.  Make sure to include any changes you made, and your observations of what worked and didn't, in a `changelog` file in your solution directory.

Thank you for helping!