<a href="https://colab.research.google.com/github/JasonMorrisSC/openfisca-canada/blob/add_section_3/OAS_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using Jupyter Notebooks with OpenFisca for Rules As Code

This notebook is an experiment to see if it is possible to implement a piece of legislation in OpenFisca through a notebook, and to see how isomorphic the implementation can be.

The idea is to include a section of the law, followed by a section of OpenFisca code, and then run some Python code to demonstrate that the encoding is working as intended.

## Install OpenFisca

The first step is to install OpenFisca. Click Run on the code below to do that first.

In [None]:
!pip install openfisca-core

Collecting openfisca-core
  Downloading OpenFisca_Core-35.7.3-py3-none-any.whl (212 kB)
[K     |████████████████████████████████| 212 kB 4.0 MB/s 
[?25hCollecting pytest<6.0.0,>=4.4.1
  Downloading pytest-5.4.3-py3-none-any.whl (248 kB)
[K     |████████████████████████████████| 248 kB 51.8 MB/s 
Collecting dpath<2.0.0,>=1.5.0
  Downloading dpath-1.5.0.tar.gz (17 kB)
Collecting nptyping==1.4.4
  Downloading nptyping-1.4.4-py3-none-any.whl (31 kB)
Collecting sortedcontainers==2.2.2
  Downloading sortedcontainers-2.2.2-py2.py3-none-any.whl (29 kB)
Collecting typish>=1.7.0
  Downloading typish-1.9.3-py3-none-any.whl (45 kB)
[K     |████████████████████████████████| 45 kB 2.8 MB/s 
Collecting pluggy<1.0,>=0.12
  Downloading pluggy-0.13.1-py2.py3-none-any.whl (18 kB)
Building wheels for collected packages: dpath
  Building wheel for dpath (setup.py) ... [?25l[?25hdone
  Created wheel for dpath: filename=dpath-1.5.0-py3-none-any.whl size=14607 sha256=a7f922deda77231b9c43d36f627bf50567ff

## Encode the Law

Next, we will encode a piece of legisation, section 3 of the Old Age Security Act.

### Text of the section

>Monthly Pension
>
>Pension Payable
>
>Payment of full pension
>
>3 (1) Subject to this Act and the regulations, a full monthly pension may be paid to
>
>(a) every person who was a pensioner on July 1, 1977;
>
>(b) every person who
>
>(i) on July 1, 1977 was not a pensioner but had attained twenty-five years of age and resided in Canada or, if that person did not reside in Canada, had resided in Canada for any period after attaining eighteen years of age or possessed a valid immigration visa,
>
>(ii) has attained sixty-five years of age, and
>
>(iii) has resided in Canada for the ten years immediately preceding the day on which that person’s application is approved or, if that person has not so resided, has, after attaining eighteen years of age, been present in Canada prior to those ten years for an aggregate period at least equal to three times the aggregate periods of absence from Canada during those ten years, and has resided in Canada for at least one year immediately preceding the day on which that person’s application is approved; and
>
>(c) every person who
>
>(i) was not a pensioner on July 1, 1977,
>
>(ii) has attained sixty-five years of age, and
>
>(iii) has resided in Canada after attaining eighteen years of age and prior to the day on which that person’s application is approved for an aggregate period of at least forty years.

### Setting Parameters

In OpenFisca, some values are defined as "parameters". Parameters are values that can be changed to see what effect changing them would have on the people subject to the legislation. For example, a tax rate might be specified as a parameter to allow people to change it in a user interface.

In this case, we are going to define "age of eligibility for OAS" as a parameter. Then, instead of using the number `65` in our encoding, we will use a reference to whatever the current value of the age of eligibility for OAS is.

Parameters are usually configured in YAML files, so adding one in a Jupyter Notebook requires either a) creating a YAML file to represent the parameter, and then loading it, or b) putting the YAML specification into the Python code. We use the second approach here to define a parameter called `benefits.old_age_security.oas_age_of_eligibility`.

In [None]:
from openfisca_core.parameters import ParameterNode

parameters = ParameterNode(data={
    "benefits": {
        "old_age_security": {
            "oas_age_of_eligibility": {
                "1977-07": [65],
            }
        }
    }
})

Now we can run a quick query to see if it has been successfully been added.

In [None]:
print(parameters.benefits.old_age_security.oas_age_of_eligibility("2021-01-01")).income_sources("2021-05-05")

[65]


This shows us that OpenFisca now knows that as of January 1 of 2021, the age of eligibility for oas was 65, and that parameter can now be used by applications that know how to allow users to modify parameters and see the new results.

Note that from a "Rules as Code" perspective, the fact that we might at some point want to change the number 65 to some other number is not something that is known because it is stated in the law. So this is not so much a representation of the law as it is additional information added to the law.


### Creating Entities

OpenFisca requires you to describe the categories of objects that will be discussed in the legislation. Relationships between entities are described as variables.

For our purposes, we only need an entity called "Person".

From a rules as code perspective, this is making explicit for OpenFisca information that is implicit in the natural langauge version of the law, which does not need to "create" or "define" the word "person" in order to be effective. In the legislation, words like "someone" and "that person", and "any person" have meaning because they are English words, and they correspond to a shared concept in the real world.

A computer does not have or need any experience of the real world. It doesn't need to know what "person" means. But it does need to know that "Person" is a type of thing, and that one or more things might be a person, and that by virtue of being a person other properties can be known about the person.

So we create "Person" as an "Entity".

The explanatory text is taken from OpenFisca's default country template.

In [None]:
from openfisca_core.entities import build_entity

Person = build_entity(
    key = "person",
    plural = "persons",
    label = "An individual. The minimal legal entity on which a legislation might be applied.",
    doc = """

    Variables like 'salary' and 'income_tax' are usually defined for the entity 'Person'.

    Usage:
    Calculate a variable applied to a 'Person' (e.g. access the 'salary' of a specific month with person("salary", "2017-05")).
    Check the role of a 'Person' in a group entity (e.g. check if a the 'Person' is a 'first_parent' in a 'Household' entity with person.has_role(Household.FIRST_PARENT)).

    For more information, see: https://openfisca.org/doc/coding-the-legislation/50_entities.html
    """,
    is_person = True,
    )

entities = [Person]

### OpenFisca Imports

There are a couple of basic concepts that we need to import to be able to use them below. OpenFisca uses its own period definitions. We will use "MONTH" only, to simplify the demonstration.

We also need to import the `Variable` type in order to start creating variables, which represent things that the user can tell us about entities, or that OpenFisca can calculate about entities.


In [None]:
from openfisca_core.periods import MONTH
from openfisca_core.variables import Variable

### Encoding Section 3(1)
 
We are now going to encode section 3(1), in OpenFisca, but only a small part of it.  First, we are going to encode the fact that section 3(1) is satisfied if any of sections 3(1)(a-c) are satisfied.

The important part is the last line of the `formula` function inside the class, which uses the `*` operator to represent disjunction, or an "or" statement.

The `+` operator is used because variables in OpenFisca are not scalar, but vectors. Each variable is actually a list of values. So we are using a vector disjunction operator instead of a boolean `or`. 

In [None]:
class s3_1_applies(Variable):
    value_type = bool
    entity = Person
    definition_period = MONTH
    label = "In accordance with section 3(1) of the Old Age Security Act, the person may be paid a full monthly pension."

    def formula(person, period, parameters):
        """Whether the person may, under section 3(1) of the Old Age Security Act, be paid a full monthly pension."""
        return person("section_3_1_a_applies", period) + person("section_3_1_b_applies", period) + person("section_3_1_c_applies", period)

Next, we will encode the fact that _if_ section 3(1) applies to a person, that person may be paid a full monthly pension.

In [None]:
class may_be_paid_full_monthly_pension(Variable):
    value_type = bool
    entity = Person
    definition_period = MONTH
    label = "The person may be paid a full monthly pension."

    def formula(person, period, parameters):
        """Whether the person may be paid a full monthly pension."""
        return person("section_3_1_applies", period)

#### Why Do It Twice?

So first, we said that if 3(1)(a) or 3(1)(b) or 3(1)(c) are satisfied then 3(1) is satisfied. Second, we said that if 3(1) is satisfied, then the legal conclusion of 3(1) holds.

Why not take out the middle man, and say that if 3(1)(a-c) are true then the legal conclusion holds?

You can, and depending on what kinds of questions you want your code to be able to answer, that might be appropriate and sufficient. But from a Rules as Code perspective, we are encoding legislation in advance of knowing what purpose the encoding is going to be put to. So what we should aspire to do is to encode information that answers as many questions that the law is capable of answering as possible.

For example, if you know for a fact that your app is only ever going to want to know whether a Person is eligible for a full monthly pension, then there is no reason to divide the rule into two Variables. (Indeed, as you will see, we are going to divide section 3(1) into **many** OpenFisca Variables, not just two.)

But, perhaps you would like to be able to answer the question: "How did the person qualify for a full monthly pension?" It may be that there are other sections of the law that *also* qualify someone for full monthly pension. In which case, we are going to have to amend the formula for that legal conclusion to add those other sections. But by also encoding which sections were triggered separately from what conclusions arise as a result, we have access to variables that can show how OpenFisca reached that conclusion.

It also allows us to run simulations and ask questions like "are there any people who qualify for the pension under section 3, but not under the other section?" That allows us to determine whether section 3 even matters, or if it is entirely duplicate to the other section that reaches the same conclusion.

So the objective of this Rules as Code style encoding is to reflect as much inforamtion about the law in the encoding as you can, because you don't know in advance which questions will be relevant.

That increases the difficulty of doing the encoding, but increases the number of purposes to which the encoding can be put.

#### Using OpenFisca for Reasons

Using this approach to get OpenFisca to give you reasons for legal conclusions has some advantages and disadvantages over other approaches. One disadvantage, is that you cannot simply ask OpenFisca "why" something is true, or not true. Instead, you have to ask it to calculate all of the variables that might have been relevant. Which means that the programmer needs to know in advance which variables those are. That defeats the objective of obscuring from the application developer how the RUles as Code implementation does its calculations.  Also, the information is calculated as a set of true or false values for each node of the legislation, but the answer received from the API doesn't include information about how those nodes are connected, and which results triggered which other results. Reconstructing that information on the applicaiton end will again require the developer to be aware of the structure of the act, which again defeats the purpose of avoiding the need for the application developer to know how the Act works.

However, OpenFisca does have the advantage that as long as you ask for a Variable, it will be provided if it can be calculated. Which means that unlike some logic-based programming approaches to encoding legislation, OpenFisca is capable of advising the user if the same conclusion can be reached in multiple ways.

### Section 3(1)(a)

The text here is:

> every person who was a pensioner on July 1, 1977

There are no sub-sections to 3(1)(a), so we only need to describe the one variable that describes when this section applies, and the formula for that Variable will be dependent on whether or not the person was a pensioner as of July 1, 1977.

We are using "MONTH" as the duration for all of our variables, so we will ask whether or not the person was a pensioner "in" July of 1977.

**QUESTION:** We haven't defined pensioner? Will OpenFisca treat undefined variables as `false`-ish, or will it throw an error?

In [None]:
class section_3_1_a_applies(Variable):
    value_type = bool
    entity = Person
    definition_period = MONTH
    label = "The person meets the definition of section 3(1)(a) of the Old Age Security Act."

    def formula(person, period, parameters):
        """Whether the person was, on July 1, 1977, a pensioner."""
        return person("pensioner", 1977_07)

### Section 3(1)(b)

Unlike section 3(1)(a), section 3(1)(b) has sub-components, so we are again going to encode one variable that indicates which sub-components trigger this component, and then separately encode what triggers each of the sub-components. Here, all three of the components must be satisfied, so we use the vector conjunction operator `*` instead of the vector disjunction operator `+`.

In [None]:
class section_3_1_b_applies(Variable):
    value_type = bool
    entity = Person
    definition_period = MONTH
    label = "The person meeds the definition of section 3(1)(b) of the Old Age Security Act."

    def formula(person, period, parameters):
        """Whether the person is caught under section 3(1)(b) of the Old Age Security Act."""
        return person("section_3_1_b_i_applies", period) * person("section_3_1_b_ii_applies", period) * person("section_3_1_b_iii_applies", period)

#### Section 3(1)(b)(i)

Section 3(1)(b)(i) reads:

> on July 1, 1977 was not a pensioner but had attained twenty-five years of age and resided in Canada or, if that person did not reside in Canada, had resided in Canada for any period after attaining eighteen years of age or possessed a valid immigration visa,

Section 3(1)(b)(i) is also a "completion" of the phrase in section 3(1)(b), so the entire phrase together is:

> every person who on July 1, 1977 was not a pensioner but had attained twenty-five years of age and resided in Canada or, if that person did not reside in Canada, had resided in Canada for any period after attaining eighteen years of age or possessed a valid immigration visa,

This section does not have any sub-sections, but it does have a complicated internal structure, which can be represented as follows:

```
* All of
  * Not a pensioner on July 1, 1977, and
  * 25 years of age on July 1, 1977, and
  * one of
    * resided in Canada, or
    * all of
      * did not reside in Canada, and
      * one of
        * resided in canada for any period after attaining 18 years of age, or
        * possessed a valid immigration visa
```

There is an alternative structure that could be used to describe this section. We could instead describe it this way:

```
* All of
  * Not a pensioner on July 1, 1977, and
  * 25 years of age on July 1, 1977, and
  * one of
    * resided in Canada, or
    * one of
      * resided in canada for any period after attaining 18 years of age, or
      * possessed a valid immigration visa
```
Here, we have simplified the structure by removing the requirement that as an alternative to residing in Canada, the person must not reside in Canada.

The purely logical meanings of the two structures is identical, when implemented in OpenFisca. So why would we choose one over the other?

Well, it comes down to a matter of interpretation. Why does the section use the phrasing "or, if the person did not reside in Canada"?

In one possible interpretation, the intent of the legislation was to make the second residence qualification available only for people who did not reside in Canada. That would be a redundant intent, but it could have been the intent. In another possible interpretation, the phrase is included *only* to indicate that the disjunction applies to only the residence requirement, and not to the requirement of not a pensioner and 25 years of age, which appear before it in the list.

That latter intent is more likely. But if we interpret and encode it that way, we are removing information from the encoding that exists in the law, despite the fact that including the information that the words "if the person did not reside in Canada" were actually included.

> Where you have the option of two encodings of the legislation, with no logical difference, prefer the encoding that best mirrors the legislature, so as not to lose information.

Perhaps, later, the first half of section 3(1)(b)(i) is amended in a way that suddenly makes the "not a residence" requirement in the second half relevant. The person amending the encoding to match the amendments to the law may not realize that something was left out of the original encoding, and the logical effect of amending only the first part may not be what was expected.

Also, if later you discover that you want to have answers to questions like "why, under section 3(1)(b)(i) did the person qualify", you can take the elements that are being considered inside the formula only, and turn them into variables. If that happens, having the correct structure of logic in your formula, even if it is slightly redundant, will make it easier to implement the logic as a set of variables that matches the text of the law, which in turn will make the explanations created by those variables more consistent with what the law says.

Indeed, you can see that the "not a pensioner" requirement in 3(1)(b)(i) is also redundant in the same way, because it is dealt with in section 3(1)(a), and 3(1)(a) and 3(1)(b) are disjoint. But we include it again, because it does no harm to the logic, and it increases the fidelity of the encoding.

#### Complicated Formulas

Here, because the section is internally complicated, we are not going to use a single line of Python to describe what is going on. Instead, we are going to take all of the things that we need to know and turn them into local variables inside the `formula` function. Then, we are going to combine them in reverse order to the diagram above.

**NB** "not" is the wrong syntax, here, because that is not how you do vector negation.



In [None]:
class section_3_1_b_i_applies(Variable):
    value_type = bool
    entity = Person
    definition_period = MONTH
    label = "The person meets the definition of section 3(1)(b)(i) of the Old Age Security Act."

    def formula(person, period, parameters):
        """Whether the person meets the definition of section 3(1)(b)(i) of the Old Age Security Act."""
        pensioner = person("pensioner", 1977_07)
        attained_25_yoa = person("age", 1977_07) > 25
        resided = person("country_of_residence", 1977_08) == "Canada"
        resided_after_18 = person("resided_in_canada_after_18_yoa", 1977_07)
        possessed_valid_visa = person("possessed_valid_immigration_visa", 1977_07)
        alternate_residence_requirement = (resided_after_18 + possessed_valid_visa) * not_(resided)
        residence_requirement = resided
        age_and_residence = attained_25_yoa * (residence_requirement + alternate_residence_requirement)
        section = not_(pensioner) * age_and_residence
        return section

#### Section 3(1)(b)(ii)

> has attained sixty-five years of age, and

Here is the requirement of 65 years of age, again. We are going to assume that this age would be changed here, as well as anywhere else, all according to the parameter we created at the top of the file. So we can define being at least 65 years of age as having an age, at the relevant time, that is above the OAS eligibility age that was defined as of that time.

Numerical comparisons like greater-than and less-than work fine for vectors, so we can say
```
person("age", period) >= parameters.benefits.old_age_security.oas_age_of_eligibility(period)
```
and that will return whether or not the person was at or over the eligibility age *defined at that time*.

**QUESTION**: I think this raises some problems in terms of retroactivity and retrospectivity.

In [None]:
class section_3_1_b_ii_applies(Variable):
  value_type = bool
  entity = Person
  definition_period = MONTH
  label = "The person meets the definition of section 3(1)(b)(ii) of the Old Age Security Act."

  def formula(person, period, parameters):
    """Whether the person meets the definition of section 3(1)(b)(ii) of the Old Age Security Act."""
    return person("age", period) >= parameters.benefits.old_age_security.oas_age_of_eligibility(period)

#### Section 3(1)(b)(iii)

> has resided in Canada for the ten years immediately preceding the day on which that person’s application is approved or, if that person has not so resided, has, after attaining eighteen years of age, been present in Canada prior to those ten years for an aggregate period at least equal to three times the aggregate periods of absence from Canada during those ten years, and has resided in Canada for at least one year immediately preceding the day on which that person’s application is approved; and

This is a very difficult section to encode, because of how it deals with durations. The first thing that we would need to do is create some way of storing information about the periods of time during which a person lived in a place.

If we know where a person lives at any given point in time, OpenFisca is automatically going to add dates to that knowledge, and create a list of residences. So we just need to create a variable for Person that is their current place of residence.


In [17]:
class residing_in_canada(Variable):
  value_type = bool
  entity = Person
  definition_period = MONTH
  label = "Whether the person is residing in Canada during the period"

Now that we have created a variable for storing whether or not the person lives in Canada, as of a given month, we can calculate a variable that returns their residence history as of the current time. That can then be used to calculate how long they have been living where as of certain dates in the past, I think?

In [None]:
class residence_history(Variable):
  value_type = list
  entity = Person
  definition_period = MONTH
  label = "A history of the person's prior values for residing_in_canada"

  def formula(person, period, parameters):
    """A list of tuples that indicate when the person's residence history starts, and when it changed."""
    # Start from the period in which the person was born.
    current = person(period).birthdate.first_month
    # Loop through all the periods until the current period.
    while (current )
      # If there is a value for that period and
        # The result list is empty, or
        # The most recent result is different
      # Add the value to the list
    # Return the list
    # TODO
    return []

### Section 3(1)(c)

>(c) every person who

In [None]:
class section_3_1_c_applies(Variable):
    value_type = bool
    entity = Person
    definition_period = MONTH
    label = "The person meeds the definition of section 3(1)(c) of the Old Age Security Act."

    def formula(person, period, parameters):
        """Whether the person is caught under section 3(1)(c) of the Old Age Security Act."""
        return person("section_3_1_c_i_applies", period) * person("section_3_1_c_ii_applies", period) * person("section_3_1_c_iii_applies", period)

#### Section 3(1)(c)(i)

> (i) was not a pensioner on July 1, 1977,


In [None]:
class section_3_1_c_i_applies(Variable):
  value_type = bool
  entity = Person
  definition_period = MONTH
  label = "The person meets the definition of section 3(1)(c)(i) of the Old Age Security Act."

  def formula(person, period, parameters):
    """Whether the person meets the definition of section 3(1)(c)(i) of the Old Age Security Act."""
    return not_(person("pensioner",1977-07)

#### Section 3(1)(c)(ii)

> (ii) has attained sixty-five years of age, and

In [None]:
class section_3_1_c_ii_applies(Variable):
  value_type = bool
  entity = Person
  definition_period = MONTH
  label = "The person meets the definition of section 3(1)(c)(ii) of the Old Age Security Act."

  def formula(person, period, parameters):
    """Whether the person meets the definition of section 3(1)(c)(ii) of the Old Age Security Act."""
    return person("age",period) >= parameters(period).benefits.old_age_security.oas_age_of_eligibility

#### Section 3(1)(c)(iii)

>(iii) has resided in Canada after attaining eighteen years of age and prior to the day on which that person’s application is approved for an aggregate period of at least forty years.

In [None]:
class section_3_1_c_iii_applies(Variable):
  value_type = bool
  entity = Person
  definition_period = MONTH
  label = "The person meets the definition of section 3(1)(c)(iii) of the Old Age Security Act."

  def formula(person, period, parameters):
    """Whether the person meets the definition of section 3(1)(c)(iii) of the Old Age Securiry Act."""
    # Has lived aggregate 40 years in Canada since the age of 18.
    # TODO
    return false

## Test the Encoding

I'm still learning how this is done, but the best example I can find is at https://colab.research.google.com/drive/1RVwyFpqDpyJ6zK_dsZ_umhPcPC30dOre?usp=sharing#scrollTo=y6l4uw1FQMXD.