# Domain knowledge creation with cause2e
This notebook shows examples of how ```cause2e``` can be used for creating and handling domain knowledge. Domain knowledge can be created by the ```knowledge.EdgeCreator``` before learning the causal graph. Using domain knowledge for structure learning drastically increases our chances of finding the right causal graph with structure learning methods. 

As a reminder: The correct causal graph has an edge from variable A to variable B if and only if variable A directly influences variable B (changing the value of variable A changes the value of variable B if we keep all other variables fixed). 

Humans can often infer parts of the causal graph from domain knowledge. The nodes are always just the variables in the data, so the problem of finding the right graph comes down to selecting the right edges between them.
There are three ways of passing domain knowledge:
- Indicate which edges must be present in the causal graph.
- Indicate which edges must not be present in the causal graph.
- Indicate a temporal order in which the variables have been created. This is then used to generate forbidden edges, since the future can never influence the past.

### Imports

In [1]:
from cause2e import knowledge

## Problem description
Suppose that we want to model the following situation. Prospective students send out applications to four different universities (A, B, C and D) where they hope to be accepted. Their application contains their grades, but not their age or ethnicity. The applications are fully anonymous and solely grade-based, meaning that the universities only sort the students by their grades and accept the best ones. The only exception is university D, where the grades do not matter and students are selected only based on their age. It is forbidden for the universities' admission committees to talk to each other during the application process.

## Organizing our knowledge
Suppose we are given data for the following variables:
- 'Age': A student's age.
- 'Ethnicity': A student's ethnicity.
- 'Grades': A student's grade average.
- 'University A': True if a student is accepted at university A, false otherwise.
- 'University B': True if a student is accepted at university B, false otherwise.
- 'University C': True if a student is accepted at university C, false otherwise.
- 'University D': True if a student is accepted at university D, false otherwise.

Before we can start creating knowledge in the form of forbidden or required edges in the causal graph with ```cause2e```, we have to think about which edges can be present in the graph. No maths or programming is required, we only need common sense (and actual domain knowledge in more specific application cases) to come up with a list of edges.
- A student's grades, age and ethnicity cannot be affected by the universities acceptance/rejection letters for temporal reasons.
- A student's age or ethnicity cannot be influenced by their grades.
- A student's grade affects the acceptance chance at all the universities by the nature of the screening process, except at university D.
- A student's age or ethnicity cannot influence acceptance/rejection at the universities, except at university D where age matters.
- Being accepted/rejected from one university cannot influence the results from the other universities.

## Creating knowledge with ```cause2e```
In principle, we can explicitly enumerate all the forbidden or required edges from the above verbal reasoning and feed them into a structure learning algorithm. However, this quickly becomes very laborous as the number of variables in a problem grows. It is also prone to errors since there are so many constraints that it is easy to forget one without noticing. ```cause2e``` spares us from this pain by providing the ```knowledge.EdgeCreator``` that has utility methods for creating many constraints at once, just like humans do it naturally when using language.

In [2]:
edge_creator = knowledge.EdgeCreator()

In order to create many constraints at once, it is helpful to group our variables into semantically meaningful groups, e.g. the responses from the universities on the one hand and the students' attributes on the other hand.

In [3]:
universities = {'University A', 'University B', 'University C', 'University D'}
student_attributes = {'Age', 'Ethnicities', 'Grades'}

### Forbid edges from temporal knowledge
In many applications, we have knowledge about the temporal structure of the data generating process. This is very helpful, since the future cannot influence the past. We can directly communicate this knowledge to the ```EdgeCreator``` that subsequently translates it into a number of forbidden edges.

In [4]:
temporal_order = [student_attributes, universities]
edge_creator.forbid_edges_from_temporal(temporal_order)

The communicated constraints can be checked with a call to ```show_edges()```.

In [5]:
edge_creator.show_edges()

-------------------
Required edges:
-------------------
Forbidden edges:
('University C', 'Age')
('University B', 'Ethnicities')
('University A', 'Age')
('University D', 'Age')
('University B', 'Grades')
('University D', 'Ethnicities')
('University B', 'Age')
('University D', 'Grades')
('University A', 'Ethnicities')
('University C', 'Grades')
('University A', 'Grades')
('University C', 'Ethnicities')
-------------------


We can also directly inspect the ```forbidden_edges``` and ```required_edges``` attributes where the constraints are stored.

In [6]:
print(edge_creator.forbidden_edges)
print(edge_creator.required_edges)

{('University C', 'Age'), ('University B', 'Ethnicities'), ('University A', 'Age'), ('University D', 'Age'), ('University B', 'Grades'), ('University D', 'Ethnicities'), ('University B', 'Age'), ('University D', 'Grades'), ('University A', 'Ethnicities'), ('University C', 'Grades'), ('University A', 'Grades'), ('University C', 'Ethnicities')}
set()


We see that the edges are represented as a set of pairs: The first entry is the source node, the second entry is the destination node. We can directly pass another forbidden edge using this simple format, but there is a convenience method that does not require knowing the data structure.

In [7]:
edge_creator.forbid_edge('Grades', 'University A')
edge_creator.show_edges()

-------------------
Required edges:
-------------------
Forbidden edges:
('University C', 'Age')
('University B', 'Ethnicities')
('University A', 'Age')
('University D', 'Age')
('University B', 'Grades')
('University D', 'Ethnicities')
('University B', 'Age')
('University D', 'Grades')
('University A', 'Ethnicities')
('University C', 'Grades')
('University A', 'Grades')
('University C', 'Ethnicities')
('Grades', 'University A')
-------------------


### Fresh start
If we have made a mistake while adding constraints, we can just delete all required and forbidden edges. Since the knowledge creation is an inexpensive operation, we can then repeat all the previous steps without including the undesired constraint. In our scenario, the last edge from 'Grades' to 'University A' was incorrectly specified as forbidden.

In [8]:
edge_creator.forget_edges()
edge_creator.show_edges()

-------------------
Required edges:
-------------------
Forbidden edges:
-------------------


Unlike in a real application, for this notebook we forget all edges after each step, in order to be able to see the effect of our last operation more clearly.

### Forbid edges within groups
We can create additional constraints if variables in one group cannot influence each other. In our case, the universities are not allowed to communicate during the application process.

In [9]:
edge_creator.forbid_edges_within_group(universities)
edge_creator.show_edges()
edge_creator.forget_edges()

-------------------
Required edges:
-------------------
Forbidden edges:
('University D', 'University A')
('University B', 'University B')
('University D', 'University D')
('University A', 'University D')
('University A', 'University A')
('University C', 'University C')
('University A', 'University B')
('University D', 'University C')
('University B', 'University A')
('University D', 'University B')
('University C', 'University D')
('University B', 'University D')
('University A', 'University C')
('University C', 'University A')
('University C', 'University B')
('University B', 'University C')
-------------------


### Forbid edges between groups
If there are two groups such that no variable in one group directly influences a variable in the other group, we can use this information to generate a list of forbidden edges. In our case, the first group consists only of the students' grades and the second group consists of their age and ethnicity. We tell the ```EdgeCreator``` to forbid all incoming edges from the second group into the first group. 

In [10]:
edge_creator.forbid_edges_from_groups({'Grades'}, incoming={'Age', 'Ethnicity'})
edge_creator.show_edges()
edge_creator.forget_edges()

-------------------
Required edges:
-------------------
Forbidden edges:
('Ethnicity', 'Grades')
('Age', 'Grades')
-------------------


Since there is no inherent order of the two groups, we can prescribe the same forbidden edges by switching the role of the groups and using the ```outgoing``` instead of the ```incoming``` argument. We see that the result is unchanged.

In [11]:
edge_creator.forbid_edges_from_groups({'Age', 'Ethnicity'}, outgoing={'Grades'})
edge_creator.show_edges()
edge_creator.forget_edges()

-------------------
Required edges:
-------------------
Forbidden edges:
('Ethnicity', 'Grades')
('Age', 'Grades')
-------------------


It is also possible to use both arguments at the same time when a group of variables cannot be influenced by a second group and at the same time cannot influence a third group.

### Exceptions in group constraints
Sometimes, a set of variables behaves quite similar (e.g. our four universities), but not identically (University D has different application guidelines). These cases can be handled by the ```exceptions``` argument where we can pass a set of constraints that should not be created even though the group rules would demand it.

In [12]:
edge_creator.forbid_edges_from_groups(universities, incoming={'Age', 'Ethnicity'}, exceptions={('Age', 'University D')})
edge_creator.show_edges()
edge_creator.forget_edges()

-------------------
Required edges:
-------------------
Forbidden edges:
('Ethnicity', 'University C')
('Age', 'University B')
('Age', 'University C')
('Ethnicity', 'University B')
('Ethnicity', 'University A')
('Ethnicity', 'University D')
('Age', 'University A')
-------------------


### Require edges
Requiring edges follows exactly the same logic as forbidding them. The only difference is that required edges must be in the causal graph whereas forbidden edges must not. We demonstrate this by requiring the edges from the students' grades to their application results, again with the exception of University D.

In [13]:
edge_creator.require_edges_from_groups(universities, incoming={'Grades'}, exceptions={('Grades', 'University D')})
edge_creator.show_edges()
edge_creator.forget_edges()

-------------------
Required edges:
('Grades', 'University C')
('Grades', 'University A')
('Grades', 'University B')
-------------------
Forbidden edges:
-------------------


The last remaining of our verbally formulated constraints is that a student's age determines their chances of being accepted at University D. Since this is only a single edge, we add it without any fancy methods.

In [14]:
edge_creator.require_edge('Age', 'University D')
edge_creator.show_edges()
edge_creator.forget_edges()

-------------------
Required edges:
('Age', 'University D')
-------------------
Forbidden edges:
-------------------


### Combining all previous steps
In order to show how shortly and concisely we can communicate prior knowledge in the form of constraints on the causal graph, we repeat all previous steps without the clutter added by didactic method calls and explanations.

In [15]:
edge_creator = knowledge.EdgeCreator()

universities = {'University A', 'University B', 'University C', 'University D'}
student_attributes = {'Age', 'Ethnicities', 'Grades'}

temporal_order = [student_attributes, universities]
edge_creator.forbid_edges_from_temporal(temporal_order)
edge_creator.forbid_edges_within_group(universities)
edge_creator.forbid_edges_from_groups(universities, incoming={'Age', 'Ethnicity'}, exceptions={('Age', 'University D')})
edge_creator.forbid_edges_from_groups({'Grades'}, incoming={'Age', 'Ethnicity'})
edge_creator.require_edges_from_groups(universities, incoming={'Grades'}, exceptions={('Grades', 'University D')})
edge_creator.require_edge('Age', 'University D')
edge_creator.show_edges()

-------------------
Required edges:
('Grades', 'University C')
('Age', 'University D')
('Grades', 'University A')
('Grades', 'University B')
-------------------
Forbidden edges:
('University D', 'University A')
('University B', 'University B')
('University D', 'University D')
('Age', 'University A')
('University A', 'University D')
('Ethnicity', 'Grades')
('University A', 'University A')
('University B', 'Grades')
('University C', 'University C')
('Age', 'Grades')
('University B', 'Age')
('Age', 'University B')
('University A', 'University B')
('Ethnicity', 'University B')
('University D', 'University C')
('University C', 'Ethnicities')
('University B', 'University A')
('University D', 'University B')
('University C', 'University D')
('University C', 'Age')
('University B', 'University D')
('University B', 'Ethnicities')
('University A', 'University C')
('University C', 'University A')
('Ethnicity', 'University C')
('Age', 'University C')
('University A', 'Age')
('University D', 'Age')

Even for this small application scenario, we see that the number of required and forbidden edges can quickly grow large. If we think about how large the set of all possible combinations of edges in the causal graph is, it becomes clear that eliminating most of the possible edges before starting a structure learning algorithm is an absolute necessity for complex scenarios. It makes no sense to let the structure learning algorithm check an enormous search space for the correct causal graph, when we can already drastically reduce the search space using common sense.

## Passing the created knowledge to a structure learning algorithm
```Cause2e``` uses the ```discovery.StructureLearner``` class to call structure learning algorithms. If we have created a ```StructureLearner``` called ```learner```, we only have to call ```learner.set_knowledge(edge_creator=edge_creator)``` and the rest is taken care of internally.