In [None]:
#default_exp model_documentation
#hide
%load_ext autoreload
%autoreload 2

# Model documentation
> Functions to jumpstat and facilitate model documentation

In [None]:
#hide
from nbdev.showdoc import show_doc

Each user has a specific documentation need, ranging from the simply logging the model training to a more complex description of the model pipeline with a discusson of its predictions. `gingado` addresses this variety of needs by offering a class of objects, "Documenters", that facilitate model documentation in a generic way, as well as one specific model documentation type as described below. 

The model documentation is performed by Documenters, objects that subclass from the base class `ggdModelDocumentation`. This base class offers code that can be used by any Documenter to read the pipeline in question and to save the resulting documentation in a JSON format. One current area of development is the automatic filing of some fields related to the model. The objective is to automatise documentation of the information that can be fetched automatically from the model, leaving time for the analyst to concentrate on other tasks, such as considering the ethical implications of the machine learning model being trained.

Documenters save the underlying information using the JSON format. With the JSON documentation file at hand, the user can then use existing third-party libraries to transform the information stored in JSON into a variety of formats (eg, HTML, PDF) as needed.

`ModelCard` - the model documentation template inspired by the work of [Mitchell et al, 2019](https://dl.acm.org/doi/abs/10.1145/3287560.3287596?casa_token=3JORxBYy_DQAAAAA:0RsTpg5NsCX8B2lEwMg81rCxHiQlkZIuP1rPjAmOOF1fP0NTi3Vv3-WT75gwQm6bysUYxdXLkgqUuA) already comes with `gingado`. Its template can be used by users as is, or tweaked according to each need. The `ModelCard` template can also serve as inspiration for any custom documentation needs. Users with documentation needs beyond the out-of-the-box solutions provided by `gingado` can simply create their own class of Documenters, and compatibility with these custom documentation routines with the rest of the code is ensured. Users are encouraged to submit a pull request with their own documentation models subclassing `ggdModelDocumentation` if these custom templates can also benefit other users.

In [None]:
#hide
#export
import copy
import json

class ggdModelDocumentation:
    "Base class for gingado Documenters"

    def setup_template(self):
        self.json_doc = copy.deepcopy(self.__class__.template)
        for k in self.json_doc.keys():
            self.json_doc[k].pop('field_description', "")

    def show_template(self, indent=True):
        if indent:
            print(json.dumps(self.__class__.template, indent=self.indent_level))
        else:
            return self.__class__.template
        
    def documentation_path(self):
        print(self.file_path)

    def show_json(self):
        print(json.dumps(self.json_doc, indent=self.indent_level))

    def save_json(self, file_path):
        with open(file_path, 'w') as f:
            json.dump(self.json_doc, f)

    def read_json(self, file_path=None):
        if file_path is None:
            file_path = self.file_path
        f = open(file_path)
        self.json_doc = json.load(f)

    def open_questions(self):
        return [
                    k + "__" + v 
                    for k, v in self.json_doc.items()
                    for v, i in v.items()
                    if i == self.__class__.template[k][v]
        ]

    def fill_info(self, new_info):
        for k, v in new_info.items():
            if k not in self.__class__.template:
                field_keys = list(self.__class__.template.keys())
                raise KeyError(f"key '{k}' is not in the documentation template. The template's keys are: {field_keys}")
            if isinstance(v, dict):
                for v_k, v_v in v.items():
                    if v_k == 'field_description':
                        raise KeyError("The key 'field_description' is not supposed to be changed from the template definition.")
                    if v_k not in self.__class__.template[k]:
                        field_keys = [k for k in self.__class__.template[k].keys() if k != 'field_description']
                        raise KeyError(
                            f"key '{v_k}' is not in the documentation template's item {k}. These template item's keys are: {field_keys}")
                    self.json_doc[k][v_k] = v_v
            else:
                self.json_doc[k] = v

    def __setitem__(self, key, value):
        setattr(self, key, value)

    def __getitem__(self, key):
        return getattr(self, key)

    def __str__(self):
        return json.dumps(self.json_doc, indent=4)

    def __repr__(self):
        return f"{self.__class__}()"

In [None]:
show_doc(ggdModelDocumentation)

<h2 id="ggdModelDocumentation" class="doc_header"><code>class</code> <code>ggdModelDocumentation</code><a href="" class="source_link" style="float:right">[source]</a></h2>

> <code>ggdModelDocumentation</code>()

Base class for gingado Documenters

In [None]:
#hide
#export
from gingado.utils import get_username, get_datetime

In [None]:
#hide
#export
class ModelCard(ggdModelDocumentation):
    template = {
        'model_details': {
            'field_description': "Basic information about the model",
            'developer': "Person or organisation developing the model",
            'datetime': "Model date",
            'version': "Model version",
            'type': "Model type",
            'info': "Information about training algorithms, parameters, fairness constraints or other applied approaches, and features",
            'paper': "Paper or other resource for more information",
            'citation': "Citation details",
            'license': "License",
            'contact': "Where to send questions or comments about the model"
        },
        'intended_use': {
            'field_description': "Use cases that were envisioned during development",
            'primary_uses': "Primary intended uses",
            'primary_users': "Primary intended users",
            'out_of_scope': "Out-of-scope use cases"
        },
        'factors': {
            'field_description': "Factors could include demographic or phenotypic groups, environmental conditions, technical attributes, or others",
            'relevant': "Relevant factors",
            'evaluation': "Evaluation factors" 
        },
        'metrics': {
            'field_description': "Metrics should be chosen to reflect potential real world impacts of the model",
            'performance_measures': "Model performance measures",
            'thresholds': "Decision thresholds",
            'variation_approaches': "Variation approaches"
        },
        'evaluation_data': {
            'field_description': "Details on the dataset(s) used for the quantitative analyses in the documentation",
            'datasets': "Datasets",
            'motivation': "Motivation",
            'preprocessing': "Preprocessing"
        },
        'training_data': {
            'field_description': """
            May not be possible to provide in practice. When possible, this section should mirror 'Evaluation Data'. 
            If such detail is not possible, minimal allowable information should be provided here, 
            such as details of the distribution over various factors in the training datasets.""",
            'training_data': "Information on training data"
        },
        'quant_analyses': {
            'field_description': "Quantitative Analyses",
            'unitary': "Unitary results",
            'intersectional': "Intersectional results"
        },
        'ethical_considerations': {
            'field_description': """
            Ethical considerations that went into model development, surfacing ethical challenges and 
            solutions to stakeholders. Ethical analysis does not always lead to precise solutions, but the process 
            of ethical contemplation is worthwhile to inform on responsible practices and next steps in future work.""",
            'sensitive_data': "Does the model use any sensitive data (e.g., protected classes)?",
            'human_life': "Is the model intended to inform decisions about mat- ters central to human life or flourishing - e.g., health or safety? Or could it be used in such a way?",
            'mitigations': "What risk mitigation strategies were used during model development?",
            'risks_and_harms': """
            What risks may be present in model usage? Try to identify the potential recipients, likelihood, and magnitude of harms. 
            If these cannot be determined, note that they were consid- ered but remain unknown""",
            'use_cases': "Are there any known model use cases that are especially fraught?",
            'additional_information': """
            If possible, this section should also include any additional ethical considerations that went into model development, 
            for example, review by an external board, or testing with a specific community."""
        },
        'caveats_recommendations': {
            'field_description': "Additional concerns that were not covered in the previous sections",
            'caveats': "For example, did the results suggest any further testing? Were there any relevant groups that were not represented in the evaluation dataset?",
            'recommendations': "Are there additional recommendations for model use? What are the ideal characteristics of an evaluation dataset for this model?"
        }
    }

    def __init__(self, file_path="", autofill=True, indent_level=2):
        self.file_path = file_path
        self.autofill = autofill
        self.indent_level = indent_level
        self.setup_template()
        if self.autofill:
            self.autofill_template()            

    def autofill_template(self):
        """Creates an empty model card template, then fills it with information that is automatically obtained from the system"""
        auto_info = {
            'model_details': {
                'developer': get_username(),
                'datetime': get_datetime()
            }
        }
        self.fill_info(auto_info)

In [None]:
show_doc(ModelCard)

<h2 id="ModelCard" class="doc_header"><code>class</code> <code>ModelCard</code><a href="" class="source_link" style="float:right">[source]</a></h2>

> <code>ModelCard</code>(**`file_path`**=*`''`*, **`autofill`**=*`True`*, **`indent_level`**=*`2`*) :: [`ggdModelDocumentation`](/gingado/documentation.html#ggdModelDocumentation)

Base class for gingado Documenters

After a Documenter object, such as `ModelCard` is instanciated, the user can see the underlying template with the module `show_template`, as below:

In [None]:
model_doc = ModelCard(autofill=True)
model_doc.show_template()

assert model_doc.show_template(indent=False) == ModelCard.template

{
  "model_details": {
    "field_description": "Basic information about the model",
    "developer": "Person or organisation developing the model",
    "datetime": "Model date",
    "version": "Model version",
    "type": "Model type",
    "info": "Information about training algorithms, parameters, fairness constraints or other applied approaches, and features",
    "paper": "Paper or other resource for more information",
    "citation": "Citation details",
    "license": "License",
    "contact": "Where to send questions or comments about the model"
  },
  "intended_use": {
    "field_description": "Use cases that were envisioned during development",
    "primary_uses": "Primary intended uses",
    "primary_users": "Primary intended users",
    "out_of_scope": "Out-of-scope use cases"
  },
  "factors": {
    "field_description": "Factors could include demographic or phenotypic groups, environmental conditions, technical attributes, or others",
    "relevant": "Relevant factors",
    

The template should be protected from editing once a Documenter has been created. This way, even if a user unwarrantedly changes the template, this does not interfere with the Documenter functionality.

In [None]:
model_doc.template = None
model_doc.show_template()

assert model_doc.show_template(indent=False) == ModelCard.template

{
  "model_details": {
    "field_description": "Basic information about the model",
    "developer": "Person or organisation developing the model",
    "datetime": "Model date",
    "version": "Model version",
    "type": "Model type",
    "info": "Information about training algorithms, parameters, fairness constraints or other applied approaches, and features",
    "paper": "Paper or other resource for more information",
    "citation": "Citation details",
    "license": "License",
    "contact": "Where to send questions or comments about the model"
  },
  "intended_use": {
    "field_description": "Use cases that were envisioned during development",
    "primary_uses": "Primary intended uses",
    "primary_users": "Primary intended users",
    "out_of_scope": "Out-of-scope use cases"
  },
  "factors": {
    "field_description": "Factors could include demographic or phenotypic groups, environmental conditions, technical attributes, or others",
    "relevant": "Relevant factors",
    

The template serves to provide specific instances of the Documenter object with a form-like structure, indicating which fields are open and thus require some answers or information. Consequently, the template should also not change when the actual document object changes:

In [None]:
model_doc.fill_info({'metrics': ['test']})
print([model_doc.json_doc['metrics'], ModelCard.template['metrics']])

assert model_doc.show_template(indent=False) == ModelCard.template

[['test'], {'field_description': 'Metrics should be chosen to reflect potential real world impacts of the model', 'performance_measures': 'Model performance measures', 'thresholds': 'Decision thresholds', 'variation_approaches': 'Variation approaches'}]


The method `show_template` prints the Documenter's documentation template:

In [None]:
model_doc.show_template()

Users can find which fields in their templates are still without response by using the module `open_questions`. The levels of the template are reflected in the resulting dictionary: double underscores separate levels in the underlying JSON file.

In [None]:
model_doc.open_questions()

['model_details__developer',
 'model_details__datetime',
 'model_details__version',
 'model_details__type',
 'model_details__info',
 'model_details__paper',
 'model_details__citation',
 'model_details__license',
 'model_details__contact',
 'intended_use__primary_uses',
 'intended_use__primary_users',
 'intended_use__out_of_scope',
 'factors__relevant',
 'factors__evaluation',
 'metrics__performance_measures',
 'metrics__thresholds',
 'metrics__variation_approaches',
 'evaluation_data__datasets',
 'evaluation_data__motivation',
 'evaluation_data__preprocessing',
 'training_data__training_data',
 'quant_analyses__unitary',
 'quant_analyses__intersectional',
 'ethical_considerations__sensitive_data',
 'ethical_considerations__human_life',
 'ethical_considerations__mitigations',
 'ethical_considerations__risks_and_harms',
 'ethical_considerations__use_cases',
 'ethical_considerations__additional_information',
 'caveats_recommendations__caveats',
 'caveats_recommendations__recommendations']

If the user wants to fill in an empty field such as the ones identified above by the method `open_questions`, the user simply needs to pass to the module `fill_info` a dictionary with the corresponding information. Depending on the template, the dictionary may be nested. 

> Note: it is technically possible to attribute the element directly to the attribute `json_doc`, but this should be avoided in favour of using the method `fill_info`. The latter tests whether the new information is valid according to the documentation template and also enables the filling of more than one question at the same time. Importantly, attributing information directly to `json_doc` is not logged, and may unwarrantedly create new entries that are not part of the template (eg, if a new dictionary key is created due to typos).

In [None]:
new_info = {
    'metrics': {'performance_measures': "This is a test"},
    'caveats_recommendations': {'caveats': "This is another test"}
    }
model_doc.fill_info(new_info)

# technically possible but not recommended:


And now we can check that the corresponding entry is part of the documentation, and thus no longer shown as an open question:

In [None]:
assert model_doc.json_doc['caveats_recommendations']['caveats'] == "This is another test"
model_doc.open_questions()

In [None]:
new_info
new_info2 = dict(new_info)
new_info2['metrics'] = None
new_info

## Creating a custom Documenter

`gingado` users can easily transform their model documentation needs into a Documenter object. The main advantages of doing this are: 
* the documentation template becomes a "recyclable" object that can be saved, loaded, and used in other models or code routines; and
* model documentation can be more closely aligned with model creation and training, thus decreasing the probability that the model and its documentation diverge during the process of model development.

The requirements for an object to be a `gingado` Documenter are:
* it must subclass `ggdModelDocumentation` (or implement all its methods if the user does not want to keep a dependency to `gingado`),
* include the actual template for the documentation as a dictionary in a class attribute called `template`,
* follow the `scikit-learn` convention of storing the `__init__` parameters in attributes with the same name,
* implement the `autofill_template` method using the `fill_info` method to set the automatically filled information fields.