
<link rel="stylesheet" href="https://unpkg.com/thebe@latest/lib/index.css">
<script src="https://unpkg.com/thebe@latest/lib/index.js"></script>

<script type="text/javascript">
  document.addEventListener("DOMContentLoaded", function() {
    thebelab.bootstrap({
      requestKernel: true,
      binderOptions: {
        repo: "your-repo/your-project",
        ref: "main",
      },
      codeMirrorConfig: {
        theme: "abcdef",
      },
    });
  });
</script>


# **Data Readiness For AI Tabular Checklist - Part 3**

 * Creator(s) John Pill
 * Affiliation: UK Met Office
 * History: 1.0
 * Last update: 27 August 2024.


---

## **Tutorial Material**

* **Run this Jupyter notebook locally using Jupyter Lab**
* **Select 'Run All Cells' from the 'Run' menu to generate the checklist**.
* **Remember to save your notebook regularly as you work through it to prevent loosing your answers.**


## **Data section, optional**
Scripts for pulling the data into the notebook assuming

---

## **Setup Notebook**

In [None]:
import ipywidgets as widgets
from IPython.display import display, clear_output
import json
import sys
import os
sys.path.append(os.path.abspath('..')) # Add the parent directory to the system path
from aidatareadiness import utils
from aidatareadiness.utils import WIDGET_WIDTH, DESCRIPTION_STYLE, PLACEHOLDER  
from aidatareadiness.checklist_auto import tabular

In [None]:
# Load checklist from JSON file:
checklist = utils.load_checklist()

#### Reset stored answers to start again:

In [None]:
# Reset all checklist answers back to original blank answers for all sections.
# Any completed information will be lost. 

# To reset the stored answers uncomment and run these lines of code below. Re-comment the lines afterwards to avoid them running again. 
# utils.reset_checklist()
# checklist = utils.load_checklist()

# You can then re-run each section to reload it on the reset data. 

## **Load Data**

In [None]:
# Loading data isn't necessary for this workbook as all questions can not be directly answered by the analysing the file. 
# The answer to these questions will mostly come from reading the dataset documentation and metadata.

# Replace add_your_file_path_here with the path to your data file (csv, txt etc.). 
file_path = "add_your_file_path_here.csv"

# Call the read_file helper function - uncomment the lines below after replacing your file path above. 

# df = tabular.read_file(file_path)
# df

In [None]:

print("Dataset:", checklist["GeneralInformation"]["DatasetName"])
print("Dataset link:", checklist["GeneralInformation"]["DatasetLink"])
print("Assessor:", checklist["GeneralInformation"]["AssessorName"])
print("Assessor email:", checklist["GeneralInformation"]["AssessorEmailAddress"])

---

## **3. Data Documentation**

### Community standard or convention


In [None]:

metadata_standard = widgets.Combobox(
            value = checklist['DataDocumentation']['MetadataStandard'],
            options=['Yes', 'No', 'N/A'],
            description='3.1 Does the dataset metadata follow a community/domain standard or convention?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

metadata_standard_detail = widgets.Text(
            value = checklist['DataDocumentation']['MetadataStandardName'],
            description="Metadata standard:",
            placeholder='Which standard is it? (CF, TBD, etc.)',
            layout=widgets.Layout(display="none", width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

metadata_machine_readable = widgets.Combobox(
            value = checklist['DataDocumentation']['MetadataMachineReadable'],
            options=['Yes', 'No', 'N/A'],
            description='Is the dataset metadata machine-readable?',
            placeholder="Click to select option",
            layout = widgets.Layout(display="none", width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

metadata_spatial_temporal = widgets.Combobox(
            value = checklist['DataDocumentation']['MetadataSpatialTemporalExtent'],
            options=['Yes', 'No', 'N/A'],
            description='Does it include details on the spatial and temporal extent?',
            placeholder="Click to select option",
            layout = widgets.Layout(display="none", width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )


# Function to change the display setting of the following UI components. 
def on_click_handler(change):    

    # Show / hide main trunk of questions. 
    if metadata_standard.value == "Yes":
        metadata_standard_detail.layout.display = ''
        metadata_machine_readable.layout.display = ''
        metadata_spatial_temporal.layout.display = ''
    else: 
        metadata_standard_detail.layout.display = 'none'
        metadata_machine_readable.layout.display = 'none'
        metadata_spatial_temporal.layout.display = 'none'
        metadata_standard_detail.value = ''
        metadata_machine_readable.value = 'N/A'
        metadata_spatial_temporal.value = 'N/A'
        

display(metadata_standard, metadata_standard_detail, metadata_machine_readable, metadata_spatial_temporal)

# Observe UI components for changes and call the on_click_handler function if value property changed. 
metadata_standard.observe(on_click_handler, names="value")

### Data dictionary

In [None]:

data_dictionary = widgets.Combobox(
            value=checklist['DataDocumentation']['DataDictionaryExists'],
            options=['Yes', 'No', 'N/A'],
            description='3.2 Is there a comprehensive data dictionary/codebook that describes what each element of the dataset means? parameters?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

data_dictionary_standardized = widgets.Combobox(
            value=checklist['DataDocumentation']['DataDictionaryStandardised'],
            options=['Yes', 'No', 'N/A'],
            description='Is the data dictionary standardized?',
            placeholder="Click to select option",
            layout=widgets.Layout(display="none", width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

data_dictionary_machine_readable = widgets.Combobox(
            value=checklist['DataDocumentation']['DataDictionaryMachineReadable'],
            options=['Yes', 'No', 'N/A'],
            description='Is the data dictionary machine-readable?',
            placeholder="Click to select option",
            layout=widgets.Layout(display="none", width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

parameters_defined_standard = widgets.Combobox(
            value=checklist['DataDocumentation']['DataDictionaryParametersFollowStandard'],
            options=['Yes', 'No', 'N/A'],
            description='Do the parameters follow a defined standard?',
            placeholder="Click to select option",
            layout=widgets.Layout(display="none", width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

parameters_defined_standard_detail = widgets.Text(
            value=checklist['DataDocumentation']['DataDictionaryStandardName'],
            description = 'Parameter standards:',
            placeholder='If the parameters follow a defined standard, which standard it is?',
            layout=widgets.Layout(display="none", width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

parameters_common_vocabulary = widgets.Combobox(
            value=checklist['DataDocumentation']['DataDictionaryParametersCrosswalked'],
            options=['Yes', 'No', 'N/A'],
            description='Are parameters crosswalked in an ontology or common vocabulary (e.g. NIEM)?',
            placeholder="Click to select option",
            layout=widgets.Layout(display="none", width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

# Function to change the display setting of the following UI components. 
def on_click_handler(change):    

    # Show / hide main trunk of questions. 
    if data_dictionary.value == "Yes":
        data_dictionary_standardized.layout.display = ''
        data_dictionary_machine_readable.layout.display = ''
        parameters_defined_standard.layout.display = ''
        parameters_defined_standard_detail.layout.display = ''
        parameters_common_vocabulary.layout.display = ''

    else:   
        data_dictionary_standardized.layout.display = 'none'
        data_dictionary_machine_readable.layout.display = 'none'
        parameters_defined_standard.layout.display = 'none'
        parameters_defined_standard_detail.layout.display = 'none'
        parameters_common_vocabulary.layout.display = 'none'
        data_dictionary_standardized.value = 'N/A'
        data_dictionary_machine_readable.value = 'N/A'
        parameters_defined_standard.value = 'N/A'
        parameters_defined_standard_detail.value = ''
        parameters_common_vocabulary.value = 'N/A'

            
# Display the UI components
display(data_dictionary, data_dictionary_standardized, data_dictionary_machine_readable, parameters_defined_standard, parameters_defined_standard_detail, parameters_common_vocabulary)

# Observe UI components for changes and call the on_click_handler function if value property changed. 
data_dictionary.observe(on_click_handler, names="value")



### Unique persistent identifier

3. Does the dataset have a unique persistent identifier, e.g. DOI? Yes, [supply identifier] / No / Not applicable


In [None]:

unique_persistent_identifier = widgets.Combobox(
    value=checklist['DataDocumentation']['IdentifierExists'],
            options=['Yes', 'No', 'N/A'],
            description='3.3 Does the dataset have a unique persistent identifier, e.g. DOI?',
            placeholder=PLACEHOLDER,
            layout=widgets.Layout(width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

unique_persistent_identifier_link = widgets.Text(
            value=checklist['DataDocumentation']['Identifier'],
            description = "Identifier",
            placeholder=PLACEHOLDER,
            layout=widgets.Layout(width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

display(unique_persistent_identifier, unique_persistent_identifier_link)

### Contact information and feedback

In [None]:

contact_info_available = widgets.Combobox(
            value=checklist['DataDocumentation']['ContactInformation'],
            options=['Yes', 'No', 'N/A'],
            description='3.4 Is there contact information for subject-matter experts?',
            placeholder=PLACEHOLDER,
            layout=widgets.Layout(width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

feedback_mechanism_available = widgets.Combobox(
            value=checklist['DataDocumentation']['UserFeedbackMechanism'],
            options=['Yes', 'No', 'N/A'],
            description='3.5 Is there a mechanism for user feedback and suggestions?',
            placeholder=PLACEHOLDER,
            layout=widgets.Layout(width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

display(contact_info_available, feedback_mechanism_available)


### Examples codes / notebooks / toolkits


In [None]:

example_code_available = widgets.Combobox(
            value=checklist['DataDocumentation']['ExampleCodesAvailable'],
            options=['Yes', 'No', 'N/A'],
            description='3.6 Are there example codes / notebooks / toolkits available showing how the data can be used?',
            placeholder=PLACEHOLDER,
            layout=widgets.Layout(width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

display(example_code_available)

### Licenses

In [None]:

dataset_licence = widgets.Text(
            value=checklist['DataDocumentation']['LicenseType'],
            description='3.7 What is the license for the data?',
            placeholder="Type your answer",
            layout=widgets.Layout(width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

dataset_licence_machine_readable = widgets.Combobox(
            value=checklist['DataDocumentation']['LicenceMachineReadable'],
            options=['Yes', 'No', 'N/A'],
            description='Is the license standardized and machine-readable (e.g. Creative Commons)?',
            placeholder=PLACEHOLDER,
            layout=widgets.Layout(width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

display(dataset_licence, dataset_licence_machine_readable)

### Dataset useage

In [None]:

ai_ml_existing_useage_links = widgets.Textarea(
            value=checklist['DataDocumentation']['UsedInAIorMLReports'],
            description='3.8 Has this dataset already been used in AI or ML activities? Link to publications/reports',
            layout=widgets.Layout(width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

usage_recomendations = widgets.Combobox(
            value=checklist['DataDocumentation']['UsageRecommendations'],
            options=['Yes', 'No', 'N/A'],
            description='3.9 Are there recommendations on the intended use of the data, and uses that are not recommended?',
            placeholder=PLACEHOLDER,
            layout=widgets.Layout(width=WIDGET_WIDTH),
            style = DESCRIPTION_STYLE
            )

display(ai_ml_existing_useage_links, usage_recomendations)

In [None]:

# Save button
save_button = widgets.Button(description="Save Data Documentation Answers to json file",  button_style="primary",  layout=widgets.Layout(flex='1 1 auto', width='auto'))

def generate_updates_documentation():

    updates = {
        "DataDocumentation": {
            #Metadata
            "MetadataStandard": metadata_standard.value,
            "MetadataStandardName": metadata_standard_detail.value,
            "MetadataMachineReadable": metadata_machine_readable.value,
            "MetadataSpatialTemporalExtent": metadata_spatial_temporal.value,
            
            # Data Dictionary
            "DataDictionaryExists": data_dictionary.value,            
            "DataDictionaryStandardised" : data_dictionary_standardized.value,
            "DataDictionaryMachineReadable" : data_dictionary_machine_readable.value,
            "DataDictionaryParametersFollowStandard" : parameters_defined_standard.value,
            "DataDictionaryStandardName" : parameters_defined_standard_detail.value,
            "DataDictionaryParametersCrosswalked" : parameters_common_vocabulary.value,
            
            # Identifier
            "IdentifierExists" : unique_persistent_identifier.value,
            "Identifier" : unique_persistent_identifier_link.value,
            
            # Contact and feedback
            "ContactInformation" : contact_info_available.value,
            "UserFeedbackMechanism" : feedback_mechanism_available.value,

            #Misc
            "ExampleCodesAvailable" : example_code_available.value,
            "LicenseType" : dataset_licence.value,
            "LicenceMachineReadable" : dataset_licence_machine_readable.value,
            "UsedInAIorMLReports" : ai_ml_existing_useage_links.value,
            "UsageRecommendations" : usage_recomendations.value,
        }
    }
    return updates

save_button.on_click(lambda b: utils.update_checklist(b, generate_updates_documentation()))

display(save_button)

## Finished

1. Make sure you saved your answers to the external json file using the buttons above. 
2. If you would like to view these saved answers use the button below. 
3. Move onto the notebook Template_Checklist_Part_4.ipynb covering Data Access

In [None]:

button_print_json = widgets.Button(description="Print json results",  button_style='info', layout=widgets.Layout(flex='1 1 auto', width='auto'))
output = widgets.Output()

display(button_print_json, output)

def print_json_info(b):
    """
    Loads a copy of the json file to checklist variable. 
    Then prints the json file contents to Jupyter notebook cell output.

    Arguments: b - represents the button calling the function. 
    """
    checklist = utils.load_checklist()
    with output:
        clear_output()
        for key, value in checklist.items():
            print(f"{key}:")
            if isinstance(value, dict):
                for sub_key, sub_value in value.items():
                    print(f"  {sub_key}: {sub_value}")
            else:
                print(f"  {value}")

button_print_json.on_click(print_json_info)


---

## **Appendix** - Definition of terms used in the checklist.

### Documentation
* **Dataset Metadata**: complete information about the dataset: quality, provenance, location, time period, responsible parties, purpose, etc.
* **Data Dictionary/Codebook**: complete information about the individual variables / measures / parameters within a dataset: type, units, null value, etc.
* **Identifier**: a code or number that uniquely identifies a dataset
* **Ontology**: formalized definitions of concepts within a domain of knowledge, and the nature of the inter-relationships among those concepts