
<link rel="stylesheet" href="https://unpkg.com/thebe@latest/lib/index.css">
<script src="https://unpkg.com/thebe@latest/lib/index.js"></script>

<script type="text/javascript">
  document.addEventListener("DOMContentLoaded", function() {
    thebelab.bootstrap({
      requestKernel: true,
      binderOptions: {
        repo: "your-repo/your-project",
        ref: "main",
      },
      codeMirrorConfig: {
        theme: "abcdef",
      },
    });
  });
</script>


# **Data Readiness For AI Checklist - Part 1 & 2**

 * Creator(s) John Pill
 * Affiliation: UK Met Office
 * History: 1.0
 * Last update: 27 August 2024.


---

## **Overview**
The checklist is developed using the 2019 draft readiness matrix developed by the Office of Science and Technology Policy Subcommittee on Open Science as a basis. The checklist has been improved based on further research and user feedback. Definitions for some concepts are listed at the end of this document. This checklist is developed through a collaboration of ESIP Data Readiness Cluster members include representatives from NOAA, NASA, USGS, and other organizations. The checklist will be updated periodically to reflect community feedback.

ESIP Data Readiness Cluster (2023): Checklist to Examine AI-readiness for Open Environmental Datasets v.1.0. ESIP. Online resource. https://doi.org/10.6084/m9.figshare.19983722.v1

Readiness Matrix (2020): What is AI-Ready Open Data? NOAA. Online resource. https://www.star.nesdis.noaa.gov/star/documents/meetings/2020AI/presentations/202010/20201022_Christensen.pdf

### Prerequisites
Ideally for AI-ready assessment, a dataset should be defined as the minimum measurable bundle (i.e., a physical parameter/variable of observational datasets or model simulations). The assessment at this scale will enable better integration of data from different sources for research and development. However, it can be an intensive process for manual assessment without automation. Therefore, we recommend current assessments be done on the data file level. If the dataset has different versions, the checklist should be applied to each dataset type (e.g. raw, derived).

### Learning Outcomes
* Know how to check a range of dataset features. 
* Assess a wide range of dataset features, which will impact the dataset's 'readiness' for machine learning.  


---

# **TODO TASKS** 
**Consider what to do at the end of the checklist:**
* Add export functions for different formats (CSV, JSON, etc.)

**Change all if statement else clauses from setting default values from N/A to None**

**Check code cells marked as hidden in cell metadata stay hidden when opened fresh by a new user**

**Required for attempting to run cells programmatically but not working currently.**
* import nbformat
* from nbclient import NotebookClient

---

## **Tutorial Material**

* **Run this Jupyter notebook locally using Jupyter Lab**
* **Select 'Run All Cells' from the 'Run' menu to generate the checklist**.
* **Remember to save your notebook regularly as you work through it to prevent loosing your answers.**


## **Data section, optional**
Scripts for pulling the data into the notebook assuming

---

## **Setup Notebook**

In [1]:
import ipywidgets as widgets
from IPython.display import display, clear_output
import json
import sys
import os
sys.path.append(os.path.abspath('..')) # Add the parent directory to the system path
from utils import *

In [2]:
# Load checklist from JSON file:
checklist = load_checklist()

#### Reset stored answers to start again:

In [3]:
# Reset all checklist answers back to original blank answers for all sections.
# Any completed information will be lost. 

# To reset the stored answers uncomment and run these lines of code below. Re-comment the lines afterwards to avoid them running again. 
# reset_checklist()
# checklist = load_checklist()

# You can then re-run each section to reload it on the reset data. 

---

## **1. Dataset General Info**


### Basic details

In [4]:

dataset_name = widgets.Text(
    description='1.1 Dataset name:',
    value=checklist["GeneralInformation"]["DatasetName"],
    layout=widgets.Layout(width=widget_width),
    style = description_style
)

dataset_version = widgets.Text(
    value=checklist["GeneralInformation"]["DatasetVersion"],
    description='1.2. Dataset version:',
    layout=widgets.Layout(width=widget_width),
    style = description_style
)

dataset_link = widgets.Text(
    value=checklist["GeneralInformation"]["DatasetLink"],
    description='1.3. Location / url link:',
    layout=widgets.Layout(width=widget_width),
    style = description_style
)

dataset_assessor_name = widgets.Text(
    value=checklist["GeneralInformation"]["AssessorName"],
    description='1.4. Assessor name:',
    layout=widgets.Layout(width=widget_width),
    style = description_style
)

dataset_assessor_email = widgets.Text(
    value=checklist["GeneralInformation"]["AssessorEmailAddress"],
    description='1.5. Assessor email address:',
    layout=widgets.Layout(width=widget_width),
    style = description_style
)

# Display all UI components.
display(dataset_name, dataset_version, dataset_link, dataset_assessor_name, dataset_assessor_email)

Text(value='', description='1.1 Dataset name:', layout=Layout(width='900px'), style=TextStyle(description_widt…

Text(value='', description='1.2. Dataset version:', layout=Layout(width='900px'), style=TextStyle(description_…

Text(value='', description='1.3. Location / url link:', layout=Layout(width='900px'), style=TextStyle(descript…

Text(value='', description='1.4. Assessor name:', layout=Layout(width='900px'), style=TextStyle(description_wi…

Text(value='', description='1.5. Assessor email address:', layout=Layout(width='900px'), style=TextStyle(descr…

### Dataset details

In [5]:

label = widgets.Label(value="Select from the list, or type your own answer if appropriate.")

raw_derived = widgets.Combobox(
            value = checklist["GeneralInformation"]["RawOrDerived"],
            options=['Raw', 'Derived', 'Unknown'],
            description='6. Is this raw data or a derived/processed data product?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

observe_model_synthetic = widgets.Combobox(
            value = checklist["GeneralInformation"]["ObservedModeledSyntheticData"],
            options=['Observed', 'Modeled', 'Synthetic'],
            description='7. Is this observational data, simulation/model output, or synthetic data?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

data_sources = widgets.Combobox(
            value = checklist["GeneralInformation"]["DataSource"],
            options=['Single-source', 'Aggregated'],
            description='8. Is the data single-source or aggregated from several sources? ',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

display(label, raw_derived, observe_model_synthetic, data_sources)

Label(value='Select from the list, or type your own answer if appropriate.')

Combobox(value='', description='6. Is this raw data or a derived/processed data product?', layout=Layout(width…

Combobox(value='', description='7. Is this observational data, simulation/model output, or synthetic data?', l…

Combobox(value='', description='8. Is the data single-source or aggregated from several sources? ', layout=Lay…

In [6]:

# Save button
save_button = widgets.Button(description="Save General Information Answers to json file",  button_style="primary",  layout=widgets.Layout(flex='1 1 auto', width='auto'))

def generate_updates_general():
    updates = {
        "GeneralInformation": {
            "DatasetName": dataset_name.value,
            "DatasetVersion": dataset_version.value,
            "DatasetLink": dataset_link.value,
            "AssessorName": dataset_assessor_name.value,
            "AssessorEmailAddress": dataset_assessor_email.value,
            "RawOrDerived" : raw_derived.value,
            "ObservedModeledSyntheticData" : observe_model_synthetic.value,
            "DataSource" : data_sources.value,
        }
    }
    return updates

save_button.on_click(lambda b: update_checklist(b, generate_updates_general()))

display(save_button)

Button(button_style='primary', description='Save General Information Answers to json file', layout=Layout(flex…

---

## **2. Data Quality**

### Data timeliness    

In [7]:

data_update = widgets.Combobox(
            value=checklist['DataQuality']['WillBeUpdated'],
            options=['Yes', 'No'],
            description='2.1 Will the dataset be updated?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

data_update_frequency = widgets.Combobox(
            value=checklist["DataQuality"]["WhenNewDataAdded"],
            options=['When data updated', 'Hourly', 'Daily', 'Weekly', 'Monthly', 'Annually', 'Other', "N/A"],
            description='If the data will be updated, how often will it be updated?',
            placeholder=placeholder,
            layout=widgets.Layout(display='none', width=widget_width),
            style = description_style
            )

data_update_stages = widgets.Combobox(
            value=checklist["DataQuality"]["DifferentStages"],
            options=['Preliminary data first, then updated later', 'Full record', "N/A"],
            description='Will there be different stages of the update?',
            placeholder=placeholder,
            layout=widgets.Layout(display='none', width=widget_width),
            style = description_style
            )

data_update_delay = widgets.Text(
            value=checklist["DataQuality"]["DelayBetweenStages"],
            description='If yes, what is the delay between different stages?',
            disabled=False,
            layout=widgets.Layout(display='none', width=widget_width),
            style = description_style
            )

data_update_supersede = widgets.Combobox(
            value=checklist["DataQuality"]["SupersedeCurrentVersion"],
            options=['Yes', 'No', "N/A"],
            description='Should the new version of the dataset supersede the current version?',
            placeholder=placeholder,
            layout=widgets.Layout(display='none', width=widget_width),
            style = description_style
            )

# Function to change the display setting of the following UI components. 
def on_click_handler(change): 
    if change["new"] == "Yes":
        data_update_frequency.layout.display = ''
        data_update_stages.layout.display = ''
        data_update_delay.layout.display = ''
        data_update_supersede.layout.display = ''
    else:
        data_update_frequency.layout.display = 'none'
        data_update_stages.layout.display = 'none'
        data_update_delay.layout.display = 'none'
        data_update_supersede.layout.display = 'none'
        
        # Return the values back to default state if 1st option changed back.
        data_update_frequency.value = None
        data_update_stages.value = None
        data_update_delay.value = ""
        data_update_supersede.value = None
        

# Show UI components based on their display settings. 
display(data_update, data_update_frequency, data_update_stages, data_update_delay, data_update_supersede)

# Observe the first UI component for changes and call the on_click_handler function if value property changed. 
data_update.observe(on_click_handler, names="value")

Combobox(value='', description='2.1 Will the dataset be updated?', layout=Layout(width='900px'), options=('Yes…

Combobox(value='', description='If the data will be updated, how often will it be updated?', layout=Layout(dis…

Combobox(value='', description='Will there be different stages of the update?', layout=Layout(display='none', …

Text(value='', description='If yes, what is the delay between different stages?', layout=Layout(display='none'…

Combobox(value='', description='Should the new version of the dataset supersede the current version?', layout=…

### Data completeness

In [8]:

completeness_docs = widgets.Combobox(
            value=checklist['DataQuality']['DocumentationAvailable'],
            options=['Yes', 'No'],
            description='2.2 Is there any documentation about the completeness of the dataset?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

completeness_docs_link = widgets.Text(
            value=checklist['DataQuality']['LinkToReport'],
            description="Documentation link:",
            placeholder='Please provide a link to the document',
            disabled=False,
            layout=widgets.Layout(display='none', width=widget_width),
            style = description_style
            )

expected_spatial_coverage = widgets.Combobox(
            value=checklist['DataQuality']['SpatialCoverage'],
            options=['Complete', 'Partial', 'Unknown', 'N/A'],
            description='2.3 How complete is the dataset compared to the expected spatial coverage?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

expected_temporal_coverage = widgets.Combobox(
            value=checklist['DataQuality']['TemporalCoverage'],
            options=['Complete', 'Partial', 'Unknown', 'N/A'],
            description='2.4 How complete is the dataset compared to the expected temporal coverage?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

# Function to change the display setting of the following UI components. 
def on_click_handler(change): 
    if change["new"] == "Yes":
        completeness_docs_link.layout.display = ''
    else:
        completeness_docs_link.layout.display = 'none'
        
        # Return the values back to default state if 1st option changed back.
        completeness_docs_link.value = ""

# Show UI components based on their display settings. 
display(completeness_docs, completeness_docs_link, expected_spatial_coverage, expected_temporal_coverage)

# Observe the first UI component for changes and call the on_click_handler function if value property changed. 
completeness_docs.observe(on_click_handler, names="value")


Combobox(value='', description='2.2 Is there any documentation about the completeness of the dataset?', layout…

Text(value='', description='Documentation link:', layout=Layout(display='none', width='900px'), placeholder='P…

Combobox(value='', description='2.3 How complete is the dataset compared to the expected spatial coverage?', l…

Combobox(value='', description='2.4 How complete is the dataset compared to the expected temporal coverage?', …

### Data consistency

In [9]:

self_consistent_units = widgets.Combobox(
            value=checklist['DataQuality']['SelfConsistent'],
            options=['Yes', 'No', 'N/A'],
            description='2.5 Is this dataset self-consistent in that its units, data types, and parameter names do not change over time and space?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

consistent_units = widgets.Combobox(
            value=checklist['DataQuality']['ConsistentWithSimilarData'],
            options=['Yes', 'No', 'N/A'],
            description='2.6 Is this dataset’s units, data types, and parameter names consistent with similar data collections?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

consistent_unit_monitoring = widgets.Combobox(
            value=checklist['DataQuality']['MonitoringProcessesExist'],
            options=['Yes', 'No', 'N/A'],
            description='2.7 Are there processes to monitor for units, data types, and parameter consistency?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

consistent_unit_review = widgets.Text(
            value=checklist['DataQuality']['MonitoringProcessesDetails'],
            description = 'Review measures:',
            placeholder='If yes, what measures are taken? Manual review / Automated review etc.',
            disabled=False,
            layout=widgets.Layout(display='none', width=widget_width),
            style = description_style
            )

# Function to change the display setting of the following UI components. 
def on_click_handler(change): 
    if change["new"] == "Yes":
        consistent_unit_review.layout.display = ''
    else:
        consistent_unit_review.layout.display = 'none'
        
        # Return the values back to default state if 1st option changed back.
        consistent_unit_review.value = ""

# Show UI components based on their display settings. 
display(self_consistent_units, consistent_units, consistent_unit_monitoring, consistent_unit_review)

# Observe the first UI component for changes and call the on_click_handler function if value property changed. 
consistent_unit_monitoring.observe(on_click_handler, names="value")


Combobox(value='', description='2.5 Is this dataset self-consistent in that its units, data types, and paramet…

Combobox(value='', description='2.6 Is this dataset’s units, data types, and parameter names consistent with s…

Combobox(value='', description='2.7 Are there processes to monitor for units, data types, and parameter consis…

Text(value='', description='Review measures:', layout=Layout(display='none', width='900px'), placeholder='If y…

In [10]:

# Save button
save_button = widgets.Button(description="Save Timeliness, Completeness & Consistency Answers to json file",  button_style="primary",  layout=widgets.Layout(flex='1 1 auto', width='auto'))

def generate_updates_data_quality1():

    updates = {
        "DataQuality": {
            "WillBeUpdated": data_update.value,
            "WhenNewDataAdded": data_update_frequency.value,
            "DifferentStages": data_update_stages.value,
            "DelayBetweenStages": data_update_delay.value,
            "SupersedeCurrentVersion": data_update_supersede.value,
            "DocumentationAvailable" : completeness_docs.value,
            "LinkToReport" : completeness_docs_link.value,
            "SpatialCoverage" : expected_spatial_coverage.value,
            "TemporalCoverage" : expected_temporal_coverage.value,
            "SelfConsistent" : self_consistent_units.value,
            "ConsistentWithSimilarData" : consistent_units.value,
            "MonitoringProcessesExist" : consistent_unit_monitoring.value,
            "MonitoringProcessesDetails" : consistent_unit_review.value            
        }
    }
    return updates

save_button.on_click(lambda b: update_checklist(b, generate_updates_data_quality1()))

display(save_button)

Button(button_style='primary', description='Save Timeliness, Completeness & Consistency Answers to json file',…

### Data bias

In [11]:

dataset_bias = widgets.Combobox(
            value=checklist['DataQuality']['KnownBias'],
            options=['Yes', 'No', 'Unknown'],
            description='2.8 Is there known bias in the dataset?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

dataset_bias_measures = widgets.Combobox(
            value=checklist['DataQuality']['BiasExamined'],
            options=['Yes', 'No', 'Unknown', 'N/A'],
            description='Have measures been taken to examine bias?',
            placeholder=placeholder,
            layout=widgets.Layout(display="none", width=widget_width),
            style = description_style
            )

dataset_bias_measures_detail = widgets.Textarea(
            value=checklist['DataQuality']['BiasMeasures'],
            placeholder='If yes, what measures were used?',
            layout=widgets.Layout(display="none", width=widget_width),
            )

dataset_bias_metrological_traceable = widgets.Textarea(
            value=checklist['DataQuality']['MetrologicalTraceability'],
            placeholder='Is the bias metrological traceable?',
            layout=widgets.Layout(display="none", width=widget_width),
            )

dataset_bias_report = widgets.Combobox(
            value=checklist['DataQuality']['BiasReport'],
            options=['No known bias', 'Found and reported', 'No info available', 'N/A'],
            description='Is there reported bias in the data?',
            placeholder=placeholder,
            layout=widgets.Layout(display="none", width=widget_width),
            style = description_style
            )

dataset_bias_report_link = widgets.Text(
            value=checklist['DataQuality']['BiasReportLink'],
            placeholder='(optional) Link to the report/document on the bias',
            layout=widgets.Layout(display="none", width=widget_width)
            )

dataset_bias_corrected_link = widgets.Text(
            value=checklist['DataQuality']['BiasCorrectedDatasetLink'],
            placeholder='(optional) Link to a bias-corrected or bias-reduced version of the dataset',
            layout=widgets.Layout(display="none", width=widget_width)
            )

dataset_bias_tools_link = widgets.Text(
            value=checklist['DataQuality']['BiasReductionToolsLink'],
            placeholder='(optional) Link to tools available to reduce bias',
            layout=widgets.Layout(display="none", width=widget_width)
            )

# Function to change the display setting of the following UI components. 
def on_click_handler(change):    

    # Show / hide main trunk of questions. 
    if dataset_bias.value == "Yes":
        dataset_bias_measures.layout.display = ''
        dataset_bias_report.layout.display = ''
        dataset_bias_metrological_traceable.layout.display = ''
        dataset_bias_report_link.layout.display = ''
        dataset_bias_corrected_link.layout.display = ''
        dataset_bias_tools_link.layout.display = ''

    else:   
        dataset_bias_measures.layout.display = 'none'
        dataset_bias_report.layout.display = 'none'
        dataset_bias_report_link.layout.display = 'none'
        dataset_bias_corrected_link.layout.display = 'none'
        dataset_bias_tools_link.layout.display = 'none'
        dataset_bias_measures.value = None
        dataset_bias_report.value = None
        dataset_bias_report_link.value = ''
        dataset_bias_corrected_link.value = ''
        dataset_bias_tools_link.value = ''

    # Show / hide 2nd trunk of questions.
    if dataset_bias_measures.value == "Yes":
        dataset_bias_measures_detail.layout.display = ''
        dataset_bias_metrological_traceable.layout.display = ''
    else:
        dataset_bias_measures_detail.layout.display = 'none'
        dataset_bias_metrological_traceable.layout.display = 'none'
        dataset_bias_measures_detail.value = ''
        dataset_bias_metrological_traceable.value = ''
        
            
# Display the UI components
display(dataset_bias, dataset_bias_measures, dataset_bias_measures_detail, dataset_bias_metrological_traceable, dataset_bias_report, dataset_bias_report_link, dataset_bias_corrected_link, dataset_bias_tools_link)

# Observe UI components for changes and call the on_click_handler function if value property changed. 
dataset_bias.observe(on_click_handler, names="value")
dataset_bias_measures.observe(on_click_handler, names="value")



Combobox(value='', description='2.8 Is there known bias in the dataset?', layout=Layout(width='900px'), option…

Combobox(value='', description='Have measures been taken to examine bias?', layout=Layout(display='none', widt…

Textarea(value='', layout=Layout(display='none', width='900px'), placeholder='If yes, what measures were used?…

Textarea(value='', layout=Layout(display='none', width='900px'), placeholder='Is the bias metrological traceab…

Combobox(value='', description='Is there reported bias in the data?', layout=Layout(display='none', width='900…

Text(value='', layout=Layout(display='none', width='900px'), placeholder='(optional) Link to the report/docume…

Text(value='', layout=Layout(display='none', width='900px'), placeholder='(optional) Link to a bias-corrected …

Text(value='', layout=Layout(display='none', width='900px'), placeholder='(optional) Link to tools available t…

### Data integrity

In [12]:

data_resolution_info = widgets.Combobox(
            value=checklist['DataQuality']['QuantitativeResolutionInfo'],
            options=['Yes', 'No', 'N/A'],
            description='2.9 Is there quantitative information about data resolution in space and time?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

data_quality_report = widgets.Combobox(
            value=checklist['DataQuality']['PublishedQualityProcedures'],
            options=['Yes', 'No', 'N/A'],
            description='2.10 Are there published data quality procedures or reports?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

data_quality_report_link = widgets.Text(
            value=checklist['DataQuality']['QualityInformationLink'],
            description="Quality information link:",
            placeholder='If there is published quality information, please provide the link.',
            layout=widgets.Layout(width=widget_width),
            style=description_style
            )

dataset_provenance = widgets.Combobox(
            value=checklist['DataQuality']['ProvenanceTracked'],
            options=['Yes', 'No', 'N/A'],
            description='2.11 Is the provenance of the dataset tracked and documented?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

data_integrity = widgets.Combobox(
            value=checklist['DataQuality']['DataIntegrityChecks'],
            options=['Yes', 'No', 'N/A'],
            description='2.12 Are there checksums / other checks for data integrity? ',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

display(data_resolution_info, data_quality_report, data_quality_report_link, dataset_provenance, data_integrity)

Combobox(value='', description='2.9 Is there quantitative information about data resolution in space and time?…

Combobox(value='', description='2.10 Are there published data quality procedures or reports?', layout=Layout(w…

Text(value='', description='Quality information link:', layout=Layout(width='900px'), placeholder='If there is…

Combobox(value='', description='2.11 Is the provenance of the dataset tracked and documented?', layout=Layout(…

Combobox(value='', description='2.12 Are there checksums / other checks for data integrity? ', layout=Layout(w…

### Dataset size

In [13]:

data_size_question = widgets.Label(
    value = '2.13 What is the size of the dataset? Depending on the resource, this might be:'
)

spacer = widgets.Box(layout=widgets.Layout(width='20px'))

total_data_volume = widgets.Text(
    value = checklist['DataQuality']['DatasetVolume'],
    placeholder='Total data volume:'
)

num_data_dimensions_label = widgets.Label(
    value = "Number of data dimensions:"
)

num_data_dimensions = widgets.IntText(
    value = checklist['DataQuality']['DatasetDimensions'],
    layout = widgets.Layout(width="100px")
)

dimensions = widgets.HBox([num_data_dimensions_label, num_data_dimensions])


num_data_files_label = widgets.Label(
    value = "Number of data files:"
)

num_data_files = widgets.IntText(
    value = checklist['DataQuality']['DatasetNumFiles'],
    layout = widgets.Layout(width="100px")
)

data_files = widgets.HBox([num_data_files_label, num_data_files])

num_data_rows_label = widgets.Label(
    value = "Number of data table rows:"
)

num_data_rows = widgets.IntText(
    value = checklist['DataQuality']['DatasetNumRows'],
    layout = widgets.Layout(width="100px")
)

data_rows = widgets.HBox([num_data_rows_label, num_data_rows])

num_data_images_label = widgets.Label(
    value = "Number of images:"
)

num_data_images = widgets.IntText(
    value = checklist['DataQuality']['DatasetNumImages'],
    layout = widgets.Layout(width="100px")
)

num_data_images_size_label = widgets.Label(
    value = "Size of images:"
)

num_data_images_size = widgets.Text(
    value = checklist['DataQuality']['DatasetImageSize'],
    placeholder='228 x 228'
)

images = widgets.HBox([num_data_images_label, num_data_images, spacer, num_data_images_size_label, num_data_images_size])


display(data_size_question, total_data_volume, data_files, data_rows, dimensions, images)

Label(value='2.13 What is the size of the dataset? Depending on the resource, this might be:')

Text(value='', placeholder='Total data volume:')

HBox(children=(Label(value='Number of data files:'), IntText(value=0, layout=Layout(width='100px'))))

HBox(children=(Label(value='Number of data table rows:'), IntText(value=0, layout=Layout(width='100px'))))

HBox(children=(Label(value='Number of data dimensions:'), IntText(value=0, layout=Layout(width='100px'))))

HBox(children=(Label(value='Number of images:'), IntText(value=0, layout=Layout(width='100px')), Box(layout=La…

In [14]:

# Save button
save_button = widgets.Button(description="Save Bias, Integrity & Size Answers to json file",  button_style="primary",  layout=widgets.Layout(flex='1 1 auto', width='auto'))

def generate_updates_data_quality2():
    
    updates = {
        "DataQuality": {
            #Bias
            "KnownBias": dataset_bias.value,
            "BiasExamined": dataset_bias_measures.value,
            "BiasMeasures": dataset_bias_measures_detail.value,
            "MetrologicalTraceability": dataset_bias_metrological_traceable.value,
            "BiasReport": dataset_bias_report.value,            
            "BiasReportLink" : dataset_bias_report_link.value,
            "BiasCorrectedDatasetLink" : dataset_bias_corrected_link.value,
            "BiasReductionToolsLink" : dataset_bias_tools_link.value,
            # Integrity
            "QuantitativeResolutionInfo" : data_resolution_info.value,
            "PublishedQualityProcedures" : data_quality_report.value,
            "QualityInformationLink" : data_quality_report_link.value,
            "ProvenanceTracked" : dataset_provenance.value,
            "DataIntegrityChecks" : data_integrity.value,
            # Size
            "DatasetVolume" : total_data_volume.value,
            "DatasetNumFiles" : num_data_files.value,
            "DatasetNumRows" : num_data_rows.value,
            "DatasetDimensions" : num_data_dimensions.value,
            "DatasetNumImages" : num_data_images.value,
            "DatasetImageSize" : num_data_images_size.value,
        }
    }
    return updates

save_button.on_click(lambda b: update_checklist(b, generate_updates_data_quality2()))

display(save_button)

Button(button_style='primary', description='Save Bias, Integrity & Size Answers to json file', layout=Layout(f…

## Finished

1. Make sure you saved your answers to the external json file using the buttons above. 
2. If you would like to view these saved answers use the button below. 
3. Move onto the notebook Template_Checklist_Part_3.ipynb covering Data Documentation

In [15]:

button_print_json = widgets.Button(description="Print json results",  button_style='info', layout=widgets.Layout(flex='1 1 auto', width='auto'))
output = widgets.Output()

display(button_print_json, output)

def print_json_info(b):
    """
    Loads a copy of the json file to checklist variable. 
    Then prints the json file contents to Jupyter notebook cell output.

    Arguments: b - represents the button calling the function. 
    """
    checklist = load_checklist()
    with output:
        clear_output()
        for key, value in checklist.items():
            print(f"{key}:")
            if isinstance(value, dict):
                for sub_key, sub_value in value.items():
                    print(f"  {sub_key}: {sub_value}")
            else:
                print(f"  {value}")

button_print_json.on_click(print_json_info)


Button(button_style='info', description='Print json results', layout=Layout(flex='1 1 auto', width='auto'), st…

Output()

---

## **Appendix** - Definition of terms used in the checklist.

### Quality
* **Completeness**: the breadth of a dataset compared to an ideal 100% completion (spatial, temporal, demographic, etc.); important in avoiding sampling bias
* **Consistency**: uniformity within the entire dataset or compared with similar data collections; for example, no changes in units or data types over time; the item measured against itself or its a counterpart in another dataset or database
* **Bias**: a systematic tilt in the dataset when compared to a reference, caused for example by instrumentation, incorrect data processing, unrepresentative sampling, or human error; the exact nature of bias and how it is measured will vary depending on the type of data and the research domain.
* **Uncertainty**: parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand.
* **Timeliness**: the speed of data release, compared to when an event occurred or measurements were made; requirements will vary depending on the timeframe of the phenomenon (e.g., severe thunderstorms vs. climate change, or disease outbreaks vs. life expectancy trends)
* **Provenance**: identification of the data sources, how it was processed, and who released it.
* **Integrity**: verification that the data remains unchanged from the original; aka data fixity.