<script src="https://unpkg.com/thebe@latest/lib/index.js"></script>
<script type="text/javascript">
    thebelab.bootstrap();
</script>

# **Data Readiness For AI Checklist**

 * Creator(s) John Pill
 * Affiliation: UK Met Office
 * History: 1.0
 * Last update: 27 August 2024.


---

## **Overview**
The checklist is developed using the 2019 draft readiness matrix developed by the Office of Science and Technology Policy Subcommittee on Open Science as a basis. The checklist has been improved based on further research and user feedback. Definitions for some concepts are listed at the end of this document. This checklist is developed through a collaboration of ESIP Data Readiness Cluster members include representatives from NOAA, NASA, USGS, and other organizations. The checklist will be updated periodically to reflect community feedback.

ESIP Data Readiness Cluster (2023): Checklist to Examine AI-readiness for Open Environmental Datasets v.1.0. ESIP. Online resource. https://doi.org/10.6084/m9.figshare.19983722.v1

Readiness Matrix (2020): What is AI-Ready Open Data? NOAA. Online resource. https://www.star.nesdis.noaa.gov/star/documents/meetings/2020AI/presentations/202010/20201022_Christensen.pdf

### Prerequisites
Ideally for AI-ready assessment, a dataset should be defined as the minimum measurable bundle (i.e., a physical parameter/variable of observational datasets or model simulations). The assessment at this scale will enable better integration of data from different sources for research and development. However, it can be an intensive process for manual assessment without automation. Therefore, we recommend current assessments be done on the data file level. If the dataset has different versions, the checklist should be applied to each dataset type (e.g. raw, derived).

### Learning Outcomes
* Know how to check a range of dataset features. 
* Assess a wide range of dataset features, which will impact the dataset's 'readiness' for machine learning.  


---

# **TODO TASKS** 
**Consider what to do at the end of the checklist:**
* Add export functions for different formats (CSV, JSON, etc.)

**Change all if statement else clauses from setting default values from N/A to None**

**Consider if its possible to extract helper functions and widget code to external code files which can be shared to the multiple notebooks and abstract code which isn't necessary for this notebook activity**  

**Check code cells marked as hidden in cell metadata stay hidden when opened fresh by a new user**

**Required for attempting to run cells programmatically but not working currently.**
* import nbformat
* from nbclient import NotebookClient


## **Tutorial Material TODO**


Remember to save your notebook regularly as you work through it to prevent loosing your answers.


### **Run this Jupyter notebook locally using Jupyter Lab**
* **Add download and running instructions**
* **May need to 'run all cells' to generate checklist - need to test**.


### Data section, optional
Scripts for pulling the data into the notebook assuming


## **Setup Notebook**

In [36]:
import ipywidgets as widgets
from IPython.display import display, clear_output
import json
from helper_functions import *

In [37]:
# Default properties and helper functions. 

# Default properties
# checklist_filename = "Data_Readiness_Checklist.json"
widget_width = '900px'
description_style = {'description_width': 'initial'}
placeholder = 'Click to select option'

In [38]:
# Load checklist from JSON file:
checklist = load_checklist()

In [5]:
# TODO - Can't get tagged cells to reload as when they do they don't recognise previously defined variables. Currently works if you press this button then reload individual cells manually. 

restore_button = widgets.Button(description="Restore answers",  button_style="warning")
display(restore_button)


def load_checklist_from_json(b):
     # Open json file
    with open("Data_Readiness_Checklist.json", "r") as file:
        checklist = json.load(file)


    # # Reload cells with the "thebe-init" tag
    # with open("AI_Data_Readiness_Checklist_Template_Interactive_v2.ipynb") as file:
    #     nb = nbformat.read(file, as_version=4)

    # client = NotebookClient(nb)
    
    # with client.setup_kernel():
    #     for index, cell in enumerate(nb.cells):
    #         if 'tags' in cell['metadata'] and 'thebe-init' in cell['metadata']['tags']:
    #             client.execute_cell(cell, cell_index=index)


restore_button.on_click(load_checklist_from_json)



---

## **1. Dataset General Info**


### Basic details

In [6]:

dataset_name = widgets.Text(
    description='1.1 Dataset name:',
    value=checklist["GeneralInformation"]["DatasetName"],
    layout=widgets.Layout(width=widget_width),
    style = description_style
)

dataset_version = widgets.Text(
    value=checklist["GeneralInformation"]["DatasetVersion"],
    description='1.2. Dataset version:',
    layout=widgets.Layout(width=widget_width),
    style = description_style
)

dataset_link = widgets.Text(
    value=checklist["GeneralInformation"]["DatasetLink"],
    description='1.3. Location / url link:',
    layout=widgets.Layout(width=widget_width),
    style = description_style
)

dataset_assessor_name = widgets.Text(
    value=checklist["GeneralInformation"]["AssessorName"],
    description='1.4. Assessor name:',
    layout=widgets.Layout(width=widget_width),
    style = description_style
)

dataset_assessor_email = widgets.Text(
    value=checklist["GeneralInformation"]["AssessorEmailAddress"],
    description='1.5. Assessor email address:',
    layout=widgets.Layout(width=widget_width),
    style = description_style
)

# Display all UI components.
display(dataset_name, dataset_version, dataset_link, dataset_assessor_name, dataset_assessor_email)

Text(value='Weather dataset 5', description='1.1 Dataset name:', layout=Layout(width='900px'), style=TextStyle…

Text(value='1.8', description='1.2. Dataset version:', layout=Layout(width='900px'), style=TextStyle(descripti…

Text(value='www.dataset2.com', description='1.3. Location / url link:', layout=Layout(width='900px'), style=Te…

Text(value='John', description='1.4. Assessor name:', layout=Layout(width='900px'), style=TextStyle(descriptio…

Text(value='john2@met.com', description='1.5. Assessor email address:', layout=Layout(width='900px'), style=Te…

### Dataset details

In [7]:

label = widgets.Label(value="Select from the list, or type your own answer if appropriate.")

raw_derived = widgets.Combobox(
            options=['Raw', 'Derived', 'Unknown'],
            description='6. Is this raw data or a derived/processed data product?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

observe_model_synthetic = widgets.Combobox(
            options=['Observed', 'Modeled', 'Synthetic'],
            description='7. Is this observational data, simulation/model output, or synthetic data?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

data_sources = widgets.Combobox(
            options=['Single-source', 'Aggregated'],
            description='8. Is the data single-source or aggregated from several sources? ',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

display(label, raw_derived, observe_model_synthetic, data_sources)

Label(value='Select from the list, or type your own answer if appropriate.')

Combobox(value='', description='6. Is this raw data or a derived/processed data product?', layout=Layout(width…

Combobox(value='', description='7. Is this observational data, simulation/model output, or synthetic data?', l…

Combobox(value='', description='8. Is the data single-source or aggregated from several sources? ', layout=Lay…

In [8]:

# Save button
save_button = widgets.Button(description="Save General Information Answers to json file",  button_style="primary",  layout=widgets.Layout(flex='1 1 auto', width='auto'))

def generate_updates_general():
    updates = {
        "GeneralInformation": {
            "DatasetName": dataset_name.value,
            "DatasetVersion": dataset_version.value,
            "DatasetLink": dataset_link.value,
            "AssessorName": dataset_assessor_name.value,
            "AssessorEmailAddress": dataset_assessor_email.value,
            "RawOrDerived" : raw_derived.value,
            "ObservedModeledSyntheticData" : observe_model_synthetic.value,
            "DataSource" : data_sources.value,
        }
    }
    return updates

save_button.on_click(lambda b: update_checklist(b, generate_updates_general()))

display(save_button)

Button(button_style='primary', description='Save General Information Answers to json file', layout=Layout(flex…

---

## **2. Data Quality**

### Data timeliness    

In [9]:

data_update = widgets.Combobox(
            options=['Yes', 'No'],
            description='2.1 Will the dataset be updated?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

data_update_frequency = widgets.Combobox(
            options=['When data updated', 'Hourly', 'Daily', 'Weekly', 'Monthly', 'Annually', 'Other', "N/A"],
            description='If the data will be updated, how often will it be updated?',
            placeholder=placeholder,
            layout=widgets.Layout(display='none', width=widget_width),
            style = description_style
            )

data_update_stages = widgets.Combobox(
            options=['Preliminary data first, then updated later', 'Full record', "N/A"],
            description='Will there be different stages of the update?',
            placeholder=placeholder,
            layout=widgets.Layout(display='none', width=widget_width),
            style = description_style
            )

data_update_delay = widgets.Text(
            value='',
            description='If yes, what is the delay between different stages?',
            disabled=False,
            layout=widgets.Layout(display='none', width=widget_width),
            style = description_style
            )

data_update_supersede = widgets.Combobox(
            options=['Yes', 'No', "N/A"],
            description='Should the new version of the dataset supersede the current version?',
            placeholder=placeholder,
            layout=widgets.Layout(display='none', width=widget_width),
            style = description_style
            )

# Function to change the display setting of the following UI components. 
def on_click_handler(change): 
    if change["new"] == "Yes":
        data_update_frequency.layout.display = ''
        data_update_stages.layout.display = ''
        data_update_delay.layout.display = ''
        data_update_supersede.layout.display = ''
    else:
        data_update_frequency.layout.display = 'none'
        data_update_stages.layout.display = 'none'
        data_update_delay.layout.display = 'none'
        data_update_supersede.layout.display = 'none'
        
        # Return the values back to default state if 1st option changed back.
        data_update_frequency.value = None
        data_update_stages.value = None
        data_update_delay.value = ""
        data_update_supersede.value = None
        

# Show UI components based on their display settings. 
display(data_update, data_update_frequency, data_update_stages, data_update_delay, data_update_supersede)

# Observe the first UI component for changes and call the on_click_handler function if value property changed. 
data_update.observe(on_click_handler, names="value")

Combobox(value='', description='2.1 Will the dataset be updated?', layout=Layout(width='900px'), options=('Yes…

Combobox(value='', description='If the data will be updated, how often will it be updated?', layout=Layout(dis…

Combobox(value='', description='Will there be different stages of the update?', layout=Layout(display='none', …

Text(value='', description='If yes, what is the delay between different stages?', layout=Layout(display='none'…

Combobox(value='', description='Should the new version of the dataset supersede the current version?', layout=…

### Data completeness

In [10]:

completeness_docs = widgets.Combobox(
            options=['Yes', 'No'],
            description='2.2 Is there any documentation about the completeness of the dataset?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

completeness_docs_link = widgets.Text(
            value='',
            description="Documentation link:",
            placeholder='Please provide a link to the document',
            disabled=False,
            layout=widgets.Layout(display='none', width=widget_width),
            style = description_style
            )

expected_spatial_coverage = widgets.Combobox(
            options=['Complete', 'Partial', 'Unknown', 'N/A'],
            description='2.3 How complete is the dataset compared to the expected spatial coverage?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

expected_temporal_coverage = widgets.Combobox(
            options=['Complete', 'Partial', 'Unknown', 'N/A'],
            description='2.4 How complete is the dataset compared to the expected temporal coverage?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

# Function to change the display setting of the following UI components. 
def on_click_handler(change): 
    if change["new"] == "Yes":
        completeness_docs_link.layout.display = ''
    else:
        completeness_docs_link.layout.display = 'none'
        
        # Return the values back to default state if 1st option changed back.
        completeness_docs_link.value = ""

# Show UI components based on their display settings. 
display(completeness_docs, completeness_docs_link, expected_spatial_coverage, expected_temporal_coverage)

# Observe the first UI component for changes and call the on_click_handler function if value property changed. 
completeness_docs.observe(on_click_handler, names="value")


Combobox(value='', description='2.2 Is there any documentation about the completeness of the dataset?', layout…

Text(value='', description='Documentation link:', layout=Layout(display='none', width='900px'), placeholder='P…

Combobox(value='', description='2.3 How complete is the dataset compared to the expected spatial coverage?', l…

Combobox(value='', description='2.4 How complete is the dataset compared to the expected temporal coverage?', …

### Data consistency

In [11]:

self_consistent_units = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='2.5 Is this dataset self-consistent in that its units, data types, and parameter names do not change over time and space?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

consistent_units = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='2.6 Is this dataset’s units, data types, and parameter names consistent with similar data collections?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

consistent_unit_monitoring = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='2.7 Are there processes to monitor for units, data types, and parameter consistency?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

consistent_unit_review = widgets.Text(
            value='',
            description = 'Review measures:',
            placeholder='If yes, what measures are taken? Manual review / Automated review etc.',
            disabled=False,
            layout=widgets.Layout(display='none', width=widget_width),
            style = description_style
            )

# Function to change the display setting of the following UI components. 
def on_click_handler(change): 
    if change["new"] == "Yes":
        consistent_unit_review.layout.display = ''
    else:
        consistent_unit_review.layout.display = 'none'
        
        # Return the values back to default state if 1st option changed back.
        consistent_unit_review.value = ""

# Show UI components based on their display settings. 
display(self_consistent_units, consistent_units, consistent_unit_monitoring, consistent_unit_review)

# Observe the first UI component for changes and call the on_click_handler function if value property changed. 
consistent_unit_monitoring.observe(on_click_handler, names="value")


Combobox(value='', description='2.5 Is this dataset self-consistent in that its units, data types, and paramet…

Combobox(value='', description='2.6 Is this dataset’s units, data types, and parameter names consistent with s…

Combobox(value='', description='2.7 Are there processes to monitor for units, data types, and parameter consis…

Text(value='', description='Review measures:', layout=Layout(display='none', width='900px'), placeholder='If y…

In [12]:

# Save button
save_button = widgets.Button(description="Save Timeliness, Completeness & Consistency Answers to json file",  button_style="primary",  layout=widgets.Layout(flex='1 1 auto', width='auto'))

def generate_updates_data_quality1():

    updates = {
        "DataQuality": {
            "WillBeUpdated": data_update.value,
            "WhenNewDataAdded": data_update_frequency.value,
            "DifferentStages": data_update_stages.value,
            "DelayBetweenStages": data_update_delay.value,
            "SupersedeCurrentVersion": data_update_supersede.value,
            "DocumentationAvailable" : completeness_docs.value,
            "LinkToReport" : completeness_docs_link.value,
            "SpatialCoverage" : expected_spatial_coverage.value,
            "TemporalCoverage" : expected_temporal_coverage.value,
            "SelfConsistent" : self_consistent_units.value,
            "ConsistentWithSimilarData" : consistent_units.value,
            "MonitoringProcessesExist" : consistent_unit_monitoring.value,
            "MonitoringProcessesDetails" : consistent_unit_review.value            
        }
    }
    return updates

save_button.on_click(lambda b: update_checklist(b, generate_updates_data_quality1()))

display(save_button)

Button(button_style='primary', description='Save Timeliness, Completeness & Consistency Answers to json file',…

### Data bias

In [13]:

dataset_bias = widgets.Combobox(
            options=['Yes', 'No', 'Unknown'],
            description='2.8 Is there known bias in the dataset?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

dataset_bias_measures = widgets.Combobox(
            options=['Yes', 'No', 'Unknown', 'N/A'],
            description='Have measures been taken to examine bias?',
            placeholder=placeholder,
            layout=widgets.Layout(display="none", width=widget_width),
            style = description_style
            )

dataset_bias_measures_detail = widgets.Textarea(
            value='',
            placeholder='If yes, what measures were used?',
            layout=widgets.Layout(display="none", width=widget_width),
            )

dataset_bias_metrological_traceable = widgets.Textarea(
            value='',
            placeholder='Is the bias metrological traceable?',
            layout=widgets.Layout(display="none", width=widget_width),
            )

dataset_bias_report = widgets.Combobox(
            options=['No known bias', 'Found and reported', 'No info available', 'N/A'],
            description='Is there reported bias in the data?',
            placeholder=placeholder,
            layout=widgets.Layout(display="none", width=widget_width),
            style = description_style
            )

dataset_bias_report_link = widgets.Text(
            value='',
            placeholder='(optional) Link to the report/document on the bias',
            layout=widgets.Layout(display="none", width=widget_width)
            )

dataset_bias_corrected_link = widgets.Text(
            value='',
            placeholder='(optional) Link to a bias-corrected or bias-reduced version of the dataset',
            layout=widgets.Layout(display="none", width=widget_width)
            )

dataset_bias_tools_link = widgets.Text(
            value='',
            placeholder='(optional) Link to tools available to reduce bias',
            layout=widgets.Layout(display="none", width=widget_width)
            )

# Function to change the display setting of the following UI components. 
def on_click_handler(change):    

    # Show / hide main trunk of questions. 
    if dataset_bias.value == "Yes":
        dataset_bias_measures.layout.display = ''
        dataset_bias_report.layout.display = ''
        dataset_bias_metrological_traceable.layout.display = ''
        dataset_bias_report_link.layout.display = ''
        dataset_bias_corrected_link.layout.display = ''
        dataset_bias_tools_link.layout.display = ''

    else:   
        dataset_bias_measures.layout.display = 'none'
        dataset_bias_report.layout.display = 'none'
        dataset_bias_report_link.layout.display = 'none'
        dataset_bias_corrected_link.layout.display = 'none'
        dataset_bias_tools_link.layout.display = 'none'
        dataset_bias_measures.value = None
        dataset_bias_report.value = None
        dataset_bias_report_link.value = ''
        dataset_bias_corrected_link.value = ''
        dataset_bias_tools_link.value = ''

    # Show / hide 2nd trunk of questions.
    if dataset_bias_measures.value == "Yes":
        dataset_bias_measures_detail.layout.display = ''
        dataset_bias_metrological_traceable.layout.display = ''
    else:
        dataset_bias_measures_detail.layout.display = 'none'
        dataset_bias_metrological_traceable.layout.display = 'none'
        dataset_bias_measures_detail.value = ''
        dataset_bias_metrological_traceable.value = ''
        
            
# Display the UI components
display(dataset_bias, dataset_bias_measures, dataset_bias_measures_detail, dataset_bias_metrological_traceable, dataset_bias_report, dataset_bias_report_link, dataset_bias_corrected_link, dataset_bias_tools_link)

# Observe UI components for changes and call the on_click_handler function if value property changed. 
dataset_bias.observe(on_click_handler, names="value")
dataset_bias_measures.observe(on_click_handler, names="value")



Combobox(value='', description='2.8 Is there known bias in the dataset?', layout=Layout(width='900px'), option…

Combobox(value='', description='Have measures been taken to examine bias?', layout=Layout(display='none', widt…

Textarea(value='', layout=Layout(display='none', width='900px'), placeholder='If yes, what measures were used?…

Textarea(value='', layout=Layout(display='none', width='900px'), placeholder='Is the bias metrological traceab…

Combobox(value='', description='Is there reported bias in the data?', layout=Layout(display='none', width='900…

Text(value='', layout=Layout(display='none', width='900px'), placeholder='(optional) Link to the report/docume…

Text(value='', layout=Layout(display='none', width='900px'), placeholder='(optional) Link to a bias-corrected …

Text(value='', layout=Layout(display='none', width='900px'), placeholder='(optional) Link to tools available t…

### Data integrity

In [14]:

data_resolution_info = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='2.9 Is there quantitative information about data resolution in space and time?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

data_quality_report = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='2.10 Are there published data quality procedures or reports?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

data_quality_report_link = widgets.Text(
            value='',
            description="Quality information link:",
            placeholder='If there is published quality information, please provide the link.',
            layout=widgets.Layout(width=widget_width),
            style=description_style
            )

dataset_provenance = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='2.11 Is the provenance of the dataset tracked and documented?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

data_integrity = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='2.12 Are there checksums / other checks for data integrity? ',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

display(data_resolution_info, data_quality_report, data_quality_report_link, dataset_provenance, data_integrity)

Combobox(value='', description='2.9 Is there quantitative information about data resolution in space and time?…

Combobox(value='', description='2.10 Are there published data quality procedures or reports?', layout=Layout(w…

Text(value='', description='Quality information link:', layout=Layout(width='900px'), placeholder='If there is…

Combobox(value='', description='2.11 Is the provenance of the dataset tracked and documented?', layout=Layout(…

Combobox(value='', description='2.12 Are there checksums / other checks for data integrity? ', layout=Layout(w…

### Dataset size

In [15]:

data_size_question = widgets.Label(
    value = '2.13 What is the size of the dataset? Depending on the resource, this might be:'
)

spacer = widgets.Box(layout=widgets.Layout(width='20px'))

total_data_volume = widgets.Text(
    value = '',
    placeholder='Total data volume:'
)

num_data_dimensions_label = widgets.Label(
    value = "Number of data dimensions:"
)

num_data_dimensions = widgets.IntText(
    value = 0,
    layout = widgets.Layout(width="100px")
)

dimensions = widgets.HBox([num_data_dimensions_label, num_data_dimensions])


num_data_files_label = widgets.Label(
    value = "Number of data files:"
)

num_data_files = widgets.IntText(
    value = 0,
    layout = widgets.Layout(width="100px")
)

data_files = widgets.HBox([num_data_files_label, num_data_files])

num_data_rows_label = widgets.Label(
    value = "Number of data table rows:"
)

num_data_rows = widgets.IntText(
    value = 0,
    layout = widgets.Layout(width="100px")
)

data_rows = widgets.HBox([num_data_rows_label, num_data_rows])

num_data_images_label = widgets.Label(
    value = "Number of images:"
)

num_data_images = widgets.IntText(
    value = 0,
    layout = widgets.Layout(width="100px")
)

num_data_images_size_label = widgets.Label(
    value = "Size of images:"
)

num_data_images_size = widgets.Text(
    value = '',
    placeholder='228 x 228'
)

images = widgets.HBox([num_data_images_label, num_data_images, spacer, num_data_images_size_label, num_data_images_size])


display(data_size_question, total_data_volume, data_files, data_rows, dimensions, images)

Label(value='2.13 What is the size of the dataset? Depending on the resource, this might be:')

Text(value='', placeholder='Total data volume:')

HBox(children=(Label(value='Number of data files:'), IntText(value=0, layout=Layout(width='100px'))))

HBox(children=(Label(value='Number of data table rows:'), IntText(value=0, layout=Layout(width='100px'))))

HBox(children=(Label(value='Number of data dimensions:'), IntText(value=0, layout=Layout(width='100px'))))

HBox(children=(Label(value='Number of images:'), IntText(value=0, layout=Layout(width='100px')), Box(layout=La…

In [16]:

# Save button
save_button = widgets.Button(description="Save Bias, Integrity & Size Answers to json file",  button_style="primary",  layout=widgets.Layout(flex='1 1 auto', width='auto'))

def generate_updates_data_quality2():
    
    updates = {
        "DataQuality": {
            #Bias
            "KnownBias": dataset_bias.value,
            "BiasExamined": dataset_bias_measures.value,
            "BiasMeasures": dataset_bias_measures_detail.value,
            "MetrologicalTraceability": dataset_bias_metrological_traceable.value,
            "BiasReport": dataset_bias_report.value,            
            "BiasReportLink" : dataset_bias_report_link.value,
            "BiasCorrectedDatasetLink" : dataset_bias_corrected_link.value,
            "BiasReductionToolsLink" : dataset_bias_tools_link.value,
            # Integrity
            "QuantitativeResolutionInfo" : data_resolution_info.value,
            "PublishedQualityProcedures" : data_quality_report.value,
            "QualityInformationLink" : data_quality_report_link.value,
            "ProvenanceTracked" : dataset_provenance.value,
            "DataIntegrityChecks" : data_integrity.value,
            # Size
            "DatasetVolume" : total_data_volume.value,
            "DatasetNumFiles" : num_data_files.value,
            "DatasetNumRows" : num_data_rows.value,
            "DatasetDimensions" : num_data_dimensions.value,
            "DatasetNumImages" : num_data_images.value,
            "DatasetImageSize" : num_data_images_size.value,
        }
    }
    return updates

save_button.on_click(lambda b: update_checklist(b, generate_updates_data_quality2()))

display(save_button)

Button(button_style='primary', description='Save Bias, Integrity & Size Answers to json file', layout=Layout(f…

---

## **3. Data Documentation**

### Community standard or convention


In [17]:

metadata_standard = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='3.1 Does the dataset metadata follow a community/domain standard or convention?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

metadata_standard_detail = widgets.Text(
            value='',
            description="Metadata standard:",
            placeholder='Which standard is it? (CF, TBD, etc.)',
            layout=widgets.Layout(display="none", width=widget_width),
            style = description_style
            )

metadata_machine_readable = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='Is the dataset metadata machine-readable?',
            placeholder="Click to select option",
            layout = widgets.Layout(display="none", width=widget_width),
            style = description_style
            )

metadata_spatial_temporal = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='Does it include details on the spatial and temporal extent?',
            placeholder="Click to select option",
            layout = widgets.Layout(display="none", width=widget_width),
            style = description_style
            )


# Function to change the display setting of the following UI components. 
def on_click_handler(change):    

    # Show / hide main trunk of questions. 
    if metadata_standard.value == "Yes":
        metadata_standard_detail.layout.display = ''
        metadata_machine_readable.layout.display = ''
        metadata_spatial_temporal.layout.display = ''
    else: 
        metadata_standard_detail.layout.display = 'none'
        metadata_machine_readable.layout.display = 'none'
        metadata_spatial_temporal.layout.display = 'none'
        metadata_standard_detail.value = ''
        metadata_machine_readable.value = 'N/A'
        metadata_spatial_temporal.value = 'N/A'
        

display(metadata_standard, metadata_standard_detail, metadata_machine_readable, metadata_spatial_temporal)

# Observe UI components for changes and call the on_click_handler function if value property changed. 
metadata_standard.observe(on_click_handler, names="value")

Combobox(value='', description='3.1 Does the dataset metadata follow a community/domain standard or convention…

Text(value='', description='Metadata standard:', layout=Layout(display='none', width='900px'), placeholder='Wh…

Combobox(value='', description='Is the dataset metadata machine-readable?', layout=Layout(display='none', widt…

Combobox(value='', description='Does it include details on the spatial and temporal extent?', layout=Layout(di…

### Data dictionary

In [18]:

data_dictionary = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='3.2 Is there a comprehensive data dictionary/codebook that describes what each element of the dataset means? parameters?',
            placeholder="Click to select option",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

data_dictionary_standardized = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='Is the data dictionary standardized?',
            placeholder="Click to select option",
            layout=widgets.Layout(display="none", width=widget_width),
            style = description_style
            )

data_dictionary_machine_readable = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='Is the data dictionary machine-readable?',
            placeholder="Click to select option",
            layout=widgets.Layout(display="none", width=widget_width),
            style = description_style
            )

parameters_defined_standard = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='Do the parameters follow a defined standard?',
            placeholder="Click to select option",
            layout=widgets.Layout(display="none", width=widget_width),
            style = description_style
            )

parameters_defined_standard_detail = widgets.Text(
            value='',
            description = 'Parameter standards:',
            placeholder='If the parameters follow a defined standard, which standard it is?',
            layout=widgets.Layout(display="none", width=widget_width),
            style = description_style
            )

parameters_common_vocabulary = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='Are parameters crosswalked in an ontology or common vocabulary (e.g. NIEM)?',
            placeholder="Click to select option",
            layout=widgets.Layout(display="none", width=widget_width),
            style = description_style
            )

# Function to change the display setting of the following UI components. 
def on_click_handler(change):    

    # Show / hide main trunk of questions. 
    if data_dictionary.value == "Yes":
        data_dictionary_standardized.layout.display = ''
        data_dictionary_machine_readable.layout.display = ''
        parameters_defined_standard.layout.display = ''
        parameters_defined_standard_detail.layout.display = ''
        parameters_common_vocabulary.layout.display = ''

    else:   
        data_dictionary_standardized.layout.display = 'none'
        data_dictionary_machine_readable.layout.display = 'none'
        parameters_defined_standard.layout.display = 'none'
        parameters_defined_standard_detail.layout.display = 'none'
        parameters_common_vocabulary.layout.display = 'none'
        data_dictionary_standardized.value = 'N/A'
        data_dictionary_machine_readable.value = 'N/A'
        parameters_defined_standard.value = 'N/A'
        parameters_defined_standard_detail.value = ''
        parameters_common_vocabulary.value = 'N/A'

            
# Display the UI components
display(data_dictionary, data_dictionary_standardized, data_dictionary_machine_readable, parameters_defined_standard, parameters_defined_standard_detail, parameters_common_vocabulary)

# Observe UI components for changes and call the on_click_handler function if value property changed. 
data_dictionary.observe(on_click_handler, names="value")



Combobox(value='', description='3.2 Is there a comprehensive data dictionary/codebook that describes what each…

Combobox(value='', description='Is the data dictionary standardized?', layout=Layout(display='none', width='90…

Combobox(value='', description='Is the data dictionary machine-readable?', layout=Layout(display='none', width…

Combobox(value='', description='Do the parameters follow a defined standard?', layout=Layout(display='none', w…

Text(value='', description='Parameter standards:', layout=Layout(display='none', width='900px'), placeholder='…

Combobox(value='', description='Are parameters crosswalked in an ontology or common vocabulary (e.g. NIEM)?', …

### Unique persistent identifier

3. Does the dataset have a unique persistent identifier, e.g. DOI? Yes, [supply identifier] / No / Not applicable


In [19]:

unique_persistent_identifier = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='3.3 Does the dataset have a unique persistent identifier, e.g. DOI?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

unique_persistent_identifier_link = widgets.Text(
            value='',
            description = "Identifier",
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

display(unique_persistent_identifier, unique_persistent_identifier_link)

Combobox(value='', description='3.3 Does the dataset have a unique persistent identifier, e.g. DOI?', layout=L…

Text(value='', description='Identifier', layout=Layout(width='900px'), placeholder='Click to select option', s…

### Contact information and feedback

In [20]:

contact_info_available = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='3.4 Is there contact information for subject-matter experts?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

feedback_mechanism_available = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='3.5 Is there a mechanism for user feedback and suggestions?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

display(contact_info_available, feedback_mechanism_available)


Combobox(value='', description='3.4 Is there contact information for subject-matter experts?', layout=Layout(w…

Combobox(value='', description='3.5 Is there a mechanism for user feedback and suggestions?', layout=Layout(wi…

### Examples codes / notebooks / toolkits


In [21]:

example_code_available = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='3.6 Are there example codes / notebooks / toolkits available showing how the data can be used?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

display(example_code_available)

Combobox(value='', description='3.6 Are there example codes / notebooks / toolkits available showing how the d…

### Licenses

In [22]:

dataset_licence = widgets.Text(
            value='',
            description='3.7 What is the license for the data?',
            placeholder="Type your answer",
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

dataset_licence_machine_readable = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='Is the license standardized and machine-readable (e.g. Creative Commons)?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

display(dataset_licence, dataset_licence_machine_readable)

Text(value='', description='3.7 What is the license for the data?', layout=Layout(width='900px'), placeholder=…

Combobox(value='', description='Is the license standardized and machine-readable (e.g. Creative Commons)?', la…

### Dataset useage

In [23]:

ai_ml_existing_useage_links = widgets.Textarea(
            value='',
            description='3.8 Has this dataset already been used in AI or ML activities? Link to publications/reports',
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

usage_recomendations = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='3.9 Are there recommendations on the intended use of the data, and uses that are not recommended?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

display(ai_ml_existing_useage_links, usage_recomendations)

Textarea(value='', description='3.8 Has this dataset already been used in AI or ML activities? Link to publica…

Combobox(value='', description='3.9 Are there recommendations on the intended use of the data, and uses that a…

In [24]:

# Save button
save_button = widgets.Button(description="Save Data Documentation Answers to json file",  button_style="primary",  layout=widgets.Layout(flex='1 1 auto', width='auto'))

def generate_updates_documentation():

    updates = {
        "DataDocumentation": {
            #Metadata
            "MetadataStandard": metadata_standard.value,
            "MetadataStandardName": metadata_standard_detail.value,
            "MetadataMachineReadable": metadata_machine_readable.value,
            "MetadataSpatialTemporalExtent": metadata_spatial_temporal.value,
            
            # Data Dictionary
            "DataDictionaryExists": data_dictionary.value,            
            "DataDictionaryStandardised" : data_dictionary_standardized.value,
            "DataDictionaryMachineReadable" : data_dictionary_machine_readable.value,
            "DataDictionaryParametersFollowStandard" : parameters_defined_standard.value,
            "DataDictionaryStandardName" : parameters_defined_standard_detail.value,
            "DataDictionaryParametersCrosswalked" : parameters_common_vocabulary.value,
            
            # Identifier
            "IdentifierExists" : unique_persistent_identifier.value,
            "Identifier" : unique_persistent_identifier_link.value,
            
            # Contact and feedback
            "ContactInformation" : contact_info_available.value,
            "UserFeedbackMechanism" : feedback_mechanism_available.value,

            #Misc
            "ExampleCodesAvailable" : example_code_available.value,
            "LicenseType" : dataset_licence.value,
            "LicenceMachineReadable" : dataset_licence_machine_readable.value,
            "UsedInAIorMLReports" : ai_ml_existing_useage_links.value,
            "UsageRecommendations" : usage_recomendations.value,
        }
    }
    return updates

save_button.on_click(lambda b: update_checklist(b, generate_updates_documentation()))

display(save_button)

Button(button_style='primary', description='Save Data Documentation Answers to json file', layout=Layout(flex=…

---

## **4. Data Access**

### File formats

In [25]:

dataset_file_formats_label = widgets.Label(
    value = "4.1 What is/are the major file formats? (Use shift / Ctrl / CMD to select multiple)"
)

dataset_file_format_options = ['CSV', 'netCDF', 'geoJSON', 'Shapefile', 'GRIB', 'HDF', 'GeoTIFF', 'KML', 'GINI', 'Zarr', 'Other']

dataset_file_formats = widgets.SelectMultiple(
    options=dataset_file_format_options,
    value=(),
    rows=len(dataset_file_format_options),
)

dataset_file_formats_machine_readable = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='Are the main formats machine-readable?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

dataset_file_formats_non_proprietary = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='Is the data available in at least one open, non-proprietary format?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

dataset_file_formats_conversion_tools = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='Are there tools/services to support data format conversion?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

dataset_file_formats_conversion_tools_link = widgets.Text(
            value='',
            description='Tools / services link:',
            placeholder='If yes, provide the link to the tools/services',
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

def on_click_handler_sel_format(change):
    """
    Completes various fields based on the file format selection.
    """
    
    selected_formats = change['new']
    dataset_file_formats_conversion_tools_link_dict = {'CSV':['https://pandas.pydata.org/', 'https://www.qgis.org/', 'https://www.arcgis.com/home/index.html', ],
                                                        'netCDF':['https://unidata.github.io/netcdf4-python/','https://scitools-iris.readthedocs.io/en/stable/index.html', 'https://docs.xarray.dev/en/stable/', 'https://www.giss.nasa.gov/tools/panoply/', 'https://www.unidata.ucar.edu/software/tds/', 'https://www.mathworks.com/products/matlab.html'],
                                                        'geoJSON':['https://www.qgis.org/', 'https://www.arcgis.com/home/index.html', 'https://geopandas.org/en/stable/', 'https://shapely.readthedocs.io/en/stable/manual.html'],
                                                        'Shapefile':['https://www.qgis.org/', 'https://www.arcgis.com/home/index.html', 'https://gdal.org/en/latest/', 'https://pypi.org/project/pyshp/', 'https://geopandas.org/en/stable/',], 
                                                        'GRIB':['https://scitools-iris.readthedocs.io/en/stable/index.html', 'https://www.giss.nasa.gov/tools/panoply/', 'https://www.qgis.org/', 'https://www.cpc.ncep.noaa.gov/products/wesley/wgrib2/', 'https://github.com/ecmwf/cfgrib'],
                                                        'HDF':['https://www.h5py.org/', 'https://www.giss.nasa.gov/tools/panoply/','https://earth.esa.int/eogateway/tools/hdfview'],
                                                        'GeoTIFF':['https://www.qgis.org/', 'https://www.arcgis.com/home/index.html','https://gdal.org/en/latest/','https://rasterio.readthedocs.io/en/stable/'],
                                                        'KML':['https://www.qgis.org/', 'https://www.arcgis.com/home/index.html','https://fastkml.readthedocs.io/en/latest/','https://simplekml.readthedocs.io/en/latest/'],
                                                        'GINI':['https://gdal.org/en/latest/', 'https://www.unidata.ucar.edu/software/metpy/'],
                                                        'Zarr':['https://zarr.readthedocs.io/en/stable/', 'https://docs.xarray.dev/en/stable/','https://www.dask.org']}

    if set(selected_formats) & {'CSV', 'netCDF', 'geoJSON', 'Shapefile', 'GRIB', 'HDF', 'GeoTIFF', 'KML', 'GINI', 'Zarr'}:
        dataset_file_formats_machine_readable.value = 'Yes'
    elif set(selected_formats) & {'Other'}:
        dataset_file_formats_machine_readable.value = ''
    else:
        dataset_file_formats_machine_readable.value = 'No'
    
    if set(selected_formats) & {'CSV', 'netCDF', 'geoJSON', 'Shapefile', 'GRIB', 'HDF','GeoTIFF', 'Zarr'}:
        dataset_file_formats_non_proprietary.value = 'Yes'
    elif set(selected_formats) & {'Other'}:
        dataset_file_formats_non_proprietary.value = ''
    else:
        dataset_file_formats_non_proprietary.value = 'No'
    
    # Determine if conversion tools are available
    if set(selected_formats) & {'CSV', 'netCDF', 'geoJSON', 'Shapefile', 'GRIB', 'HDF', 'GeoTIFF', 'KML', 'GINI', 'Zarr'}:
        dataset_file_formats_conversion_tools.value = 'Yes'
        dataset_file_formats_conversion_tools_link_list = []
        for format in list(set(selected_formats)):
            [dataset_file_formats_conversion_tools_link_list.append(i) for i in dataset_file_formats_conversion_tools_link_dict[format]]
        dataset_file_formats_conversion_tools_link.value = ' '.join(list(set(dataset_file_formats_conversion_tools_link_list)))
    elif set(selected_formats) & {'Other'}:
        dataset_file_formats_conversion_tools.value = ''
        dataset_file_formats_conversion_tools_link.value = ''
    else:
        dataset_file_formats_conversion_tools.value = 'No'
        dataset_file_formats_conversion_tools_link.value = ''

dataset_file_formats.observe(on_click_handler_sel_format, names='value')

display(dataset_file_formats_label, dataset_file_formats, dataset_file_formats_machine_readable, dataset_file_formats_non_proprietary, dataset_file_formats_conversion_tools, dataset_file_formats_conversion_tools_link)

Label(value='4.1 What is/are the major file formats? (Use shift / Ctrl / CMD to select multiple)')

SelectMultiple(options=('CSV', 'netCDF', 'geoJSON', 'Shapefile', 'GRIB', 'HDF', 'GeoTIFF', 'KML', 'GINI', 'Zar…

Combobox(value='', description='Are the main formats machine-readable?', layout=Layout(width='900px'), options…

Combobox(value='', description='Is the data available in at least one open, non-proprietary format?', layout=L…

Combobox(value='', description='Are there tools/services to support data format conversion?', layout=Layout(wi…

Text(value='', description='Tools / services link:', layout=Layout(width='900px'), placeholder='If yes, provid…

### Data delivery

In [26]:

dataset_authentication = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='4.2 Does data access require authentication (e.g., a registered user account)?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

dataset_direct_access = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='4.3 Can the file be accessed via direct file downloading or ordering?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

dataset_api_available = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='4.4 Is there an Application Programming Interface (API) or web service to access the data?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

dataset_api_standard_protocol = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='If there is an API, does the API follow an open standard protocol (e.g., OGC)?',
            placeholder=placeholder,
            layout=widgets.Layout(display="none", width=widget_width),
            style = description_style
            )

dataset_api_documentation_available = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='If there is an API, is there documentation for the API?',
            placeholder=placeholder,
            layout=widgets.Layout(display="none", width=widget_width),
            style = description_style
            )

dataset_api_documentation_link = widgets.Text(
            value='',
            placeholder='If “Yes”, please provide a URL to the documentation.',
            layout=widgets.Layout(display="none", width=widget_width)
            )



# Function to change the display setting of the following UI components. 
def on_click_handler(change):    

    # Show / hide main trunk of questions. 
    if dataset_api_available.value == "Yes":
        dataset_api_standard_protocol.layout.display = ''
        dataset_api_documentation_available.layout.display = ''
        dataset_api_documentation_link.layout.display = ''

    else:   
        dataset_api_standard_protocol.layout.display = 'none'
        dataset_api_documentation_available.layout.display = 'none'
        dataset_api_documentation_link.layout.display = 'none'

        dataset_api_standard_protocol.value = 'N/A'
        dataset_api_documentation_available.value = 'N/A'
        dataset_api_documentation_link.value = ''


  
            
# Display the UI components
display(dataset_authentication, dataset_direct_access, dataset_api_available, dataset_api_standard_protocol, dataset_api_documentation_available, dataset_api_documentation_link)


# Observe UI components for changes and call the on_click_handler function if value property changed. 
dataset_api_available.observe(on_click_handler, names="value")






Combobox(value='', description='4.2 Does data access require authentication (e.g., a registered user account)?…

Combobox(value='', description='4.3 Can the file be accessed via direct file downloading or ordering?', layout…

Combobox(value='', description='4.4 Is there an Application Programming Interface (API) or web service to acce…

Combobox(value='', description='If there is an API, does the API follow an open standard protocol (e.g., OGC)?…

Combobox(value='', description='If there is an API, is there documentation for the API?', layout=Layout(displa…

Text(value='', layout=Layout(display='none', width='900px'), placeholder='If “Yes”, please provide a URL to th…

### Privacy and security


In [27]:

dataset_restricted_protection = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='4.5 For restricted data, have measures been taken to provide some access while still applying appropriate protection for privacy and security?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            
            )

dataset_aggregation = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='4.6 Has the data been aggregated to reduce granularity?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

dataset_anonymization = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='4.7 Has the data been anonymized / de-identified?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

dataset_secure_access = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='4.8 Is there secure access to the full dataset for authorized users? ',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

display(dataset_restricted_protection, dataset_aggregation, dataset_anonymization, dataset_secure_access)


Combobox(value='', description='4.5 For restricted data, have measures been taken to provide some access while…

Combobox(value='', description='4.6 Has the data been aggregated to reduce granularity?', layout=Layout(width=…

Combobox(value='', description='4.7 Has the data been anonymized / de-identified?', layout=Layout(width='900px…

Combobox(value='', description='4.8 Is there secure access to the full dataset for authorized users? ', layout…

In [28]:

# Save button
save_button = widgets.Button(description="Save Data Access Answers to json file",  button_style="primary",  layout=widgets.Layout(flex='1 1 auto', width='auto'))

def generate_updates_access():

    updates = {
        "DataAccess": {
            #File Formats
            "FileFormats": dataset_file_formats.value,
            "FileFormatsMachineReadable": dataset_file_formats_machine_readable.value,
            "OpenFormatAvailable": dataset_file_formats_non_proprietary.value,
            "FormatConversionTools": dataset_file_formats_conversion_tools.value,
            "ConversionToolsLink": dataset_file_formats_conversion_tools_link.value, 

            # Data Delivery
            "AuthenticationRequired" : dataset_authentication.value,
            "DirectDownloadAvailable" : dataset_direct_access.value,
            "APIorWebAvailable" : dataset_api_available.value,
            "APIOpenStandard" : dataset_api_standard_protocol.value,
            "APIDocumentation" : dataset_api_documentation_available.value,
            "APIDocumentationLink" : dataset_api_documentation_link.value,

            # Privacy and Security
            "SecurityMeasuresTaken" : dataset_restricted_protection.value,
            "DataAggregated" : dataset_aggregation.value,
            "DataAnonymized" : dataset_anonymization.value,
            "SecureAccessForAuthorizedUsers" : dataset_secure_access.value,
        }
    }
    return updates

save_button.on_click(lambda b: update_checklist(b, generate_updates_access()))

display(save_button)

Button(button_style='primary', description='Save Data Access Answers to json file', layout=Layout(flex='1 1 au…

---

## **5. Data Preparation**

### Null values

In [29]:

dataset_null_values = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description='5.1 Have null values/gaps been filled?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

display(dataset_null_values)

Combobox(value='', description='5.1 Have null values/gaps been filled?', layout=Layout(width='900px'), options…

### Outliers

In [30]:

dataset_outliers = widgets.Combobox(
            options=['Yes, tagged ', 'Yes, removed', 'No', 'N/A'],
            description='5.2 Have outliers been identified?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

display(dataset_outliers)


Combobox(value='', description='5.2 Have outliers been identified?', layout=Layout(width='900px'), options=('Y…

### Gridded data


In [31]:

dataset_gridded = widgets.Combobox(
            options=['Regularly gridded in space', 'Constant time-frequency', 'Regularly gridded in space and constant time-frequency', 'Not gridded', 'N/A'],
            description='5.3 Is the data gridded (regularly sampled in time and space)?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

dataset_gridded_transformed = widgets.Combobox(
            options=['Yes, from irregular sampling', 'Yes, from a different regular sampling', 'No, this is the original sampling', 'N/A'],
            description='If the data is gridded, was it transformed from a different original sampling?',            
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )


dataset_gridded_original_sample = widgets.Combobox(
            options=['Yes', 'No', 'Only by request', 'N/A'],
            description = 'If the data is resampled from the original sampling, is the data also available at the original sampling?',
            placeholder=placeholder,
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

display(dataset_gridded, dataset_gridded_transformed, dataset_gridded_original_sample)

Combobox(value='', description='5.3 Is the data gridded (regularly sampled in time and space)?', layout=Layout…

Combobox(value='', description='If the data is gridded, was it transformed from a different original sampling?…

Combobox(value='', description='If the data is resampled from the original sampling, is the data also availabl…

### Targets / labels for supervised learning

In [32]:

dataset_targets_or_labels = widgets.Combobox(
            options=['Yes', 'No', 'N/A'],
            description = '5.4 Are there associated targets or labels for supervised learning techniques?',
            placeholder='Click to select option - (Can this be used as a training dataset)?',
            layout=widgets.Layout(width=widget_width),
            style = description_style
            )

dataset_targets_or_labels_standards_label = widgets.Label(
    value = "If there are associated targets/labels, are community labeling standards implemented?"
)

dataset_targets_or_labels_standards = widgets.Text(
            value = '',
            placeholder = 'e.g., STAC label extension, ESA AIREO specification, etc.',
            layout = widgets.Layout(width=widget_width)
)

display(dataset_targets_or_labels, dataset_targets_or_labels_standards_label, dataset_targets_or_labels_standards)

Combobox(value='', description='5.4 Are there associated targets or labels for supervised learning techniques?…

Label(value='If there are associated targets/labels, are community labeling standards implemented?')

Text(value='', layout=Layout(width='900px'), placeholder='e.g., STAC label extension, ESA AIREO specification,…

In [33]:

# Save button
save_button = widgets.Button(description="Save Data Access Answers to json file",  button_style="primary",  layout=widgets.Layout(flex='1 1 auto', width='auto'))

def generate_updates_preparation():

    updates = {
        "DataPreparation": {
            "NullValuesFilled": dataset_null_values.value,
            "OutliersIdentified": dataset_outliers.value,
            "Gridded": dataset_gridded.value,
            "TransformedFromOriginal": dataset_gridded_transformed.value,
            "OriginalSamplingAvailable": dataset_gridded_original_sample.value, 
            "SupervisedLearningLabels" : dataset_targets_or_labels.value,
            "SupervisedLearningLabelStandards" : dataset_targets_or_labels_standards.value,
          
        }
    }
    return updates

save_button.on_click(lambda b: update_checklist(b, generate_updates_preparation()))

display(save_button)

Button(button_style='primary', description='Save Data Access Answers to json file', layout=Layout(flex='1 1 au…

## Finished

In [34]:

button_finished = widgets.Button(description="Print checklist",  button_style='info')
output = widgets.Output()
display(button_finished, output)

results = {}

def generate_results(b):
    results["1.1 \tDataset name"] = dataset_name.value
    results["1.2 \tDataset version"] = dataset_version.value
    results["1.3 \tDataset link"] = dataset_link.value
    results["1.4 \tAssessor name"] = dataset_assessor_name.value
    results["1.5 \tAssessor email address"] = dataset_assessor_email.value
    results["1.6 \tData product"] = raw_derived.value
    results["1.7 \tData origin"] = observe_model_synthetic.value
    results["1.8 \tData source"] = data_sources.value
    results["2.1 \tWill dataset be updated"] = data_update.value
    results["2.1.1 \tUpdate frequency"] = data_update_frequency.value
    results["2.1.2 \tUpdate stages"] = data_update_stages.value
    results["2.1.3 \tUpdate delay reason"] = data_update_delay.value
    results["2.1.4 \tShould new version supersede"] = data_update_supersede.value
    results["2.2 \tDataset completeness documentation"] = completeness_docs.value
    results["2.2.1 \tDataset completeness doc link"] = completeness_docs_link.value
    results["2.3 \tDataset completion vs expected spatial coverage"] = expected_spatial_coverage.value
    results["2.4 \tDataset completion vs expected temporal coverage"] = expected_temporal_coverage.value
    results["2.5 \tSelf-consistent units, dtypes, parameters"] = self_consistent_units.value
    results["2.6 \tConsistent units, dtypes, parameter with similar datasets"] = consistent_units.value
    results["2.7 \tConsistent units, dtypes, parameters monitoring "] = consistent_unit_monitoring.value
    results["2.7.1 \tConsistent units, dtypes, parameter monitoring measures"] = consistent_unit_review.value
    results["2.8 \tIs there known bias in the dataset"] = dataset_bias.value
    results["2.8.1 \tHas bias been examined"] = dataset_bias_measures.value
    results["2.8.2 \tBias measures used"] = dataset_bias_measures_detail.value
    results["2.8.3 \tIs bias metrological traceable"] = dataset_bias_metrological_traceable.value
    results["2.8.3 \tReported bias in data"] = dataset_bias_report.value
    results["2.8.4 \tBias report link"] = dataset_bias_report_link.value
    results["2.8.5 \tBias corrected dataset version linke"] = dataset_bias_report.value
    results["2.8.6 \tTools to reduce bias link"] = dataset_bias_tools_link.value    
    results["2.9 \tSpace and time data resolution info"] = data_resolution_info.value
    results["2.10 \tPublished data quality procedures or reports"] = data_quality_report.value
    results["2.10.1 \tIf there is published quality information, please provide the link"] = data_quality_report_link.value
    results["2.11 \tProvenance of the dataset tracked and documented"] = dataset_provenance.value
    results["2.12 \tChecksums / other checks for data integrity"] = data_integrity.value
    results["2.13.1 \tTotal data volume"] = total_data_volume.value
    results["2.13.2 \tNumber of data dimensions"] = num_data_dimensions.value
    results["2.13.3 \tNumber of data files"] = num_data_files.value
    results["2.13.4 \tNumber of data table rows"] = num_data_rows.value
    results["2.13.5 \tNumber of images"] = num_data_images.value
    results["2.13.6 \tSize of images"] = num_data_images_size.value
    results["3.2 \tData dictionary for dataset / parameters"] = data_dictionary.value
    results["3.2.1 \tData dictionary standardized"] = data_dictionary_standardized.value
    results["3.2.2 \tData dictionary machine-readable"] = data_dictionary_machine_readable.value
    results["3.2.3 \tParameters follow a defined standard"] = parameters_defined_standard.value
    results["3.2.4 \tWhich standard to the parameters follow"] = parameters_defined_standard_detail.value
    results["3.2.5 \tAre parameters crosswalked in an ontology or common vocabulary"] = parameters_common_vocabulary.value
    results["3.3 \tHas a unique persistent identifier"] = unique_persistent_identifier.value
    results["3.3.1 \tUnique persistent identifier link"] = unique_persistent_identifier_link.value
    results["3.4 \tContact info available"] = contact_info_available.value
    results["3.5 \tFeedback mechanism available"] = feedback_mechanism_available.value
    results["3.6 \tExample code / notebooks / toolkits"] = example_code_available.value
    results["3.7 \tLicence"] = dataset_licence.value
    results["3.7.1 \tLicence machine-readable"] = dataset_licence_machine_readable.value
    results["3.8 \tAI / ML existing usage links"] = ai_ml_existing_useage_links.value
    results["3.9 \tDataset useage recomendations"] = usage_recomendations.value
    results["4.1 \tWhat is/are the major file formats"] = dataset_file_formats.value
    results["4.1.1 \tMain data formats machine-readable"] = dataset_file_formats_machine_readable.value
    results["4.1.2 \tData available in at least one open, non-proprietary format"] = dataset_file_formats_non_proprietary.value
    results["4.1.3 \tData format conversion tools / services"] = dataset_file_formats_conversion_tools.value
    results["4.1.4 \tData format conversion tools / services link"] = dataset_file_formats_conversion_tools_link.value
    results["4.2 \tDataset requires authentication"] = dataset_authentication.value
    results["4.3 \tDirect file download / order access"] = dataset_direct_access.value
    results["4.4 \tAPI / webservice available"] = dataset_api_available.value
    results["4.4.1 \tAPI follow an open standard protocol"] = dataset_api_standard_protocol.value
    results["4.4.2 \tAPI documentation available"] = dataset_api_documentation_available.value
    results["4.4.3 \tAPI documentation link "] = dataset_api_documentation_link.value
    results["4.5 \tRestricted data with some access but appropriate protections"] = dataset_restricted_protection.value
    results["4.6 \tData aggregated to reduce granularity"] = dataset_aggregation.value
    results["4.7 \tData has been anonymized / de-identified"] = dataset_anonymization.value
    results["4.8 \tSecure access to full dataset for authorized users"] = dataset_secure_access.value
    results["5.1 \tNull values / gaps been filled"] = dataset_null_values.value
    results["5.2 \tOutlier values been identified"] = dataset_outliers.value
    results["5.3 \tIs the data gridded"] = dataset_gridded.value
    results["5.3.1 \tIs gridded data transformed from original sampling"] = dataset_gridded_transformed.value
    results["5.3.2 \tIf data is resampled, is the original sampling available"] = dataset_gridded_original_sample.value
    results["5.4 \tAssociated targets or labels for supervised learning techniques"] = dataset_targets_or_labels.value
    results["5.4.1 \tAre targets or labels community standards implimented"] = dataset_targets_or_labels_standards.value
    
    # Print checklist results.      
    with output:
        clear_output()
        print("\n------ CHECKLIST RESULTS ------\n")
        for key, value in results.items():
            #if value != "":
            print(f"{key}: {value}")
        
button_finished.on_click(generate_results)


Button(button_style='info', description='Print checklist', style=ButtonStyle())

Output()

---

## **Appendix** - Definition of terms used in the checklist.

### Quality
* **Completeness**: the breadth of a dataset compared to an ideal 100% completion (spatial, temporal, demographic, etc.); important in avoiding sampling bias
* **Consistency**: uniformity within the entire dataset or compared with similar data collections; for example, no changes in units or data types over time; the item measured against itself or its a counterpart in another dataset or database
* **Bias**: a systematic tilt in the dataset when compared to a reference, caused for example by instrumentation, incorrect data processing, unrepresentative sampling, or human error; the exact nature of bias and how it is measured will vary depending on the type of data and the research domain.
* **Uncertainty**: parameter, associated with the result of a measurement, that characterizes the dispersion of the values that could reasonably be attributed to the measurand.
* **Timeliness**: the speed of data release, compared to when an event occurred or measurements were made; requirements will vary depending on the timeframe of the phenomenon (e.g., severe thunderstorms vs. climate change, or disease outbreaks vs. life expectancy trends)
* **Provenance**: identification of the data sources, how it was processed, and who released it.
* **Integrity**: verification that the data remains unchanged from the original; aka data fixity.

### Documentation
* **Dataset Metadata**: complete information about the dataset: quality, provenance, location, time period, responsible parties, purpose, etc.
* **Data Dictionary/Codebook**: complete information about the individual variables / measures / parameters within a dataset: type, units, null value, etc.
* **Identifier**: a code or number that uniquely identifies a dataset
* **Ontology**: formalized definitions of concepts within a domain of knowledge, and the nature of the inter-relationships among those concepts

### Data Access

* **Formats**: standards that govern how information is stored in a computer file (e.g., CSV, JSON, GeoTIFF, etc.); different AI user communities will have different requirements, so the best practice is to provide several format options to meet the needs of multiple high priority user communities.
* **Delivery Options**: mechanisms for publishing open data for public use (e.g., direct file download, Application Programming Interface (API), cloud services, etc.); different AI user communities will have different requirements, so the best practice is to provide several delivery options to meet the needs of multiple high priority user communities.
* **License/Usage Rights**: information on who is allowed to use the data and for what purposes, including data sharing agreements, fees, etc.; some federal data needs to have restrictions and some will be fully open, so rights should be documented in detail
* **Security/Privacy**: protection of data that is restricted in some way (privacy, proprietary/business information, national security, etc.)
