In [3]:
pip install netCDF4

Collecting netCDF4
  Downloading netCDF4-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (1.8 kB)
Collecting cftime (from netCDF4)
  Downloading cftime-1.6.4.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.7 kB)
Downloading netCDF4-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.1/9.1 MB[0m [31m48.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading cftime-1.6.4.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m58.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: cftime, netCDF4
Successfully installed cftime-1.6.4.post1 netCDF4-1.7.2


### Comparison of Generated Output with Given Output

The comparison is divided into two key parts:

1. **Identifying Missing Attributes**:  
   This step checks for attributes present in the given output but missing in the generated output, ensuring completeness of the data.

2. **Metadata Analysis**:  
   Compares the metadata in both files to verify the presence of global attributes and evaluate consistency between the files.

#NRS01A

Comparing NRS01A Output 1 with the given output

In [5]:
import netCDF4

def compare_netcdf_variables(file1, file2):
    # Open the netCDF files
    nc1 = netCDF4.Dataset(file1, mode='r')
    nc2 = netCDF4.Dataset(file2, mode='r')

    # Define the expected attributes in Output.nc
    expected_attributes = {
        "timestamp", "effort", "psd", "quality_flag", "time",
        "frequency", "cal_frequency", "analog_sensitivity",
        "preamplifier_gain", "recorder_gain", "sensor_sensitivity"
    }

    # Get variables from both files
    variables_file1 = set(nc1.variables.keys())
    variables_file2 = set(nc2.variables.keys())

    # Find missing variables in file2
    missing_in_file2 = expected_attributes - variables_file2

    # Close the files
    nc1.close()
    nc2.close()

    return missing_in_file2

# Specify your file paths
file1 = 'Output.nc'  # Reference file
file2 = 'NRS01_H5R6B.1.5000_20180901_DAILY_MILLIDEC_MinRes_v3.nc'  # File to compare

# Perform comparison
missing_attributes = compare_netcdf_variables(file1, file2)

# Display the results
if missing_attributes:
    print("Variables missing in file2 compared to file1:")
    for attr in missing_attributes:
        print(f"- {attr}")
else:
    print("All expected variables are present in file2.")


Variables missing in file2 compared to file1:
- preamplifier_gain


In [7]:


def compare_file_metadata(file1, file2):
    """
    Compare metadata (static and dynamic) between two netCDF files.
    Additionally, display metadata present in file1.

    Parameters:
        file1 (str): Path to the first netCDF file (reference file).
        file2 (str): Path to the second netCDF file (comparison file).

    Returns:
        dict: Contains the differences in metadata between the two files.
    """
    # Open the netCDF files
    nc1 = netCDF4.Dataset(file1, mode='r')
    nc2 = netCDF4.Dataset(file2, mode='r')

    metadata_differences = {
        "global_missing_in_file2": [],
        "global_extra_in_file2": [],
        "global_in_file1": [],
        "variable_missing_in_file2": {},
        "variable_extra_in_file2": {},
        "variable_in_file1": {}
    }

    # Compare global metadata
    file1_global_attrs = set(nc1.ncattrs())
    file2_global_attrs = set(nc2.ncattrs())

    metadata_differences["global_missing_in_file2"] = list(file1_global_attrs - file2_global_attrs)
    metadata_differences["global_extra_in_file2"] = list(file2_global_attrs - file1_global_attrs)
    metadata_differences["global_in_file1"] = list(file1_global_attrs)

    # Compare metadata for variables
    for var in nc1.variables:
        file1_var_attrs = set(nc1.variables[var].ncattrs())
        metadata_differences["variable_in_file1"][var] = list(file1_var_attrs)

        if var in nc2.variables:
            file2_var_attrs = set(nc2.variables[var].ncattrs())

            missing_in_file2 = file1_var_attrs - file2_var_attrs
            extra_in_file2 = file2_var_attrs - file1_var_attrs

            if missing_in_file2:
                metadata_differences["variable_missing_in_file2"][var] = list(missing_in_file2)
            if extra_in_file2:
                metadata_differences["variable_extra_in_file2"][var] = list(extra_in_file2)

    # Close the netCDF files
    nc1.close()
    nc2.close()

    return metadata_differences

# Example usage
file1 = 'Output.nc'  # Reference file
file2 = 'NRS01_H5R6B.1.5000_20180901_DAILY_MILLIDEC_MinRes_v3.nc'  # File to compare

metadata_diff = compare_file_metadata(file1, file2)

# Display results
print("Global attributes missing in file2:", metadata_diff["global_missing_in_file2"])
print("Extra global attributes in file2:", metadata_diff["global_extra_in_file2"])
print("Global attributes in file1:", metadata_diff["global_in_file1"])
print("\nVariable metadata missing in file2:")
for var, attrs in metadata_diff["variable_missing_in_file2"].items():
    print(f"Variable '{var}': Missing attributes: {attrs}")

print("\nExtra variable metadata in file2:")
for var, attrs in metadata_diff["variable_extra_in_file2"].items():
    print(f"Variable '{var}': Extra attributes: {attrs}")

print("\nVariable metadata in file1:")
for var, attrs in metadata_diff["variable_in_file1"].items():
    print(f"Variable '{var}': Attributes in file1: {attrs}")



Global attributes missing in file2: []
Extra global attributes in file2: []
Global attributes in file1: ['conventions', 'keywords', 'time_coverage_duration', 'history', 'time_coverage_resolution', 'summary', 'title', 'acknowledgement', 'product_version', 'creator_name', 'instrument', 'infoUrl', 'comment', 'license', 'project', 'publisher_email', 'institution', 'reference', 'time_offset', 'keywords_vocabulary', 'id', 'standard_name_vocabulary', 'publisher_type', 'citation', 'creator_role', 'naming_authority', 'geospatial_bounds', 'publisher_name', 'source', 'date_created', 'publisher_url']

Variable metadata missing in file2:

Extra variable metadata in file2:

Variable metadata in file1:
Variable 'timestamp': Attributes in file1: ['actual_range', 'long_name', 'units', 'standard_name']
Variable 'effort': Attributes in file1: ['coverage_content_type', 'long_name', 'units']
Variable 'psd': Attributes in file1: ['coverage_content_type', 'long_name', 'units', 'standard_name', 'comment']
Var

Comparing NRS01A Output 2 with the given output

In [8]:


def compare_netcdf_variables(file1, file2):
    # Open the netCDF files
    nc1 = netCDF4.Dataset(file1, mode='r')
    nc2 = netCDF4.Dataset(file2, mode='r')

    # Define the expected attributes in Output.nc
    expected_attributes = {
        "timestamp", "effort", "psd", "quality_flag", "time",
        "frequency", "cal_frequency", "analog_sensitivity",
        "preamplifier_gain", "recorder_gain", "sensor_sensitivity"
    }

    # Get variables from both files
    variables_file1 = set(nc1.variables.keys())
    variables_file2 = set(nc2.variables.keys())

    # Find missing variables in file2
    missing_in_file2 = expected_attributes - variables_file2

    # Close the files
    nc1.close()
    nc2.close()

    return missing_in_file2

# Specify your file paths
file1 = 'Output.nc'  # Reference file
file2 = 'NRS01_H5R6B.1.5000_20180831_DAILY_MILLIDEC_MinRes_v3-2.nc'  # File to compare

# Perform comparison
missing_attributes = compare_netcdf_variables(file1, file2)

# Display the results
if missing_attributes:
    print("Variables missing in file2 compared to file1:")
    for attr in missing_attributes:
        print(f"- {attr}")
else:
    print("All expected variables are present in file2.")

Variables missing in file2 compared to file1:
- preamplifier_gain


In [9]:


def compare_file_metadata(file1, file2):
    """
    Compare metadata (static and dynamic) between two netCDF files.
    Additionally, display metadata present in file1.

    Parameters:
        file1 (str): Path to the first netCDF file (reference file).
        file2 (str): Path to the second netCDF file (comparison file).

    Returns:
        dict: Contains the differences in metadata between the two files.
    """
    # Open the netCDF files
    nc1 = netCDF4.Dataset(file1, mode='r')
    nc2 = netCDF4.Dataset(file2, mode='r')

    metadata_differences = {
        "global_missing_in_file2": [],
        "global_extra_in_file2": [],
        "global_in_file1": [],
        "variable_missing_in_file2": {},
        "variable_extra_in_file2": {},
        "variable_in_file1": {}
    }

    # Compare global metadata
    file1_global_attrs = set(nc1.ncattrs())
    file2_global_attrs = set(nc2.ncattrs())

    metadata_differences["global_missing_in_file2"] = list(file1_global_attrs - file2_global_attrs)
    metadata_differences["global_extra_in_file2"] = list(file2_global_attrs - file1_global_attrs)
    metadata_differences["global_in_file1"] = list(file1_global_attrs)

    # Compare metadata for variables
    for var in nc1.variables:
        file1_var_attrs = set(nc1.variables[var].ncattrs())
        metadata_differences["variable_in_file1"][var] = list(file1_var_attrs)

        if var in nc2.variables:
            file2_var_attrs = set(nc2.variables[var].ncattrs())

            missing_in_file2 = file1_var_attrs - file2_var_attrs
            extra_in_file2 = file2_var_attrs - file1_var_attrs

            if missing_in_file2:
                metadata_differences["variable_missing_in_file2"][var] = list(missing_in_file2)
            if extra_in_file2:
                metadata_differences["variable_extra_in_file2"][var] = list(extra_in_file2)

    # Close the netCDF files
    nc1.close()
    nc2.close()

    return metadata_differences

# Example usage
file1 = 'Output.nc'  # Reference file
file2 = 'NRS01_H5R6B.1.5000_20180831_DAILY_MILLIDEC_MinRes_v3-2.nc'  # File to compare

metadata_diff = compare_file_metadata(file1, file2)

# Display results
print("Global attributes missing in file2:", metadata_diff["global_missing_in_file2"])
print("Extra global attributes in file2:", metadata_diff["global_extra_in_file2"])
print("Global attributes in file1:", metadata_diff["global_in_file1"])
print("\nVariable metadata missing in file2:")
for var, attrs in metadata_diff["variable_missing_in_file2"].items():
    print(f"Variable '{var}': Missing attributes: {attrs}")

print("\nExtra variable metadata in file2:")
for var, attrs in metadata_diff["variable_extra_in_file2"].items():
    print(f"Variable '{var}': Extra attributes: {attrs}")

print("\nVariable metadata in file1:")
for var, attrs in metadata_diff["variable_in_file1"].items():
    print(f"Variable '{var}': Attributes in file1: {attrs}")

Global attributes missing in file2: []
Extra global attributes in file2: []
Global attributes in file1: ['conventions', 'keywords', 'time_coverage_duration', 'history', 'time_coverage_resolution', 'summary', 'title', 'acknowledgement', 'product_version', 'creator_name', 'instrument', 'infoUrl', 'comment', 'license', 'project', 'publisher_email', 'institution', 'reference', 'time_offset', 'keywords_vocabulary', 'id', 'standard_name_vocabulary', 'publisher_type', 'citation', 'creator_role', 'naming_authority', 'geospatial_bounds', 'publisher_name', 'source', 'date_created', 'publisher_url']

Variable metadata missing in file2:

Extra variable metadata in file2:

Variable metadata in file1:
Variable 'timestamp': Attributes in file1: ['actual_range', 'long_name', 'units', 'standard_name']
Variable 'effort': Attributes in file1: ['coverage_content_type', 'long_name', 'units']
Variable 'psd': Attributes in file1: ['coverage_content_type', 'long_name', 'units', 'standard_name', 'comment']
Var

#NRS01B

Comparing NRS01B output 1 with the given output

In [10]:


def compare_netcdf_variables(file1, file2):
    # Open the netCDF files
    nc1 = netCDF4.Dataset(file1, mode='r')
    nc2 = netCDF4.Dataset(file2, mode='r')

    # Define the expected attributes in Output.nc
    expected_attributes = {
        "timestamp", "effort", "psd", "quality_flag", "time",
        "frequency", "cal_frequency", "analog_sensitivity",
        "preamplifier_gain", "recorder_gain", "sensor_sensitivity"
    }

    # Get variables from both files
    variables_file1 = set(nc1.variables.keys())
    variables_file2 = set(nc2.variables.keys())

    # Find missing variables in file2
    missing_in_file2 = expected_attributes - variables_file2

    # Close the files
    nc1.close()
    nc2.close()

    return missing_in_file2

# Specify your file paths
file1 = 'Output.nc'  # Reference file
file2 = 'NRS01_2022_H5R6.1.5000_20200917_DAILY_MILLIDEC_MinRes_v3.nc'  # File to compare

# Perform comparison
missing_attributes = compare_netcdf_variables(file1, file2)

# Display the results
if missing_attributes:
    print("Variables missing in file2 compared to file1:")
    for attr in missing_attributes:
        print(f"- {attr}")
else:
    print("All expected variables are present in file2.")

Variables missing in file2 compared to file1:
- preamplifier_gain


In [11]:


def compare_file_metadata(file1, file2):
    """
    Compare metadata (static and dynamic) between two netCDF files.
    Additionally, display metadata present in file1.

    Parameters:
        file1 (str): Path to the first netCDF file (reference file).
        file2 (str): Path to the second netCDF file (comparison file).

    Returns:
        dict: Contains the differences in metadata between the two files.
    """
    # Open the netCDF files
    nc1 = netCDF4.Dataset(file1, mode='r')
    nc2 = netCDF4.Dataset(file2, mode='r')

    metadata_differences = {
        "global_missing_in_file2": [],
        "global_extra_in_file2": [],
        "global_in_file1": [],
        "variable_missing_in_file2": {},
        "variable_extra_in_file2": {},
        "variable_in_file1": {}
    }

    # Compare global metadata
    file1_global_attrs = set(nc1.ncattrs())
    file2_global_attrs = set(nc2.ncattrs())

    metadata_differences["global_missing_in_file2"] = list(file1_global_attrs - file2_global_attrs)
    metadata_differences["global_extra_in_file2"] = list(file2_global_attrs - file1_global_attrs)
    metadata_differences["global_in_file1"] = list(file1_global_attrs)

    # Compare metadata for variables
    for var in nc1.variables:
        file1_var_attrs = set(nc1.variables[var].ncattrs())
        metadata_differences["variable_in_file1"][var] = list(file1_var_attrs)

        if var in nc2.variables:
            file2_var_attrs = set(nc2.variables[var].ncattrs())

            missing_in_file2 = file1_var_attrs - file2_var_attrs
            extra_in_file2 = file2_var_attrs - file1_var_attrs

            if missing_in_file2:
                metadata_differences["variable_missing_in_file2"][var] = list(missing_in_file2)
            if extra_in_file2:
                metadata_differences["variable_extra_in_file2"][var] = list(extra_in_file2)

    # Close the netCDF files
    nc1.close()
    nc2.close()

    return metadata_differences

# Example usage
file1 = 'Output.nc'  # Reference file
file2 = 'NRS01_2022_H5R6.1.5000_20200917_DAILY_MILLIDEC_MinRes_v3.nc'  # File to compare

metadata_diff = compare_file_metadata(file1, file2)

# Display results
print("Global attributes missing in file2:", metadata_diff["global_missing_in_file2"])
print("Extra global attributes in file2:", metadata_diff["global_extra_in_file2"])
print("Global attributes in file1:", metadata_diff["global_in_file1"])
print("\nVariable metadata missing in file2:")
for var, attrs in metadata_diff["variable_missing_in_file2"].items():
    print(f"Variable '{var}': Missing attributes: {attrs}")

print("\nExtra variable metadata in file2:")
for var, attrs in metadata_diff["variable_extra_in_file2"].items():
    print(f"Variable '{var}': Extra attributes: {attrs}")

print("\nVariable metadata in file1:")
for var, attrs in metadata_diff["variable_in_file1"].items():
    print(f"Variable '{var}': Attributes in file1: {attrs}")

Global attributes missing in file2: []
Extra global attributes in file2: []
Global attributes in file1: ['conventions', 'keywords', 'time_coverage_duration', 'history', 'time_coverage_resolution', 'summary', 'title', 'acknowledgement', 'product_version', 'creator_name', 'instrument', 'infoUrl', 'comment', 'license', 'project', 'publisher_email', 'institution', 'reference', 'time_offset', 'keywords_vocabulary', 'id', 'standard_name_vocabulary', 'publisher_type', 'citation', 'creator_role', 'naming_authority', 'geospatial_bounds', 'publisher_name', 'source', 'date_created', 'publisher_url']

Variable metadata missing in file2:

Extra variable metadata in file2:

Variable metadata in file1:
Variable 'timestamp': Attributes in file1: ['actual_range', 'long_name', 'units', 'standard_name']
Variable 'effort': Attributes in file1: ['coverage_content_type', 'long_name', 'units']
Variable 'psd': Attributes in file1: ['coverage_content_type', 'long_name', 'units', 'standard_name', 'comment']
Var

Comparing NRS01B Output 2 with the given output

In [12]:


def compare_netcdf_variables(file1, file2):
    # Open the netCDF files
    nc1 = netCDF4.Dataset(file1, mode='r')
    nc2 = netCDF4.Dataset(file2, mode='r')

    # Define the expected attributes in Output.nc
    expected_attributes = {
        "timestamp", "effort", "psd", "quality_flag", "time",
        "frequency", "cal_frequency", "analog_sensitivity",
        "preamplifier_gain", "recorder_gain", "sensor_sensitivity"
    }

    # Get variables from both files
    variables_file1 = set(nc1.variables.keys())
    variables_file2 = set(nc2.variables.keys())

    # Find missing variables in file2
    missing_in_file2 = expected_attributes - variables_file2

    # Close the files
    nc1.close()
    nc2.close()

    return missing_in_file2

# Specify your file paths
file1 = 'Output.nc'  # Reference file
file2 = 'NRS01_2022_H5R6.1.5000_20200916_DAILY_MILLIDEC_MinRes_v3.nc'  # File to compare

# Perform comparison
missing_attributes = compare_netcdf_variables(file1, file2)

# Display the results
if missing_attributes:
    print("Variables missing in file2 compared to file1:")
    for attr in missing_attributes:
        print(f"- {attr}")
else:
    print("All expected variables are present in file2.")

Variables missing in file2 compared to file1:
- preamplifier_gain


In [13]:


def compare_file_metadata(file1, file2):
    """
    Compare metadata (static and dynamic) between two netCDF files.
    Additionally, display metadata present in file1.

    Parameters:
        file1 (str): Path to the first netCDF file (reference file).
        file2 (str): Path to the second netCDF file (comparison file).

    Returns:
        dict: Contains the differences in metadata between the two files.
    """
    # Open the netCDF files
    nc1 = netCDF4.Dataset(file1, mode='r')
    nc2 = netCDF4.Dataset(file2, mode='r')

    metadata_differences = {
        "global_missing_in_file2": [],
        "global_extra_in_file2": [],
        "global_in_file1": [],
        "variable_missing_in_file2": {},
        "variable_extra_in_file2": {},
        "variable_in_file1": {}
    }

    # Compare global metadata
    file1_global_attrs = set(nc1.ncattrs())
    file2_global_attrs = set(nc2.ncattrs())

    metadata_differences["global_missing_in_file2"] = list(file1_global_attrs - file2_global_attrs)
    metadata_differences["global_extra_in_file2"] = list(file2_global_attrs - file1_global_attrs)
    metadata_differences["global_in_file1"] = list(file1_global_attrs)

    # Compare metadata for variables
    for var in nc1.variables:
        file1_var_attrs = set(nc1.variables[var].ncattrs())
        metadata_differences["variable_in_file1"][var] = list(file1_var_attrs)

        if var in nc2.variables:
            file2_var_attrs = set(nc2.variables[var].ncattrs())

            missing_in_file2 = file1_var_attrs - file2_var_attrs
            extra_in_file2 = file2_var_attrs - file1_var_attrs

            if missing_in_file2:
                metadata_differences["variable_missing_in_file2"][var] = list(missing_in_file2)
            if extra_in_file2:
                metadata_differences["variable_extra_in_file2"][var] = list(extra_in_file2)

    # Close the netCDF files
    nc1.close()
    nc2.close()

    return metadata_differences

# Example usage
file1 = 'Output.nc'  # Reference file
file2 = 'NRS01_2022_H5R6.1.5000_20200916_DAILY_MILLIDEC_MinRes_v3.nc'  # File to compare

metadata_diff = compare_file_metadata(file1, file2)

# Display results
print("Global attributes missing in file2:", metadata_diff["global_missing_in_file2"])
print("Extra global attributes in file2:", metadata_diff["global_extra_in_file2"])
print("Global attributes in file1:", metadata_diff["global_in_file1"])
print("\nVariable metadata missing in file2:")
for var, attrs in metadata_diff["variable_missing_in_file2"].items():
    print(f"Variable '{var}': Missing attributes: {attrs}")

print("\nExtra variable metadata in file2:")
for var, attrs in metadata_diff["variable_extra_in_file2"].items():
    print(f"Variable '{var}': Extra attributes: {attrs}")

print("\nVariable metadata in file1:")
for var, attrs in metadata_diff["variable_in_file1"].items():
    print(f"Variable '{var}': Attributes in file1: {attrs}")

Global attributes missing in file2: []
Extra global attributes in file2: []
Global attributes in file1: ['conventions', 'keywords', 'time_coverage_duration', 'history', 'time_coverage_resolution', 'summary', 'title', 'acknowledgement', 'product_version', 'creator_name', 'instrument', 'infoUrl', 'comment', 'license', 'project', 'publisher_email', 'institution', 'reference', 'time_offset', 'keywords_vocabulary', 'id', 'standard_name_vocabulary', 'publisher_type', 'citation', 'creator_role', 'naming_authority', 'geospatial_bounds', 'publisher_name', 'source', 'date_created', 'publisher_url']

Variable metadata missing in file2:

Extra variable metadata in file2:

Variable metadata in file1:
Variable 'timestamp': Attributes in file1: ['actual_range', 'long_name', 'units', 'standard_name']
Variable 'effort': Attributes in file1: ['coverage_content_type', 'long_name', 'units']
Variable 'psd': Attributes in file1: ['coverage_content_type', 'long_name', 'units', 'standard_name', 'comment']
Var

In [14]:


def compare_variable_metadata(file1, file2):
    """
    Compare variable metadata between two netCDF files.

    Parameters:
        file1 (str): Path to the reference file (given output).
        file2 (str): Path to the file to compare (generated file).

    Returns:
        dict: Differences in variable metadata.
    """
    nc1 = netCDF4.Dataset(file1, 'r')
    nc2 = netCDF4.Dataset(file2, 'r')

    metadata_comparison = {
        "missing_metadata_in_file2": {},
        "extra_metadata_in_file2": {},
        "variables_only_in_file1": []
    }

    for var in nc1.variables:
        if var not in nc2.variables:
            metadata_comparison["variables_only_in_file1"].append(var)
        else:
            # Compare metadata attributes for the variable
            file1_attrs = set(nc1.variables[var].ncattrs())
            file2_attrs = set(nc2.variables[var].ncattrs())

            missing_in_file2 = file1_attrs - file2_attrs
            extra_in_file2 = file2_attrs - file1_attrs

            if missing_in_file2:
                metadata_comparison["missing_metadata_in_file2"][var] = list(missing_in_file2)
            if extra_in_file2:
                metadata_comparison["extra_metadata_in_file2"][var] = list(extra_in_file2)

    nc1.close()
    nc2.close()

    return metadata_comparison

# Example usage
file1 = 'Output.nc'  # Reference file
file2 = 'NRS01_2022_H5R6.1.5000_20200916_DAILY_MILLIDEC_MinRes_v3.nc'  # File to compare

result = compare_variable_metadata(file1, file2)

print("Variables only in file1:", result["variables_only_in_file1"])
print("Missing metadata in file2:")
for var, attrs in result["missing_metadata_in_file2"].items():
    print(f"  - Variable '{var}': Missing attributes {attrs}")
print("Extra metadata in file2:")
for var, attrs in result["extra_metadata_in_file2"].items():
    print(f"  - Variable '{var}': Extra attributes {attrs}")


Variables only in file1: []
Missing metadata in file2:
Extra metadata in file2:


#Code to Check Missing PSD Values

In [18]:
import netCDF4
import numpy as np

def check_missing_psd_multiple_files(file_paths):
    """
    Checks for missing PSD values in multiple netCDF files.

    Parameters:
        file_paths (list): List of paths to netCDF files.

    Returns:
        dict: Results for each file, including total PSD values, missing values, and percentage of missing values.
    """
    results = {}

    for file_path in file_paths:
        try:
            # Open the netCDF file
            nc = netCDF4.Dataset(file_path, mode='r')

            # Check if 'psd' variable exists
            if 'psd' not in nc.variables:
                results[file_path] = {"error": "PSD variable not found in the file."}
                nc.close()
                continue

            # Extract PSD data
            psd_data = nc.variables['psd'][:]

            # Count missing values (NaN)
            missing_count = np.count_nonzero(np.isnan(psd_data))
            total_count = psd_data.size
            missing_percentage = (missing_count / total_count) * 100

            # Store results
            results[file_path] = {
                "total_count": total_count,
                "missing_count": missing_count,
                "missing_percentage": missing_percentage
            }

            # Close the netCDF file
            nc.close()

        except Exception as e:
            # Handle errors gracefully
            results[file_path] = {"error": str(e)}

    return results

# Example usage
file_paths = ["NRS01_2022_H5R6.1.5000_20200916_DAILY_MILLIDEC_MinRes_v3.nc", "NRS01_2022_H5R6.1.5000_20200917_DAILY_MILLIDEC_MinRes_v3.nc", "NRS01_H5R6B.1.5000_20180831_DAILY_MILLIDEC_MinRes_v3-2.nc", "NRS01_H5R6B.1.5000_20180901_DAILY_MILLIDEC_MinRes_v3.nc"]  # Replace with actual file paths

results = check_missing_psd_multiple_files(file_paths)

# Display results
for file, result in results.items():
    print(f"\nResults for {file}:")
    if "error" in result:
        print(f"  Error: {result['error']}")
    else:
        print(f"  Total PSD values: {result['total_count']}")
        print(f"  Missing PSD values: {result['missing_count']}")
        print(f"  Percentage of missing PSD values: {result['missing_percentage']:.2f}%")




Results for NRS01_2022_H5R6.1.5000_20200916_DAILY_MILLIDEC_MinRes_v3.nc:
  Total PSD values: 1719605
  Missing PSD values: 1439
  Percentage of missing PSD values: 0.08%

Results for NRS01_2022_H5R6.1.5000_20200917_DAILY_MILLIDEC_MinRes_v3.nc:
  Total PSD values: 1719605
  Missing PSD values: 1439
  Percentage of missing PSD values: 0.08%

Results for NRS01_H5R6B.1.5000_20180831_DAILY_MILLIDEC_MinRes_v3-2.nc:
  Total PSD values: 1719605
  Missing PSD values: 1439
  Percentage of missing PSD values: 0.08%

Results for NRS01_H5R6B.1.5000_20180901_DAILY_MILLIDEC_MinRes_v3.nc:
  Total PSD values: 1719605
  Missing PSD values: 1439
  Percentage of missing PSD values: 0.08%


#Quality Matrix Check for Multiple Files
This script checks the quality_flag variable across up to four netCDF files to identify:

Unique Values: Lists all unique values present in the quality matrix for each file.


Unexpected Values: Highlights any values outside the specified default range (e.g., [0, 1]).


Errors: Captures issues such as missing variables or inaccessible files.

In [17]:
import netCDF4
import numpy as np

def check_quality_matrix_multiple_files(file_paths, quality_var_name="quality_flag", default_values=[0, 1]):
    """
    Checks if the quality matrix in multiple netCDF files contains values other than the default values.

    Parameters:
        file_paths (list): List of paths to the netCDF files.
        quality_var_name (str): Name of the quality matrix variable (default: 'quality_flag').
        default_values (list): List of expected default values (default: [0, 1]).

    Returns:
        dict: Results for each file, including unique values and unexpected values.
    """
    results = {}

    for file_path in file_paths:
        try:
            # Open the netCDF file
            nc = netCDF4.Dataset(file_path, mode='r')

            # Check if the quality matrix exists
            if quality_var_name not in nc.variables:
                results[file_path] = {
                    "error": f"Variable '{quality_var_name}' not found in the file."
                }
                nc.close()
                continue

            # Extract the quality matrix data
            quality_data = nc.variables[quality_var_name][:]

            # Find unique values in the quality matrix
            unique_values = np.unique(quality_data)

            # Check for unexpected values
            unexpected_values = [value for value in unique_values if value not in default_values]

            # Store results
            results[file_path] = {
                "unique_values": unique_values.tolist(),
                "unexpected_values": unexpected_values
            }

            # Close the netCDF file
            nc.close()

        except Exception as e:
            # Handle errors gracefully
            results[file_path] = {"error": str(e)}

    return results

# Example usage
file_paths = ["NRS01_2022_H5R6.1.5000_20200916_DAILY_MILLIDEC_MinRes_v3.nc", "NRS01_2022_H5R6.1.5000_20200917_DAILY_MILLIDEC_MinRes_v3.nc", "NRS01_H5R6B.1.5000_20180831_DAILY_MILLIDEC_MinRes_v3-2.nc", "NRS01_H5R6B.1.5000_20180901_DAILY_MILLIDEC_MinRes_v3.nc"]  # Replace with actual file paths
results = check_quality_matrix_multiple_files(file_paths, default_values=[0, 1])

# Display results
for file, result in results.items():
    print(f"\nResults for {file}:")
    if "error" in result:
        print(f"  Error: {result['error']}")
    else:
        print(f"  Unique values in quality matrix: {result['unique_values']}")
        if result["unexpected_values"]:
            print(f"  Unexpected values in quality matrix: {result['unexpected_values']}")
        else:
            print("  No unexpected values found in the quality matrix.")




Results for NRS01_2022_H5R6.1.5000_20200916_DAILY_MILLIDEC_MinRes_v3.nc:
  Unique values in quality matrix: [2]
  Unexpected values in quality matrix: [2]

Results for NRS01_2022_H5R6.1.5000_20200917_DAILY_MILLIDEC_MinRes_v3.nc:
  Unique values in quality matrix: [2]
  Unexpected values in quality matrix: [2]

Results for NRS01_H5R6B.1.5000_20180831_DAILY_MILLIDEC_MinRes_v3-2.nc:
  Unique values in quality matrix: [1, 4]
  Unexpected values in quality matrix: [4]

Results for NRS01_H5R6B.1.5000_20180901_DAILY_MILLIDEC_MinRes_v3.nc:
  Unique values in quality matrix: [1, 4]
  Unexpected values in quality matrix: [4]


# Conclusion



The comparison and analysis of the generated files revealed the following insights:

1. **Data Quality Matrix**:
   - The data quality matrix in NRS01B outputs is set to default, as noted by Carrie. However, the NRS01A outputs show variations across sub-directories.
   - Upon examining the files using Panoply and myHDF5, it was observed that the generated outputs have symmetrical patterns similar to the given output file, confirming the code's functionality.
   - Since the values differ across sub-directories but are identical for NRS01B files, this suggests that the default values in NRS01B could be file-specific rather than an issue with the code. Further domain expertise is needed to confirm the correctness of the data quality flag values.

2. **PSD Values**:
   - Each output file contains unique PSD values, indicating the code is generating valid, non-default results.
   - Across all files, 1,439 PSD values (0.08% of the total 1,719,605) are missing. While the missing values will be addressed, the uniqueness of the PSD values demonstrates that the code is working as intended.

3. **Pre-Amp Gain Attribute**:
   - The pre-amp gain attribute is missing or has a default value of `0.000e+0` throughout the output files. The same default value appears in the sample output file, suggesting that it might be the expected value for this dataset.
   - The code will be checked to ensure there are no errors in generating the pre-amp gain attribute. Clarification on the expected values for this attribute is requested to confirm its correctness.

4. **Metadata**:
   - Some metadata values are missing in the output files, but comparisons show that each file has unique metadata attributes, indicating that the code is functioning correctly and producing distinct outputs.
   - CSV files within the same sub-directory are more similar compared to files across different directories, further reflecting the integrity of the generated data.

Overall, the code is effectively generating unique outputs across files and directories. The identified issues (missing PSD values, pre-amp gain attribute, and some metadata) will be investigated and addressed. Feedback on the expected values for the pre-amp gain attribute and the data quality matrix is crucial for further refinement.